@ICCV2017: For exploiting second-order statistics, we propose Matrix Power Normalized Covariance pooling (MPN-COV) ConvNets, different from and outperforming those using global average pooling.

Stars: ✭ 63 (-45.22%)

Mutual labels: cuda

LuisaRender

High-Performance Multiple-Backend Renderer Based on LuisaCompute

Stars: ✭ 47 (-59.13%)

Mutual labels: cuda

Wheels

Performance-optimized wheels for TensorFlow (SSE, AVX, FMA, XLA, MPI)

Stars: ✭ 891 (+674.78%)

Mutual labels: cuda

docker python-opencv-ffmpeg

Dockerfile containing FFmpeg, OpenCV4 and Python2/3, based on Ubuntu LTS

Stars: ✭ 38 (-66.96%)

Mutual labels: cuda

Tensorflow Object Detection Tutorial

The purpose of this tutorial is to learn how to install and prepare TensorFlow framework to train your own convolutional neural network object detection classifier for multiple objects, starting from scratch

Stars: ✭ 113 (-1.74%)

Mutual labels: cuda

PyTorchTOP

GPU PyTorch TOP in TouchDesigner with CUDA-enabled OpenCV

Stars: ✭ 58 (-49.57%)

Mutual labels: cuda

Ddsh Tip2018

source code for paper "Deep Discrete Supervised Hashing"

Stars: ✭ 16 (-86.09%)

Mutual labels: cuda

Fbtt Embedding

This is a Tensor Train based compression library to compress sparse embedding tables used in large-scale machine learning models such as recommendation and natural language processing. We showed this library can reduce the total model size by up to 100x in Facebook’s open sourced DLRM model while achieving same model quality. Our implementation is faster than the state-of-the-art implementations. Existing the state-of-the-art library also decompresses the whole embedding tables on the fly therefore they do not provide memory reduction during runtime of the training. Our library decompresses only the requested rows therefore can provide 10,000 times memory footprint reduction per embedding table. The library also includes a software cache to store a portion of the entries in the table in decompressed format for faster lookup and process.

Stars: ✭ 92 (-20%)

Mutual labels: cuda

Cuda Design Patterns

Some CUDA design patterns and a bit of template magic for CUDA

Stars: ✭ 78 (-32.17%)

Mutual labels: cuda

Qualia2.0

Qualia is a deep learning framework deeply integrated with automatic differentiation and dynamic graphing with CUDA acceleration. Qualia was built from scratch.

Stars: ✭ 41 (-64.35%)

Mutual labels: cuda

Icpcuda

Super fast implementation of ICP in CUDA for compute capable devices 3.5 or higher

Stars: ✭ 416 (+261.74%)

Mutual labels: cuda

Ggnn

GGNN: State of the Art Graph-based GPU Nearest Neighbor Search

Stars: ✭ 63 (-45.22%)

Mutual labels: cuda

desert

A fast (?) random sampling drawing library

Stars: ✭ 61 (-46.96%)

Mutual labels: cuda

Libcudarange

An interval arithmetic and affine arithmetic library for NVIDIA CUDA

Stars: ✭ 5 (-95.65%)

Mutual labels: cuda

opencv-cuda-docker

Dockerfiles for OpenCV compiled with CUDA, opencv_contrib modules and Python 3 bindings

Stars: ✭ 55 (-52.17%)

Mutual labels: cuda

Deep Learning With Cats

Deep learning with cats (^._.^)

Stars: ✭ 1,290 (+1021.74%)

Mutual labels: cuda

CPP-Programming

Various C/C++ examples. DirectX, OpenGL, CUDA, Vulkan, OpenCL.

Stars: ✭ 30 (-73.91%)

Mutual labels: cuda

Scikit Cuda

Python interface to GPU-powered libraries

Stars: ✭ 803 (+598.26%)

Mutual labels: cuda

Gdax Orderbook Ml

Application of machine learning to the Coinbase (GDAX) orderbook

Stars: ✭ 60 (-47.83%)

Mutual labels: cuda

tiny-cuda-nn

Lightning fast & tiny C++/CUDA neural network framework

Stars: ✭ 908 (+689.57%)

Mutual labels: cuda

Blocksparse

Efficient GPU kernels for block-sparse matrix multiplication and convolution

Stars: ✭ 797 (+593.04%)

Mutual labels: cuda

Deepnet

Deep.Net machine learning framework for F#

Stars: ✭ 99 (-13.91%)

Mutual labels: cuda

H2o4gpu

H2Oai GPU Edition

Stars: ✭ 416 (+261.74%)

Mutual labels: cuda

octotiger

Astrophysics program simulating the evolution of star systems based on the fast multipole method on adaptive Octrees

Stars: ✭ 30 (-73.91%)

Mutual labels: cuda

Pycuda

CUDA integration for Python, plus shiny features

Stars: ✭ 1,112 (+866.96%)

Mutual labels: cuda

ThrustRTC

CUDA tool set for non-C++ languages that provides similar functionality like Thrust, with NVRTC at its core.

Stars: ✭ 41 (-64.35%)

Mutual labels: cuda

Numba

NumPy aware dynamic Python compiler using LLVM

Stars: ✭ 7,090 (+6065.22%)

Mutual labels: cuda

SimNDT

Ultrasonic NDT Simulator with engine core based on the Elastodynamic Finite Integration Technique (EFIT)

Stars: ✭ 34 (-70.43%)

Mutual labels: opencl

Minhashcuda

Weighted MinHash implementation on CUDA (multi-gpu).

Stars: ✭ 88 (-23.48%)

Mutual labels: cuda

Sixtyfour

How fast can we brute force a 64-bit comparison?

Stars: ✭ 41 (-64.35%)

Mutual labels: cuda

Deformable Convolution Pytorch

PyTorch implementation of Deformable Convolution

Stars: ✭ 410 (+256.52%)

Mutual labels: cuda

Tf Coriander

OpenCL 1.2 implementation for Tensorflow

Stars: ✭ 775 (+573.91%)

Mutual labels: opencl

Pytorch Baidu Ctc

PyTorch bindinga for Baidu's Warp-CTC

Stars: ✭ 61 (-46.96%)

Mutual labels: cuda

warp

continuous energy monte carlo neutron transport in general geometries on GPUs

Stars: ✭ 27 (-76.52%)

Mutual labels: cuda

Accelerate

Embedded language for high-performance array computations

Stars: ✭ 751 (+553.04%)

Mutual labels: cuda

Torch Mesh Isect

Stars: ✭ 107 (-6.96%)

Mutual labels: cuda

Tensorrt tutorial

Stars: ✭ 407 (+253.91%)

Mutual labels: cuda

bazel.cmake

bazel.cmake mimics the behavior of bazel to simplify the usability of CMake

Stars: ✭ 38 (-66.96%)

Mutual labels: cuda

Kintinuous

Real-time large scale dense visual SLAM system

Stars: ✭ 740 (+543.48%)

Mutual labels: cuda

Jampack

Experimental parallel compression algorithm

Stars: ✭ 21 (-81.74%)

Mutual labels: cuda

Flattened Cnn

Flattened convolutional neural networks (1D convolution modules for Torch nn)

Stars: ✭ 59 (-48.7%)

Mutual labels: cuda

Gunrock

High-Performance Graph Primitives on GPUs

Stars: ✭ 718 (+524.35%)

Mutual labels: cuda

Hiop

HPC solver for nonlinear optimization problems

Stars: ✭ 75 (-34.78%)

Mutual labels: cuda

Octree Slam

Large octree map construction and rendering with CUDA and OpenGL

Stars: ✭ 40 (-65.22%)

Mutual labels: cuda

Ai Lab

All-in-one AI container for rapid prototyping

Stars: ✭ 406 (+253.04%)

Mutual labels: cuda

ClothTOP

GPU-accelerated Cloth TOP node for TouchDesigner using the NVIDIA Flex physics solver.

Stars: ✭ 33 (-71.3%)

Mutual labels: cuda

Deep Learning Boot Camp

A community run, 5-day PyTorch Deep Learning Bootcamp

Stars: ✭ 1,270 (+1004.35%)

Mutual labels: cuda

Warp Ctc

Fast parallel CTC.

Stars: ✭ 3,954 (+3338.26%)

Mutual labels: cuda

Nbody

N body gravity attraction problem solver

Stars: ✭ 40 (-65.22%)

Mutual labels: cuda

Gocv

Go package for computer vision using OpenCV 4 and beyond.

Stars: ✭ 4,511 (+3822.61%)

Mutual labels: cuda

Ios ml

List of Machine Learning, AI, NLP solutions for iOS. The most recent version of this article can be found on my blog.

Stars: ✭ 1,409 (+1125.22%)

Mutual labels: gpgpu

Neural Api

CAI NEURAL API - Pascal based neural network API optimized for AVX, AVX2 and AVX512 instruction sets plus OpenCL capable devices including AMD, Intel and NVIDIA.

Stars: ✭ 94 (-18.26%)

Mutual labels: opencl

Bytecoder

Rich Domain Model for JVM Bytecode and Framework to interpret and transpile it.

Stars: ✭ 401 (+248.7%)

Mutual labels: opencl

Style Feature Reshuffle

caffe implementation of "Arbitrary Style Transfer with Deep Feature Reshuffle"

Stars: ✭ 38 (-66.96%)

Mutual labels: cuda

301-360 of 583 similar projects

first

‹

›