notebooksA docker-based starter kit for machine learning via jupyter notebooks. Designed for those who just want a runtime environment and get on with machine learning. Docker tags:
Stars: ✭ 29 (-50.85%)
dask-pytorch-ddpdask-pytorch-ddp is a Python package that makes it easy to train PyTorch models on dask clusters using distributed data parallel.
Stars: ✭ 50 (-15.25%)
Foundations of HPC 2021This repository collects the materials from the course "Foundations of HPC", 2021, at the Data Science and Scientific Computing Department, University of Trieste
Stars: ✭ 22 (-62.71%)
blockchain-reading-listA reading list on blockchain and related technologies, targeted at technical people who want a deep understanding of those topics.
Stars: ✭ 93 (+57.63%)
hp2pHeavy Peer To Peer: a MPI based benchmark for network diagnostic
Stars: ✭ 17 (-71.19%)
marsjsLabel images from Unsplash in browser - using MobileNet on Tensorflow.Js
Stars: ✭ 53 (-10.17%)
azurehpcThis repository provides easy automation scripts for building a HPC environment in Azure. It also includes examples to build e2e environment and run some of the key HPC benchmarks and applications.
Stars: ✭ 102 (+72.88%)
KoalasKoalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+5059.32%)
high-assurance-legacyLegacy code connected to the high-assurance implementation of the Ouroboros protocol family
Stars: ✭ 81 (+37.29%)
Every Single Day I TldrA daily digest of the articles or videos I've found interesting, that I want to share with you.
Stars: ✭ 249 (+322.03%)
leaflet heatmap简单的可视化湖州通话数据 假设数据量很大,没法用浏览器直接绘制热力图,把绘制热力图这一步骤放到线下计算分析。使用Apache Spark并行计算数据之后,再使用Apache Spark绘制热力图,然后用leafletjs加载OpenStreetMap图层和热力图图层,以达到良好的交互效果。现在使用Apache Spark实现绘制,可能是Apache Spark不擅长这方面的计算或者是我没有设计好算法,并行计算的速度比不上单机计算。Apache Spark绘制热力图和计算代码在这 https://github.com/yuanzhaokang/ParallelizeHeatmap.git .
Stars: ✭ 13 (-77.97%)
Data AcceleratorData Accelerator for Apache Spark simplifies onboarding to Streaming of Big Data. It offers a rich, easy to use experience to help with creation, editing and management of Spark jobs on Azure HDInsights or Databricks while enabling the full power of the Spark engine.
Stars: ✭ 247 (+318.64%)
Quora question pairs NLP KaggleQuora Kaggle Competition : Natural Language Processing using word2vec embeddings, scikit-learn and xgboost for training
Stars: ✭ 17 (-71.19%)
Neo4j Spark ConnectorNeo4j Connector for Apache Spark, which provides bi-directional read/write access to Neo4j from Spark, using the Spark DataSource APIs
Stars: ✭ 245 (+315.25%)
dbcsrDBCSR: Distributed Block Compressed Sparse Row matrix library
Stars: ✭ 65 (+10.17%)
RecommendationsystemBook recommender system using collaborative filtering based on Spark
Stars: ✭ 244 (+313.56%)
nebulaA distributed block-based data storage and compute engine
Stars: ✭ 127 (+115.25%)
Hadoop Docker基于Docker构建的Hadoop开发测试环境,包含Hadoop,Hive,HBase,Spark
Stars: ✭ 238 (+303.39%)
DevOpsDevOps code to deploy eScience services
Stars: ✭ 19 (-67.8%)
fdtd3dfdtd3d is an open source 1D, 2D, 3D FDTD electromagnetics solver with MPI, OpenMP and CUDA support for x86, arm, arm64 architectures
Stars: ✭ 77 (+30.51%)
MydatascienceportfolioApplying Data Science and Machine Learning to Solve Real World Business Problems
Stars: ✭ 227 (+284.75%)
Spark WorkshopApache Spark™ and Scala Workshops
Stars: ✭ 224 (+279.66%)
machine learningA gentle introduction to machine learning: data handling, linear regression, naive bayes, clustering
Stars: ✭ 22 (-62.71%)
Sagemaker SparkA Spark library for Amazon SageMaker.
Stars: ✭ 219 (+271.19%)
spark-word2vecA parallel implementation of word2vec based on Spark
Stars: ✭ 24 (-59.32%)
GimelBig Data Processing Framework - Unified Data API or SQL on Any Storage
Stars: ✭ 216 (+266.1%)
handson-ml도서 "핸즈온 머신러닝"의 예제와 연습문제를 담은 주피터 노트북입니다.
Stars: ✭ 285 (+383.05%)
QCFractalA distributed compute and database platform for quantum chemistry.
Stars: ✭ 107 (+81.36%)
Spark Knnk-Nearest Neighbors algorithm on Spark
Stars: ✭ 205 (+247.46%)
clinicaSoftware platform for clinical neuroimaging studies
Stars: ✭ 153 (+159.32%)
MmlsparkSimple and Distributed Machine Learning
Stars: ✭ 2,899 (+4813.56%)
model-deployment-flask'Deploying machine learning models with a Flask API' tutorial, written for HyperionDev
Stars: ✭ 64 (+8.47%)
BallistaDistributed compute platform implemented in Rust, and powered by Apache Arrow.
Stars: ✭ 2,274 (+3754.24%)
GalaxyGalaxy is an asynchronous parallel visualization ray tracer for performant rendering in distributed computing environments. Galaxy builds upon Intel OSPRay and Intel Embree, including ray queueing and sending logic inspired by TACC GraviT.
Stars: ✭ 18 (-69.49%)
distexDistributed process pool for Python
Stars: ✭ 101 (+71.19%)
Kotlin Spark ApiThis projects gives Kotlin bindings and several extensions for Apache Spark. We are looking to have this as a part of Apache Spark 3.x
Stars: ✭ 183 (+210.17%)
PracticalMachineLearningA collection of ML related stuff including notebooks, codes and a curated list of various useful resources such as books and softwares. Almost everything mentioned here is free (as speech not free food) or open-source.
Stars: ✭ 60 (+1.69%)
Awesome-ScriptsA collection of awesome scripts from developers around the globe.
Stars: ✭ 135 (+128.81%)
XsqlUnified SQL Analytics Engine Based on SparkSQL
Stars: ✭ 176 (+198.31%)
data-science-learning📊 All of courses, assignments, exercises, mini-projects and books that I've done so far in the process of learning by myself Machine Learning and Data Science.
Stars: ✭ 32 (-45.76%)
Kraps RpcA RPC framework leveraging Spark RPC module
Stars: ✭ 175 (+196.61%)
kdd99-scikitSolutions to kdd99 dataset with Decision tree and Neural network by scikit-learn
Stars: ✭ 50 (-15.25%)
Spark NlpState of the Art Natural Language Processing
Stars: ✭ 2,518 (+4167.8%)
plinycomputeA system for development of high-performance, data-intensive, distributed computing, applications, tools, and libraries.
Stars: ✭ 27 (-54.24%)
TransmogrifaiTransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
Stars: ✭ 2,084 (+3432.2%)
awesome-AI-kubernetes❄️ 🐳 Awesome tools and libs for AI, Deep Learning, Machine Learning, Computer Vision, Data Science, Data Analytics and Cognitive Computing that are baked in the oven to be Native on Kubernetes and Docker with Python, R, Scala, Java, C#, Go, Julia, C++ etc
Stars: ✭ 95 (+61.02%)
splinkImplementation of Fellegi-Sunter's canonical model of record linkage in Apache Spark, including EM algorithm to estimate parameters
Stars: ✭ 181 (+206.78%)
pycobrapython library implementing ensemble methods for regression, classification and visualisation tools including Voronoi tesselations.
Stars: ✭ 111 (+88.14%)
hpcLearning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )
Stars: ✭ 39 (-33.9%)
pyspark-algorithmsPySpark Algorithms Book: https://www.amazon.com/dp/B07X4B2218/ref=sr_1_2
Stars: ✭ 72 (+22.03%)
ml-restREST API (and possible UI) for Machine Learning workflows
Stars: ✭ 62 (+5.08%)
spark-stringmetricSpark functions to run popular phonetic and string matching algorithms
Stars: ✭ 51 (-13.56%)