H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+13695.12%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (+212.2%)
VerticapyVerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.
Stars: ✭ 59 (+43.9%)
DatasciencevmTools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
Stars: ✭ 153 (+273.17%)
S3gits3git: git for Cloud Storage. Distributed Version Control for Data. Create decentralized and versioned repos that scale infinitely to 100s of millions of files. Clone huge PB-scale repos on your local SSD to make changes, commit and push back. Oh yeah, it dedupes too and offers directory versioning.
Stars: ✭ 1,287 (+3039.02%)
VizukaExplore high-dimensional datasets and how your algo handles specific regions.
Stars: ✭ 100 (+143.9%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (+160.98%)
Datascience Ai Machinelearning ResourcesAlex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.
Stars: ✭ 414 (+909.76%)
CoursesQuiz & Assignment of Coursera
Stars: ✭ 454 (+1007.32%)
PachydermReproducible Data Science at Scale!
Stars: ✭ 5,305 (+12839.02%)
MazeMaze Applied Reinforcement Learning Framework
Stars: ✭ 85 (+107.32%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+3163.41%)
Just Dashboard📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+3585.37%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (+58.54%)
nebulaA distributed, fast open-source graph database featuring horizontal scalability and high availability
Stars: ✭ 8,196 (+19890.24%)
DatmoOpen source production model management tool for data scientists
Stars: ✭ 334 (+714.63%)
TrinoOfficial repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+11073.17%)
Data Science CareerCareer Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository
Stars: ✭ 630 (+1436.59%)
ScannerEfficient video analysis at scale
Stars: ✭ 569 (+1287.8%)
TitanoboaTitanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.
Stars: ✭ 787 (+1819.51%)
PlynxPLynx is a domain agnostic platform for managing reproducible experiments and data-oriented workflows.
Stars: ✭ 192 (+368.29%)
AutodlAutomated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]
Stars: ✭ 854 (+1982.93%)
NipypeWorkflows and interfaces for neuroimaging packages
Stars: ✭ 557 (+1258.54%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+53675.61%)
Datumbox FrameworkDatumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.
Stars: ✭ 1,063 (+2492.68%)
Oie ResourcesA curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Stars: ✭ 283 (+590.24%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (+92.68%)
MlboxMLBox is a powerful Automated Machine Learning python library.
Stars: ✭ 1,199 (+2824.39%)
NniAn open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
Stars: ✭ 10,698 (+25992.68%)
Pythondatarepo for code published on pythondata.com
Stars: ✭ 113 (+175.61%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (+165.85%)
AcceleratorThe Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: ✭ 137 (+234.15%)
CrateCrateDB is a distributed SQL database that makes it simple to store and analyze
massive amounts of data in real-time.
Stars: ✭ 3,254 (+7836.59%)
RayAn open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.
Stars: ✭ 18,547 (+45136.59%)
KoalasKoalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+7324.39%)
dxramA distributed in-memory key-value storage for billions of small objects.
Stars: ✭ 25 (-39.02%)
Data Science Live BookAn open source book to learn data science, data analysis and machine learning, suitable for all ages!
Stars: ✭ 193 (+370.73%)
GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (+270.73%)
HazelcastOpen-source distributed computation and storage platform
Stars: ✭ 4,662 (+11270.73%)
PretzelJavascript full-stack framework for Big Data visualisation and analysis
Stars: ✭ 26 (-36.59%)
DataflowjavasdkGoogle Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (+1982.93%)
Nsfw Filter🚀 A Google Chrome / Firefox extension that blocks NSFW images from the web pages that you load using TensorFlow JS.
Stars: ✭ 984 (+2300%)
Weidentity基于区块链的符合W3C DID和Verifiable Credential规范的分布式身份解决方案
Stars: ✭ 972 (+2270.73%)
Python TrainingPython training for business analysts and traders
Stars: ✭ 972 (+2270.73%)
Data PolygamyData Polygamy is a topology-based framework that allows users to query for statistically significant relationships between spatio-temporal data sets.
Stars: ✭ 39 (-4.88%)
Esper TvEsper instance for TV news analysis
Stars: ✭ 37 (-9.76%)
Feagen(deprecated) A fast and memory-efficient Python data engineering framework for machine learning.
Stars: ✭ 33 (-19.51%)
ParalleldistR Package: Parallel Distance Matrix Computation using Multiple Threads
Stars: ✭ 37 (-9.76%)
Mljar SupervisedAutomated Machine Learning Pipeline with Feature Engineering and Hyper-Parameters Tuning 🚀
Stars: ✭ 961 (+2243.9%)