H2o 3H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
Stars: ✭ 5,656 (+915.44%)
Tennis Crystal BallUltimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-80.79%)
Spark Py NotebooksApache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 1,338 (+140.22%)
ScipipeRobust, flexible and resource-efficient pipelines using Go and the commandline
Stars: ✭ 826 (+48.29%)
NextflowA DSL for data-driven computational pipelines
Stars: ✭ 1,337 (+140.04%)
Dnai.EditorDnai Editor - Visual Scripting (Node Editor)
Stars: ✭ 117 (-78.99%)
VerticapyVerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.
Stars: ✭ 59 (-89.41%)
RsparklingRSparkling: Use H2O Sparkling Water from R (Spark + R + Machine Learning)
Stars: ✭ 65 (-88.33%)
KoalasKoalas: pandas API on Apache Spark
Stars: ✭ 3,044 (+446.5%)
TitanoboaTitanoboa makes complex workflows easy. It is a low-code workflow orchestration platform for JVM - distributed, highly scalable and fault tolerant.
Stars: ✭ 787 (+41.29%)
Just Dashboard📊 📋 Dashboards using YAML or JSON files
Stars: ✭ 1,511 (+171.27%)
Oie ResourcesA curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Stars: ✭ 283 (-49.19%)
ChigraphA visual systems language for beginners compiled using LLVM
Stars: ✭ 247 (-55.66%)
TrinoOfficial repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
Stars: ✭ 4,581 (+722.44%)
Pythondatarepo for code published on pythondata.com
Stars: ✭ 113 (-79.71%)
DataflowjavasdkGoogle Cloud Dataflow provides a simple, powerful model for building both batch and streaming parallel data processing pipelines.
Stars: ✭ 854 (+53.32%)
AttacaRobust, distributed version control for large files.
Stars: ✭ 41 (-92.64%)
DltkDeep Learning Toolkit for Medical Image Analysis
Stars: ✭ 1,249 (+124.24%)
VizukaExplore high-dimensional datasets and how your algo handles specific regions.
Stars: ✭ 100 (-82.05%)
DatasciencevmTools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
Stars: ✭ 153 (-72.53%)
GeniA Clojure dataframe library that runs on Spark
Stars: ✭ 152 (-72.71%)
PrefectThe easiest way to automate your data
Stars: ✭ 7,956 (+1328.37%)
RaftlibThe RaftLib C++ library, streaming/dataflow concurrency via C++ iostream-like operators
Stars: ✭ 717 (+28.73%)
DFiantDFiant: A Dataflow Hardware Descripition Language
Stars: ✭ 21 (-96.23%)
flowdAn inter-language runtime for flow-based programming (FBP)
Stars: ✭ 18 (-96.77%)
Data Science Live BookAn open source book to learn data science, data analysis and machine learning, suitable for all ages!
Stars: ✭ 193 (-65.35%)
Data Science Ipython NotebooksData science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
Stars: ✭ 22,048 (+3858.35%)
Griffon VmGriffon Data Science Virtual Machine
Stars: ✭ 128 (-77.02%)
DrawflowSimple flow library 🖥️🖱️
Stars: ✭ 730 (+31.06%)
AcceleratorThe Accelerator is a tool for fast and reproducible processing of large amounts of data.
Stars: ✭ 137 (-75.4%)
dspatchThe Refreshingly Simple Cross-Platform C++ Dataflow / Pipelining / Stream Processing / Reactive Programming Framework
Stars: ✭ 124 (-77.74%)
actACT hardware description language and core tools.
Stars: ✭ 53 (-90.48%)
PachydermReproducible Data Science at Scale!
Stars: ✭ 5,305 (+852.42%)
Data Science CareerCareer Resources for Data Science, Machine Learning, Big Data and Business Analytics Career Repository
Stars: ✭ 630 (+13.11%)
AutodlAutomated Deep Learning without ANY human intervention. 1'st Solution for AutoDL [email protected]
Stars: ✭ 854 (+53.32%)
PretzelJavascript full-stack framework for Big Data visualisation and analysis
Stars: ✭ 26 (-95.33%)
Datumbox FrameworkDatumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.
Stars: ✭ 1,063 (+90.84%)
SetlA simple Spark-powered ETL framework that just works 🍺
Stars: ✭ 79 (-85.82%)
Spark R Notebooks R on Apache Spark (SparkR) tutorials for Big Data analysis and Machine Learning as IPython / Jupyter notebooks
Stars: ✭ 109 (-80.43%)
BatchflowBatchFlow helps you conveniently work with random or sequential batches of your data and define data processing and machine learning workflows even for datasets that do not fit into memory.
Stars: ✭ 156 (-71.99%)
Datascience Ai Machinelearning ResourcesAlex Castrounis' curated set of resources for artificial intelligence (AI), machine learning, data science, internet of things (IoT), and more.
Stars: ✭ 414 (-25.67%)
CoursesQuiz & Assignment of Coursera
Stars: ✭ 454 (-18.49%)
Pgm Index🏅State-of-the-art learned data structure that enables fast lookup, predecessor, range searches and updates in arrays of billions of items using orders of magnitude less space than traditional indexes
Stars: ✭ 499 (-10.41%)
Course V3The 3rd edition of course.fast.ai
Stars: ✭ 4,785 (+759.07%)
SnorkelA system for quickly generating training data with weak supervision
Stars: ✭ 4,953 (+789.23%)
Stream FrameworkStream Framework is a Python library, which allows you to build news feed, activity streams and notification systems using Cassandra and/or Redis. The authors of Stream-Framework also provide a cloud service for feed technology:
Stars: ✭ 4,576 (+721.54%)
Feature SelectionFeatures selector based on the self selected-algorithm, loss function and validation method
Stars: ✭ 534 (-4.13%)
DapyEasy-to-use data analysis / manipulation framework for humans
Stars: ✭ 523 (-6.1%)
Awesome RA curated list of awesome R packages, frameworks and software.
Stars: ✭ 4,858 (+772.17%)
Dataframe GoDataFrames for Go: For statistics, machine-learning, and data manipulation/exploration
Stars: ✭ 487 (-12.57%)
Disk.frameFast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data
Stars: ✭ 517 (-7.18%)
Fit SneFast Fourier Transform-accelerated Interpolation-based t-SNE (FIt-SNE)
Stars: ✭ 485 (-12.93%)
CouchdbSeamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
Stars: ✭ 5,166 (+827.47%)
GlueLinked Data Visualizations Across Multiple Files
Stars: ✭ 518 (-7%)
Learn Data Science For FreeThis repositary is a combination of different resources lying scattered all over the internet. The reason for making such an repositary is to combine all the valuable resources in a sequential manner, so that it helps every beginners who are in a search of free and structured learning resource for Data Science. For Constant Updates Follow me in …
Stars: ✭ 4,757 (+754.04%)
Machine Learning RoadmapA roadmap connecting many of the most important concepts in machine learning, how to learn them and what tools to use to perform them.
Stars: ✭ 5,277 (+847.4%)