Benchm MlA minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
LabsLabs for the Foundations of Applied Mathematics curriculum.
Ml Workspace🛠 All-in-one web-based IDE specialized for machine learning and data science.
TestovoeHome assignments for data science positions
Project kojakTraining a Neural Network to Detect Gestures and Control Smart Home Devices with OpenCV in Python
Ml Hub🧰 Multi-user development platform for machine learning teams. Simple to setup within minutes.
Nyc TransportA Unified Database of NYC transport (subway, taxi/Uber, and citibike) data.
EvalmlEvalML is an AutoML library written in python.
DatacompyPandas and Spark DataFrame comparison for humans
PycwtA Python module for continuous wavelet spectral analysis. It includes a collection of routines for wavelet transform and statistical analysis via FFT algorithm. In addition, the module also includes cross-wavelet transforms, wavelet coherence tests and sample scripts.
Fantasy Basketball Scraping statistics, predicting NBA player performance with neural networks and boosting algorithms, and optimising lineups for Draft Kings with genetic algorithm. Capstone Project for Machine Learning Engineer Nanodegree by Udacity.
Selfie2animeAnime2Selfie Backend Services - Lambda, Queue, API Gateway and traffic processing
Docker tutorialCode and helper scripts for article on Medium "How Docker Can Help You Become A More Effective Data Scientist"
TextbookPrinciples and Techniques of Data Science, the textbook for Data 100 at UC Berkeley
Py RseResearch Software Engineering with Python course material
TscvTime Series Cross-Validation -- an extension for scikit-learn
Bodywork CoreDeploy machine learning projects developed in Python, to Kubernetes. Accelerated MLOps 🚀
Scalable Data ScienceScalable Data Science, course sets in big data Using Apache Spark over databricks and their mathematical, statistical and computational foundations using SageMath.
StumpySTUMPY is a powerful and scalable Python library for modern time series analysis
RaspberryturkThe Raspberry Turk is a robot that can play chess—it's entirely open source, based on Raspberry Pi, and inspired by the 18th century chess playing machine, the Mechanical Turk.
Book This book serves as an introduction to a whole new way of thinking systematically about geographic data, using geographical analysis and computation to unlock new insights hidden within data.
MatrixprofileA Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.
NlpaugData augmentation for NLP
Ripser.pyA Lean Persistent Homology Library for Python
TomaHelps you write algorithms in PyTorch that adapt to the available (CUDA) memory
DatasciencecourseraData Science Repo and blog for John Hopkins Coursera Courses. Please let me know if you have any questions.
TrafficA toolbox for processing and analysing air traffic data
ScilabFree and Open Source software for numerical computation providing a powerful computing environment for engineering and scientific applications.
Machine Learning And Data ScienceThis is a repository which contains all my work related Machine Learning, AI and Data Science. This includes my graduate projects, machine learning competition codes, algorithm implementations and reading material.
AcceleratorThe Accelerator is a tool for fast and reproducible processing of large amounts of data.
PandasschemaA validation library for Pandas data frames using user-friendly schemas
Qlik Py ToolsData Science algorithms for Qlik implemented as a Python Server Side Extension (SSE).
Beyond Jupyter🐍💻📊 All material from the PyCon.DE 2018 Talk "Beyond Jupyter Notebooks - Building your own data science platform with Python & Docker" (incl. Slides, Video, Udemy MOOC & other References)
Blockchain2graphBlockchain2graph extracts blockchain data (bitcoin) and insert them into a graph database (neo4j).
AcceleratorsData science and AI solution accelerator suite that provides templates for prototyping, reporting, and presenting data science analytics of specific domains