All Categories → Data Processing → data-science

Top 1642 data-science open source projects

Drake
An R-focused pipeline toolkit for reproducibility and high-performance computing
Starcraft2 Replay Analysis
A jupyter notebook that provides analysis for StarCraft 2 replays
Pymc Example Project
Example PyMC3 project for performing Bayesian data analysis using a probabilistic programming approach to machine learning.
Epfl
EPFL summaries & cheatsheets over 5 years (computer science, communication systems, data science and computational neuroscience).
Daily Coding Problem
Series of the problem 💯 and solution ✅ asked by Daily Coding problem👨‍🎓 website.
Vvedenie Mashinnoe Obuchenie
📝 Подборка ресурсов по машинному обучению
Stocker
Financial Web Scraper & Sentiment Classifier
Kaggle Competitions
There are plenty of courses and tutorials that can help you learn machine learning from scratch but here in GitHub, I want to solve some Kaggle competitions as a comprehensive workflow with python packages. After reading, you can use this workflow to solve other real problems and use it as a template.
Aile
Automatic Item List Extraction
Topic Modeling Tool
A point-and-click tool for creating and analyzing topic models produced by MALLET.
Pymrmr
Python3 binding to mRMR Feature Selection algorithm (currently not maintained)
R Text Data
List of textual data sources to be used for text mining in R
Sortingalgorithm.hayateshiki
Hayate-Shiki is an improved merge sort algorithm with the goal of "faster than quick sort".
Xcessiv
A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling in Python.
Conferences
List of Machine Learning & Data Science Conferences
Flyte
Accelerate your ML and Data workflows to production. Flyte is a production grade orchestration system for your Data and ML workloads. It has been battle tested at Lyft, Spotify, freenome and others and truly open-source.
Dex
Dex : The Data Explorer -- A data visualization tool written in Java/Groovy/JavaFX capable of powerful ETL and publishing web visualizations.
Gopup
数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…
Pydepta
A python implementation of DEPTA
Phormatics
Using A.I. and computer vision to build a virtual personal fitness trainer. (Most Startup-Viable Hack - HackNYU2018)
Sayn
Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).
Tsv Utils
eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
Notebooks
A collection of Jupyter/IPython notebooks
Learning python
Source material for Python Like You Mean it
Covid19 Dashboard
A site that displays up to date COVID-19 stats, powered by fastpages.
Hyperlearn
50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster
Neuralpy
NeuralPy: A Keras like deep learning library works on top of PyTorch
Uc R.github.io
Main repository for R programming courses @ University of Cincinnati, courses and tutorials that focus on data wrangling, exploration, visualization, and analysis with R.
Suspeitando
Projeto de análise de contratos com suspeita de superfaturamento e má qualidade na prestação de serviços.
Oswitch
Provides access to complex Bioinformatics software (even BioLinux!) in just one command.
Kbet
An R package to test for batch effects in high-dimensional single-cell RNA sequencing data.
Tsrepr
TSrepr: R package for time series representations
Tgcontest
Telegram Data Clustering contest solution by Mindful Squirrel
Dream3d
Data Analysis program and framework for materials science data analytics, based on the managing framework SIMPL framework.
Magicbox
A platform that uses real-time data to inform life-saving humanitarian responses to emergency situations
Gorilla Notebook
A clojure/clojurescript notebook application/-library based on Gorilla-REPL
Permon
A tool to monitor everything you want. Clean, simple, extensible and in one place.
Asne
A sparsity aware and memory efficient implementation of "Attributed Social Network Embedding" (TKDE 2018).
Budgetml
Deploy a ML inference service on a budget in less than 10 lines of code.
Data Science Roadmap
Roadmap to learn Data Science and related areas.
361-420 of 1642 data-science projects