All Projects → krzjoa → Awesome Python Data Science

krzjoa / Awesome Python Data Science

Licence: cc-by-4.0
Probably the best curated list of data science software in Python.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Awesome Python Data Science

Tablesaw
Java dataframe and visualization library
Stars: ✭ 2,785 (+242.98%)
Mutual labels:  data-science, statistics, data-analysis, data-visualization
Socrat
A Dynamic Web Toolbox for Interactive Data Processing, Analysis, and Visualization
Stars: ✭ 26 (-96.8%)
Mutual labels:  data-science, statistics, data-analysis, data-visualization
Model Describer
model-describer : Making machine learning interpretable to humans
Stars: ✭ 22 (-97.29%)
Mutual labels:  data-science, data-analysis, scikit-learn, data-visualization
Sweetviz
Visualize and compare datasets, target values and associations, with one line of code.
Stars: ✭ 1,851 (+127.96%)
Mutual labels:  data-science, statistics, data-analysis, data-visualization
Ml Workspace
🛠 All-in-one web-based IDE specialized for machine learning and data science.
Stars: ✭ 2,337 (+187.81%)
Mutual labels:  data-science, data-analysis, scikit-learn, data-visualization
Dat8
General Assembly's 2015 Data Science course in Washington, DC
Stars: ✭ 1,516 (+86.7%)
Mutual labels:  data-science, data-analysis, scikit-learn, data-visualization
Hyperlearn
50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster
Stars: ✭ 1,204 (+48.28%)
Mutual labels:  data-science, statistics, data-analysis, scikit-learn
Amazing Feature Engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
Stars: ✭ 218 (-73.15%)
Mutual labels:  data-science, data-analysis, scikit-learn, data-visualization
Datascience
Curated list of Python resources for data science.
Stars: ✭ 3,051 (+275.74%)
Mutual labels:  data-science, statistics, data-analysis, data-visualization
Imbalanced Learn
A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning
Stars: ✭ 5,617 (+591.75%)
Mutual labels:  data-science, statistics, data-analysis
Data Science Hacks
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (-66.38%)
Mutual labels:  data-science, data-analysis, data-visualization
Pydataroad
open source for wechat-official-account (ID: PyDataLab)
Stars: ✭ 302 (-62.81%)
Mutual labels:  data-science, data-analysis, data-visualization
Xlearn
High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.
Stars: ✭ 2,968 (+265.52%)
Mutual labels:  data-science, statistics, data-analysis
Deep Learning Machine Learning Stock
Stock for Deep Learning and Machine Learning
Stars: ✭ 240 (-70.44%)
Mutual labels:  data-science, data-analysis, data-visualization
Cryptocurrency Analysis Python
Open-Source Tutorial For Analyzing and Visualizing Cryptocurrency Data
Stars: ✭ 278 (-65.76%)
Mutual labels:  data-science, data-analysis, data-visualization
Orange3
🍊 📊 💡 Orange: Interactive data analysis
Stars: ✭ 3,152 (+288.18%)
Mutual labels:  data-science, scikit-learn, data-visualization
Articles
A repository for the source code, notebooks, data, files, and other assets used in the data science and machine learning articles on LearnDataSci
Stars: ✭ 350 (-56.9%)
Mutual labels:  data-science, data-analysis, data-visualization
Data Science
Collection of useful data science topics along with code and articles
Stars: ✭ 315 (-61.21%)
Mutual labels:  data-science, data-analysis, data-visualization
Cjworkbench
The data journalism platform with built in training
Stars: ✭ 244 (-69.95%)
Mutual labels:  data-science, data-analysis, data-visualization
Scikit Mobility
scikit-mobility: mobility analysis in Python
Stars: ✭ 339 (-58.25%)
Mutual labels:  data-science, statistics, data-analysis
pyds


Awesome Python Data Science

Probably the best curated list of data science software in Python

Contents

Machine Learning

General Purpouse Machine Learning

  • scikit-learn - Machine learning in Python. sklearn
  • Shogun - Machine learning toolbox.
  • xLearn - High Performance, Easy-to-use, and Scalable Machine Learning Package.
  • cuML - RAPIDS Machine Learning Library. sklearn GPU accelerated
  • modAL - Modular active learning framework for Python3. sklearn
  • Sparkit-learn - PySpark + scikit-learn = Sparkit-learn. sklearn Apache Spark based
  • mlpack - A scalable C++ machine learning library (Python bindings).
  • dlib - Toolkit for making real world machine learning and data analysis applications in C++ (Python bindings).
  • MLxtend - Extension and helper modules for Python's data analysis and machine learning libraries. sklearn
  • hyperlearn - 50%+ Faster, 50%+ less RAM usage, GPU support re-written Sklearn, Statsmodels. sklearn PyTorch based/compatible
  • Reproducible Experiment Platform (REP) - Machine Learning toolbox for Humans. sklearn
  • scikit-multilearn - Multi-label classification for python. sklearn
  • seqlearn - Sequence classification toolkit for Python. sklearn
  • pystruct - Simple structured learning framework for Python. sklearn
  • sklearn-expertsys - Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models. sklearn
  • RuleFit - Implementation of the rulefit. sklearn
  • metric-learn - Metric learning algorithms in Python. sklearn
  • pyGAM - Generalized Additive Models in Python.
  • Karate Club - An unsupervised machine learning library for graph structured data.
  • Little Ball of Fur - A library for sampling graph structured data.
  • causalml - Uplift modeling and causal inference with machine learning algorithms. sklearn

Time Series

  • tslearn - Machine learning toolkit dedicated to time-series data. sklearn
  • tick - Module for statistical learning, with a particular emphasis on time-dependent modelling. sklearn
  • Prophet - Automatic Forecasting Procedure.
  • PyFlux - Open source time series library for Python.
  • bayesloop - Probabilistic programming framework that facilitates objective model selection for time-varying parameter models.
  • luminol - Anomaly Detection and Correlation library.
  • dateutil - Powerful extensions to the standard datetime module
  • maya - makes it very easy to parse a string and for changing timezones

Automated Machine Learning

  • TPOT - Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming. sklearn
  • auto-sklearn - An automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator. sklearn
  • MLBox - A powerful Automated Machine Learning python library.

Ensemble Methods

  • ML-Ensemble - High performance ensemble learning. sklearn
  • Stacking - Simple and useful stacking library, written in Python. sklearn
  • stacked_generalization - Library for machine learning stacking generalization. sklearn
  • vecstack - Python package for stacking (machine learning technique). sklearn

Imbalanced Datasets

  • imbalanced-learn - Module to perform under sampling and over sampling with various techniques. sklearn
  • imbalanced-algorithms - Python-based implementations of algorithms for learning on imbalanced data. sklearn sklearn

Random Forests

Extreme Learning Machine

  • Python-ELM - Extreme Learning Machine implementation in Python. sklearn
  • Python Extreme Learning Machine (ELM) - A machine learning technique used for classification/regression tasks.
  • hpelm - High performance implementation of Extreme Learning Machines (fast randomized neural networks). GPU accelerated

Kernel Methods

  • pyFM - Factorization machines in python. sklearn
  • fastFM - A library for Factorization Machines. sklearn
  • tffm - TensorFlow implementation of an arbitrary order Factorization Machine. sklearn sklearn
  • liquidSVM - An implementation of SVMs.
  • scikit-rvm - Relevance Vector Machine implementation using the scikit-learn API. sklearn
  • ThunderSVM - A fast SVM Library on GPUs and CPUs. sklearn GPU accelerated

Gradient Boosting

  • XGBoost - Scalable, Portable and Distributed Gradient Boosting. sklearn GPU accelerated
  • LightGBM - A fast, distributed, high performance gradient boosting. sklearn GPU accelerated
  • CatBoost - An open-source gradient boosting on decision trees library. sklearn GPU accelerated
  • ThunderGBM - Fast GBDTs and Random Forests on GPUs. sklearn GPU accelerated

Deep Learning

PyTorch

  • PyTorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration. PyTorch based/compatible
  • torchvision - Datasets, Transforms and Models specific to Computer Vision. PyTorch based/compatible
  • torchtext - Data loaders and abstractions for text and NLP. PyTorch based/compatible
  • torchaudio - An audio library for PyTorch. PyTorch based/compatible
  • ignite - High-level library to help with training neural networks in PyTorch. PyTorch based/compatible
  • PyToune - A Keras-like framework and utilities for PyTorch.
  • skorch - A scikit-learn compatible neural network library that wraps pytorch. sklearn PyTorch based/compatible
  • PyTorchNet - An abstraction to train neural networks. PyTorch based/compatible
  • pytorch_geometric - Geometric Deep Learning Extension Library for PyTorch. PyTorch based/compatible
  • Catalyst - High-level utils for PyTorch DL & RL research. PyTorch based/compatible
  • pytorch_geometric_temporal - Temporal Extension Library for PyTorch Geometric. PyTorch based/compatible

TensorFlow

  • TensorFlow - Computation using data flow graphs for scalable machine learning by Google. sklearn
  • TensorLayer - Deep Learning and Reinforcement Learning Library for Researcher and Engineer. sklearn
  • TFLearn - Deep learning library featuring a higher-level API for TensorFlow. sklearn
  • Sonnet - TensorFlow-based neural network library. sklearn
  • tensorpack - A Neural Net Training Interface on TensorFlow. sklearn
  • Polyaxon - A platform that helps you build, manage and monitor deep learning models. sklearn
  • NeuPy - NeuPy is a Python library for Artificial Neural Networks and Deep Learning (previously: Theano compatible). sklearn
  • tfdeploy - Deploy tensorflow graphs for fast evaluation and export to tensorflow-less environments running numpy. sklearn
  • tensorflow-upstream - TensorFlow ROCm port. sklearn Possible to run on AMD GPU
  • TensorFlow Fold - Deep learning with dynamic computation graphs in TensorFlow. sklearn
  • tensorlm - Wrapper library for text generation / language models at char and word level with RNN. sklearn
  • TensorLight - A high-level framework for TensorFlow. sklearn
  • Mesh TensorFlow - Model Parallelism Made Easier. sklearn
  • Ludwig - A toolbox, that allows to train and test deep learning models without the need to write code. sklearn
  • Keras - A high-level neural networks API running on top of TensorFlow. Keras compatible
  • keras-contrib - Keras community contributions. Keras compatible
  • Hyperas - Keras + Hyperopt: A very simple wrapper for convenient hyperparameter. Keras compatible
  • Elephas - Distributed Deep learning with Keras & Spark. Keras compatible
  • Hera - Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser. Keras compatible
  • Spektral - Deep learning on graphs. Keras compatible
  • qkeras - A quantization deep learning library. Keras compatible

MXNet

  • MXNet - Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler. MXNet based
  • Gluon - A clear, concise, simple yet powerful and efficient API for deep learning (now included in MXNet). MXNet based
  • MXbox - Simple, efficient and flexible vision toolbox for mxnet framework. MXNet based
  • gluon-cv - Provides implementations of the state-of-the-art deep learning models in computer vision. MXNet based
  • gluon-nlp - NLP made easy. MXNet based
  • Xfer - Transfer Learning library for Deep Neural Networks. MXNet based
  • MXNet - HIP Port of MXNet. MXNet based Possible to run on AMD GPU

Others

  • Tangent - Source-to-Source Debuggable Derivatives in Pure Python.
  • autograd - Efficiently computes derivatives of numpy code.
  • Myia - Deep Learning framework (pre-alpha).
  • nnabla - Neural Network Libraries by Sony.
  • Caffe - A fast open framework for deep learning.
  • hipCaffe - The HIP port of Caffe. Possible to run on AMD GPU

DISCONTINUED PROJECTS

Web Scraping

  • BeautifulSoup: The easiest library to scrape static websites for beginners
  • Scrapy: Fast and extensible scraping library. Can write rules and create customized scraper without touching the coure
  • Selenium: Use Selenium Python API to access all functionalities of Selenium WebDriver in an intuitive way like a real user.
  • Pattern: High level scraping for well-establish websites such as Google, Twitter, and Wikipedia. Also has NLP, machine learning algorithms, and visualization
  • twitterscraper: Efficient library to scrape twitter

Data Manipulation

Data Containers

  • pandas - Powerful Python data analysis toolkit.
  • pandas_profiling - Create HTML profiling reports from pandas DataFrame objects
  • cuDF - GPU DataFrame Library. pandas compatible GPU accelerated
  • blaze - NumPy and pandas interface to Big Data. pandas compatible
  • pandasql - Allows you to query pandas DataFrames using SQL syntax. pandas compatible
  • pandas-gbq - pandas Google Big Query. pandas compatible
  • xpandas - Universal 1d/2d data containers with Transformers .functionality for data analysis by The Alan Turing Institute.
  • pysparkling - A pure Python implementation of Apache Spark's RDD and DStream interfaces. Apache Spark based
  • Arctic - High performance datastore for time series and tick data.
  • datatable - Data.table for Python. R inspired/ported lib
  • koalas - pandas API on Apache Spark. pandas compatible
  • modin - Speed up your pandas workflows by changing a single line of code. pandas compatible
  • swifter - A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner.
  • pandas_flavor - A package which allow to write your own flavor of Pandas easily.
  • pandas-log - A package which allow to provide feedback about basic pandas operations and find both buisness logic and performance issues.
  • vaex - Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second.

Pipelines

  • pdpipe - Sasy pipelines for pandas DataFrames.
  • SSPipe - Python pipe (|) operator with support for DataFrames and Numpy and Pytorch.
  • pandas-ply - Functional data manipulation for pandas. pandas compatible
  • Dplython - Dplyr for Python. R inspired/ported lib
  • sklearn-pandas - pandas integration with sklearn. sklearn pandas compatible
  • Dataset - Helps you conveniently work with random or sequential batches of your data and define data processing.
  • pyjanitor - Clean APIs for data cleaning. pandas compatible
  • meza - A Python toolkit for processing tabular data.
  • Prodmodel - Build system for data science pipelines.
  • dopanda - Hints and tips for using pandas in an analysis environment. pandas compatible
  • CircleCi: Automates your software builds, tests, and deployments.

Feature Engineering

General

  • Featuretools - Automated feature engineering.
  • skl-groups - A scikit-learn addon to operate on set/"group"-based features. sklearn
  • Feature Forge - A set of tools for creating and testing machine learning feature. sklearn
  • few - A feature engineering wrapper for sklearn. sklearn
  • scikit-mdr - A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction. sklearn
  • tsfresh - Automatic extraction of relevant features from time series. sklearn

Feature Selection

  • scikit-feature - Feature selection repository in python.
  • boruta_py - Implementations of the Boruta all-relevant feature selection method. sklearn
  • BoostARoota - A fast xgboost feature selection algorithm. sklearn
  • scikit-rebate - A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning. sklearn

Visualization

General Purposes

  • Matplotlib - Plotting with Python.
  • seaborn - Statistical data visualization using matplotlib.
  • prettyplotlib - Painlessly create beautiful matplotlib plots.
  • python-ternary - Ternary plotting library for python with matplotlib.
  • missingno - Missing data visualization module for Python.
  • chartify - Python library that makes it easy for data scientists to create charts.
  • physt - Improved histograms.

Interactive plots

  • animatplot - A python package for animating plots build on matplotlib.
  • plotly - A Python library that makes interactive and publication-quality graphs.
  • Bokeh - Interactive Web Plotting for Python.
  • Altair - Declarative statistical visualization library for Python. Can easily do many data transformation within the code to create graph
  • bqplot - Plotting library for IPython/Jupyter notebooks

Map

  • folium - Makes it easy to visualize data on an interactive open street map
  • geemap - Python package for interactive mapping with Google Earth Engine (GEE)

Automatic Plotting

  • HoloViews - Stop plotting your data - annotate your data and let it visualize itself.
  • AutoViz: Visualize data automatically with 1 line of code (ideal for machine learning)
  • SweetViz: Visualize and compare datasets, target values and associations, with one line of code.

NLP

  • pyLDAvis: Visualize interactive topic model

Deployment

  • datapane - A collection of APIs to turn scripts and notebooks into interactive reports.
  • binder - Enable sharing and execute Jupyter Notebooks
  • fastapi - Modern, fast (high-performance), web framework for building APIs with Python
  • streamlit - Make it easy to deploy machine learning model

Model Explanation

  • Shapley - A data-driven framework to quantify the value of classifiers in a machine learning ensemble.
  • Alibi - Algorithms for monitoring and explaining machine learning models.
  • anchor - Code for "High-Precision Model-Agnostic Explanations" paper.
  • aequitas - Bias and Fairness Audit Toolkit.
  • Contrastive Explanation - Contrastive Explanation (Foil Trees). sklearn
  • yellowbrick - Visual analysis and diagnostic tools to facilitate machine learning model selection. sklearn
  • scikit-plot - An intuitive library to add plotting functionality to scikit-learn objects. sklearn
  • shap - A unified approach to explain the output of any machine learning model. sklearn
  • ELI5 - A library for debugging/inspecting machine learning classifiers and explaining their predictions.
  • Lime - Explaining the predictions of any machine learning classifier. sklearn
  • FairML - FairML is a python toolbox auditing the machine learning models for bias. sklearn
  • L2X - Code for replicating the experiments in the paper Learning to Explain: An Information-Theoretic Perspective on Model Interpretation.
  • PDPbox - Partial dependence plot toolbox.
  • pyBreakDown - Python implementation of R package breakDown. sklearnR inspired/ported lib
  • PyCEbox - Python Individual Conditional Expectation Plot Toolbox.
  • Skater - Python Library for Model Interpretation.
  • model-analysis - Model analysis tools for TensorFlow. sklearn
  • themis-ml - A library that implements fairness-aware machine learning algorithms. sklearn
  • treeinterpreter - Interpreting scikit-learn's decision tree and random forest predictions. sklearn
  • AI Explainability 360 - Interpretability and explainability of data and machine learning models.
  • Auralisation - Auralisation of learned features in CNN (for audio).
  • CapsNet-Visualization - A visualization of the CapsNet layers to better understand how it works.
  • lucid - A collection of infrastructure and tools for research in neural network interpretability.
  • Netron - Visualizer for deep learning and machine learning models (no Python code, but visualizes models from most Python Deep Learning frameworks).
  • FlashLight - Visualization Tool for your NeuralNetwork.
  • tensorboard-pytorch - Tensorboard for pytorch (and chainer, mxnet, numpy, ...).
  • mxboard - Logging MXNet data for visualization in TensorBoard. MXNet based

Reinforcement Learning

  • OpenAI Gym - A toolkit for developing and comparing reinforcement learning algorithms.
  • Coach - Easy experimentation with state of the art Reinforcement Learning algorithms.
  • garage - A toolkit for reproducible reinforcement learning research.
  • OpenAI Baselines - High-quality implementations of reinforcement learning algorithms.
  • Stable Baselines - A set of improved implementations of reinforcement learning algorithms based on OpenAI Baselines.
  • RLlib - Scalable Reinforcement Learning.
  • Horizon - A platform for Applied Reinforcement Learning.
  • TF-Agents - A library for Reinforcement Learning in TensorFlow. sklearn
  • TensorForce - A TensorFlow library for applied reinforcement learning. sklearn
  • TRFL - TensorFlow Reinforcement Learning. sklearn
  • Dopamine - A research framework for fast prototyping of reinforcement learning algorithms.
  • keras-rl - Deep Reinforcement Learning for Keras. Keras compatible
  • ChainerRL - A deep reinforcement learning library built on top of Chainer.

Probabilistic Methods

  • pomegranate - Probabilistic and graphical models for Python. GPU accelerated
  • pyro - A flexible, scalable deep probabilistic programming library built on PyTorch. PyTorch based/compatible
  • ZhuSuan - Bayesian Deep Learning. sklearn
  • PyMC - Bayesian Stochastic Modelling in Python.
  • PyMC3 - Python package for Bayesian statistical modeling and Probabilistic Machine Learning. Theano compatible
  • sampled - Decorator for reusable models in PyMC3.
  • Edward - A library for probabilistic modeling, inference, and criticism. sklearn
  • InferPy - Deep Probabilistic Modelling Made Easy. sklearn
  • GPflow - Gaussian processes in TensorFlow. sklearn
  • PyStan - Bayesian inference using the No-U-Turn sampler (Python interface).
  • gelato - Bayesian dessert for Lasagne. Theano compatible
  • sklearn-bayes - Python package for Bayesian Machine Learning with scikit-learn API. sklearn
  • skggm - Estimation of general graphical models. sklearn
  • pgmpy - A python library for working with Probabilistic Graphical Models.
  • skpro - Supervised domain-agnostic prediction framework for probabilistic modelling by The Alan Turing Institute. sklearn
  • Aboleth - A bare-bones TensorFlow framework for Bayesian deep learning and Gaussian process approximation. sklearn
  • PtStat - Probabilistic Programming and Statistical Inference in PyTorch. PyTorch based/compatible
  • PyVarInf - Bayesian Deep Learning methods with Variational Inference for PyTorch. PyTorch based/compatible
  • emcee - The Python ensemble sampling toolkit for affine-invariant MCMC.
  • hsmmlearn - A library for hidden semi-Markov models with explicit durations.
  • pyhsmm - Bayesian inference in HSMMs and HMMs.
  • GPyTorch - A highly efficient and modular implementation of Gaussian Processes in PyTorch. PyTorch based/compatible
  • MXFusion - Modular Probabilistic Programming on MXNet. MXNet based
  • sklearn-crfsuite - A scikit-learn inspired API for CRFsuite. sklearn

Genetic Programming

  • gplearn - Genetic Programming in Python. sklearn
  • DEAP - Distributed Evolutionary Algorithms in Python.
  • karoo_gp - A Genetic Programming platform for Python with GPU support. sklearn
  • monkeys - A strongly-typed genetic programming framework for Python.
  • sklearn-genetic - Genetic feature selection module for scikit-learn. sklearn

Optimization

  • Spearmint - Bayesian optimization.
  • BoTorch - Bayesian optimization in PyTorch. PyTorch based/compatible
  • scikit-opt - Heuristic Algorithms for optimization.
  • SMAC3 - Sequential Model-based Algorithm Configuration.
  • Optunity - Is a library containing various optimizers for hyperparameter tuning.
  • hyperopt - Distributed Asynchronous Hyperparameter Optimization in Python.
  • hyperopt-sklearn - Hyper-parameter optimization for sklearn. sklearn
  • sklearn-deap - Use evolutionary algorithms instead of gridsearch in scikit-learn. sklearn
  • sigopt_sklearn - SigOpt wrappers for scikit-learn methods. sklearn
  • Bayesian Optimization - A Python implementation of global optimization with gaussian processes.
  • SafeOpt - Safe Bayesian Optimization.
  • scikit-optimize - Sequential model-based optimization with a scipy.optimize interface.
  • Solid - A comprehensive gradient-free optimization framework written in Python.
  • PySwarms - A research toolkit for particle swarm optimization in Python.
  • Platypus - A Free and Open Source Python Library for Multiobjective Optimization.
  • GPflowOpt - Bayesian Optimization using GPflow. sklearn
  • POT - Python Optimal Transport library.
  • Talos - Hyperparameter Optimization for Keras Models.
  • nlopt - Library for nonlinear optimization (global and local, constrained or unconstrained).

Natural Language Processing

  • NLTK - Modules, data sets, and tutorials supporting research and development in Natural Language Processing.
  • CLTK - The Classical Language Toolkik.
  • gensim - Topic Modelling for Humans.
  • PSI-Toolkit - A natural language processing toolkit.
  • pyMorfologik - Python binding for Morfologik.
  • skift - Scikit-learn wrappers for Python fastText. sklearn
  • Phonemizer - Simple text to phonemes converter for multiple languages.
  • flair - Very simple framework for state-of-the-art NLP.
  • spaCy - Industrial-Strength Natural Language Processing.

Computer Audition

  • librosa - Python library for audio and music analysis.
  • Yaafe - Audio features extraction.
  • aubio - A library for audio and music analysis.
  • Essentia - Library for audio and music analysis, description and synthesis.
  • LibXtract - A simple, portable, lightweight library of audio feature extraction functions.
  • Marsyas - Music Analysis, Retrieval and Synthesis for Audio Signals.
  • muda - A library for augmenting annotated audio data.
  • madmom - Python audio and music signal processing library.

Computer Vision

  • OpenCV - Open Source Computer Vision Library.
  • scikit-image - Image Processing SciKit (Toolbox for SciPy).
  • imgaug - Image augmentation for machine learning experiments.
  • imgaug_extension - Additional augmentations for imgaug.
  • Augmentor - Image augmentation library in Python for machine learning.
  • albumentations - Fast image augmentation library and easy to use wrapper around other libraries.

Statistics

  • pandas_summary - Extension to pandas dataframes describe function. pandas compatible
  • Pandas Profiling - Create HTML profiling reports from pandas DataFrame objects. pandas compatible
  • statsmodels - Statistical modeling and econometrics in Python.
  • stockstats - Supply a wrapper StockDataFrame based on the pandas.DataFrame with inline stock statistics/indicators support.
  • weightedcalcs - A pandas-based utility to calculate weighted means, medians, distributions, standard deviations, and more.
  • scikit-posthocs - Pairwise Multiple Comparisons Post-hoc Tests.
  • Alphalens - Performance analysis of predictive (alpha) stock factors.

Distributed Computing

  • Horovod - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. sklearn
  • PySpark - Exposes the Spark programming model to Python. Apache Spark based
  • Veles - Distributed machine learning platform.
  • Jubatus - Framework and Library for Distributed Online Machine Learning.
  • DMTK - Microsoft Distributed Machine Learning Toolkit.
  • PaddlePaddle - PArallel Distributed Deep LEarning.
  • dask-ml - Distributed and parallel machine learning. sklearn
  • Distributed - Distributed computation in Python.

Experimentation

  • Sacred - A tool to help you configure, organize, log and reproduce experiments.
  • Xcessiv - A web-based application for quick, scalable, and automated hyperparameter tuning and stacked ensembling.
  • Persimmon - A visual dataflow programming language for sklearn.
  • Ax - Adaptive Experimentation Platform. sklearn
  • Neptune - A lightweight ML experiment tracking, results visualization and management tool.

Evaluation

  • recmetrics - Library of useful metrics and plots for evaluating recommender systems.
  • Metrics - Machine learning evaluation metric.
  • sklearn-evaluation - Model evaluation made easy: plots, tables and markdown reports. sklearn
  • AI Fairness 360 - Fairness metrics for datasets and ML models, explanations and algorithms to mitigate bias in datasets and models.

Computations

  • numpy - The fundamental package needed for scientific computing with Python.
  • Dask - Parallel computing with task scheduling. pandas compatible
  • bottleneck - Fast NumPy array functions written in C.
  • CuPy - NumPy-like API accelerated with CUDA.
  • scikit-tensor - Python library for multilinear algebra and tensor factorizations.
  • numdifftools - Solve automatic numerical differentiation problems in one or more variables.
  • quaternion - Add built-in support for quaternions to numpy.
  • adaptive - Tools for adaptive and parallel samping of mathematical functions.

Spatial Analysis

  • GeoPandas - Python tools for geographic data. pandas compatible
  • PySal - Python Spatial Analysis Library.

Quantum Computing

  • PennyLane - Quantum machine learning, automatic differentiation, and optimization of hybrid quantum-classical computations.
  • QML - A Python Toolkit for Quantum Machine Learning.

Conversion

  • sklearn-porter - Transpile trained scikit-learn estimators to C, Java, JavaScript and others.
  • ONNX - Open Neural Network Exchange.
  • MMdnn - A set of tools to help users inter-operate among different deep learning frameworks.

Contributing

Contributions are welcome! 😎 Read the contribution guideline.

License

This work is licensed under the Creative Commons Attribution 4.0 International License - CC BY 4.0

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].