J'ai seulement fait ici un amas de fleurs étrangères, n'y ayant fourni du mien que le filet à les lier.

My Machine Learning-related stuff!

My Apache Zeppelin and Jupyter notebooks, and more! For a series of valuable data analysis and machine learning-related stuff in general

ML Resources(with an emphasis on Python)

This document attempts to develop a curated list of Machine Learning resources, including books, papers, software, libraries, notebooks, etc. Most of the libraries are for Python though the rest of the materials here are generally suited for working with data.

Books and Writings

Foundations of Machine Learning: I strongly suggest reading this book
Readings in Database Systems(The Red Book): I strongly suggest reading this book
A Course in Machine Learning: Good book to start learning ML
Mining Massive Datasets: Great book about Big Data concepts, Data Mining algorithms, and their applications
Networks, Crowds, and Markets: Reasoning About a Highly Connected World : Good starter book for Network Science and its applications (e.g. graph analysis, social network analysis)
An Introduction to Statistical Learning
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
Arxiv.org/ML
Python Machine Learning
Python Data Science Handbook
Whirlwind Tour Of Python: Good starter book for learning Python
Python Machine Learning (second edition)
Deep Learning Book(MIT Press)
Probability and Statistics Cookbook
An ML Cheat Sheet
Hand-book on STATISTICAL DISTRIBUTIONS for experimentalists
Deep Learning Papernotes: A repository of many of the research papers published about various DL-related topics over the years
NLR Papers: A perfect collection of papers on Network Representation Learning and Network Embedding
KRL Papers: A nice collection of papers on Knowledge Representation Learning and Knowledge Embedding
Stanford CS 229 ML Cheatsheets: A nice collection of ML cheat-sheets on various important subject matters
Machine Learning for Business: Machine Learning for Business teaches you how to make your company more automated, productive, and competitive by mastering practical, implementable machine learning techniques and tools
A gentle introduction to Tensors and their uses: An introduction to Tensors and their sample applications
Linear Algebra course book: Jim Hefferon's Linear Algebra book, A good companion book for learning linear algebra fundamentals
Top 10 Data Mining Algorithms: A good article describing how 10 of the more famous Data Mining algorithms work
Representation Learning: A Review and New Perspectives: An excellent introduction to Representation Learning and its implications
NLTK Book: A great book if you want to process and analyze texts with NLTK
An Introduction to Variable and Feature Selection
Machine Learning Workflow with Python: A collection of useful ML-related stuff for people interested in working with the data
Interpretable Machine Learning
GNNPapers: A collection of research papers on Graph Neural Networks
Mining Social Media: The web version of an easy-to-follow introductory book for mining social media data
The Economist data visualization: A set of articles describing how the Economist uses data visualization
Text Mining with R - A tidy approach
Explanatory Model Analysis: Explore, Explain and Examine Predictive Models
Graph Representation Learning
Official Matplotlib cheat sheets
Data Mining and Machine Learning: Fundamental Concepts and Algorithms, Second Edition
Dive into Deep Learning: "Interactive deep learning book with code, math, and discussions" -- its website

Dataset Repositories

UCI Machine Learning Repository: Lots of exciting datasets piled up just for you to use!
Kaggle: A very active community, a great place to learn from others
Network Repository: Many network/graph datasets, If you like graphs, it's the place for you!
Deep Learning Datasets: DL datasets, of course!
MLDatasets: Another nice dataset repository
Open Data for Deep Learning: Deep means big here, I guess!
Wikipedia List of Datasets for Machine Learning Research: It's Wikipedia!
GHTorrent: GHTorrent project is an attempt to make an offline queryable mirror of Github projects' data available for everyone
SOTorrent: A very rich dataset of StackOverflow posts and related contents such as post comments
Datalist.com: A handy list of ML-related datasets from all over the web
awesome-twitter-data: An extensive collection of datasets from Twitter's data
Dataset for Graph classification: A collection of datasets for classification on graphs
Google's dataset search

Q&A Websites

Quora Data Science: A superb place to ask and seek answers!
Stack Exchange Data Science: Another friendly Q&A community with an emphasis on the technical side
Kaggle: Kaggle again:)
Quora Machine Learning: Quora again:)
Stack Overflow: General Q&A for developers who need help with their code

Useful Websites

Kaggle: Kaggle again:)
Reddit Machine Learning Community
CrowdAI: A Kaggle alternative, popular among students
Quora: Quora Q&A platform
Github.com: Github contains many valuable resources such as the code for many algorithms, all in one centralized platform!
Apache Projects: A few hundred cool software projects are related to data management in some way! (e.g. Hadoop ecosystem)
Stanford Machine Learning Course(Have a look at the project section!)
NIPS Website: A very prestigious AI conference held every year
Scipy Lectures
Nice website about Data Mining
ML Resources on Github
A list of researches on a few interesting topics
Open Machine Learning Course: An ML course covering so many topics
Tanagra - Data Mining and Data Science Tutorials: TANAGRA's tutorials cover a vast amount of topics
Papers with code: It is a convenient repository of research papers that are coming with their code published too, you can access the code from many recent cutting-edge algorithms from here
Twitter datasets: A list of datasets related to the social platform Twitter
KDnuggets
Deepnote
Google Colab
DeepLearning.com: A website for resources related to everything DL
Data Science blogs: A curated list of Data Science blogs
PaddleHub: PaddleHub is a large repository of useful pre-trained ML models

Tutorials & Courses

NLP-Notebooks: A collection of notebooks covering conventional NLP tasks such as word embeddings, text classification, etc.
2021 DeepMind x UCL Reinforcement Learning Lecture Series: Video lectures from DeepMind covering the area of Reinforcement Learning
Stanford Graph Learning Workshop: Recording of Stanford graph Learning workshop sessions on September 2021
Machine Learning for Beginners - A Curriculum
Introductory Machine Learning Course: An accessible machine learning course taught by professor Professor Yaser Abu-Mostafa from Caltech
The Hugging Face Deep Reinforcement Learning Class
Google Machine Learning Education: A series of courses that cover machine learning fundamentals and core concepts

Editors & IDEs for Python

Spyder: A great Python IDE for scientists in general
Pycharm CE: An excellent IDE for the development of anything with Python
GNU Emacs: GNU Emacs is an environment for doing almost anything
IDLE: Default Python IDE, lean and clean environment to develop in Python
Rodeo: A Python IDE for data scientists

Toolboxes & Software Distributions

Anaconda: A very user-friendly environment for scientific Python development
Miniconda
Vowpal Wabbit
StackNet
Sofia ML
LIBLINEAR
LibFM
SVM Rank

Notebook Authoring Environments

Jupyter
Apache Zeppelin: A great notebook environment for data visualization and doing analytics stuff, it can connect to many different databases and data management systems
Beacker
nteract
JupyterLab: Next-generation Jupiter notebook environment
Spark Notebook: Spark Notebook is an interactive notebook authoring environment for working with Scala code on top of Spark clusters
Python(x,y): Python(x,y) is an open-source environment for scientific and numerical computations and analysis
Polynote: A notebook authoring tool with native support for Scala on Spark from Netflix

Python Machine Learning, Data Mining, Statistical Analysis Libraries

Pandas: Famous Python's data manipulation library
Scipy: Defacto Pythons scientific computation library
Numpy: Linear algebra library for fast numerical computation
Scikit Learn: High-level Machine Learning library with tons of features, very easy-to-use and extendable
Bokeh: An interactive high-level data visualization library
Matplotlib: A compelling data visualization library, More low-level than other visualization libs
Graph Tool: A fast and powerful library for working with graphs in Python. It's developed on top of Boost C++ libraries, so consequently, it's very efficient
NetworkX: A Python module for Complex Network modelling and analysis, Very easy-to-use but may be slow occasionally because it's in pure Python
TensorFlow: Low-level library for creating deep artificial neural networks, works both on CPU and GPU. Usually, you use TF in conjunction with a library with a higher-level API exposing TF's functionalities like Keras
Keras: "Keras is a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano" - Keras's website
NLTK: Swiss Army knife tool for text processing in Python
Pattern: Another good text processing library for Python
IPython
Orange: Orange is a general-purpose data mining and analysis tool also library that lets you develop machine learning pipelines just by a few dragging and dropping
Theano
CatBoost: Yandex's implementation of Gradient Boosting on Decision Trees. It supports categorical features out of the box
XGboost: Original XGBOOST library, A very efficient Gradient Boosting library with extra regularisation
Mlxtend: A great Data Mining and Machine Learning library with
NetworKit: A very high-performance graph processing and analysis toolkit written in C++ and uses OpenMP, so it is very fast on multicore computers
Eli5
Pandasql
Dask: A fast data manipulation library with out-of-core handling of the data, Suited for a distributed environment, Its API is (exactly)compatible with Pandas' API
MLBox
Gensim
Scikit-learn-Contrib/Imbalanced-learn: An extension library for Scikit-learn for handling imbalanced datasets
Patsy: "Kamelot!!! ... It's just a model Shhhh!"
Statsmodels: A Python package for building various statistical models
Seaborn: A high-level visualization library for Python
Pandas-profiling
Blaze
Altair
Numba
BigARTM
GYM: An open-source toolkit for reinforcement learning from the Open AI project
PyBrain: A Machine Learning library for Python with emphasis on modelling via many types of neural network architectures
Sklearn-pandas
Auto-ML
Scikit-Learn Contrib/Lightning: An extension library to Scikit-learn for large-scale linear classification, regression, and ranking problems
GPLearn
Nengo
Scikit-learn Contrib/*: A collection of extension libraries for Scikit-learn adding new (missing) functionalities to it
Koolmogorov: A Python library for hierarchical clustering and visualization
Lime: A tool for exploring and explaining the output of classifiers
TreeInterpreter
SNAP-Python: Python wrapper library for Stanford Network Analysis Platform (SNAP)
Pycobra: A Python library implementing ensemble methods for regression, classification, and visualization tools, including Voronoi tessellations
TF Learn: A library on top of TensorFlow providing a higher API than TensorFlow
Featuretools: A Python library for automated feature engineering
spaCy: NLP library with tons of features(like various CNN models)
SymPy: Symbolic computation library for Python, Aiming to become a full-fledged CAS
Uniform Manifold Approximation and Projection: A general non-linear dimensionality reduction algorithm implemented in Python
Scikit-learn Contrib/HDBSCAN: A high-performance implementation of HDBSCAN clustering, HDBSCAN is a robust and easy-to-use clustering algorithm with minimal parameters, Ideal for exploratory data analysis; It works as an extension to Scikit-learn
Turi Create: A fast tool/library for simplifying various ML tasks
Scikit-learn-Contrib/Categorical-Encoding: An extension library for Scikit-learn that provides additional categorical feature encoding schemes(e.g. LeaveOneOut scheme)
Optunity: A library for hyperparameter optimization
Kmodes
TF-Slim
Pyro: "Pyro is a universal probabilistic programming language (PPL) written in Python and supported by PyTorch on the backend" - Pyro's website
GEM: A Python library that provides various graph embedding methods like 'node2vec' and 'locally linear embedding.'
DynamicGEM: A dynamic graph embedding library like GEM
GraphSAGE: A graph embedding framework to generate low-dimensional vector representations for nodes, instrumental if you need to use deep learning on graph data
Horovd: A distributed training framework for TensorFlow, Keras, and PyTorch by Uber
NetLSD: Python implementation of NetLSD, a scalable graph embedding algorithm for representing a graph via a low-dimensional vector

SHAP: A tool for exploring and explaining the outcome of an arbitrary model
NLPre: Another fantastic Python NLP library
GCN: Python implementation of graph convolutional networks in TensorFlow
AllenNLP: "An open-source NLP research library, built on PyTorch" - AllenNLP's repository documentations
TensorLy: A Python Library for efficient Tensor operations
CuPy: A Python matrix library accelerated by Nvidia CUDA, it's also compatible with Numpy's API
Scikit-Multiflow: A Python library for Stream Mining
MLflow: A software toolbox to manage ML projects' workflow and life-cycle, it aims to make ML software projects easier to implement by providing various helper components for each step
pyGAM: A Python module for building Generalized Additive Models (GAMs)
ggplot: "ggplot is a plotting system for Python based on R's ggplot2 and the Grammar of Graphics. It is built for making professionally-looking plots quickly with minimal code" - ggplot's website
Linkpred: A Python package for link prediction on graphs
SparklingGraph: A Python library to process large-scale graphs using Spark and GraphX in a distributed manner
OpenNE: An open-source network embedding library
Galry: A high-performance visualization library in Python
Dedupe: A Python library for fuzzy entity resolution and record deduplication
PyText: A deep-learning-based NLP modelling framework built on top of PyTorch
flair: A state-of-the-art NLP framework in Python from Zalando
NearPy: "A Python framework for fast (approximated) nearest neighbour search in large, high-dimensional data sets using different locality-sensitive hashes," according to its descriptions
fastchunking: A (fast) text chunking algorithm implemented in C++ and Python
Vaex: Vaex is a data manipulation library much like Pandas and Dask, with a lazy out-of-core approach to handling the data so you can work with huge tables with it
openTSNE: An extensible, parallel implementation of t-SNE
Faust: A stream processing library for Python
Active Semi-Supervised Clustering: An extension library for Scikit-learn that implements a collection of useful active semi-supervised clustering algorithms
TextDistance: A Python library for calculating and comparing the Distance between two sequences (such as text documents) with many algorithms
Ray: A scalable. high-performance distributed execution framework for executing arbitrary Python functions on multiple machines, suitable for many ML workloads
Pyitlib: An opensource library for calculating a useful collection of information-theoretic measures (i.e., entropy) for discrete random variables
KDEpy: A collection of useful kernel density estimators in Python 3.5+
Tsfresh: A Python library for (automatic) feature extraction and engineering on time-dependent data
GPy: A Python library for working with Gaussian processes
Tslearn: A machine learning library dedicated to working with time-dependent data
Ludwig: "Ludwig is a toolbox that allows to train and test deep learning models without the need to write code" - Ludwigs's website
Record Linkage Toolkit: A Python software toolkit for record deduplication and linkage
PyJanitor: Python port of R's janitor package for data cleansing and manipulation
FastText: A library for fast and efficient text embedding and classification
Mimesis: A fast and valuable fake data generation library
PyOD: A Python software toolbox for scalable Outlier Detection (aka Anomaly Detection)
Creme: A Python library for Online Learning and building incremental models
vg: A linear algebra library much like Numpy with a more human-friendly interface
GraphKernels: A fast library for calculating various graph kernels
GraKeL: A graph kernel calculation library that is using Scikit-learn's API so it can be used with other functionalities and routines already present in Scikit-learn without much hassle
Graphsim: A graph similarity extension library for NetworkX
Textract: A general text extraction tool from many file formats
Sacred: Sacred is a Python library to make an ML workflow easier to reproduce and manage for you!
TextDistance: TextDistance is a Python library for calculating and comparing the Distance between two or more sequences of an arbitrary alphabet (e.g., words, DNA sequences), it has got over 30 distance algorithms to use
Py_stringmathcing: Py_stringmathcing is a Python library consisting of a comprehensive set of string tokenizers (such as alphabetical tokenizers, whitespace tokenizers) and also string similarity measures (e.g., edit Distance, Jaccard distance)
JGraph: JGraph is a WebGL graph drawing library for Python
Kedro: A Python library and also a tool to manage your data analysis workflow in your projects
PySAL: PySAL is a Python package for geolocation-based data analysis
k-Shape: This is a Python implementation of the k-Shape clustering algorithm for clustering the time series data
Pyforest: You could use Pyforest to import all Python data science-related libraries lazily as you need them in your code
ETE Toolkit: ETE Toolkit is a Python toolbox for visualizing and analysis of tree format data
Whoosh: Whoosh is a full-text indexing and search library for Python
Geoplot: Geoplot is a Python visualization library for geospatial plotting of geolocational records
GeoPandas: GeoPandas is a high-level library with an API similar to Pandas that makes working with geospatial datasets in Python much easier
Edward: "A library for probabilistic modelling, inference, and criticism" - its website
HyperTools: A Python library for high-dimensional data visualization and analysis
TextRank: TextRank algorithm implementation for Python 3
pymorton: A Python package for ordinal hashing of multidimensional points into a one-dimensional ordering
PySS3: A Python package implementing SS3 text classifier with visualizations tools for explainable artificial intelligence (XAI)
Lpproj: A Python implementation of Locality Preserving Projections (LPP) with Scikit-Learn compatible API
Multi-Rake: Multilingual rapid automatic keyword extraction (Multi-RAKE) is a Python library for automatic text summarization and keyword extraction of text in many different languages
PyCarets
ACME: A software framework for research on reinforcement learning
fastText: A fast text representation learning and classification library from Facebook
Distance: A useful library in pure Python to calculate the distance between arbitrary sequences
Texhero: "Text preprocessing, representation and visualization from zero to hero" -- Texthero's website
xLearn: "High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface." -- xLearn's description
TextBlob: A text processing library with a high-level API
Plotline
Dtale: A Python tool to analyze data stored in pandas dataframes
Lasagne: A lightweight library to build and train neural networks in Theano
Magnitude: Magnitude is a vector embedding helper library (much in the spirit of Gensim)
Missingno: A useful Python library for visualizing missing data
Vector Hub: Vector Hub is a Python library that can help turn almost everything (including text, graph, and image data) into vector representations
pyLDAvis: A Python library for interactive topic modeling and visualization of topics in textual datasets
Pyextrank: A Python implementation of the TextRank algorithm
Mitosheets: A Jupyter Lab extension to make it easier to work with Panda's dataframes
Transformers
CoClust: A Python library for co-clustering
PySurvival: PySurvival is a Python package for survival analysis of data
Scikit-survival: Scikit-survival is an extension to Scikit-learn that adds survival analysis capabilities to it
rfpimp: A Python package that brings permutation-based feature importance measure to Scikit-learn Random Forests learners
Jiant: Jiant is an NLP software toolkit with multitask and transfer learning capabilities
PyG: "PyG (PyTorch Geometric) is a library built upon PyTorch to easily write and train Graph Neural Networks (GNNs) for a wide range of applications related to structured data."---PyG's documentation
Nodevectors: A Python package with fast and scalable implementations for some popular vertex embedding algorithms
JGraphT: JGraphT now supports Python
DGL: Deep graph library is a Python package for using deep learning algorithms on graph data. It can use PyTorch, TensorFlow, or Apache MXNet as its backend
Spektral: A Python package for creating and running graph neural networks
HyperNetX(HNX): A Python library to work with data modelled as hypergraphs
Graph4NLP: Graph4nlp is a Python library that makes it easier to use graph neural networks in and for NLP tasks
JAX: JAX is a Python library for high-performance numerical computation used in machine learning
Metric-learn: Metric-learn is Python library for metric learning. It's available as part of (scikit-learn-contrib)[https://github.com/scikit-learn-contrib] collection
AmpliGraph: AmpliGraph is a Python library for representation learning on knowledge graphs
Distfit: "Distfitis a python package for probability density fitting of univariate distributions on non-censored data"--Distfit's website
Pke: Pke is a keyphrase extractor from the text in Python
Albumentation: Albumentations is a library for image augmentation
Spark NLP: Spark NLP is an NLP library for Python
Skorch: Skorch is a neural network library that uses PyTorch as its backend and provides APIs compatible with the Scikit-learn machine learning library
Optuna: Optuna is an open source hyperparameter optimization framework for automating hyperparameter search

Additional Useful Resources

PyPy Python Implementation: A stackless alternative implementation for Python's runtime
Useful Metrics: A collection of useful ML-related scoring and learning metrics
XGboost Benchmarks
Franchise Notebook
Orange
Weka: The famous Data Mining tool from where Kiwis live
ELKI: A Data Mining software framework in Java
Julia Programming Language: New language for Scientific Computing and HPC
SQL Notebook
IPython: An augmented Python shell with lots of features
Incanter: A statistical analysis environment for a Lisp(for Clojure, to be exact)
Torch: Scientific Computing framework running on top of Lua's Just in Time compiler, brilliant idea!
BPython: An advanced Python shell
RAnalyticFlow: Great environment for Data Flow Programming in R
SPMF: A Java Data Mining library with tons of excellent algorithms
SageMath: Open source math software system, a complete math environment for everyone
H2O AI Platform: A software tool for Big Data Analysis could be used for both data mining and machine learning tasks. It has tons of features
Various ML Cheat Sheets
OpenRefine: An open-source data cleansing and refinement tool
Deep Learning Papers
Apache Mxnet: A high-performance and scalable ANN framework for Deep Learning
Material for the book 'Python for Data Analysis'
Encog Machine Learning Framework: An ML library for Java and .NET with a focus on ANN algorithms
Apache Spark MLib: An ML library on top of your spark cluster!
Awesome-Python: A comprehensive list of Pythonic resources (libraries, frameworks, etc.)
GATE: A mature text processing toolkit in Java
MALLET: "MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modelling, information extraction, and other machine learning applications to text." - MALLET's website
MLPack: A fast ML library written in C++ with bindings to Python
t-SNE: Implementation of famous t-distributed stochastic neighbour embedding algorithm for various languages
Caffe
Apache Singa
CompLearn
SNAP
Apache PredictionIO
JGraphT: A Java library for working with graphs with tonnes of features
JGaphX: A Java library for diagramming and visualizing graphs
Microsoft Distributed Machine Learning Toolkit
Microsoft Cognitive Toolkit
BIDMat: A both CPU and GPU-accelerated matrix library for data mining tasks
BIDMach
Apache SystemML
Apache Mahout
Accord.NET: Accord.NET is a Machine Learning framework written in C#. Its API is available for .NET, and it also comes combined with some audio and image processing libraries entirely written in C#
BitMAGIC Library
Cassovary
Dex: a friendly tool written in Java for data analysis and data mining
Apache OpenNLP
OpenNN: A C++ library to build complex neural network models
MOA: A tool for mining stream data by people who also created Weka
MLPACK: C++ Machine Learning library for scalability, speed, and ease-of-use
MOSES: "Moses is a statistical machine translation system that allows you to train translation models for any language pair automatically." - Moses's website
Parallel Python: A Python module for parallel execution of code on SMP and Cluster environment
BeautifulSoup: A handy Python library to digest almost anything from World Wild Web
Wordbatch: A library for parallel feature extraction on textual data(and potentially other complex data types)
Mypy: Static typing facilities for Python
SKIL: A platform for managing the life cycle of an ML/DS-related project or product
An unofficial Python extension package repository for Windows
LIBOL: An online learning library
Smile: "Smile is a fast and comprehensive machine learning system"- Smile's website
Tablesaw: A daydreamer and visualization library for Java
TensorFlow Models: A repository of models and examples built with TensorFlow
Curated list of graph embedding methods: A collection of paper-code pairs for the state of the art graph embeddings(a.k.a network representational learning) algorithms
Curated list of resources for Recommender Systems
Pegasus: An open-source system for analyzing huge graphs. It seems it is not being developed or maintained for a long time
Dataset: A handy tool to simplify the task of reading and writing to relational databases
Twython: A Twitter API library in pure Python with tonnes of features
Apache TinkerPop: Excellent graph storage and computation framework. It can be used both as a graph analytics platform and a graph database system; love the little gremlins!
Graphexp: Graphexp is a visual graph explorer with D3.js for TinkerPop
Scilab: An open-source numerical computation language and environment, a great Matlab alternative
Glow: A compiler for Neural Network hardware accelerators for various hardware
GraphJet: A real-time graph processing library in Java
GraphDrawing: A lovely graph analysis and drawing library in Java
Sketch Library: A C++ library for data summarization
The Lemur Project: A collection of search engine, text processing, and data mining tools and libraries in C+, and Java-like RankLib for ranking
VisPy: A Python library for interactive scientific visualization that is designed to be fast, scalable, and easy to use
Awesome Machine Learning: A curated list of remarkable Machine Learning frameworks, libraries, software, etc
MOA Framework: A fantastic Java software environment and framework for Stream Mining
MEKA: A multi-label classification tool, it works on top of Weka
Mulan: A Java library for learning on multi-labelled data
Dlib: A fast Machine Learning library implemented in C++ for solving real-world data problems
MITE: A library and tool for information extraction on text data, it's built on top of Dlib with binding for languages like Java and Python
GraphStream: GraphStream is a Java library for analyzing and visualizing dynamic graphs
Cytoscape: A complex network (graph) visualization tool in Java
Gephi: A network visualization and analysis tool in Java
SocNetV: A handy social network visualization tool
Visone: Yet another handy social network analysis and visualization tool
Flashlight: A fast Machine Learning library in C++
Machine Learning with Python: A collection of ML algorithms and their sample use-cases implemented in Python
TANAGRA: "TANAGRA is a free DATA MINING software for academic and research purposes" its website
KNIME: KNIME is an open-source data analytics, reporting, and data integration platform
MG4J: An open-source, high-performance full-text search engine written in Java
WebGraph: A Java framework for working on massive graphs
RTree: Reactive implementations of immutable in-memory R-tree and R*-tree in Java
Recommender Systems: A useful repository of stuff all about the Recommender Systems (e.g. best practices to build Recommender Systems)
Awesome-Graph: A curated list of resources (e.g., libraries, frameworks, and databases) related to graphs
Parallel Graph AnalytiX (PGX): A graph processing and analytics toolbox from Oracle which is written in Java
ROOT: A scientific toolbox for data processing and analysis in C++
Stanford Topic Modeling Toolbox (TMT): TMT is a friendly Java toolkit for topic modelling on textual data
Java Data Mining Package: An opensource Java package for mining massive datasets implementing a vast collection of algorithms (i.e., clustering, regression, classification, and graphical models)
ScalaNLP: A numerical computation and Data Mining library suite written in Scala, with an emphasis on NLP
Vegas: A very flexible declarative data visualization library in Scala that works with Apache Spark right out of the box
DeepLearning.scala: A simple Scala library for creating complex artificial neural networks by ThoughtWorks
XAPIAN: An open-source search engine library with bindings to be used in many high-level programming languages, for example, Python, Java, and Lua!
DataMelt: "DataMelt is a free software for numeric computation, mathematics, statistics, symbolic calculations, data analysis, and data visualization" - DataMelt's website
Luna: A functional programming language to create data processing friendly programs in a WYSIWYG way
NetLogo: A computational multi-agent development and simulation environment, an incredible tool for investigating complex phenomena via implementing simple computational rules for agents!
LabPlot: LabPlot is a lovely application for data analysis and plotting; it is part of KDE Project!
Meta Toolkit: A fast software toolkit implementing many useful ML algorithms; it is written in C++
Record Linkage Tools: A collection of valuable resources for record deduplication and linkage
Gunrock: A GPU-based graph analytics and processing library, it works with CUDA
Papers on Graph Analytics: A thorough list of publications related to graphs covering many interesting topics
GraphIt: GraphIt - "A High-Performance Domain Specific Language for Graph Analytics" - GraphIt's website
SMORe: A handy tool and library for fast weighted graph embedding in C++
Warp-ctc: A fast parallel implementation of CTC, for both CPU and GPU
Grew: Grew is a graph library and tool written in Ocaml with applications in NLP, it is a companion tool for the book Application of Graph Rewriting to Natural Language Processing
ZVTM: A handy graph visualization library for Java
mrJob: A Python library to create MapReduce jobs and run them on multiple machines (i.e., in a cluster)
Metanome: A collection of interesting materials (e.g., algorithms, code, articles) related to data profiling
Graphillion: Graphilion is a software library for working with many graphs in a parallel fashion
Awesome graph classification: A very comprehensive collection of graph embedding, classification, and representation learning papers with the code!
VFML: Very Fast ML (hence the name VFML) is a fast C library for mining very massive data streams
Talisman: Talisman is a modular JavaScript library for NLP and Machine Learning activities
StyleGAN: StyleGAN is the TensorFlow implementation of a proposed architecture for GANs from NVIDIA. You can use it to create photo-realistic pictures of people who don't exist!
Java String Similarity: A Java library implementing a collection of helpful text similarity/distance measures
Label Studio: Label Studio is a handy tool with a friendly UI for labelling your data (e.g., records and documents)
GraphML: GraphML is a graph representation and serialization file format based on XML that could store many different types of graphs with their attributes without loss of information
Taco: A compiler for compiling and executing general tensor algebra operations on sparse tensors in machine code for CPUs and GPUs
Libspatialindex: Libspatialindex contains many robust geolocational indexing algorithms like R*-tree and TPR-tree
NLP Best Practices: A collection of best practices and their examples in the NLP domain from Microsoft
Tulip: Tulip is an excellent open-source data visualization and analysis software toolbox, it is perfect for working with graphs and graph datasets
Juno: Juno is an IDE based on Atom for Julia programming language
BoofCV: A real-time machine vision and image processing in Java
cuDF: cuDF is a library with API similar to Pandas that is built based on the Apache Arrow columnar memory format; cudf uses GPU routines for loading, joining, aggregating, filtering, and otherwise manipulating data
LASER toolkit: LASER (Language-Agnostic SEntence Representations) is a software toolkit for sentence embedding for about 100 different languages
Idyll: "A toolkit for creating data-driven stories and explorable explanations" - Idyll's website
DeepLearning4J: A java-based software toolbox for building and training deep artificial neural networks
NeMo: NeMo is a software toolkit for building AI applications
TRAINS Agent: TRAINS Agent is a DevOps tool for setting up and running an AI experiment on a cluster computing environment
TensorFlow Hub: TensorFlow Hub is a library for the publication, discovery, and consumption of reusable parts of deep learning models
AIX360: An explainable AI (XAI) toolkit to interpret Machine Learning models
Catalyst: Catalyst is a tool for making Deep Learning experiments on PyTorch reproducible
TensorFlowJS: TensorFlowJS is a JavaScript library to use TensorFlow models in web applications in the browser
Kst: Kst is a handy data visualization tool from the KDE project
AMIDST: AMIDST is a Java software toolbox for probabilistic modelling of data
LIBFFM: "LIBFFM is an open-source tool for field-aware factorization machines (FFM)"; people won a few real-world data science challenges in Kaggle
jLDADMM: A Java package for LDA and DMM topic modelling
Stan: "Stan is a state-of-the-art platform for statistical modelling and high-performance statistical computation." - Stan's website
DEAP: Distributed Evolutionary Algorithms in Python
Stanford CoreNLP
SimMetrics
Neuroph
MLeap
HiSee
JSAT: A Java-based library implementing a standard set of data analysis and machine learning activities
Ark tweet Pos tagger: CMU ARK Twitter part-of-speech tagger
JASP
Jamovi
DynaML: "DynaML is a Scala & JVM Machine Learning toolbox for research, education & industry." -- Its website
ExecuteMulan: A Java utility to run the multi-label classification method from Mulan with more ease
GTN: "GTN is an open-source framework for automatic differentiation with a powerful, expressive type of graph called weighted finite-state transducers (WFSTs). Just as PyTorch provides a framework for automatic differentiation with tensors, GTN provides such a framework for WFSTs. AI researchers and engineers can use GTN to train graph-based machine learning models more effectively." -- Facebook
Tribuo: An open-source machine learning library in Java from Oracle
Neo4J's Graph Data Science Library
Libbow: "Bow (or libbow) is a library of C code useful for writing statistical text analysis, language modelling, and information retrieval programs." -- Its website
Doccano
FACTORIE
BigDL: BigDL is a deep learning library that runs in an Apache Spark cluster
ImageJ: "ImageJ is an open-source image processing program designed for multidimensional scientific images."--ImageJ's website
OjAlgo: Oj! Algorithms is a pure-Java linear algebra and mathematical optimization library
Ivy: Ivy is a unifying framework for different deep learning frameworks such as PyTorch and TensorFlow. You need to write your code in Ivy once and run it on many deep learning frameworks
Gradio: Gradio is a user interface (UI) framework for building UIs for machine learning and data science applications in Python
Label Studio: Label Studio is a tool for annotating and labeling datasets

Ali Rahimi's talk NIPS 2017: Good talk from someone inside the field
Procrustes: How could we live without Wikipedia?
Probably Approximately Correct
Foundations of Machine Learning: An excellent book to start learning ML, A must for every ML enthusiast
Scikit-Learn website: Scikit-learn's website itself is a great resource to learn!
What Computers Still Can't Do: Some old and still valid criticisms of Strong AI! Are AI and Alchemy the same?
Readings in Database Systems(The Red Book): An enjoyable to read. It's a little bit hard to follow at first for me, but a great many resources are mentioned at the end of each chapter, and it gives significant insights into the history, trends, and future of DBMSs and Data Processing Platforms
Kolmogorov Complexity: Let's compress everything!
Machine Learning Meets Databases: A very informative and also easy to follow article, including a short introduction to Machine Learning and also describing its relation to Data Mining and Databases
A gentle introduction to Tensors and their uses: An introduction to Tensors and their sample applications, Don't let the math scare you off!:0)
Mining Massive Datasets: A lovely blend of theory and application for what can be done to data
Networks, Crowds, and Markets: Reasoning About a Highly Connected World : Very insightful if you like to know more about the interconnected world and networks

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

habedi / PracticalMachineLearning

Programming Languages

Labels

Projects that are alternatives of or similar to PracticalMachineLearning