📝 An awesome Data Science repository to learn and apply for real world problems.
Data mining for materials science
🍊 📊 💡 Orange: Interactive data analysis
Real-time sentiment analysis in Python using twitter's streaming api
(MLSys' 21) An Acceleration System for Large-scare Unsupervised Heterogeneous Outlier Detection (Anomaly Detection)
Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
Curated list of Python resources for data science.
Python library for reading and writing well data using Log ASCII Standard (LAS) files
Analyze Data with Pandas-based Networks. Documentation:
Interface to manage and centralize Google Alert information
A package that makes it trivial to create and evaluate machine learning pipeline architectures.
Lecture Slides and R Sessions for Trevor Hastie and Rob Tibshinari's "Statistical Learning" Stanford course
Amazing Feature Engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
The shortest yet efficient Python implementation of the sequential pattern mining algorithm PrefixSpan, closed sequential pattern mining algorithm BIDE, and generator sequential pattern mining algorithm FEAT.
Analytic platform for real-time large-scale streams containing structured and unstructured data.
HTTP(S) Rotating Residential proxies - Code examples & General information
Estadistica Con R
Apuntes personales sobre estadística, machine learning y lenguaje de programación R
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
A Python package implementing a new machine learning model for text classification with visualization tools for Explainable AI
manipulate JSON files
A Lightweight Decision Tree Framework supporting regular algorithms: ID3, C4,5, CART, CHAID and Regression Trees; some advanced techniques: Gradient Boosting (GBDT, GBRT, GBM), Random Forest and Adaboost w/categorical features support for Python
Data Science Resources
👨🏽🏫You can learn about what data science is and why it's important in today's modern world. Are you interested in data science?🔋
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
An (un-)ethical hacking-station based on Raspberry Pi and Python
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
Курс "Прикладные задачи анализа данных" (ВМК, МГУ имени М.В. Ломоносова)
Topic Modelling for Humans
Extract indicators of compromise from text, including "escaped" ones.
Scraping statistics, predicting NBA player performance with neural networks and boosting algorithms, and optimising lineups for Draft Kings with genetic algorithm. Capstone Project for Machine Learning Engineer Nanodegree by Udacity.
A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.
The Accelerator is a tool for fast and reproducible processing of large amounts of data.
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Lithology and stratigraphic logs for wells or outcrop.
Python class to scrape data from rightmove.co.uk and return listings in a pandas DataFrame object