All Categories → Data Processing → data-mining

Top 285 data-mining open source projects

Matminer
Data mining for materials science
Tweetfeels
Real-time sentiment analysis in Python using twitter's streaming api
Python Projects
some python projects
Suod
(MLSys' 21) An Acceleration System for Large-scare Unsupervised Heterogeneous Outlier Detection (Anomaly Detection)
Reaper
Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs
Data Mining Conferences
Ranking, acceptance rate, deadline, and publication tips
Lasio
Python library for reading and writing well data using Log ASCII Standard (LAS) files
Chirp
Interface to manage and centralize Google Alert information
Automlpipeline.jl
A package that makes it trivial to create and evaluate machine learning pipeline architectures.
Statistical Learning
Lecture Slides and R Sessions for Trevor Hastie and Rob Tibshinari's "Statistical Learning" Stanford course
Amazing Feature Engineering
Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.
Prefixspan Py
The shortest yet efficient Python implementation of the sequential pattern mining algorithm PrefixSpan, closed sequential pattern mining algorithm BIDE, and generator sequential pattern mining algorithm FEAT.
Zhihu Analysis Python
Social Network Analysis of Zhihu with Python
Qminer
Analytic platform for real-time large-scale streams containing structured and unstructured data.
Smartproxy
HTTP(S) Rotating Residential proxies - Code examples & General information
Estadistica Con R
Apuntes personales sobre estadística, machine learning y lenguaje de programación R
Tradingview Data Scraper
Extract price and indicator data from TradingView charts to create ML datasets
Instascrape
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
Awesome Ensemble Learning
Ensemble learning related books, papers, videos, and toolboxes
Pyss3
A Python package implementing a new machine learning model for text classification with visualization tools for Explainable AI
Ail Framework
AIL framework - Analysis Information Leak framework
Dataaspirant codes
Complete machine learning model codes
2017 Ccf Bdci Aijudge
2017-CCF-BDCI-让AI当法官(初赛):7th/415 (Top 1.68%)
Chefboost
A Lightweight Decision Tree Framework supporting regular algorithms: ID3, C4,5, CART, CHAID and Regression Trees; some advanced techniques: Gradient Boosting (GBDT, GBRT, GBM), Random Forest and Adaboost w/categorical features support for Python
Python practice of data analysis and mining
《Python数据分析与挖掘实战》随书源码与数据
Data Science Resources
👨🏽‍🏫You can learn about what data science is and why it's important in today's modern world. Are you interested in data science?🔋
Lightgbm
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
Pipeline
the `pipeline` shell command
Spypi
An (un-)ethical hacking-station based on Raspberry Pi and Python
Pdftabextract
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
Pzad
Курс "Прикладные задачи анализа данных" (ВМК, МГУ имени М.В. Ломоносова)
Xioc
Extract indicators of compromise from text, including "escaped" ones.
Alimusic
🎼天池阿里音乐流行趋势预测大赛,项目中涵盖了从初赛到复赛的全部核心代码。复赛的聚合数据可以在百度网盘下载,更详细的思路介绍欢迎访问我的博客。
Rosie Pattern Language
Rosie Pattern Language (RPL) and the Rosie Pattern Engine have MOVED!
Fantasy Basketball
Scraping statistics, predicting NBA player performance with neural networks and boosting algorithms, and optimising lineups for Draft Kings with genetic algorithm. Capstone Project for Machine Learning Engineer Nanodegree by Udacity.
Efficient Apriori
An efficient Python implementation of the Apriori algorithm.
Matrixprofile
A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.
Accelerator
The Accelerator is a tool for fast and reproducible processing of large amounts of data.
Easyocr
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Striplog
Lithology and stratigraphic logs for wells or outcrop.
Tipdm
TipDM建模平台,开源的数据挖掘工具。
Wekadeeplearning4j
Weka package for the Deeplearning4j java library
Emotion Recognition From Speech
A machine learning application for emotion recognition from speech
Rightmove webscraper.py
Python class to scrape data from rightmove.co.uk and return listings in a pandas DataFrame object
1-60 of 285 data-mining projects