A super-fast and scalable Random Forest library based on fast histogram decision tree algorithm and distributed bagging framework. It can be used for binary classification, multi-label classification, and regression tasks. This library provides both Python and command line interface to users.

Stars: ✭ 20 (+17.65%)

Mutual labels: data-mining

hdnom

Benchmarking and Visualization Toolkit for Penalized Cox Models

Stars: ✭ 36 (+111.76%)

Mutual labels: high-dimensional-data

Lasio

Python library for reading and writing well data using Log ASCII Standard (LAS) files

Stars: ✭ 234 (+1276.47%)

Mutual labels: data-mining

Statistical Learning

Lecture Slides and R Sessions for Trevor Hastie and Rob Tibshinari's "Statistical Learning" Stanford course

Stars: ✭ 223 (+1211.76%)

Mutual labels: data-mining

Data-Mining-on-Social-Media

Python scripts to extract tweets and facebook posts from public users.

Stars: ✭ 99 (+482.35%)

Mutual labels: data-mining

KaliIntelligenceSuite

Kali Intelligence Suite (KIS) shall aid in the fast, autonomous, central, and comprehensive collection of intelligence by executing standard penetration testing tools. The collected data is internally stored in a structured manner to allow the fast identification and visualisation of the collected information.

Stars: ✭ 58 (+241.18%)

Mutual labels: data-mining

Semantic-Bus

object flow treatment, data transformation

Stars: ✭ 49 (+188.24%)

Mutual labels: data-mining

Apriori-and-Eclat-Frequent-Itemset-Mining

Implementation of the Apriori and Eclat algorithms, two of the best-known basic algorithms for mining frequent item sets in a set of transactions, implementation in Python.

Stars: ✭ 36 (+111.76%)

Mutual labels: data-mining

kenchi

A scikit-learn compatible library for anomaly detection

Stars: ✭ 36 (+111.76%)

Mutual labels: data-mining

corpusexplorer2.0

Korpuslinguistik war noch nie so einfach...

Stars: ✭ 16 (-5.88%)

Mutual labels: data-mining

Tweetfeels

Real-time sentiment analysis in Python using twitter's streaming api

Stars: ✭ 249 (+1364.71%)

Mutual labels: data-mining

PyDREAM

Python Implementation of Decay Replay Mining (DREAM)

Stars: ✭ 22 (+29.41%)

Mutual labels: data-mining

Data Mining Conferences

Ranking, acceptance rate, deadline, and publication tips

Stars: ✭ 236 (+1288.24%)

Mutual labels: data-mining

MetQy

Repository for R package MetQy (read related publication here: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6247936/)

Stars: ✭ 17 (+0%)

Mutual labels: data-mining

Chirp

Interface to manage and centralize Google Alert information

Stars: ✭ 227 (+1235.29%)

Mutual labels: data-mining

Medium-Stats-Analysis

Exploring data and analyzing metrics for user-specific Medium Stats

Stars: ✭ 27 (+58.82%)

Mutual labels: data-mining

DiVE

An interactive 3D web viewer of up to million points on one screen that represent data. Provides interaction for viewing high-dimensional data that has been previously embedded in 3D or 2D. Based on graphosaurus.js and three.js. For a Linux release of a complete embedding+visualization pipeline please visit https://github.com/sonjageorgievska/Em…

Stars: ✭ 26 (+52.94%)

Mutual labels: high-dimensional-data

Prefixspan Py

The shortest yet efficient Python implementation of the sequential pattern mining algorithm PrefixSpan, closed sequential pattern mining algorithm BIDE, and generator sequential pattern mining algorithm FEAT.

Stars: ✭ 214 (+1158.82%)

Mutual labels: data-mining

Zhihu Analysis Python

Social Network Analysis of Zhihu with Python

Stars: ✭ 215 (+1164.71%)

Mutual labels: data-mining

PaperWeeklyAI

📚「@MaiweiAI」Studying papers in the fields of computer vision, NLP, and machine learning algorithms every week.

Stars: ✭ 50 (+194.12%)

Mutual labels: data-mining

sciblox

sciblox - Easier Data Science and Machine Learning

Stars: ✭ 48 (+182.35%)

Mutual labels: data-mining

Asclepius

Open Price Comparison for US Hospitals

Stars: ✭ 20 (+17.65%)

Mutual labels: data-mining

ppmlhdfe

Poisson pseudo-likelihood regression with multiple levels of fixed effects

Stars: ✭ 46 (+170.59%)

Mutual labels: high-dimensional-data

sugarcube

Monoidal data processes.

Stars: ✭ 32 (+88.24%)

Mutual labels: data-mining

hierarchical-clustering

A Python implementation of divisive and hierarchical clustering algorithms. The algorithms were tested on the Human Gene DNA Sequence dataset and dendrograms were plotted.

Stars: ✭ 62 (+264.71%)

Mutual labels: data-mining

Qminer

Analytic platform for real-time large-scale streams containing structured and unstructured data.

Stars: ✭ 206 (+1111.76%)

Mutual labels: data-mining

Data-Analyst-Nanodegree

This repo consists of the projects that I completed as a part of the Udacity's Data Analyst Nanodegree's curriculum.

Stars: ✭ 13 (-23.53%)

Mutual labels: data-mining

Rule Extraction from Trees

A toolkit for extracting comprehensible rules from tree-based algorithms

Stars: ✭ 34 (+100%)

Mutual labels: data-mining

awesome-Python-data-science-books

Probably the best curated list of data science books in Python

Stars: ✭ 331 (+1847.06%)

Mutual labels: data-mining

Awesome Datascience

📝 An awesome Data Science repository to learn and apply for real world problems.

Stars: ✭ 17,520 (+102958.82%)

Mutual labels: data-mining

teanaps

자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.

Stars: ✭ 91 (+435.29%)

Mutual labels: data-mining

Orange3

🍊 📊 💡 Orange: Interactive data analysis

Stars: ✭ 3,152 (+18441.18%)

Mutual labels: data-mining

conferencias matutinas amlo

CSVs de las versiones estenográficas de las conferencias matutinas del Presidente Andres Manuel López Obrador ( Mañaneras AMLO )

Stars: ✭ 25 (+47.06%)

Mutual labels: data-mining

Python Projects

some python projects

Stars: ✭ 247 (+1352.94%)

Mutual labels: data-mining

bsu

🎓Repository for university labs on FAMCS, BSU

Stars: ✭ 91 (+435.29%)

Mutual labels: data-mining

Reaper

Social media scraping / data collection tool for the Facebook, Twitter, Reddit, YouTube, Pinterest, and Tumblr APIs

Stars: ✭ 240 (+1311.76%)

Mutual labels: data-mining

loon

A Toolkit for Interactive Statistical Data Visualization

Stars: ✭ 45 (+164.71%)

Mutual labels: high-dimensional-data

Datascience

Curated list of Python resources for data science.

Stars: ✭ 3,051 (+17847.06%)

Mutual labels: data-mining

xgboost-smote-detect-fraud

Can we predict accurately on the skewed data? What are the sampling techniques that can be used. Which models/techniques can be used in this scenario? Find the answers in this code pattern!

Stars: ✭ 59 (+247.06%)

Mutual labels: data-mining

Deepgraph

Analyze Data with Pandas-based Networks. Documentation:

Stars: ✭ 232 (+1264.71%)

Mutual labels: data-mining

TextClassification

基于scikit-learn实现对新浪新闻的文本分类，数据集为100w篇文档，总计10类，测试集与训练集1:1划分。分类算法采用SVM和Bayes，其中Bayes作为baseline。

Stars: ✭ 86 (+405.88%)

Mutual labels: data-mining

Automlpipeline.jl

A package that makes it trivial to create and evaluate machine learning pipeline architectures.

Stars: ✭ 223 (+1211.76%)

Mutual labels: data-mining

website-to-json

Converts website to json using jQuery selectors

Stars: ✭ 37 (+117.65%)

Mutual labels: data-mining

Amazing Feature Engineering

Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

Stars: ✭ 218 (+1182.35%)

Mutual labels: data-mining

Heart disease prediction

Heart Disease prediction using 5 algorithms

Stars: ✭ 43 (+152.94%)

Mutual labels: data-mining

Gwu data mining

Materials for GWU DNSC 6279 and DNSC 6290.

Stars: ✭ 217 (+1176.47%)

Mutual labels: data-mining

iis

Information Inference Service of the OpenAIRE system