mstuefer / Data_mining
Licence: mit
The Ruby DataMining Gem, is a little collection of several Data-Mining-Algorithms
Stars: ✭ 10
Programming Languages
ruby
36898 projects - #4 most used programming language
Projects that are alternatives of or similar to Data mining
genieclust
Genie++ Fast and Robust Hierarchical Clustering with Noise Point Detection - for Python and R
Stars: ✭ 34 (+240%)
Mutual labels: data-mining, clustering, machine-learning-algorithms
genie
Genie: A Fast and Robust Hierarchical Clustering Algorithm (this R package has now been superseded by genieclust)
Stars: ✭ 21 (+110%)
Mutual labels: data-mining, clustering, machine-learning-algorithms
Heart disease prediction
Heart Disease prediction using 5 algorithms
Stars: ✭ 43 (+330%)
Mutual labels: data-mining, clustering, machine-learning-algorithms
taller SparkR
Taller SparkR para las Jornadas de Usuarios de R
Stars: ✭ 12 (+20%)
Mutual labels: data-mining, machine-learning-algorithms
xgboost-smote-detect-fraud
Can we predict accurately on the skewed data? What are the sampling techniques that can be used. Which models/techniques can be used in this scenario? Find the answers in this code pattern!
Stars: ✭ 59 (+490%)
Mutual labels: data-mining, machine-learning-algorithms
teanaps
자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.
Stars: ✭ 91 (+810%)
Mutual labels: data-mining, clustering
EgoSplitting
A NetworkX implementation of "Ego-splitting Framework: from Non-Overlapping to Overlapping Clusters" (KDD 2017).
Stars: ✭ 78 (+680%)
Mutual labels: clustering, machine-learning-algorithms
SparseLSH
A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.
Stars: ✭ 127 (+1170%)
Mutual labels: data-mining, clustering
Lihang algorithms
用python和sklearn两种方法实现李航《统计学习方法》中的算法
Stars: ✭ 263 (+2530%)
Mutual labels: data-mining, machine-learning-algorithms
Moa
MOA is an open source framework for Big Data stream mining. It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation.
Stars: ✭ 409 (+3990%)
Mutual labels: machine-learning-algorithms, clustering
Model Describer
model-describer : Making machine learning interpretable to humans
Stars: ✭ 22 (+120%)
Mutual labels: data-mining, machine-learning-algorithms
hierarchical-clustering
A Python implementation of divisive and hierarchical clustering algorithms. The algorithms were tested on the Human Gene DNA Sequence dataset and dendrograms were plotted.
Stars: ✭ 62 (+520%)
Mutual labels: data-mining, clustering
kmeans
A simple implementation of K-means (and Bisecting K-means) clustering algorithm in Python
Stars: ✭ 18 (+80%)
Mutual labels: data-mining, clustering
PaperWeeklyAI
📚「@MaiweiAI」Studying papers in the fields of computer vision, NLP, and machine learning algorithms every week.
Stars: ✭ 50 (+400%)
Mutual labels: data-mining, machine-learning-algorithms
Spring2017 proffosterprovost
Introduction to Data Science
Stars: ✭ 18 (+80%)
Mutual labels: data-mining, machine-learning-algorithms
Orange3
🍊 📊 💡 Orange: Interactive data analysis
Stars: ✭ 3,152 (+31420%)
Mutual labels: data-mining, clustering
Machine-Learning-Algorithms
All Machine Learning Algorithms
Stars: ✭ 24 (+140%)
Mutual labels: clustering, machine-learning-algorithms
Machine Learning Books
book
Stars: ✭ 290 (+2800%)
Mutual labels: data-mining, machine-learning-algorithms
DataMining
DataMining is a little collection of several Data-Mining-Algorithms. Since is written in pure ruby and does not depend on any extension, it is platform independent.
Alogrithms
Already implemented
- Density Based Clustering (DBSCAN)
- Apriori
- PageRank
- k-Nearest Neighbor Classifier
Coming soon
- k-Means
- Naive Bayes
- ...
Installation
$ gem install data_mining
Usage
For Density Based Clustering
require 'data_mining'
#
# Point with id 'point1', x-value 1 and y-value 2:
# [:point1, [1, 2]]
#
input_data = [
[:point1, [1,2]],
[:point2, [2,1]],
[:point3, [10,10]]
]
radius = 3
min_points = 2
dbscan = DataMining::DBScan.new(input_data, radius, min_points)
dbscan.cluster!
dbscan.clusters # gives 1 cluster found containing point1 and point2
dbscan.outliers # gives point3 as outlier
For Apriori
require 'data_mining'
transactions = [
[:transaction1, [:product_a, :product_b, :product_e]],
[:transaction2, [:product_b, :product_d]],
[:transaction3, [:product_b, :product_c]],
[:transaction4, [:product_a, :product_b, :product_d]]
]
min_support = 2
apriori = DataMining::Apriori.new(transactions, min_support)
apriori.mine!
apriori.results
# gives the following array:
# => [ [[:product_a], [:product_b], [:product_d]],
# [[:product_a, :product_b], [:product_b, :product_d]]
# ]
# where position 0 in the array, holds an array of all single items which
# satisfy the min_support. position 1, holds an array of all pair items
# satisfying the min_support and so on as long as min_support is satisified.
# Perhaps an easier way to get an item_set immediately:
apriori.item_sets_size(2)
# gives the following array, representing all item sets of size two, satisfying
# the min_support:
# [[:product_a, :product_b], [:product_b, :product_d]]
For PageRank
require 'data_mining'
graph = [
[:node_1, [:node_2]],
[:node_2, [:node_1, :node_3]],
[:node_3, [:node_2]]
]
pagerank = DataMining::PageRank.new(graph)
# we can also pass a damping factor, default is 0.85
# DataMining::PageRank.new(graph, 0.90)
# as well as the iterations to calculate the pagerank, default
# is 100
# DataMining::PageRank.new(graph, 0.85, 1000)
pagerank.rank!
pagerank.ranks
# gives the following hash:
# => {:node_1 => 0.2567567634554257, :node_2 => 0.4864864730891484,
# :node_3 => 0.2567567634554257}
# where the key stays for the node and the value for the calculated
# pagerank
For K-Nearest Neighbor Classifier
require 'data_mining'
data = [
[:class_1, [1, 1]],
[:class_1, [2, 2]],
[:class_2, [10, 10]],
[:class_2, [11, 12]],
[:class_3, [12, 12]]
]
k = 2
knn = DataMining::KNearestNeighbor.new(data, k)
knn.classify([:unknown_class, [2, 3]]) # gives :class_1 back
# Since the given point of :unknown_class has the coordinates
# (2, 3) for (x, y) and has therefore the following two points
# as his 2 (k=2) nearest neighbors:
# [:class_1, [1, 1]]
# [:class_1, [2, 2]]
#
# And since all neighbors are of the same class (:class_1), the
# majority of the k-nearest-neighbor classes is obviously also :class_1
Contributing
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Added some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request
License
(The MIT License)
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].