Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Can we predict accurately on the skewed data? What are the sampling techniques that can be used. Which models/techniques can be used in this scenario? Find the answers in this code pattern!

Stars: ✭ 59 (+490%)

Mutual labels: data-mining, machine-learning-algorithms

teanaps

자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.

Stars: ✭ 91 (+810%)

Mutual labels: data-mining, clustering

EgoSplitting

A NetworkX implementation of "Ego-splitting Framework: from Non-Overlapping to Overlapping Clusters" (KDD 2017).

Stars: ✭ 78 (+680%)

Mutual labels: clustering, machine-learning-algorithms

SparseLSH

A Locality Sensitive Hashing (LSH) library with an emphasis on large, highly-dimensional datasets.

Stars: ✭ 127 (+1170%)

Mutual labels: data-mining, clustering

Lihang algorithms

用python和sklearn两种方法实现李航《统计学习方法》中的算法

Stars: ✭ 263 (+2530%)

Mutual labels: data-mining, machine-learning-algorithms

Moa

MOA is an open source framework for Big Data stream mining. It includes a collection of machine learning algorithms (classification, regression, clustering, outlier detection, concept drift detection and recommender systems) and tools for evaluation.

Stars: ✭ 409 (+3990%)

Mutual labels: machine-learning-algorithms, clustering

All Algorithms implemented in R

Stars: ✭ 294 (+2840%)

Mutual labels: data-mining, clustering

Model Describer

model-describer : Making machine learning interpretable to humans

Stars: ✭ 22 (+120%)

Mutual labels: data-mining, machine-learning-algorithms

hierarchical-clustering

A Python implementation of divisive and hierarchical clustering algorithms. The algorithms were tested on the Human Gene DNA Sequence dataset and dendrograms were plotted.

Stars: ✭ 62 (+520%)

Mutual labels: data-mining, clustering

kmeans

A simple implementation of K-means (and Bisecting K-means) clustering algorithm in Python

Stars: ✭ 18 (+80%)

Mutual labels: data-mining, clustering

PaperWeeklyAI

📚「@MaiweiAI」Studying papers in the fields of computer vision, NLP, and machine learning algorithms every week.

Stars: ✭ 50 (+400%)

Mutual labels: data-mining, machine-learning-algorithms

Spring2017 proffosterprovost

Introduction to Data Science

Stars: ✭ 18 (+80%)

Mutual labels: data-mining, machine-learning-algorithms

Orange3

🍊 📊 💡 Orange: Interactive data analysis

Stars: ✭ 3,152 (+31420%)

Mutual labels: data-mining, clustering

Machine-Learning-Algorithms

All Machine Learning Algorithms

Stars: ✭ 24 (+140%)

Mutual labels: clustering, machine-learning-algorithms

Machine Learning Books

book

Stars: ✭ 290 (+2800%)

Mutual labels: data-mining, machine-learning-algorithms

Elki

ELKI Data Mining Toolkit

Stars: ✭ 613 (+6030%)

Mutual labels: data-mining, clustering

View All Similar Projects ➔

DataMining

DataMining is a little collection of several Data-Mining-Algorithms. Since is written in pure ruby and does not depend on any extension, it is platform independent.

Alogrithms

Already implemented

Density Based Clustering (DBSCAN)
Apriori
PageRank
k-Nearest Neighbor Classifier

Coming soon

k-Means
Naive Bayes
...

Installation

$ gem install data_mining

Usage

For Density Based Clustering

  require 'data_mining'

  #
  # Point with id 'point1', x-value 1 and y-value 2:
  # [:point1, [1, 2]]
  #
  input_data = [
                [:point1, [1,2]],
                [:point2, [2,1]],
                [:point3, [10,10]]
               ]
  radius = 3
  min_points = 2
  dbscan = DataMining::DBScan.new(input_data, radius, min_points)
  dbscan.cluster!

  dbscan.clusters # gives 1 cluster found containing point1 and point2

  dbscan.outliers # gives point3 as outlier

For Apriori

  require 'data_mining'

  transactions = [
                    [:transaction1, [:product_a, :product_b, :product_e]],
                    [:transaction2, [:product_b, :product_d]],
                    [:transaction3, [:product_b, :product_c]],
                    [:transaction4, [:product_a, :product_b, :product_d]]
                  ]
  min_support  = 2
  apriori      = DataMining::Apriori.new(transactions, min_support)
  apriori.mine!

  apriori.results
  # gives the following array:
  # => [ [[:product_a], [:product_b], [:product_d]],
  #      [[:product_a, :product_b], [:product_b, :product_d]]
  #    ]
  # where position 0 in the array, holds an array of all single items which
  # satisfy the min_support. position 1, holds an array of all pair items
  # satisfying the min_support and so on as long as min_support is satisified.

  # Perhaps an easier way to get an item_set immediately:
  apriori.item_sets_size(2)
  # gives the following array, representing all item sets of size two, satisfying
  # the min_support:
  # [[:product_a, :product_b], [:product_b, :product_d]]

For PageRank

  require 'data_mining'

  graph   = [
              [:node_1, [:node_2]],
              [:node_2, [:node_1, :node_3]],
              [:node_3, [:node_2]]
            ]

  pagerank  = DataMining::PageRank.new(graph)
  # we can also pass a damping factor, default is 0.85
  # DataMining::PageRank.new(graph, 0.90)
  # as well as the iterations to calculate the pagerank, default
  # is 100
  # DataMining::PageRank.new(graph, 0.85, 1000)
  pagerank.rank!

  pagerank.ranks
  # gives the following hash:
  # => {:node_1 => 0.2567567634554257, :node_2 => 0.4864864730891484,
  #     :node_3 => 0.2567567634554257}
  # where the key stays for the node and the value for the calculated
  # pagerank

For K-Nearest Neighbor Classifier

  require 'data_mining'

  data  = [
            [:class_1, [1, 1]],
            [:class_1, [2, 2]],
            [:class_2, [10, 10]],
            [:class_2, [11, 12]],
            [:class_3, [12, 12]]
          ]
  k     = 2

  knn   = DataMining::KNearestNeighbor.new(data, k)

  knn.classify([:unknown_class, [2, 3]]) # gives :class_1 back

  # Since the given point of :unknown_class has the coordinates
  # (2, 3) for (x, y) and has therefore the following two points
  # as his 2 (k=2) nearest neighbors:
  #   [:class_1, [1, 1]]
  #   [:class_1, [2, 2]]
  #
  # And since all neighbors are of the same class (:class_1), the
  # majority of the k-nearest-neighbor classes is obviously also :class_1

Contributing

Fork it
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Added some feature')
Push to the branch (git push origin my-new-feature)
Create new Pull Request

License

(The MIT License)

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 10

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (0) 🔗