All Projects → mhahsler → Stream

mhahsler / Stream

A framework for data stream modeling and associated data mining tasks such as clustering and classification. - R Package

Programming Languages

r
7636 projects

Projects that are alternatives of or similar to Stream

Refinr
Cluster and merge similar char values: an R implementation of Open Refine clustering algorithms
Stars: ✭ 91 (+295.65%)
Mutual labels:  clustering, cran
Mlr
Machine Learning in R
Stars: ✭ 1,542 (+6604.35%)
Mutual labels:  clustering, cran
Dbscan
Density Based Clustering of Applications with Noise (DBSCAN) and Related Algorithms - R package
Stars: ✭ 161 (+600%)
Mutual labels:  clustering, cran
Talisman
Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.
Stars: ✭ 584 (+2439.13%)
Mutual labels:  clustering
Unsupervised Classification
SCAN: Learning to Classify Images without Labels (ECCV 2020), incl. SimCLR.
Stars: ✭ 605 (+2530.43%)
Mutual labels:  clustering
Future
🚀 R package: future: Unified Parallel and Distributed Processing in R for Everyone
Stars: ✭ 735 (+3095.65%)
Mutual labels:  cran
Bagofconcepts
Python implementation of bag-of-concepts
Stars: ✭ 18 (-21.74%)
Mutual labels:  clustering
Cilantro
A lean C++ library for working with point cloud data
Stars: ✭ 577 (+2408.7%)
Mutual labels:  clustering
Watset
Watset: Automatic Induction of Synsets from a Graph of Synonyms
Stars: ✭ 16 (-30.43%)
Mutual labels:  clustering
Depth clustering
🚕 Fast and robust clustering of point clouds generated with a Velodyne sensor.
Stars: ✭ 657 (+2756.52%)
Mutual labels:  clustering
Complexheatmap
Make Complex Heatmaps
Stars: ✭ 654 (+2743.48%)
Mutual labels:  clustering
Eliasdb
EliasDB a graph-based database.
Stars: ✭ 611 (+2556.52%)
Mutual labels:  clustering
Googlesheets
Google Spreadsheets R API
Stars: ✭ 771 (+3252.17%)
Mutual labels:  cran
Smile
Statistical Machine Intelligence & Learning Engine
Stars: ✭ 5,412 (+23430.43%)
Mutual labels:  clustering
Nanny
A tidyverse suite for (pre-) machine-learning: cluster, PCA, permute, impute, rotate, redundancy, triangular, smart-subset, abundant and variable features.
Stars: ✭ 17 (-26.09%)
Mutual labels:  clustering
Highcharter
R wrapper for highcharts
Stars: ✭ 583 (+2434.78%)
Mutual labels:  cran
Pyclustering
pyclustring is a Python, C++ data mining library.
Stars: ✭ 806 (+3404.35%)
Mutual labels:  clustering
Scikit Multilearn
A scikit-learn based module for multi-label et. al. classification
Stars: ✭ 638 (+2673.91%)
Mutual labels:  clustering
Machine Learning Octave
🤖 MatLab/Octave examples of popular machine learning algorithms with code examples and mathematics being explained
Stars: ✭ 637 (+2669.57%)
Mutual labels:  clustering
Agoo
A High Performance HTTP Server for Ruby
Stars: ✭ 679 (+2852.17%)
Mutual labels:  clustering

stream - Infrastructure for Data Stream Mining - R package

CRAN version CRAN RStudio mirror downloads Travis-CI Build Status AppVeyor Build Status

The package provides support for modeling and simulating data streams as well as an extensible framework for implementing, interfacing and experimenting with algorithms for various data stream mining tasks. The main advantage of stream is that it seamlessly integrates with the large existing infrastructure provided by R. The package currently focuses on data stream clustering and provides implementations of BICO, BIRCH, D-Stream, DBSTREAM, and evoStream.

Additional packages in the stream family are:

  • streamMOA: Interface to clustering algorithms implemented in the MOA framework. Includes implementations of DenStream, ClusTree and CluStream.
  • subspaceMOA: Interface to Subspace MOA and its implementations of HDDStream and PreDeConStream.

The development of the stream package was supported in part by NSF IIS-0948893 and NIH R21HG005912.

Installation

Stable CRAN version: install from within R with

install.packages("stream")

Current development version: Download package from AppVeyor or install from GitHub (needs devtools).

install_git("mhahsler/stream")

Usage

Load the package and create micro-clusters via sampling.

library("stream")
stream <- DSD_Gaussians(k=3, noise=0)

sample <- DSC_Sample(k=20)
update(sample, stream, 500)
sample
Reservoir sampling
Class: DSC_Sample, DSC_Micro, DSC_R, DSC 
Number of micro-clusters: 20 

Recluster micro-clusters using k-means and plot results

kmeans <- DSC_Kmeans(k=3)
recluster(kmeans, sample)
plot(kmeans, stream, type="both")

References

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].