All Projects → danilkolikov → fsfc

danilkolikov / fsfc

Licence: MIT license
Feature Selection for Clustering

Programming Languages

python
139335 projects - #7 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to fsfc

Mlr
Machine Learning in R
Stars: ✭ 1,542 (+1827.5%)
Mutual labels:  clustering, feature-selection
Machine-Learning-Algorithms
All Machine Learning Algorithms
Stars: ✭ 24 (-70%)
Mutual labels:  clustering
Dcc
This repository contains the source code and data for reproducing results of Deep Continuous Clustering paper
Stars: ✭ 179 (+123.75%)
Mutual labels:  clustering
Gemsec
The TensorFlow reference implementation of 'GEMSEC: Graph Embedding with Self Clustering' (ASONAM 2019).
Stars: ✭ 210 (+162.5%)
Mutual labels:  clustering
Clustergrammer
An interactive heatmap visualization built using D3.js
Stars: ✭ 188 (+135%)
Mutual labels:  clustering
Clustering.jl
A Julia package for data clustering
Stars: ✭ 227 (+183.75%)
Mutual labels:  clustering
Splitter
A Pytorch implementation of "Splitter: Learning Node Representations that Capture Multiple Social Contexts" (WWW 2019).
Stars: ✭ 177 (+121.25%)
Mutual labels:  clustering
Pottslab
Unsupervised multilabel image segmentation (color/gray/multichannel) based on the Potts model (aka piecewise constant Mumford-Shah model)
Stars: ✭ 97 (+21.25%)
Mutual labels:  clustering
dmmclust
dmmclust is a package for clustering short texts, based on Yin and Wang (2014)
Stars: ✭ 23 (-71.25%)
Mutual labels:  clustering
Keras deep clustering
How to do Unsupervised Clustering with Keras
Stars: ✭ 202 (+152.5%)
Mutual labels:  clustering
Vectorai
Vector AI — A platform for building vector based applications. Encode, query and analyse data using vectors.
Stars: ✭ 195 (+143.75%)
Mutual labels:  clustering
Pqkmeans
Fast and memory-efficient clustering
Stars: ✭ 189 (+136.25%)
Mutual labels:  clustering
Clustering With Deep Learning
Generic implementation for clustering with deep learning : representation learning (DNN) + clustering
Stars: ✭ 236 (+195%)
Mutual labels:  clustering
Dtwclust
R Package for Time Series Clustering Along with Optimizations for DTW
Stars: ✭ 185 (+131.25%)
Mutual labels:  clustering
ml-book
Codice sorgente ed Errata Corrige del mio libro "A tu per tu col Machine Learning"
Stars: ✭ 16 (-80%)
Mutual labels:  clustering
Gsdmm
GSDMM: Short text clustering
Stars: ✭ 175 (+118.75%)
Mutual labels:  clustering
Uci Ml Api
Simple API for UCI Machine Learning Dataset Repository (search, download, analyze)
Stars: ✭ 190 (+137.5%)
Mutual labels:  clustering
Spectralcluster
Python re-implementation of the spectral clustering algorithm in the paper "Speaker Diarization with LSTM"
Stars: ✭ 220 (+175%)
Mutual labels:  clustering
scicloj.ml
A Clojure machine learning library
Stars: ✭ 152 (+90%)
Mutual labels:  clustering
CytoPy
A data-centric flow/mass cytometry automated analysis framework
Stars: ✭ 27 (-66.25%)
Mutual labels:  clustering

Feature Selection for Clustering

mit Documentation Status

FSFC is a library with algorithms of feature selection for clustering.

It's based on the article "Feature Selection for Clustering: A Review." by S. Alelyani, J. Tang and H. Liu

Algorithms are covered with tests that check their correctness and compute some clustering metrics. For testing we use open datasets:

Project documentation is available on Read the Docs

Implemented algorithms:

  • Generic Data:
    • SPEC family - NormalizedCut, ArbitraryClustering, FixedClustering
    • Sparse clustering - Lasso
    • Localised feature selection - LFSBSS algorithm
    • Multi-Cluster Feature Selection
    • Weighted K-means
  • Text Data:
    • Text clustering - Chi-R algorithm, Feature Set-Based Clustering (FTC)
    • Frequent itemset extraction - Apriori

Dependencies:

  • numpy
  • scikit-learn
  • scipy

How to use:

Now the project is in the early alpha stage, so it isn't publish to pip.

Because of it, installation of the project is a bit complicated. To use FSFC you should:

  1. Clone repository to your computer.
  2. Run make init to install dependencies.
  3. Copy content of the folder fsfc to the source root of your project.

After it you can use feature selectors as follows:

import numpy as np
from fsfc.generic import NormalizedCut
from sklearn.pipeline import Pipeline
from sklearn.cluster import KMeans

data = np.array([...])

pipeline = Pipeline([
    ('select', NormalizedCut(3)),
    ('cluster', KMeans())
])
pipeline.fit_predict(data)

How to support:

You can support development by testing and reporting of bugs or opening pull-requests.

Project has tests, they can be run with the command make test

Also code there is a Sphinx documentation for code, it can be built with the command make html. Documentation uses numpydoc, so it should be installed on the system. To do it, run pip install numpydoc.

References:

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].