All Projects → hank110 → Bagofconcepts

hank110 / Bagofconcepts

Licence: mit
Python implementation of bag-of-concepts

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Bagofconcepts

Text Summarizer
Python Framework for Extractive Text Summarization
Stars: ✭ 96 (+433.33%)
Mutual labels:  unsupervised-learning, word2vec, clustering
Tadw
An implementation of "Network Representation Learning with Rich Text Information" (IJCAI '15).
Stars: ✭ 43 (+138.89%)
Mutual labels:  unsupervised-learning, word2vec, text-mining
Self Supervised Learning Overview
📜 Self-Supervised Learning from Images: Up-to-date reading list.
Stars: ✭ 73 (+305.56%)
Mutual labels:  unsupervised-learning, representation-learning, clustering
Danmf
A sparsity aware implementation of "Deep Autoencoder-like Nonnegative Matrix Factorization for Community Detection" (CIKM 2018).
Stars: ✭ 161 (+794.44%)
Mutual labels:  unsupervised-learning, word2vec, clustering
Gemsec
The TensorFlow reference implementation of 'GEMSEC: Graph Embedding with Self Clustering' (ASONAM 2019).
Stars: ✭ 210 (+1066.67%)
Mutual labels:  unsupervised-learning, word2vec, clustering
M-NMF
An implementation of "Community Preserving Network Embedding" (AAAI 2017)
Stars: ✭ 119 (+561.11%)
Mutual labels:  clustering, representation-learning, unsupervised-learning
Revisiting-Contrastive-SSL
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [NeurIPS 2021]
Stars: ✭ 81 (+350%)
Mutual labels:  clustering, representation-learning, unsupervised-learning
Unsupervised Classification
SCAN: Learning to Classify Images without Labels (ECCV 2020), incl. SimCLR.
Stars: ✭ 605 (+3261.11%)
Mutual labels:  unsupervised-learning, representation-learning, clustering
Nlp In Practice
Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Stars: ✭ 790 (+4288.89%)
Mutual labels:  word2vec, text-mining
dti-clustering
(NeurIPS 2020 oral) Code for "Deep Transformation-Invariant Clustering" paper
Stars: ✭ 60 (+233.33%)
Mutual labels:  clustering, unsupervised-learning
Simclr
PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
Stars: ✭ 750 (+4066.67%)
Mutual labels:  unsupervised-learning, representation-learning
altair
Assessing Source Code Semantic Similarity with Unsupervised Learning
Stars: ✭ 42 (+133.33%)
Mutual labels:  word2vec, unsupervised-learning
Text-Analysis
Explaining textual analysis tools in Python. Including Preprocessing, Skip Gram (word2vec), and Topic Modelling.
Stars: ✭ 48 (+166.67%)
Mutual labels:  text-mining, word2vec
L2c
Learning to Cluster. A deep clustering strategy.
Stars: ✭ 262 (+1355.56%)
Mutual labels:  unsupervised-learning, clustering
ML2017FALL
Machine Learning (EE 5184) in NTU
Stars: ✭ 66 (+266.67%)
Mutual labels:  clustering, unsupervised-learning
Simclr
PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations by T. Chen et al.
Stars: ✭ 293 (+1527.78%)
Mutual labels:  unsupervised-learning, representation-learning
Self Label
Self-labelling via simultaneous clustering and representation learning. (ICLR 2020)
Stars: ✭ 324 (+1700%)
Mutual labels:  representation-learning, clustering
MVGL
TCyb 2018: Graph learning for multiview clustering
Stars: ✭ 26 (+44.44%)
Mutual labels:  clustering, unsupervised-learning
2018 Machinelearning Lectures Esa
Machine Learning Lectures at the European Space Agency (ESA) in 2018
Stars: ✭ 280 (+1455.56%)
Mutual labels:  text-mining, clustering
Contrastive Predictive Coding
Keras implementation of Representation Learning with Contrastive Predictive Coding
Stars: ✭ 369 (+1950%)
Mutual labels:  unsupervised-learning, representation-learning

BOC (Bag-of-Concepts)

This is python implementation of Bag-of-Concepts, as proposed in the paper "Bag-of-Concepts: Comprehending Document Representation through Clustering Words in Distributed Representation" (Han Kyul Kim, Hyunjoong Kim, Sunzoon Cho)

For a given text data, it trains word2vec vectors for each of the words and clusters semantically similar words into a common "concept".

Subsequently, each document is represented by the counts of these concepts with concept frequency - inverse document frequency weighting scheme.

Installation

$ pip install bagofconcepts

Basic Usage

import bagofconcepts as boc


# Each line of corpus must be equivalent to each document of the corpus
boc_model=boc.BOCModel(doc_path="input corpus path")

# output can be saved with save_path parameter
boc_matrix,word2concept_list,idx2word_converter=boc_model.fit()
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].