All Projects → MaartenGr → Concept

MaartenGr / Concept

Licence: MIT license
Concept Modeling: Topic Modeling on Images and Text

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Concept

Lda2vec Pytorch
Topic modeling with word vectors
Stars: ✭ 108 (-9.24%)
Mutual labels:  topic-modeling
Familia
A Toolkit for Industrial Topic Modeling
Stars: ✭ 2,499 (+2000%)
Mutual labels:  topic-modeling
TOM
A library for topic modeling and browsing
Stars: ✭ 91 (-23.53%)
Mutual labels:  topic-modeling
Hypertools
A Python toolbox for gaining geometric insights into high-dimensional data
Stars: ✭ 1,678 (+1310.08%)
Mutual labels:  topic-modeling
Gensim
Topic Modelling for Humans
Stars: ✭ 12,763 (+10625.21%)
Mutual labels:  topic-modeling
Ldagibbssampling
Open Source Package for Gibbs Sampling of LDA
Stars: ✭ 218 (+83.19%)
Mutual labels:  topic-modeling
Learning Social Media Analytics With R
This repository contains code and bonus content which will be added from time to time for the book "Learning Social Media Analytics with R" by Packt
Stars: ✭ 102 (-14.29%)
Mutual labels:  topic-modeling
keras-aquarium
a small collection of models implemented in keras, including matrix factorization(recommendation system), topic modeling, text classification, etc. Runs on tensorflow.
Stars: ✭ 14 (-88.24%)
Mutual labels:  topic-modeling
Lftm
Improving topic models LDA and DMM (one-topic-per-document model for short texts) with word embeddings (TACL 2015)
Stars: ✭ 168 (+41.18%)
Mutual labels:  topic-modeling
auto-gfqg
Automatic Gap-Fill Question Generation
Stars: ✭ 17 (-85.71%)
Mutual labels:  topic-modeling
Tmtoolkit
Text Mining and Topic Modeling Toolkit for Python with parallel processing power
Stars: ✭ 135 (+13.45%)
Mutual labels:  topic-modeling
Palmetto
Palmetto is a quality measuring tool for topics
Stars: ✭ 144 (+21.01%)
Mutual labels:  topic-modeling
Chinese keyphrase extractor
An off-the-shelf tool for Chinese Keyphrase Extraction 一个快速从中文里抽取关键短语的工具,仅占35M内存
Stars: ✭ 237 (+99.16%)
Mutual labels:  topic-modeling
Scattertext
Beautiful visualizations of how language differs among document types.
Stars: ✭ 1,722 (+1347.06%)
Mutual labels:  topic-modeling
stripnet
STriP Net: Semantic Similarity of Scientific Papers (S3P) Network
Stars: ✭ 82 (-31.09%)
Mutual labels:  topic-modeling
Numpy Ml
Machine learning, in numpy
Stars: ✭ 11,100 (+9227.73%)
Mutual labels:  topic-modeling
Tomotopy
Python package of Tomoto, the Topic Modeling Tool
Stars: ✭ 213 (+78.99%)
Mutual labels:  topic-modeling
topic modelling financial news
Topic modelling on financial news with Natural Language Processing
Stars: ✭ 51 (-57.14%)
Mutual labels:  topic-modeling
Topic-Modeling-Workshop-with-R
A workshop on analyzing topic modeling (LDA, CTM, STM) using R
Stars: ✭ 51 (-57.14%)
Mutual labels:  topic-modeling
text-analysis
Weaving analytical stories from text data
Stars: ✭ 12 (-89.92%)
Mutual labels:  topic-modeling

PyPI - Python PyPI - PyPi docs PyPI - License Open In Colab

Concept

Concept is a technique that leverages CLIP and BERTopic-based techniques to perform Concept Modeling on images.

Since topics are part of conversations and text, they do not represent the context of images well. Therefore, these clusters of images are referred to as 'Concepts' instead of the traditional 'Topics'.

Thus, Concept Modeling takes inspiration from topic modeling techniques to cluster images, find common concepts and model them both visually using images and textually using topic representations.

Installation

Installation, with sentence-transformers, can be done using pypi:

pip install concept

Quick Start

First, we need to download and extract 25.000 images from Unsplash used in the sentence-transformers example:

import os
import glob
import zipfile
from tqdm import tqdm
from sentence_transformers import util

# 25k images from Unsplash
img_folder = 'photos/'
if not os.path.exists(img_folder) or len(os.listdir(img_folder)) == 0:
    os.makedirs(img_folder, exist_ok=True)
    
    photo_filename = 'unsplash-25k-photos.zip'
    if not os.path.exists(photo_filename):   #Download dataset if does not exist
        util.http_get('http://sbert.net/datasets/'+photo_filename, photo_filename)
        
    #Extract all images
    with zipfile.ZipFile(photo_filename, 'r') as zf:
        for member in tqdm(zf.infolist(), desc='Extracting'):
            zf.extract(member, img_folder)
img_names = list(glob.glob('photos/*.jpg'))

Next, we only need to pass images to Concept:

from concept import ConceptModel
concept_model = ConceptModel()
concepts = concept_model.fit_transform(img_names)

The resulting concepts can be visualized through concept_model.visualize_concepts():

However, to get the full experience, we need to label the concept clusters with topics. To do this, we need to create a vocabulary. We are going to feed our model with 50.000 nouns from the English vocabulary:

import random
import nltk
nltk.download("wordnet")
from nltk.corpus import wordnet as wn

all_nouns = [word for synset in wn.all_synsets('n') for word in synset.lemma_names() if "_" not in word]
selected_nouns = random.sample(all_nouns, 50_000)

Then, we can pass in the resulting selected_nouns to Concept:

from concept import ConceptModel

concept_model = ConceptModel()
concepts = concept_model.fit_transform(img_names, docs=selected_nouns)

Again, the resulting concepts can be visualized. This time however, we can also see the generated topics through concept_model.visualize_concepts():

NOTE: Use Concept(embedding_model="clip-ViT-B-32-multilingual-v1") to select a model that supports 50+ languages.

Search Concepts

We can quickly search for specific concepts by embedding a search term and finding the cluster embeddings that best represent them. As an example, let us search for the term beach and see what we can find. To do this, we simply run the following:

>>> concept_model.find_concepts("beach")
[(100, 0.277577825349102),
 (53, 0.27431058773894657),
 (95, 0.25973751319723837),
 (77, 0.2560122597417548),
 (97, 0.25361988261846297)]

Each tuple contains two values, the first is the concept cluster and the second the similarity to the search term. The top 5 similar topics are returned.

Now, let us visualize those concepts to see how well the search function works:

concept_model.visualize_concepts(concepts=[100, 53, 95, 77, 97])

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].