a small collection of models implemented in keras, including matrix factorization(recommendation system), topic modeling, text classification, etc. Runs on tensorflow.

Stars: ✭ 14 (-88.24%)

Mutual labels: topic-modeling

Lftm

Improving topic models LDA and DMM (one-topic-per-document model for short texts) with word embeddings (TACL 2015)

Stars: ✭ 168 (+41.18%)

Mutual labels: topic-modeling

auto-gfqg

Automatic Gap-Fill Question Generation

Stars: ✭ 17 (-85.71%)

Mutual labels: topic-modeling

Tmtoolkit

Text Mining and Topic Modeling Toolkit for Python with parallel processing power

Stars: ✭ 135 (+13.45%)

Mutual labels: topic-modeling

Palmetto

Palmetto is a quality measuring tool for topics

Stars: ✭ 144 (+21.01%)

Mutual labels: topic-modeling

Chinese keyphrase extractor

An off-the-shelf tool for Chinese Keyphrase Extraction 一个快速从中文里抽取关键短语的工具，仅占35M内存

Stars: ✭ 237 (+99.16%)

Mutual labels: topic-modeling

Scattertext

Beautiful visualizations of how language differs among document types.

Stars: ✭ 1,722 (+1347.06%)

Mutual labels: topic-modeling

stripnet

STriP Net: Semantic Similarity of Scientific Papers (S3P) Network

Stars: ✭ 82 (-31.09%)

Mutual labels: topic-modeling

Numpy Ml

Machine learning, in numpy

Stars: ✭ 11,100 (+9227.73%)

Mutual labels: topic-modeling

Tomotopy

Python package of Tomoto, the Topic Modeling Tool

Stars: ✭ 213 (+78.99%)

Mutual labels: topic-modeling

topic modelling financial news

Topic modelling on financial news with Natural Language Processing

Stars: ✭ 51 (-57.14%)

Mutual labels: topic-modeling

Topic-Modeling-Workshop-with-R

A workshop on analyzing topic modeling (LDA, CTM, STM) using R

Stars: ✭ 51 (-57.14%)

Mutual labels: topic-modeling

text-analysis

Weaving analytical stories from text data

Stars: ✭ 12 (-89.92%)

Mutual labels: topic-modeling

View All Similar Projects ➔

Concept

Concept is a technique that leverages CLIP and BERTopic-based techniques to perform Concept Modeling on images.

Since topics are part of conversations and text, they do not represent the context of images well. Therefore, these clusters of images are referred to as 'Concepts' instead of the traditional 'Topics'.

Thus, Concept Modeling takes inspiration from topic modeling techniques to cluster images, find common concepts and model them both visually using images and textually using topic representations.

Installation

Installation, with sentence-transformers, can be done using pypi:

pip install concept

Quick Start

First, we need to download and extract 25.000 images from Unsplash used in the sentence-transformers example:

import os
import glob
import zipfile
from tqdm import tqdm
from sentence_transformers import util

# 25k images from Unsplash
img_folder = 'photos/'
if not os.path.exists(img_folder) or len(os.listdir(img_folder)) == 0:
    os.makedirs(img_folder, exist_ok=True)
    
    photo_filename = 'unsplash-25k-photos.zip'
    if not os.path.exists(photo_filename):   #Download dataset if does not exist
        util.http_get('http://sbert.net/datasets/'+photo_filename, photo_filename)
        
    #Extract all images
    with zipfile.ZipFile(photo_filename, 'r') as zf:
        for member in tqdm(zf.infolist(), desc='Extracting'):
            zf.extract(member, img_folder)
img_names = list(glob.glob('photos/*.jpg'))

Next, we only need to pass images to Concept:

from concept import ConceptModel
concept_model = ConceptModel()
concepts = concept_model.fit_transform(img_names)

The resulting concepts can be visualized through concept_model.visualize_concepts():

However, to get the full experience, we need to label the concept clusters with topics. To do this, we need to create a vocabulary. We are going to feed our model with 50.000 nouns from the English vocabulary:

import random
import nltk
nltk.download("wordnet")
from nltk.corpus import wordnet as wn

all_nouns = [word for synset in wn.all_synsets('n') for word in synset.lemma_names() if "_" not in word]
selected_nouns = random.sample(all_nouns, 50_000)

Then, we can pass in the resulting selected_nouns to Concept:

from concept import ConceptModel

concept_model = ConceptModel()
concepts = concept_model.fit_transform(img_names, docs=selected_nouns)

Again, the resulting concepts can be visualized. This time however, we can also see the generated topics through concept_model.visualize_concepts():

NOTE: Use Concept(embedding_model="clip-ViT-B-32-multilingual-v1") to select a model that supports 50+ languages.

Search Concepts

We can quickly search for specific concepts by embedding a search term and finding the cluster embeddings that best represent them. As an example, let us search for the term beach and see what we can find. To do this, we simply run the following:

>>> concept_model.find_concepts("beach")
[(100, 0.277577825349102),
 (53, 0.27431058773894657),
 (95, 0.25973751319723837),
 (77, 0.2560122597417548),
 (97, 0.25361988261846297)]

Each tuple contains two values, the first is the concept cluster and the second the similarity to the search term. The top 5 similar topics are returned.

Now, let us visualize those concepts to see how well the search function works:

concept_model.visualize_concepts(concepts=[100, 53, 95, 77, 97])

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

MaartenGr / Concept

Programming Languages

Labels

Projects that are alternatives of or similar to Concept

Concept

Installation

Quick Start

Search Concepts