koheiw / seededlda

Licence: other

Semisupervided LDA for theory-driven text analysis

Programming Languages

7636 projects

C++

36643 projects - #6 most used programming language

Projects that are alternatives of or similar to seededlda

Bible text gcn

Pytorch implementation of "Graph Convolutional Networks for Text Classification"

Stars: ✭ 90 (+95.65%)

Mutual labels: text-classification, semi-supervised-learning

ganbert-pytorch

Enhancing the BERT training with Semi-supervised Generative Adversarial Networks in Pytorch/HuggingFace

Stars: ✭ 60 (+30.43%)

Mutual labels: text-classification, semi-supervised-learning

character-level-cnn

Keras implementation of Character-level CNN for Text Classification

Stars: ✭ 56 (+21.74%)

Mutual labels: text-classification

backprop

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

Stars: ✭ 229 (+397.83%)

Mutual labels: text-classification

X-Transformer

X-Transformer: Taming Pretrained Transformers for eXtreme Multi-label Text Classification

Stars: ✭ 127 (+176.09%)

Mutual labels: text-classification

sesemi

supervised and semi-supervised image classification with self-supervision (Keras)

Stars: ✭ 43 (-6.52%)

Mutual labels: semi-supervised-learning

Graph-Based-TC

Graph-based framework for text classification

Stars: ✭ 24 (-47.83%)

Mutual labels: text-classification

RE2RNN

Source code for the EMNLP 2020 paper "Cold-Start and Interpretability: Turning Regular Expressions intoTrainable Recurrent Neural Networks"

Stars: ✭ 96 (+108.7%)

Mutual labels: text-classification

deepOF

TensorFlow implementation for "Guided Optical Flow Learning"

Stars: ✭ 26 (-43.48%)

Mutual labels: semi-supervised-learning

Ask2Transformers

A Framework for Textual Entailment based Zero Shot text classification

Stars: ✭ 102 (+121.74%)

Mutual labels: text-classification

ganbert

Enhancing the BERT training with Semi-supervised Generative Adversarial Networks

Stars: ✭ 205 (+345.65%)

Mutual labels: semi-supervised-learning

rnn-text-classification-tf

Tensorflow implementation of Attention-based Bidirectional RNN text classification.

Stars: ✭ 26 (-43.48%)

Mutual labels: text-classification

Nepali-News-Classifier

Text Classification of Nepali Language Document. This Mini Project was done for the partial fulfillment of NLP Course : COMP 473.

Stars: ✭ 13 (-71.74%)

Mutual labels: text-classification

Cross-Speaker-Emotion-Transfer

PyTorch Implementation of ByteDance's Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech

Stars: ✭ 107 (+132.61%)

Mutual labels: semi-supervised-learning

pyroVED

Invariant representation learning from imaging and spectral data

Stars: ✭ 23 (-50%)

Mutual labels: semi-supervised-learning

text-classification-transformers

Easy text classification for everyone : Bert based models via Huggingface transformers (KR / EN)

Stars: ✭ 32 (-30.43%)

Mutual labels: text-classification

GPQ

Generalized Product Quantization Network For Semi-supervised Image Retrieval - CVPR 2020

Stars: ✭ 60 (+30.43%)

Mutual labels: semi-supervised-learning

Billion-scale-semi-supervised-learning

Implementing Billion-scale semi-supervised learning for image classification using Pytorch

Stars: ✭ 81 (+76.09%)

Mutual labels: semi-supervised-learning

watson-document-classifier

Augment IBM Watson Natural Language Understanding APIs with a configurable mechanism for text classification, uses Watson Studio.

Stars: ✭ 41 (-10.87%)

Mutual labels: text-classification

generative models

Pytorch implementations of generative models: VQVAE2, AIR, DRAW, InfoGAN, DCGAN, SSVAE

Stars: ✭ 82 (+78.26%)

Mutual labels: semi-supervised-learning

View All Similar Projects ➔

Seeded-LDA for semisupervised topic modeling

seededlda is an R package that implements the seeded-LDA for semisupervised topic modeling using quanteda. The seeded-LDA model was proposed by Lu et al. (2010). Until version 0.3, that packages has been a simple wrapper around the topicmodels package, but the LDA estimator is newly implemented in C++ using the GibbsLDA++ library to be submitted to CRAN in August 2020. The author believes this package implements the seeded-LDA model more closely to the original proposal.

Please see Theory-Driven Analysis of Large Corpora: Semisupervised Topic Classification of the UN Speeches for the overview of semisupervised topic classification techniques and their advantages in social science research.

keyATM is the latest addition to the semisupervised topic models. The users of seeded-LDA are also encouraged to use that package.

Install

install.packages("devtools")
devtools::install_github("koheiw/seededlda")

Example

The corpus and seed words in this example are from Conspiracist propaganda: How Russia promotes anti-establishment sentiment online?.

require(quanteda)
require(seededlda)

Users of seeded-LDA must provided a small dictionary of keywords (seed words) to define the desired topics.

dict <- dictionary(file = "tests/data/topics.yml")
print(dict)
## Dictionary object with 5 key entries.
## - [economy]:
##   - market*, money, bank*, stock*, bond*, industry, company, shop*
## - [politics]:
##   - parliament*, congress*, white house, party leader*, party member*, voter*, lawmaker*, politician*
## - [society]:
##   - police, prison*, school*, hospital*
## - [diplomacy]:
##   - ambassador*, diplomat*, embassy, treaty
## - [military]:
##   - military, soldier*, terrorist*, air force, marine, navy, army

corp <- readRDS("tests/data/data_corpus_sputnik.RDS")
toks <- tokens(corp, remove_punct = TRUE, remove_symbols = TRUE, remove_number = TRUE) %>%
        tokens_select(min_nchar = 2) %>% 
        tokens_compound(dict) # for multi-word expressions
dfmt <- dfm(toks) %>% 
    dfm_remove(stopwords('en')) %>% 
    dfm_trim(min_termfreq = 0.90, termfreq_type = "quantile", 
             max_docfreq = 0.2, docfreq_type = "prop")

Many of the top terms of the seeded-LDA are seed words but related topic words are also identified. The result includes “other” as a junk topic because residual = TRUE.

set.seed(1234)
slda <- textmodel_seededlda(dfmt, dict, residual = TRUE)
print(terms(slda, 20))
##       economy     politics        society           diplomacy   
##  [1,] "company"   "parliament"    "police"          "diplomatic"
##  [2,] "money"     "congress"      "school"          "embassy"   
##  [3,] "market"    "white_house"   "hospital"        "ambassador"
##  [4,] "bank"      "politicians"   "prison"          "treaty"    
##  [5,] "industry"  "parliamentary" "schools"         "diplomat"  
##  [6,] "banks"     "lawmakers"     "pic.twitter.com" "diplomats" 
##  [7,] "markets"   "voters"        "media"           "like"      
##  [8,] "banking"   "lawmaker"      "reported"        "just"      
##  [9,] "stock"     "politician"    "local"           "now"       
## [10,] "stockholm" "minister"      "information"     "think"     
## [11,] "china"     "european"      "video"           "even"      
## [12,] "percent"   "sanctions"     "public"          "trump"     
## [13,] "chinese"   "eu"            "social"          "going"     
## [14,] "economic"  "political"     "court"           "made"      
## [15,] "india"     "party"         "women"           "years"     
## [16,] "year"      "foreign"       "man"             "way"       
## [17,] "oil"       "prime"         "report"          "say"       
## [18,] "project"   "union"         "found"           "want"      
## [19,] "billion"   "moscow"        "investigation"   "many"      
## [20,] "million"   "trump"         "department"      "really"    
##       military        other      
##  [1,] "army"          "north"    
##  [2,] "terrorist"     "nuclear"  
##  [3,] "navy"          "korea"    
##  [4,] "terrorists"    "south"    
##  [5,] "air_force"     "iran"     
##  [6,] "soldiers"      "trump"    
##  [7,] "marine"        "korean"   
##  [8,] "soldier"       "world"    
##  [9,] "defense"       "israel"   
## [10,] "syria"         "deal"     
## [11,] "syrian"        "saudi"    
## [12,] "forces"        "kim"      
## [13,] "security"      "show"     
## [14,] "nato"          "israeli"  
## [15,] "weapons"       "agreement"
## [16,] "daesh"         "program"  
## [17,] "turkish"       "cup"      
## [18,] "turkey"        "trump's"  
## [19,] "international" "japan"    
## [20,] "group"         "peace"

topic <- table(topics(slda))
print(topic)
## 
##   economy  politics   society diplomacy  military     other 
##       136       181       262       158       144       119

Examples

Please read the following papers for how to apply seeded-LDA in social science research:

Curini, Luigi and Vignoli, Valerio. 2021. Committed Moderates and Uncommitted Extremists: Ideological Leaning and Parties’ Narratives on Military Interventions in Italy, Foreign Policy Analysis.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

koheiw / seededlda

Programming Languages

Labels

Projects that are alternatives of or similar to seededlda

Seeded-LDA for semisupervised topic modeling

Install

Example

Examples