All Projects → mikajoh → stmprinter

mikajoh / stmprinter

Licence: Unknown, MIT licenses found Licenses found Unknown LICENSE MIT LICENSE.md
Print multiple stm model dashboards to a pdf file for inspection

Programming Languages

r
7636 projects

Projects that are alternatives of or similar to stmprinter

learning-stm
Learning structural topic modeling using the stm R package.
Stars: ✭ 103 (+202.94%)
Mutual labels:  stm, topic-modeling
Learning Social Media Analytics With R
This repository contains code and bonus content which will be added from time to time for the book "Learning Social Media Analytics with R" by Packt
Stars: ✭ 102 (+200%)
Mutual labels:  ggplot2, topic-modeling
Topic-Modeling-Workshop-with-R
A workshop on analyzing topic modeling (LDA, CTM, STM) using R
Stars: ✭ 51 (+50%)
Mutual labels:  stm, topic-modeling
compbench
⌛ Benchmark and visualization of various compression algorithms
Stars: ✭ 21 (-38.24%)
Mutual labels:  ggplot2
PlotsOfData
Shiny App for comparison of samples
Stars: ✭ 47 (+38.24%)
Mutual labels:  ggplot2
r-whatsapp-analysis-parte1
Análisis de texto y visualización de datos con R, de conversaciones de WhatsApp, primer parte. Uso de librería rwhatsapp.
Stars: ✭ 22 (-35.29%)
Mutual labels:  ggplot2
contextualLSTM
Contextual LSTM for NLP tasks like word prediction and word embedding creation for Deep Learning
Stars: ✭ 28 (-17.65%)
Mutual labels:  topic-modeling
grcdr
A collection of ggplot2 extensions and scripts for graphics in R
Stars: ✭ 14 (-58.82%)
Mutual labels:  ggplot2
ml
machine learning
Stars: ✭ 29 (-14.71%)
Mutual labels:  topic-modeling
gganonymize
Anonymize the labels and text in a ggplot2
Stars: ✭ 42 (+23.53%)
Mutual labels:  ggplot2
hf-experiments
Experiments with Hugging Face 🔬 🤗
Stars: ✭ 37 (+8.82%)
Mutual labels:  topic-modeling
bnp
Bayesian nonparametric models for python
Stars: ✭ 17 (-50%)
Mutual labels:  topic-modeling
PlotTwist
PlotTwist - a web app for plotting and annotating time-series data
Stars: ✭ 21 (-38.24%)
Mutual labels:  ggplot2
teanaps
자연어 처리와 텍스트 분석을 위한 오픈소스 파이썬 라이브러리 입니다.
Stars: ✭ 91 (+167.65%)
Mutual labels:  topic-modeling
ggQC
ggQC
Stars: ✭ 43 (+26.47%)
Mutual labels:  ggplot2
Concept
Concept Modeling: Topic Modeling on Images and Text
Stars: ✭ 119 (+250%)
Mutual labels:  topic-modeling
gameofthrones
🎨 Game of Thrones inspired palette for R
Stars: ✭ 69 (+102.94%)
Mutual labels:  ggplot2
Ask2Transformers
A Framework for Textual Entailment based Zero Shot text classification
Stars: ✭ 102 (+200%)
Mutual labels:  topic-modeling
LinkedIn Scraper
🙋 A Selenium based automated program that scrapes profiles data,stores in CSV,follows them and saves their profile in PDF.
Stars: ✭ 25 (-26.47%)
Mutual labels:  topic-modeling
value-investing-studies
Data Analysis Studies on Value Investing
Stars: ✭ 66 (+94.12%)
Mutual labels:  ggplot2

Travis-CI Build Status AppVeyor Build Status MIT licensed

stmprinter: Print multiple stm model dashboards to a pdf file for inspection

Estimate multiple stm models and print a dashboard for each run in separate pdf pages for inspection. These function are designed for working with 15 or less number of topics (such as with survey data) and can be particularly useful when it is difficult to find a qualitiative good model on the first run.

The package includes two main functions:

Function Explanation
many_models() Runs stm::selectModel() for all provided K number of topics (in parallel). Unlike stm::manyTopics it keeps all runs kept by stm::selectModel() for K number of topics.
print_models() Prints all runs produced by either many_models() or stm::manyTopics() into a pdf file. The file makes it easy to look through several runs for several number of topics manually. Does not work well if you have more than 15 topics. An example is shown below.

Example 1 Example 2

Installation

You can install stmprinter from github with:

# install.packages("devtools")
devtools::install_github("mikajoh/stmprinter")

Example

Here is an example with the gadarian data that is included with the stm package.

First let’s prep the data as usual with stm::textProcessor() and stm::prepDocuments().

library(stm)
#> stm v1.3.3 (2018-1-26) successfully loaded. See ?stm for help. 
#>  Papers, resources, and other materials at structuraltopicmodel.com
library(stmprinter)

processed <- textProcessor(
  documents = gadarian$open.ended.response,
  metadata = gadarian
)
#> Building corpus... 
#> Converting to Lower Case... 
#> Removing punctuation... 
#> Removing stopwords... 
#> Removing numbers... 
#> Stemming... 
#> Creating Output...

out <- prepDocuments(
  documents = processed$documents,
  vocab = processed$vocab,
  meta = processed$meta
)
#> Removing 640 of 1102 terms (640 of 3789 tokens) due to frequency 
#> Your corpus now has 341 documents, 462 terms and 3149 tokens.

We can then run the many_models() function included in this package for several K topics. It runs stm::selectModel() for several K topics (in parallel) and returns a list with the output. This is convenient if you wish to estimate several models, but unlike with stm::manyTopics() (which only keeps one model per K number of topics), you wish to keep several runs per K number of topic. Note though that the print_models() function is also compatiable with output from manyTopics().

many_model() takes the same arguments as stm::selectModel() with the exception for K and cores. Here, K should be vector representing all the desired number of topics to run for. The cores argument lets you choose how many cores to use (defaults to the amount of cores available on the machine).

With our gadarian example, we could run the following to estimate stm models for 3 to 13 number of topics.

set.seed(2018)

stm_models <- many_models(
  K = 3:12,
  documents = out$documents,
  vocab= out$vocab,
  prevalence = ~ treatment + s(pid_rep), 
  data = out$meta,
  N = 4,
  runs = 100
)

You can then print all N runs for each of the provided K topics using print_models() with following code.

Here, stm_models must either be the output from many_model() or stm::manyTopics(). The second argument is the texts to use for printing the most represantative text (see ?stm::findThoughts()). You can also provide the file name (file) and title at the top of the first page (title).

print_models(
  stm_models, gadarian$open.ended.response,
  file = "gadarian_stm_runs.pdf",
  title = "gadarian project"
)

An example of the output is shown below

Note that the text argument is the full text responses, but corresponding to the documents in out$documents (see ?stm::findThoughts). If documents is removed during stm::textProcessor or stm::prepDocuments, you will need to remove the same texts from the original. You can typically do that with the following code.

text <- gadarian$open.ended.response[-c(as.integer(processed$docs.removed))][-c(as.integer(out$docs.removed))]

Pull requests, questions, suggestions, etc., are welcome!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].