A sparsity aware implementation of "Alternating Direction Method of Multipliers for Non-Negative Matrix Factorization with the Beta-Divergence" (ICASSP 2014).

Stars: ✭ 39 (+129.41%)

Mutual labels: topic-modeling, unsupervised-machine-learning

abae-pytorch

PyTorch implementation of 'An Unsupervised Neural Attention Model for Aspect Extraction' by He et al. ACL2017'

Stars: ✭ 52 (+205.88%)

Mutual labels: topic-modeling, unsupervised-machine-learning

Learning Social Media Analytics With R

This repository contains code and bonus content which will be added from time to time for the book "Learning Social Media Analytics with R" by Packt

Stars: ✭ 102 (+500%)

Mutual labels: topic-modeling

Lda2vec Pytorch

Topic modeling with word vectors

Stars: ✭ 108 (+535.29%)

Mutual labels: topic-modeling

Familia

A Toolkit for Industrial Topic Modeling

Stars: ✭ 2,499 (+14600%)

Mutual labels: topic-modeling

FUTURE

A private, free, open-source search engine built on a P2P network

Stars: ✭ 19 (+11.76%)

Mutual labels: natural-language-understanding

Lda Topic Modeling

A PureScript, browser-based implementation of LDA topic modeling.

Stars: ✭ 91 (+435.29%)

Mutual labels: topic-modeling

GLUE-bert4keras

基于bert4keras的GLUE基准代码

Stars: ✭ 59 (+247.06%)

Mutual labels: natural-language-understanding

Gensim

Topic Modelling for Humans

Stars: ✭ 12,763 (+74976.47%)

Mutual labels: topic-modeling

Palmetto

Palmetto is a quality measuring tool for topics

Stars: ✭ 144 (+747.06%)

Mutual labels: topic-modeling

Scattertext

Beautiful visualizations of how language differs among document types.

Stars: ✭ 1,722 (+10029.41%)

Mutual labels: topic-modeling

Tomotopy

Python package of Tomoto, the Topic Modeling Tool

Stars: ✭ 213 (+1152.94%)

Mutual labels: topic-modeling

Numpy Ml

Machine learning, in numpy

Stars: ✭ 11,100 (+65194.12%)

Mutual labels: topic-modeling

COCO-LM

[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining

Stars: ✭ 109 (+541.18%)

Mutual labels: natural-language-understanding

Sttm

Short Text Topic Modeling, JAVA

Stars: ✭ 100 (+488.24%)

Mutual labels: topic-modeling

Chinese keyphrase extractor

An off-the-shelf tool for Chinese Keyphrase Extraction 一个快速从中文里抽取关键短语的工具，仅占35M内存

Stars: ✭ 237 (+1294.12%)

Mutual labels: topic-modeling

Kate

Code & data accompanying the KDD 2017 paper "KATE: K-Competitive Autoencoder for Text"

Stars: ✭ 135 (+694.12%)

Mutual labels: topic-modeling

Tmtoolkit

Text Mining and Topic Modeling Toolkit for Python with parallel processing power

Stars: ✭ 135 (+694.12%)

Mutual labels: topic-modeling

Lftm

Improving topic models LDA and DMM (one-topic-per-document model for short texts) with word embeddings (TACL 2015)

Stars: ✭ 168 (+888.24%)

Mutual labels: topic-modeling

View All Similar Projects ➔

auto-gfqg

This Automatic Gap-Fill Question Generation system creates multiple choice, fill-in-the-blank questions from text corpora. Textbooks, factoid archives, news articles, reports, lecture notes, legal proceedings -- the minimum viable input is a small to moderate sized collection of coherent, well-formed english.

This work is a proof-of-concept reimplementation of the ideas behind RevUp. The ideas implemented here are largely the same as those in the paper. There are two notable differences. First, we the use a biterm topic model instead of the deep autoencoder topic model. Second, we use topic-weighted word vectors to perform the gap-phrase selection. In contrast, RevUp uses a supervised model trained on human judegements via Mechanical Turk.

Setup

This project uses sbt for build management. If you're unfamiliar with sbt, see the last section for some pointers.

Build

To download all dependencies and compile code, run sbt compile.

Test

To run all tests, execute sbt test.

Command Line Applications

To produce bash scripts that will execute each individual command-line application within this codebase, execute sbt pack. The output bash scripts will be located under target/pack/bin/: their names correspond to filenames for executable Scala programs within the project.

How to use `sbt`

When using sbt, it is best to start it in the "interactive shell mode". To do this, simply execute from the command line:

$ sbt

After starting up (give it a few seconds), you can execute the following commands:

compile // compiles code
pack // creates executable scripts
test // runs tests
coverage / initializes the code-coverage system, use right before 'test'
reload // re-loads the sbt build definition, including plugin definitions
update // grabs all dependencies

There are a lot more commands for sbt. And a ton of community plugins that extend sbt's functionality.

Final results

The conclusions, results, and future work file summarizes thoughts and findings of this proof-of-concept (poc). Importantly, if you are interested in viewing the generated gap-fill questions and distractors, read this page.

Overview of Information Flow

This gap-fill question generation system consists of a series of different programs and data resources. It is hacked-togeher research code that, in its current form, is unsuitable for production work. It does, however, demonstrate a question generation system from end-to-end.

Before attempting to run and programs here, please read through the documentation and ensure that your machine has the necessary pre-reqs.

The following numbered list roughly describes the system's sequential operation:

Use NLP tools to pre-process text. Includes sentence splitting, tokenization, and word stemming over all corpus text. See NLP process with CoreNLP for more.
Use word2vec to create word vectors over a larger, different corpus of text. See create word vectors for more.
Use biterm topic modelling (BTM) to discover latent topics that are expressed on a per-sentence basis within the corpus. See train BTM for more.
Use the learned BTM word-topic conditional probabilites and intuitive heuristics to score all sentences from the corpus. Then, threshold and eliminate low-scoring sentences, creating gap-fill question candidates. See score and generate gap fill question candidates for more.
For each candidate sentence, choose a gap word. Removing the gap word from the sentence creates the fill-in-the-blank question (i.e. the gap word is the correct answer). Additionally, discover appropriate distractors for the chosen gap word. Distractors are semantically related, but ultimately different from the gap phrase (i.e. these are the incorrect answers). See finding gap words and distractors for more.

All of the Scala programs have built-in help support. Invoke them with "-h" or "--help" to see information about how to use each program.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

malcolmgreaves / auto-gfqg

Programming Languages

Labels

Projects that are alternatives of or similar to auto-gfqg

auto-gfqg

Setup

Build

Test

Command Line Applications

How to use `sbt`

Final results

Overview of Information Flow

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

malcolmgreaves / auto-gfqg

Programming Languages

Labels

Projects that are alternatives of or similar to auto-gfqg

auto-gfqg

Setup

Build

Test

Command Line Applications

How to use sbt

Final results

Overview of Information Flow

How to use `sbt`