All Projects → malcolmgreaves → auto-gfqg

malcolmgreaves / auto-gfqg

Licence: Apache-2.0 license
Automatic Gap-Fill Question Generation

Programming Languages

Roff
2310 projects
scala
5932 projects
XSLT
1337 projects
java
68154 projects - #9 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to auto-gfqg

Opencog
A framework for integrated Artificial Intelligence & Artificial General Intelligence (AGI)
Stars: ✭ 2,132 (+12441.18%)
Mutual labels:  natural-language-understanding, unsupervised-machine-learning
NMFADMM
A sparsity aware implementation of "Alternating Direction Method of Multipliers for Non-Negative Matrix Factorization with the Beta-Divergence" (ICASSP 2014).
Stars: ✭ 39 (+129.41%)
Mutual labels:  topic-modeling, unsupervised-machine-learning
abae-pytorch
PyTorch implementation of 'An Unsupervised Neural Attention Model for Aspect Extraction' by He et al. ACL2017'
Stars: ✭ 52 (+205.88%)
Mutual labels:  topic-modeling, unsupervised-machine-learning
Learning Social Media Analytics With R
This repository contains code and bonus content which will be added from time to time for the book "Learning Social Media Analytics with R" by Packt
Stars: ✭ 102 (+500%)
Mutual labels:  topic-modeling
Lda2vec Pytorch
Topic modeling with word vectors
Stars: ✭ 108 (+535.29%)
Mutual labels:  topic-modeling
Familia
A Toolkit for Industrial Topic Modeling
Stars: ✭ 2,499 (+14600%)
Mutual labels:  topic-modeling
FUTURE
A private, free, open-source search engine built on a P2P network
Stars: ✭ 19 (+11.76%)
Mutual labels:  natural-language-understanding
Lda Topic Modeling
A PureScript, browser-based implementation of LDA topic modeling.
Stars: ✭ 91 (+435.29%)
Mutual labels:  topic-modeling
GLUE-bert4keras
基于bert4keras的GLUE基准代码
Stars: ✭ 59 (+247.06%)
Mutual labels:  natural-language-understanding
Gensim
Topic Modelling for Humans
Stars: ✭ 12,763 (+74976.47%)
Mutual labels:  topic-modeling
Palmetto
Palmetto is a quality measuring tool for topics
Stars: ✭ 144 (+747.06%)
Mutual labels:  topic-modeling
Scattertext
Beautiful visualizations of how language differs among document types.
Stars: ✭ 1,722 (+10029.41%)
Mutual labels:  topic-modeling
Tomotopy
Python package of Tomoto, the Topic Modeling Tool
Stars: ✭ 213 (+1152.94%)
Mutual labels:  topic-modeling
Numpy Ml
Machine learning, in numpy
Stars: ✭ 11,100 (+65194.12%)
Mutual labels:  topic-modeling
COCO-LM
[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
Stars: ✭ 109 (+541.18%)
Mutual labels:  natural-language-understanding
Sttm
Short Text Topic Modeling, JAVA
Stars: ✭ 100 (+488.24%)
Mutual labels:  topic-modeling
Chinese keyphrase extractor
An off-the-shelf tool for Chinese Keyphrase Extraction 一个快速从中文里抽取关键短语的工具,仅占35M内存
Stars: ✭ 237 (+1294.12%)
Mutual labels:  topic-modeling
Kate
Code & data accompanying the KDD 2017 paper "KATE: K-Competitive Autoencoder for Text"
Stars: ✭ 135 (+694.12%)
Mutual labels:  topic-modeling
Tmtoolkit
Text Mining and Topic Modeling Toolkit for Python with parallel processing power
Stars: ✭ 135 (+694.12%)
Mutual labels:  topic-modeling
Lftm
Improving topic models LDA and DMM (one-topic-per-document model for short texts) with word embeddings (TACL 2015)
Stars: ✭ 168 (+888.24%)
Mutual labels:  topic-modeling

auto-gfqg

This Automatic Gap-Fill Question Generation system creates multiple choice, fill-in-the-blank questions from text corpora. Textbooks, factoid archives, news articles, reports, lecture notes, legal proceedings -- the minimum viable input is a small to moderate sized collection of coherent, well-formed english.

This work is a proof-of-concept reimplementation of the ideas behind RevUp. The ideas implemented here are largely the same as those in the paper. There are two notable differences. First, we the use a biterm topic model instead of the deep autoencoder topic model. Second, we use topic-weighted word vectors to perform the gap-phrase selection. In contrast, RevUp uses a supervised model trained on human judegements via Mechanical Turk.

Setup

This project uses sbt for build management. If you're unfamiliar with sbt, see the last section for some pointers.

Build

To download all dependencies and compile code, run sbt compile.

Test

To run all tests, execute sbt test.

Command Line Applications

To produce bash scripts that will execute each individual command-line application within this codebase, execute sbt pack. The output bash scripts will be located under target/pack/bin/: their names correspond to filenames for executable Scala programs within the project.

How to use sbt

When using sbt, it is best to start it in the "interactive shell mode". To do this, simply execute from the command line:

$ sbt

After starting up (give it a few seconds), you can execute the following commands:

compile // compiles code
pack // creates executable scripts
test // runs tests
coverage / initializes the code-coverage system, use right before 'test'
reload // re-loads the sbt build definition, including plugin definitions
update // grabs all dependencies

There are a lot more commands for sbt. And a ton of community plugins that extend sbt's functionality.

Final results

The conclusions, results, and future work file summarizes thoughts and findings of this proof-of-concept (poc). Importantly, if you are interested in viewing the generated gap-fill questions and distractors, read this page.

Overview of Information Flow

This gap-fill question generation system consists of a series of different programs and data resources. It is hacked-togeher research code that, in its current form, is unsuitable for production work. It does, however, demonstrate a question generation system from end-to-end.

Before attempting to run and programs here, please read through the documentation and ensure that your machine has the necessary pre-reqs.

The following numbered list roughly describes the system's sequential operation:

  1. Use NLP tools to pre-process text. Includes sentence splitting, tokenization, and word stemming over all corpus text. See NLP process with CoreNLP for more.

  2. Use word2vec to create word vectors over a larger, different corpus of text. See create word vectors for more.

  3. Use biterm topic modelling (BTM) to discover latent topics that are expressed on a per-sentence basis within the corpus. See train BTM for more.

  4. Use the learned BTM word-topic conditional probabilites and intuitive heuristics to score all sentences from the corpus. Then, threshold and eliminate low-scoring sentences, creating gap-fill question candidates. See score and generate gap fill question candidates for more.

  5. For each candidate sentence, choose a gap word. Removing the gap word from the sentence creates the fill-in-the-blank question (i.e. the gap word is the correct answer). Additionally, discover appropriate distractors for the chosen gap word. Distractors are semantically related, but ultimately different from the gap phrase (i.e. these are the incorrect answers). See finding gap words and distractors for more.

All of the Scala programs have built-in help support. Invoke them with "-h" or "--help" to see information about how to use each program.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].