Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → GT-SALT → Mixtext

GT-SALT / Mixtext

Licence: mit

MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification

Labels

jupyter-notebook machine-learning natural-language-processing interpolation

Projects that are alternatives of or similar to Mixtext

Deep Nlp Seminars

Materials for deep NLP course

Stars: ✭ 113 (-28.93%)

Mutual labels: jupyter-notebook, natural-language-processing

Aws Machine Learning University Accelerated Nlp

Machine Learning University: Accelerated Natural Language Processing Class

Stars: ✭ 1,695 (+966.04%)

Mutual labels: jupyter-notebook, natural-language-processing

NLP and Text Generation Experiments in TensorFlow 2.x / 1.x

Stars: ✭ 1,487 (+835.22%)

Mutual labels: jupyter-notebook, natural-language-processing

Datasets, tools, and benchmarks for representation learning of code.

Stars: ✭ 1,378 (+766.67%)

Mutual labels: jupyter-notebook, natural-language-processing

Multihead Siamese Nets

Implementation of Siamese Neural Networks built upon multihead attention mechanism for text semantic similarity task.

Stars: ✭ 144 (-9.43%)

Mutual labels: jupyter-notebook, natural-language-processing

Code and data accompanying Natural Language Processing with PyTorch published by O'Reilly Media https://nlproc.info

Stars: ✭ 1,390 (+774.21%)

Mutual labels: jupyter-notebook, natural-language-processing

Python implementation of TextRank for phrase extraction and summarization of text documents

Stars: ✭ 1,675 (+953.46%)

Mutual labels: jupyter-notebook, natural-language-processing

Natural Language Processing Tutorial for Deep Learning Researchers

Stars: ✭ 9,895 (+6123.27%)

Mutual labels: jupyter-notebook, natural-language-processing

Data augmentation for NLP

Stars: ✭ 2,761 (+1636.48%)

Mutual labels: jupyter-notebook, natural-language-processing

Practical Machine Learning With Python

Master the essential skills needed to recognize and solve complex real-world problems with Machine Learning and Deep Learning by leveraging the highly popular Python Machine Learning Eco-system.

Stars: ✭ 1,868 (+1074.84%)

Mutual labels: jupyter-notebook, natural-language-processing

Pytorch Pos Tagging

A tutorial on how to implement models for part-of-speech tagging using PyTorch and TorchText.

Stars: ✭ 96 (-39.62%)

Mutual labels: jupyter-notebook, natural-language-processing

Pytorch Question Answering

Important paper implementations for Question Answering using PyTorch

Stars: ✭ 154 (-3.14%)

Mutual labels: jupyter-notebook, natural-language-processing

Spark Nlp Models

Models and Pipelines for the Spark NLP library

Stars: ✭ 88 (-44.65%)

Mutual labels: jupyter-notebook, natural-language-processing

Awesome Embedding Models

A curated list of awesome embedding models tutorials, projects and communities.

Stars: ✭ 1,486 (+834.59%)

Mutual labels: jupyter-notebook, natural-language-processing

Turkish Bert Nlp Pipeline

Bert-base NLP pipeline for Turkish, Ner, Sentiment Analysis, Question Answering etc.

Stars: ✭ 85 (-46.54%)

Mutual labels: jupyter-notebook, natural-language-processing

General Assembly's 2015 Data Science course in Washington, DC

Stars: ✭ 1,516 (+853.46%)

Mutual labels: jupyter-notebook, natural-language-processing

A list of NLP(Natural Language Processing) tutorials

Stars: ✭ 1,188 (+647.17%)

Mutual labels: jupyter-notebook, natural-language-processing

Course Computational Literary Analysis

Course materials for Introduction to Computational Literary Analysis, taught at UC Berkeley in Summer 2018, 2019, and 2020, and at Columbia University in Fall 2020.

Stars: ✭ 74 (-53.46%)

Mutual labels: jupyter-notebook, natural-language-processing

100 Days Of Nlp

Stars: ✭ 125 (-21.38%)

Mutual labels: jupyter-notebook, natural-language-processing

Natural Language Processing Specialization

This repo contains my coursework, assignments, and Slides for Natural Language Processing Specialization by deeplearning.ai on Coursera

Stars: ✭ 151 (-5.03%)

Mutual labels: jupyter-notebook, natural-language-processing

View All Similar Projects ➔

MixText

This repo contains codes for the following paper:

Jiaao Chen, Zichao Yang, Diyi Yang: MixText: Linguistically-Informed Interpolation of Hidden Space for Semi-Supervised Text Classification. In Proceedings of the 58th Annual Meeting of the Association of Computational Linguistics (ACL'2020)

If you would like to refer to it, please cite the paper mentioned above.

Getting Started

These instructions will get you running the codes of MixText.

Requirements

Python 3.6 or higher
Pytorch >= 1.3.0
Pytorch_transformers (also known as transformers)
Pandas, Numpy, Pickle
Fairseq

Code Structure

|__ data/
        |__ yahoo_answers_csv/ --> Datasets for Yahoo Answers
            |__ back_translate.ipynb --> Jupyter Notebook for back translating the dataset
            |__ classes.txt --> Classes for Yahoo Answers dataset
            |__ train.csv --> Original training dataset
            |__ test.csv --> Original testing dataset
            |__ de_1.pkl --> Back translated training dataset with German as middle language
            |__ ru_1.pkl --> Back translated training dataset with Russian as middle language

|__code/
        |__ transformers/ --> Codes copied from huggingface/transformers
        |__ read_data.py --> Codes for reading the dataset; forming labeled training set, unlabeled training set, development set and testing set; building dataloaders
        |__ normal_bert.py --> Codes for BERT baseline model
        |__ normal_train.py --> Codes for training BERT baseline model
        |__ mixtext.py --> Codes for our proposed TMix/MixText model
        |__ train.py --> Codes for training/testing TMix/MixText

Downloading the data

Please download the dataset and put them in the data folder. You can find Yahoo Answers, AG News, DB Pedia here, IMDB here.

Pre-processing the data

For Yahoo Answer, We concatenate the question title, question content and best answer together to form the text to be classified. The pre-processed Yahoo Answer dataset can be downloaded here.

Note that for AG News and DB Pedia, we only utilize the content (without titles) to do the classifications, and for IMDB we do not perform any pre-processing.

We utilize Fairseq to perform back translation on the training dataset. Please refer to ./data/yahoo_answers_csv/back_translate.ipynb for details.

Here, we have put two examples of back translated data, de_1.pkl and ru_1.pkl, in ./data/yahoo_answers_csv/ as well. You can directly use them for Yahoo Answers or generate your own back translated data followed the ./data/yahoo_answers_csv/back_translate.ipynb.

Training models

These section contains instructions for training models on Yahoo Answers using 10 labeled data per class for training.

Training BERT baseline model

Please run ./code/normal_train.py to train the BERT baseline model (only use labeled training data):

python ./code/normal_train.py --gpu 0,1 --n-labeled 10 --data-path ./data/yahoo_answers_csv/ \
--batch-size 8 --epochs 20

Training TMix model

Please run ./code/train.py to train the TMix model (only use labeled training data):

python ./code/train.py --gpu 0,1 --n-labeled 10 --data-path ./data/yahoo_answers_csv/ \
--batch-size 8 --batch-size-u 1 --epochs 50 --val-iteration 20 \
--lambda-u 0 --T 0.5 --alpha 16 --mix-layers-set 7 9 12 --separate-mix True

Training MixText model

Please run ./code/train.py to train the MixText model (use both labeled and unlabeled training data):

python ./code/train.py --gpu 0,1,2,3 --n-labeled 10 \
--data-path ./data/yahoo_answers_csv/ --batch-size 4 --batch-size-u 8 --epochs 20 --val-iteration 1000 \
--lambda-u 1 --T 0.5 --alpha 16 --mix-layers-set 7 9 12 \
--lrmain 0.000005 --lrlast 0.0005

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 159

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (1) 🔗