All Projects → makcedward → Nlpaug

makcedward / Nlpaug

Licence: mit
Data augmentation for NLP

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to Nlpaug

Imodels
Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).
Stars: ✭ 194 (-92.97%)
Mutual labels:  artificial-intelligence, ai, jupyter-notebook, data-science, ml
Awesome Ai Ml Dl
Awesome Artificial Intelligence, Machine Learning and Deep Learning as we learn it. Study notes and a curated list of awesome resources of such topics.
Stars: ✭ 831 (-69.9%)
Mutual labels:  artificial-intelligence, ai, jupyter-notebook, natural-language-processing, ml
Fixy
Amacımız Türkçe NLP literatüründeki birçok farklı sorunu bir arada çözebilen, eşsiz yaklaşımlar öne süren ve literatürdeki çalışmaların eksiklerini gideren open source bir yazım destekleyicisi/denetleyicisi oluşturmak. Kullanıcıların yazdıkları metinlerdeki yazım yanlışlarını derin öğrenme yaklaşımıyla çözüp aynı zamanda metinlerde anlamsal analizi de gerçekleştirerek bu bağlamda ortaya çıkan yanlışları da fark edip düzeltebilmek.
Stars: ✭ 165 (-94.02%)
Mutual labels:  artificial-intelligence, ai, jupyter-notebook, data-science, natural-language-processing
Ml
A high-level machine learning and deep learning library for the PHP language.
Stars: ✭ 1,270 (-54%)
Mutual labels:  artificial-intelligence, ai, data-science, natural-language-processing
Pycm
Multi-class confusion matrix library in Python
Stars: ✭ 1,076 (-61.03%)
Mutual labels:  artificial-intelligence, ai, data-science, ml
Atlas
An Open Source, Self-Hosted Platform For Applied Deep Learning Development
Stars: ✭ 259 (-90.62%)
Mutual labels:  artificial-intelligence, ai, data-science, ml
Data Science
Collection of useful data science topics along with code and articles
Stars: ✭ 315 (-88.59%)
Mutual labels:  artificial-intelligence, jupyter-notebook, data-science, natural-language-processing
Polyaxon
Machine Learning Platform for Kubernetes (MLOps tools for experimentation and automation)
Stars: ✭ 2,966 (+7.42%)
Mutual labels:  artificial-intelligence, ai, data-science, ml
Spacy
💫 Industrial-strength Natural Language Processing (NLP) in Python
Stars: ✭ 21,978 (+696.02%)
Mutual labels:  artificial-intelligence, ai, data-science, natural-language-processing
Hyperparameter hunter
Easy hyperparameter optimization and automatic result saving across machine learning algorithms and libraries
Stars: ✭ 648 (-76.53%)
Mutual labels:  artificial-intelligence, ai, data-science, ml
Pba
Efficient Learning of Augmentation Policy Schedules
Stars: ✭ 461 (-83.3%)
Mutual labels:  artificial-intelligence, jupyter-notebook, data-science, augmentation
Codesearchnet
Datasets, tools, and benchmarks for representation learning of code.
Stars: ✭ 1,378 (-50.09%)
Mutual labels:  jupyter-notebook, data-science, natural-language-processing, ml
Datasciencevm
Tools and Docs on the Azure Data Science Virtual Machine (http://aka.ms/dsvm)
Stars: ✭ 153 (-94.46%)
Mutual labels:  ai, jupyter-notebook, data-science, ml
Csinva.github.io
Slides, paper notes, class notes, blog posts, and research on ML 📉, statistics 📊, and AI 🤖.
Stars: ✭ 342 (-87.61%)
Mutual labels:  artificial-intelligence, ai, data-science, ml
Caer
High-performance Vision library in Python. Scale your research, not boilerplate.
Stars: ✭ 452 (-83.63%)
Mutual labels:  artificial-intelligence, ai, data-science, augmentation
Image classifier
CNN image classifier implemented in Keras Notebook 🖼️.
Stars: ✭ 139 (-94.97%)
Mutual labels:  artificial-intelligence, ai, jupyter-notebook, ml
Modelchimp
Experiment tracking for machine and deep learning projects
Stars: ✭ 121 (-95.62%)
Mutual labels:  artificial-intelligence, ai, data-science, ml
Mit Deep Learning
Tutorials, assignments, and competitions for MIT Deep Learning related courses.
Stars: ✭ 8,912 (+222.78%)
Mutual labels:  artificial-intelligence, jupyter-notebook, data-science
Caffe2
Caffe2 is a lightweight, modular, and scalable deep learning framework.
Stars: ✭ 8,409 (+204.56%)
Mutual labels:  artificial-intelligence, ai, ml
Computervision Recipes
Best Practices, code samples, and documentation for Computer Vision.
Stars: ✭ 8,214 (+197.5%)
Mutual labels:  artificial-intelligence, jupyter-notebook, data-science



Build Code Quality Downloads

nlpaug

This python library helps you with augmenting nlp for your machine learning projects. Visit this introduction to understand about Data Augmentation in NLP. Augmenter is the basic element of augmentation while Flow is a pipeline to orchestra multi augmenter together.

Features

  • Generate synthetic data for improving model performance without manual effort
  • Simple, easy-to-use and lightweight library. Augment data in 3 lines of code
  • Plug and play to any machine leanring/ neural network frameworks (e.g. scikit-learn, PyTorch, TensorFlow)
  • Support textual and audio input

Textual Data Augmentation Example


Acoustic Data Augmentation Example


Section Description
Quick Demo How to use this library
Augmenter Introduce all available augmentation methods
Installation How to install this library
Recent Changes Latest enhancement
Extension Reading More real life examples or researchs
Reference Reference of external resources such as data or model

Quick Demo

Augmenter

Augmenter Target Augmenter Action Description
Textual Character KeyboardAug substitute Simulate keyboard distance error
Textual OcrAug substitute Simulate OCR engine error
Textual RandomAug insert, substitute, swap, delete Apply augmentation randomly
Textual Word AntonymAug substitute Substitute opposite meaning word according to WordNet antonym
Textual ContextualWordEmbsAug insert, substitute Feeding surroundings word to BERT, DistilBERT, RoBERTa or XLNet language model to find out the most suitlabe word for augmentation
Textual RandomWordAug swap, crop, delete Apply augmentation randomly
Textual SpellingAug substitute Substitute word according to spelling mistake dictionary
Textual SplitAug split Split one word to two words randomly
Textual SynonymAug substitute Substitute similar word according to WordNet/ PPDB synonym
Textual TfIdfAug insert, substitute Use TF-IDF to find out how word should be augmented
Textual WordEmbsAug insert, substitute Leverage word2vec, GloVe or fasttext embeddings to apply augmentation
Textual BackTranslationAug substitute Leverage two translation models for augmentation
Textual ReservedAug substitute Replace reserved words
Textual Sentence ContextualWordEmbsForSentenceAug insert Insert sentence according to XLNet, GPT2 or DistilGPT2 prediction
Textual AbstSummAug substitute Summarize article by abstractive summarization method
Textual LambadaAug substitute Using language model to generate text and then using classification model to retain high quality results
Signal Audio CropAug delete Delete audio's segment
Signal LoudnessAug substitute Adjust audio's volume
Signal MaskAug substitute Mask audio's segment
Signal NoiseAug substitute Inject noise
Signal PitchAug substitute Adjust audio's pitch
Signal ShiftAug substitute Shift time dimension forward/ backward
Signal SpeedAug substitute Adjust audio's speed
Signal VtlpAug substitute Change vocal tract
Signal NormalizeAug substitute Normalize audio
Signal PolarityInverseAug substitute Swap positive and negative for audio
Signal Spectrogram FrequencyMaskingAug substitute Set block of values to zero according to frequency dimension
Signal TimeMaskingAug substitute Set block of values to zero according to time dimension
Signal LoudnessAug substitute Adjust volume

Flow

Augmenter Augmenter Description
Pipeline Sequential Apply list of augmentation functions sequentially
Pipeline Sometimes Apply some augmentation functions randomly

Installation

The library supports python 3.5+ in linux and window platform.

To install the library:

pip install numpy requests nlpaug

or install the latest version (include BETA features) from github directly

pip install numpy git+https://github.com/makcedward/nlpaug.git

or install over conda

conda install -c makcedward nlpaug

If you use BackTranslationAug, ContextualWordEmbsAug, ContextualWordEmbsForSentenceAug and AbstSummAug, installing the following dependencies as well

pip install torch>=1.6.0 transformers>=4.11.3 sentencepiece

If you use LambadaAug, installing the following dependencies as well

pip install simpletransformers>=0.61.10

If you use AntonymAug, SynonymAug, installing the following dependencies as well

pip install nltk>=3.4.5

If you use WordEmbsAug (word2vec, glove or fasttext), downloading pre-trained model first and installing the following dependencies as well

from nlpaug.util.file.download import DownloadUtil
DownloadUtil.download_word2vec(dest_dir='.') # Download word2vec model
DownloadUtil.download_glove(model_name='glove.6B', dest_dir='.') # Download GloVe model
DownloadUtil.download_fasttext(model_name='wiki-news-300d-1M', dest_dir='.') # Download fasttext model

pip install gensim>=4.1.2

If you use SynonymAug (PPDB), downloading file from the following URI. You may not able to run the augmenter if you get PPDB file from other website

http://paraphrase.org/#/download

If you use PitchAug, SpeedAug and VtlpAug, installing the following dependencies as well

pip install librosa>=0.7.1 matplotlib

Recent Changes

1.1.9 Dec 1, 2021

See changelog for more details.

Extension Reading

Reference

This library uses data (e.g. capturing from internet), research (e.g. following augmenter idea), model (e.g. using pre-trained model) See data source for more details.

Citation

@misc{ma2019nlpaug,
  title={NLP Augmentation},
  author={Edward Ma},
  howpublished={https://github.com/makcedward/nlpaug},
  year={2019}
}

This package is cited by many books, workshop and academic research papers (70+). Here are some of examples and you may visit here to get the full list.

Workshops cited nlpaug

Book cited nlpaug

Research paper cited nlpaug

Contributions


sakares saengkaew


Binoy Dalal

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].