makcedward / Nlp
π This repository recorded my NLP journey.
Stars: β 820
Programming Languages
python
139335 projects - #7 most used programming language
Projects that are alternatives of or similar to Nlp
Tensorwatch
Debugging, monitoring and visualization for Python Machine Learning and Data Science
Stars: β 3,191 (+289.15%)
Mutual labels: ai, data-science
Metaflow
π Build and manage real-life data science projects with ease!
Stars: β 5,108 (+522.93%)
Mutual labels: ai, data-science
Artificio
Deep Learning Computer Vision Algorithms for Real-World Use
Stars: β 326 (-60.24%)
Mutual labels: ai, data-science
Hub
Dataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai
Stars: β 4,003 (+388.17%)
Mutual labels: ai, data-science
Awesome Mlops
A curated list of references for MLOps
Stars: β 7,119 (+768.17%)
Mutual labels: ai, data-science
Awesome Mlops
π A curated list of awesome MLOps tools
Stars: β 258 (-68.54%)
Mutual labels: ai, data-science
Awesome Feature Engineering
A curated list of resources dedicated to Feature Engineering Techniques for Machine Learning
Stars: β 433 (-47.2%)
Mutual labels: ai, data-science
Compose
A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels for supervised learning.
Stars: β 203 (-75.24%)
Mutual labels: ai, data-science
Caer
High-performance Vision library in Python. Scale your research, not boilerplate.
Stars: β 452 (-44.88%)
Mutual labels: ai, data-science
Tensor House
A collection of reference machine learning and optimization models for enterprise operations: marketing, pricing, supply chain
Stars: β 449 (-45.24%)
Mutual labels: ai, data-science
Polyaxon
Machine Learning Platform for Kubernetes (MLOps tools for experimentation and automation)
Stars: β 2,966 (+261.71%)
Mutual labels: ai, data-science
Snorkel
A system for quickly generating training data with weak supervision
Stars: β 4,953 (+504.02%)
Mutual labels: ai, data-science
Atlas
An Open Source, Self-Hosted Platform For Applied Deep Learning Development
Stars: β 259 (-68.41%)
Mutual labels: ai, data-science
Oie Resources
A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Stars: β 283 (-65.49%)
Mutual labels: ai, data-science
Voice Gender
Gender recognition by voice and speech analysis
Stars: β 248 (-69.76%)
Mutual labels: ai, data-science
Csinva.github.io
Slides, paper notes, class notes, blog posts, and research on ML π, statistics π, and AI π€.
Stars: β 342 (-58.29%)
Mutual labels: ai, data-science
Imodels
Interpretable ML package π for concise, transparent, and accurate predictive modeling (sklearn-compatible).
Stars: β 194 (-76.34%)
Mutual labels: ai, data-science
Ml Auto Baseball Pitching Overlay
βΎπ€βΎ Automatic baseball pitching overlay in realtime
Stars: β 200 (-75.61%)
Mutual labels: ai, data-science
Spacy
π« Industrial-strength Natural Language Processing (NLP) in Python
Stars: β 21,978 (+2580.24%)
Mutual labels: ai, data-science
Cookiecutter Data Science
A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.
Stars: β 5,271 (+542.8%)
Mutual labels: ai, data-science
NLP - Tutorial
Repository to show how NLP can tacke real problem. Including the source code, dataset, state-of-the art in NLP
Data Augmentation
- Data Augmentation in NLP
- Data Augmentation library for Text
- Does your NLP model able to prevent adversarial attack?
- How does Data Noising Help to Improve your NLP Model?
- Data Augmentation library for Speech Recognition
- Data Augmentation library for Audio
- Unsupervied Data Augmentation
- Adversarial Attacks in Textual Deep Neural Networks
- Back Translation in Text Augmentation by nlpaug
General
Text Preprocessing
Section | Sub-Section | Description | Story |
---|---|---|---|
Tokenization | Subword Tokenization | Medium | |
Tokenization | Word Tokenization | Medium Github | |
Tokenization | Sentence Tokenization | Medium Github | |
Part of Speech | Medium Github | ||
Lemmatization | Medium Github | ||
Stemming | Medium Github | ||
Stop Words | Medium Github | ||
Phrase Word Recognition | |||
Spell Checking | Lexicon-based | Peter Norvig algorithm | Medium Github |
Lexicon-based | Symspell | Medium Github | |
Machine Translation | Statistical Machine Translation | Medium | |
Machine Translation | Attention | Medium | |
String Matching | Fuzzywuzzy | Medium Github |
Text Representation
Section | Sub-Section | Research Lab | Story | Source |
---|---|---|---|---|
Traditional Method | Bag-of-words (BoW) | Medium Github | ||
Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) | Medium Github | |||
Character Level | Character Embedding | NYU | Medium Github | Paper |
Word Level | Negative Sampling and Hierarchical Softmax | Medium | ||
Word2Vec, GloVe, fastText | Medium Github | |||
Contextualized Word Vectors (CoVe) | Salesforce | Medium Github | Paper Code | |
Misspelling Oblivious (word) Embeddings | Medium | Paper | ||
Embeddings from Language Models (ELMo) | AI2 | Medium Github | Paper Code | |
Contextual String Embeddings | Zalando Research | Medium | Paper Code | |
Sentence Level | Skip-thoughts | Medium Github | Paper Code | |
InferSent | Medium Github | Paper Code | ||
Quick-Thoughts | Medium | Paper Code | ||
General Purpose Sentence (GenSen) | Medium | Paper Code | ||
Bidirectional Encoder Representations from Transformers (BERT) | Medium | Paper(2019) Code | ||
Generative Pre-Training (GPT) | OpenAI | Medium | Paper(2019) Code | |
Self-Governing Neural Networks (SGNN) | Medium | Paper | ||
Multi-Task Deep Neural Networks (MT-DNN) | Microsoft | Medium | Paper(2019) | |
Generative Pre-Training-2 (GPT-2) | OpenAI | Medium | Paper(2019) Code | |
Universal Language Model Fine-tuning (ULMFiT) | OpenAI | Medium | Paper Code | |
BERT in Science Domain | Medium | Paper(2019) Paper(2019) | ||
BERT in Clinical Domain | NYU/PU | Medium | Paper(2019) Paper(2019) | |
RoBERTa | UW/Facebook | Medium | Paper(2019) Paper | |
Unified Language Model for NLP and NLU (UNILM) | Microsoft | Medium | Paper(2019) | |
Cross-lingual Language Model (XLMs) | Medium | Paper(2019) | ||
Transformer-XL | CMU/Google | Medium | Paper(2019) | |
XLNet | CMU/Google | Medium | Paper(2019) | |
CTRL | Salesforce | Medium | Paper(2019) | |
ALBERT | Google/Toyota | Medium | Paper(2019) | |
T5 | Googles | Medium | Paper(2019) | |
MultiFiT | Medium | Paper(2019) | ||
XTREME | Medium | Paper(2020) | ||
REALM | Medium | Paper(2020) |
| Document Level | lda2vec | | Medium | Paper | | | doc2vec | Google | Medium Github | Paper |
NLP Problem
Section | Sub-Section | Description | Research Lab | Story | Paper & Code |
---|---|---|---|---|---|
Named Entity Recognition (NER) | Pattern-based Recognition | Medium | |||
Lexicon-based Recognition | Medium | ||||
spaCy Pre-trained NER | Medium Github | ||||
Optical Character Recognition (OCR) | Printed Text | Google Cloud Vision API | Medium | Paper | |
Handwriting | LSTM | Medium | Paper | ||
Text Summarization | Extractive Approach | Medium Github | |||
Abstractive Approach | Medium | ||||
Emotion Recognition | Audio, Text, Visual | 3 Multimodals for Emotion Recognition | Medium |
Acoustic Problem
Section | Sub-Section | Description | Research Lab | Story | Paper & Code |
---|---|---|---|---|---|
Feature Representation | Unsupervised Learning | Introduction to Audio Feature Learning | Medium | Paper 1 Paper 2 Paper 3 | |
Feature Representation | Unsupervised Learning | Speech2Vec and Sentence Level Embeddings | Medium | Paper 1 Paper 2 | |
Feature Representation | Unsupervised Learning | Wav2vec | Medium | Paper | |
Speech-to-text | Introduction to Speeh-to-text | Medium |
Text Distance Measurement
Section | Sub-Section | Description | Research Lab | Story | Paper & Code |
---|---|---|---|---|---|
Euclidean Distance, Cosine Similarity and Jaccard Similarity | Medium Github | ||||
Edit Distance | Levenshtein Distance | Medium Github | |||
Word Moving Distance (WMD) | Medium Github | ||||
Supervised Word Moving Distance (S-WMD) | Medium | ||||
Manhattan LSTM | Medium | Paper |
Model Interpretation
Section | Sub-Section | Description | Research Lab | Story | Paper & Code |
---|---|---|---|---|---|
ELI5, LIME and Skater | Medium Github | ||||
SHapley Additive exPlanations (SHAP) | Medium Github | ||||
Anchors | Medium Github |
Graph
Section | Sub-Section | Description | Research Lab | Story | Paper & Code |
---|---|---|---|---|---|
Embeddings | TransE, RESCAL, DistMult, ComplEx, PyTorch BigGraph | Medium | RESCAL(2011) TransE(2013) DistMult(2015) ComplEx(2016) PyTorch BigGraph(2019) | ||
Embeddings | DeepWalk, node2vec, LINE, GraphSAGE | Medium | DeepWalk(2014) node2vec(2015) LINE(2015) GraphSAGE(2018) | ||
Embeddings | WLG, GCN, GAT, GIN | Medium | WLG(2011) GCN2017) GAT(2017) GraphSAGE(2018) | ||
Embeddings | PinSAGE(2018) | Medium | |||
Embeddings | HoIE(2015), SimpIE(2018) | Medium | |||
Embeddings | ContE(2017), ETE(2017) | Medium |
Meta-Learning
Section | Sub-Section | Description | Story |
---|---|---|---|
Introduction | Matching Nets(2016) MANN(2016) LSTM-based meta-learner(2017) Prototypical Networks(2017) ARC(2017) MAML(2017) MetaNet(2017) | Medium | |
NLP | Dialog Generation | DAML(2019), PAML(2019), NTMS(2019) | Medium |
Classification | Intent Embeddings(2016) LEOPARD(2019) | Medium | |
CV | Unsupervised Learning | CACTUs(2018) | Medium |
General | Siamese Network(1994), Triplet Network(2015) | Medium | |
MAML+(2018) | Medium |
Image
Section | Sub-Section | Description | Research Lab | Story | Paper & Code |
---|---|---|---|---|---|
Object Detection | R-CNN | Medium | Paper(2013) | ||
Object Detection | Fast R-CNN | Medium | Paper(2015) | ||
Object Detection | Faster R-CNN | Medium | Paper(2015) | ||
Object Detection | VGGNet | Medium | Paper(2014) | ||
Instance Segmentation | Mask R-CNN | FAIR | Medium | Paper(2017) | |
Image Classification | ResNet(2015) | Microsoft | Medium | ||
Image Classification | ResNeXt(2016) | Medium |
Evaluation
Section | Sub-Section | Description | Story |
---|---|---|---|
Introduction | Medium | ||
Classification | Confusion Matrix, ROC, AUC | Medium | |
Regression | MAE, MSE, RMSE, MAPE, WMAPE | Medium | |
Textual | Perplexity, BLEU, GER, WER, GLUE | Medium |
Source Code
Section | Sub-Section | Description | Link |
---|---|---|---|
Spellcheck | Github | ||
InferSent | Github |
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].