All Projects β†’ makcedward β†’ Nlp

makcedward / Nlp

πŸ“ This repository recorded my NLP journey.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Nlp

Tensorwatch
Debugging, monitoring and visualization for Python Machine Learning and Data Science
Stars: ✭ 3,191 (+289.15%)
Mutual labels:  ai, data-science
Metaflow
πŸš€ Build and manage real-life data science projects with ease!
Stars: ✭ 5,108 (+522.93%)
Mutual labels:  ai, data-science
Artificio
Deep Learning Computer Vision Algorithms for Real-World Use
Stars: ✭ 326 (-60.24%)
Mutual labels:  ai, data-science
Hub
Dataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai
Stars: ✭ 4,003 (+388.17%)
Mutual labels:  ai, data-science
Awesome Mlops
A curated list of references for MLOps
Stars: ✭ 7,119 (+768.17%)
Mutual labels:  ai, data-science
Awesome Mlops
😎 A curated list of awesome MLOps tools
Stars: ✭ 258 (-68.54%)
Mutual labels:  ai, data-science
Awesome Feature Engineering
A curated list of resources dedicated to Feature Engineering Techniques for Machine Learning
Stars: ✭ 433 (-47.2%)
Mutual labels:  ai, data-science
Compose
A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels for supervised learning.
Stars: ✭ 203 (-75.24%)
Mutual labels:  ai, data-science
Caer
High-performance Vision library in Python. Scale your research, not boilerplate.
Stars: ✭ 452 (-44.88%)
Mutual labels:  ai, data-science
Tensor House
A collection of reference machine learning and optimization models for enterprise operations: marketing, pricing, supply chain
Stars: ✭ 449 (-45.24%)
Mutual labels:  ai, data-science
Polyaxon
Machine Learning Platform for Kubernetes (MLOps tools for experimentation and automation)
Stars: ✭ 2,966 (+261.71%)
Mutual labels:  ai, data-science
Snorkel
A system for quickly generating training data with weak supervision
Stars: ✭ 4,953 (+504.02%)
Mutual labels:  ai, data-science
Atlas
An Open Source, Self-Hosted Platform For Applied Deep Learning Development
Stars: ✭ 259 (-68.41%)
Mutual labels:  ai, data-science
Oie Resources
A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Stars: ✭ 283 (-65.49%)
Mutual labels:  ai, data-science
Voice Gender
Gender recognition by voice and speech analysis
Stars: ✭ 248 (-69.76%)
Mutual labels:  ai, data-science
Csinva.github.io
Slides, paper notes, class notes, blog posts, and research on ML πŸ“‰, statistics πŸ“Š, and AI πŸ€–.
Stars: ✭ 342 (-58.29%)
Mutual labels:  ai, data-science
Imodels
Interpretable ML package πŸ” for concise, transparent, and accurate predictive modeling (sklearn-compatible).
Stars: ✭ 194 (-76.34%)
Mutual labels:  ai, data-science
Ml Auto Baseball Pitching Overlay
βšΎπŸ€–βšΎ Automatic baseball pitching overlay in realtime
Stars: ✭ 200 (-75.61%)
Mutual labels:  ai, data-science
Spacy
πŸ’« Industrial-strength Natural Language Processing (NLP) in Python
Stars: ✭ 21,978 (+2580.24%)
Mutual labels:  ai, data-science
Cookiecutter Data Science
A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.
Stars: ✭ 5,271 (+542.8%)
Mutual labels:  ai, data-science

NLP - Tutorial

Repository to show how NLP can tacke real problem. Including the source code, dataset, state-of-the art in NLP

Data Augmentation

General

Text Preprocessing

Section Sub-Section Description Story
Tokenization Subword Tokenization Medium
Tokenization Word Tokenization Medium Github
Tokenization Sentence Tokenization Medium Github
Part of Speech Medium Github
Lemmatization Medium Github
Stemming Medium Github
Stop Words Medium Github
Phrase Word Recognition
Spell Checking Lexicon-based Peter Norvig algorithm Medium Github
Lexicon-based Symspell Medium Github
Machine Translation Statistical Machine Translation Medium
Machine Translation Attention Medium
String Matching Fuzzywuzzy Medium Github

Text Representation

Section Sub-Section Research Lab Story Source
Traditional Method Bag-of-words (BoW) Medium Github
Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) Medium Github
Character Level Character Embedding NYU Medium Github Paper
Word Level Negative Sampling and Hierarchical Softmax Medium
Word2Vec, GloVe, fastText Medium Github
Contextualized Word Vectors (CoVe) Salesforce Medium Github Paper Code
Misspelling Oblivious (word) Embeddings Facebook Medium Paper
Embeddings from Language Models (ELMo) AI2 Medium Github Paper Code
Contextual String Embeddings Zalando Research Medium Paper Code
Sentence Level Skip-thoughts Medium Github Paper Code
InferSent Medium Github Paper Code
Quick-Thoughts Google Medium Paper Code
General Purpose Sentence (GenSen) Medium Paper Code
Bidirectional Encoder Representations from Transformers (BERT) Google Medium Paper(2019) Code
Generative Pre-Training (GPT) OpenAI Medium Paper(2019) Code
Self-Governing Neural Networks (SGNN) Google Medium Paper
Multi-Task Deep Neural Networks (MT-DNN) Microsoft Medium Paper(2019)
Generative Pre-Training-2 (GPT-2) OpenAI Medium Paper(2019) Code
Universal Language Model Fine-tuning (ULMFiT) OpenAI Medium Paper Code
BERT in Science Domain Medium Paper(2019) Paper(2019)
BERT in Clinical Domain NYU/PU Medium Paper(2019) Paper(2019)
RoBERTa UW/Facebook Medium Paper(2019) Paper
Unified Language Model for NLP and NLU (UNILM) Microsoft Medium Paper(2019)
Cross-lingual Language Model (XLMs) Facebook Medium Paper(2019)
Transformer-XL CMU/Google Medium Paper(2019)
XLNet CMU/Google Medium Paper(2019)
CTRL Salesforce Medium Paper(2019)
ALBERT Google/Toyota Medium Paper(2019)
T5 Googles Medium Paper(2019)
MultiFiT Medium Paper(2019)
XTREME Medium Paper(2020)
REALM Medium Paper(2020)

| Document Level | lda2vec | | Medium | Paper | | | doc2vec | Google | Medium Github | Paper |

NLP Problem

Section Sub-Section Description Research Lab Story Paper & Code
Named Entity Recognition (NER) Pattern-based Recognition Medium
Lexicon-based Recognition Medium
spaCy Pre-trained NER Medium Github
Optical Character Recognition (OCR) Printed Text Google Cloud Vision API Google Medium Paper
Handwriting LSTM Google Medium Paper
Text Summarization Extractive Approach Medium Github
Abstractive Approach Medium
Emotion Recognition Audio, Text, Visual 3 Multimodals for Emotion Recognition Medium

Acoustic Problem

Section Sub-Section Description Research Lab Story Paper & Code
Feature Representation Unsupervised Learning Introduction to Audio Feature Learning Medium Paper 1 Paper 2 Paper 3
Feature Representation Unsupervised Learning Speech2Vec and Sentence Level Embeddings Medium Paper 1 Paper 2
Feature Representation Unsupervised Learning Wav2vec Medium Paper
Speech-to-text Introduction to Speeh-to-text Medium

Text Distance Measurement

Section Sub-Section Description Research Lab Story Paper & Code
Euclidean Distance, Cosine Similarity and Jaccard Similarity Medium Github
Edit Distance Levenshtein Distance Medium Github
Word Moving Distance (WMD) Medium Github
Supervised Word Moving Distance (S-WMD) Medium
Manhattan LSTM Medium Paper

Model Interpretation

Section Sub-Section Description Research Lab Story Paper & Code
ELI5, LIME and Skater Medium Github
SHapley Additive exPlanations (SHAP) Medium Github
Anchors Medium Github

Graph

Section Sub-Section Description Research Lab Story Paper & Code
Embeddings TransE, RESCAL, DistMult, ComplEx, PyTorch BigGraph Medium RESCAL(2011) TransE(2013) DistMult(2015) ComplEx(2016) PyTorch BigGraph(2019)
Embeddings DeepWalk, node2vec, LINE, GraphSAGE Medium DeepWalk(2014) node2vec(2015) LINE(2015) GraphSAGE(2018)
Embeddings WLG, GCN, GAT, GIN Medium WLG(2011) GCN2017) GAT(2017) GraphSAGE(2018)
Embeddings PinSAGE(2018) Pinterest Medium
Embeddings HoIE(2015), SimpIE(2018) Medium
Embeddings ContE(2017), ETE(2017) Medium

Meta-Learning

Section Sub-Section Description Story
Introduction Matching Nets(2016) MANN(2016) LSTM-based meta-learner(2017) Prototypical Networks(2017) ARC(2017) MAML(2017) MetaNet(2017) Medium
NLP Dialog Generation DAML(2019), PAML(2019), NTMS(2019) Medium
Classification Intent Embeddings(2016) LEOPARD(2019) Medium
CV Unsupervised Learning CACTUs(2018) Medium
General Siamese Network(1994), Triplet Network(2015) Medium
MAML+(2018) Medium

Image

Section Sub-Section Description Research Lab Story Paper & Code
Object Detection R-CNN Medium Paper(2013)
Object Detection Fast R-CNN Medium Paper(2015)
Object Detection Faster R-CNN Medium Paper(2015)
Object Detection VGGNet Medium Paper(2014)
Instance Segmentation Mask R-CNN FAIR Medium Paper(2017)
Image Classification ResNet(2015) Microsoft Medium
Image Classification ResNeXt(2016) Medium

Evaluation

Section Sub-Section Description Story
Introduction Medium
Classification Confusion Matrix, ROC, AUC Medium
Regression MAE, MSE, RMSE, MAPE, WMAPE Medium
Textual Perplexity, BLEU, GER, WER, GLUE Medium

Source Code

Section Sub-Section Description Link
Spellcheck Github
InferSent Github
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].