Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → makcedward → Nlp

makcedward / Nlp

📝 This repository recorded my NLP journey.

Programming Languages

139335 projects - #7 most used programming language

Labels

deep-learning machine-learning nlp data-science ai

Projects that are alternatives of or similar to Nlp

Debugging, monitoring and visualization for Python Machine Learning and Data Science

Stars: ✭ 3,191 (+289.15%)

Mutual labels: ai, data-science

🚀 Build and manage real-life data science projects with ease!

Stars: ✭ 5,108 (+522.93%)

Mutual labels: ai, data-science

Deep Learning Computer Vision Algorithms for Real-World Use

Stars: ✭ 326 (-60.24%)

Mutual labels: ai, data-science

Dataset format for AI. Build, manage, & visualize datasets for deep learning. Stream data real-time to PyTorch/TensorFlow & version-control it. https://activeloop.ai

Stars: ✭ 4,003 (+388.17%)

Mutual labels: ai, data-science

A curated list of references for MLOps

Stars: ✭ 7,119 (+768.17%)

Mutual labels: ai, data-science

😎 A curated list of awesome MLOps tools

Stars: ✭ 258 (-68.54%)

Mutual labels: ai, data-science

Awesome Feature Engineering

A curated list of resources dedicated to Feature Engineering Techniques for Machine Learning

Stars: ✭ 433 (-47.2%)

Mutual labels: ai, data-science

A machine learning tool for automated prediction engineering. It allows you to easily structure prediction problems and generate labels for supervised learning.

Stars: ✭ 203 (-75.24%)

Mutual labels: ai, data-science

High-performance Vision library in Python. Scale your research, not boilerplate.

Stars: ✭ 452 (-44.88%)

Mutual labels: ai, data-science

A collection of reference machine learning and optimization models for enterprise operations: marketing, pricing, supply chain

Stars: ✭ 449 (-45.24%)

Mutual labels: ai, data-science

Machine Learning Platform for Kubernetes (MLOps tools for experimentation and automation)

Stars: ✭ 2,966 (+261.71%)

Mutual labels: ai, data-science

A system for quickly generating training data with weak supervision

Stars: ✭ 4,953 (+504.02%)

Mutual labels: ai, data-science

An Open Source, Self-Hosted Platform For Applied Deep Learning Development

Stars: ✭ 259 (-68.41%)

Mutual labels: ai, data-science

A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.

Stars: ✭ 283 (-65.49%)

Mutual labels: ai, data-science

Gender recognition by voice and speech analysis

Stars: ✭ 248 (-69.76%)

Mutual labels: ai, data-science

Csinva.github.io

Slides, paper notes, class notes, blog posts, and research on ML 📉, statistics 📊, and AI 🤖.

Stars: ✭ 342 (-58.29%)

Mutual labels: ai, data-science

Interpretable ML package 🔍 for concise, transparent, and accurate predictive modeling (sklearn-compatible).

Stars: ✭ 194 (-76.34%)

Mutual labels: ai, data-science

Ml Auto Baseball Pitching Overlay

⚾🤖⚾ Automatic baseball pitching overlay in realtime

Stars: ✭ 200 (-75.61%)

Mutual labels: ai, data-science

💫 Industrial-strength Natural Language Processing (NLP) in Python

Stars: ✭ 21,978 (+2580.24%)

Mutual labels: ai, data-science

Cookiecutter Data Science

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Stars: ✭ 5,271 (+542.8%)

Mutual labels: ai, data-science

View All Similar Projects ➔

NLP - Tutorial

Repository to show how NLP can tacke real problem. Including the source code, dataset, state-of-the art in NLP

Data Augmentation

General

Tricks of Building an ML or DNN Model

Text Preprocessing

Section	Sub-Section	Description	Story
Tokenization	Subword Tokenization		Medium
Tokenization	Word Tokenization		Medium Github
Tokenization	Sentence Tokenization		Medium Github
Part of Speech			Medium Github
Lemmatization			Medium Github
Stemming			Medium Github
Stop Words			Medium Github
Phrase Word Recognition
Spell Checking	Lexicon-based	Peter Norvig algorithm	Medium Github
	Lexicon-based	Symspell	Medium Github
	Machine Translation	Statistical Machine Translation	Medium
	Machine Translation	Attention	Medium
String Matching	Fuzzywuzzy		Medium Github

Text Representation

Section	Sub-Section	Research Lab	Story	Source
Traditional Method	Bag-of-words (BoW)		Medium Github
	Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA)		Medium Github
Character Level	Character Embedding	NYU	Medium Github	Paper
Word Level	Negative Sampling and Hierarchical Softmax		Medium
	Word2Vec, GloVe, fastText		Medium Github
	Contextualized Word Vectors (CoVe)	Salesforce	Medium Github	Paper Code
	Misspelling Oblivious (word) Embeddings	Facebook	Medium	Paper
	Embeddings from Language Models (ELMo)	AI2	Medium Github	Paper Code
	Contextual String Embeddings	Zalando Research	Medium	Paper Code
Sentence Level	Skip-thoughts		Medium Github	Paper Code
	InferSent		Medium Github	Paper Code
	Quick-Thoughts	Google	Medium	Paper Code
	General Purpose Sentence (GenSen)		Medium	Paper Code
	Bidirectional Encoder Representations from Transformers (BERT)	Google	Medium	Paper(2019) Code
	Generative Pre-Training (GPT)	OpenAI	Medium	Paper(2019) Code
	Self-Governing Neural Networks (SGNN)	Google	Medium	Paper
	Multi-Task Deep Neural Networks (MT-DNN)	Microsoft	Medium	Paper(2019)
	Generative Pre-Training-2 (GPT-2)	OpenAI	Medium	Paper(2019) Code
	Universal Language Model Fine-tuning (ULMFiT)	OpenAI	Medium	Paper Code
	BERT in Science Domain		Medium	Paper(2019) Paper(2019)
	BERT in Clinical Domain	NYU/PU	Medium	Paper(2019) Paper(2019)
	RoBERTa	UW/Facebook	Medium	Paper(2019) Paper
	Unified Language Model for NLP and NLU (UNILM)	Microsoft	Medium	Paper(2019)
	Cross-lingual Language Model (XLMs)	Facebook	Medium	Paper(2019)
	Transformer-XL	CMU/Google	Medium	Paper(2019)
	XLNet	CMU/Google	Medium	Paper(2019)
	CTRL	Salesforce	Medium	Paper(2019)
	ALBERT	Google/Toyota	Medium	Paper(2019)
	T5	Googles	Medium	Paper(2019)
	MultiFiT		Medium	Paper(2019)
	XTREME		Medium	Paper(2020)
	REALM		Medium	Paper(2020)

| Document Level | lda2vec | | Medium | Paper | | | doc2vec | Google | Medium Github | Paper |

NLP Problem

Section	Sub-Section	Description	Research Lab	Story	Paper & Code
Named Entity Recognition (NER)	Pattern-based Recognition			Medium
	Lexicon-based Recognition			Medium
	spaCy Pre-trained NER			Medium Github
Optical Character Recognition (OCR)	Printed Text	Google Cloud Vision API	Google	Medium	Paper
	Handwriting	LSTM	Google	Medium	Paper
Text Summarization	Extractive Approach			Medium Github
	Abstractive Approach			Medium
Emotion Recognition	Audio, Text, Visual	3 Multimodals for Emotion Recognition		Medium

Acoustic Problem

Section	Sub-Section	Description	Research Lab	Story	Paper & Code
Feature Representation	Unsupervised Learning	Introduction to Audio Feature Learning		Medium	Paper 1 Paper 2 Paper 3
Feature Representation	Unsupervised Learning	Speech2Vec and Sentence Level Embeddings		Medium	Paper 1 Paper 2
Feature Representation	Unsupervised Learning	Wav2vec		Medium	Paper
Speech-to-text		Introduction to Speeh-to-text		Medium

Text Distance Measurement

Section	Sub-Section	Description	Research Lab	Story	Paper & Code
Euclidean Distance, Cosine Similarity and Jaccard Similarity				Medium Github
Edit Distance	Levenshtein Distance			Medium Github
Word Moving Distance (WMD)				Medium Github
Supervised Word Moving Distance (S-WMD)				Medium
Manhattan LSTM				Medium	Paper

Model Interpretation

Section	Sub-Section	Description	Research Lab	Story	Paper & Code
ELI5, LIME and Skater				Medium Github
SHapley Additive exPlanations (SHAP)				Medium Github
Anchors				Medium Github

Graph

Section	Sub-Section	Description	Research Lab	Story	Paper & Code
Embeddings		TransE, RESCAL, DistMult, ComplEx, PyTorch BigGraph		Medium	RESCAL(2011) TransE(2013) DistMult(2015) ComplEx(2016) PyTorch BigGraph(2019)
Embeddings		DeepWalk, node2vec, LINE, GraphSAGE		Medium	DeepWalk(2014) node2vec(2015) LINE(2015) GraphSAGE(2018)
Embeddings		WLG, GCN, GAT, GIN		Medium	WLG(2011) GCN2017) GAT(2017) GraphSAGE(2018)
Embeddings		PinSAGE(2018)	Pinterest	Medium
Embeddings		HoIE(2015), SimpIE(2018)		Medium
Embeddings		ContE(2017), ETE(2017)		Medium

Meta-Learning

Section	Sub-Section	Description	Story
Introduction		Matching Nets(2016) MANN(2016) LSTM-based meta-learner(2017) Prototypical Networks(2017) ARC(2017) MAML(2017) MetaNet(2017)	Medium
NLP	Dialog Generation	DAML(2019), PAML(2019), NTMS(2019)	Medium
	Classification	Intent Embeddings(2016) LEOPARD(2019)	Medium
CV	Unsupervised Learning	CACTUs(2018)	Medium
General		Siamese Network(1994), Triplet Network(2015)	Medium
	MAML+(2018)	Medium

Image

Section	Sub-Section	Description	Research Lab	Story	Paper & Code
Object Detection		R-CNN		Medium	Paper(2013)
Object Detection		Fast R-CNN		Medium	Paper(2015)
Object Detection		Faster R-CNN		Medium	Paper(2015)
Object Detection		VGGNet		Medium	Paper(2014)
Instance Segmentation		Mask R-CNN	FAIR	Medium	Paper(2017)
Image Classification		ResNet(2015)	Microsoft	Medium
Image Classification		ResNeXt(2016)		Medium

Evaluation

Section	Sub-Section	Description	Story
Introduction			Medium
Classification		Confusion Matrix, ROC, AUC	Medium
Regression		MAE, MSE, RMSE, MAPE, WMAPE	Medium
Textual		Perplexity, BLEU, GER, WER, GLUE	Medium

Source Code

Section	Sub-Section	Description	Link
Spellcheck			Github
InferSent			Github

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 820

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (9) 🔗