All Projects → ivan-bilan → The Nlp Pandect

ivan-bilan / The Nlp Pandect

Licence: cc0-1.0
A comprehensive reference for all topics related to Natural Language Processing

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to The Nlp Pandect

Python Tutorial Notebooks
Python tutorials as Jupyter Notebooks for NLP, ML, AI
Stars: ✭ 52 (-96.15%)
Mutual labels:  natural-language-processing, deeplearning
Trankit
Trankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
Stars: ✭ 311 (-76.95%)
Mutual labels:  natural-language-processing, deeplearning
Fixy
Amacımız Türkçe NLP literatüründeki birçok farklı sorunu bir arada çözebilen, eşsiz yaklaşımlar öne süren ve literatürdeki çalışmaların eksiklerini gideren open source bir yazım destekleyicisi/denetleyicisi oluşturmak. Kullanıcıların yazdıkları metinlerdeki yazım yanlışlarını derin öğrenme yaklaşımıyla çözüp aynı zamanda metinlerde anlamsal analizi de gerçekleştirerek bu bağlamda ortaya çıkan yanlışları da fark edip düzeltebilmek.
Stars: ✭ 165 (-87.77%)
Mutual labels:  natural-language-processing, deeplearning
Ai Series
📚 [.md & .ipynb] Series of Artificial Intelligence & Deep Learning, including Mathematics Fundamentals, Python Practices, NLP Application, etc. 💫 人工智能与深度学习实战,数理统计篇 | 机器学习篇 | 深度学习篇 | 自然语言处理篇 | 工具实践 Scikit & Tensoflow & PyTorch 篇 | 行业应用 & 课程笔记
Stars: ✭ 702 (-47.96%)
Mutual labels:  natural-language-processing, deeplearning
Coursera Natural Language Processing Specialization
Programming assignments from all courses in the Coursera Natural Language Processing Specialization offered by deeplearning.ai.
Stars: ✭ 39 (-97.11%)
Mutual labels:  natural-language-processing, deeplearning
Spago
Self-contained Machine Learning and Natural Language Processing library in Go
Stars: ✭ 854 (-36.69%)
Mutual labels:  natural-language-processing, deeplearning
Learn Data Science For Free
This repositary is a combination of different resources lying scattered all over the internet. The reason for making such an repositary is to combine all the valuable resources in a sequential manner, so that it helps every beginners who are in a search of free and structured learning resource for Data Science. For Constant Updates Follow me in …
Stars: ✭ 4,757 (+252.63%)
Mutual labels:  natural-language-processing, deeplearning
Ludwig
Data-centric declarative deep learning framework
Stars: ✭ 8,018 (+494.37%)
Mutual labels:  natural-language-processing, deeplearning
Bidaf Keras
Bidirectional Attention Flow for Machine Comprehension implemented in Keras 2
Stars: ✭ 60 (-95.55%)
Mutual labels:  natural-language-processing, deeplearning
Tageditor
🏖TagEditor - Annotation tool for spaCy
Stars: ✭ 92 (-93.18%)
Mutual labels:  natural-language-processing
Toiro
A comparison tool of Japanese tokenizers
Stars: ✭ 95 (-92.96%)
Mutual labels:  natural-language-processing
Abydos
Abydos NLP/IR library for Python
Stars: ✭ 91 (-93.25%)
Mutual labels:  natural-language-processing
Micromlp
A micro neural network multilayer perceptron for MicroPython (used on ESP32 and Pycom modules)
Stars: ✭ 92 (-93.18%)
Mutual labels:  deeplearning
Bdrar
Code for the ECCV 2018 paper "Bidirectional Feature Pyramid Network with Recurrent Attention Residual Modules for Shadow Detection"
Stars: ✭ 95 (-92.96%)
Mutual labels:  deeplearning
Msr Nlp Projects
This is a list of open-source projects at Microsoft Research NLP Group
Stars: ✭ 92 (-93.18%)
Mutual labels:  natural-language-processing
Sentence Similarity
PyTorch implementations of various deep learning models for paraphrase detection, semantic similarity, and textual entailment
Stars: ✭ 96 (-92.88%)
Mutual labels:  natural-language-processing
Geotext
Geotext extracts country and city mentions from text
Stars: ✭ 91 (-93.25%)
Mutual labels:  natural-language-processing
Lda Topic Modeling
A PureScript, browser-based implementation of LDA topic modeling.
Stars: ✭ 91 (-93.25%)
Mutual labels:  natural-language-processing
Bond
BOND: BERT-Assisted Open-Domain Name Entity Recognition with Distant Supervision
Stars: ✭ 96 (-92.88%)
Mutual labels:  natural-language-processing
Ngsim env
Learning human driver models from NGSIM data with imitation learning.
Stars: ✭ 96 (-92.88%)
Mutual labels:  deeplearning

The-NLP-Pandect

This pandect (πανδέκτης is Ancient Greek for encyclopedia) was created to help you find almost anything related to Natural Language Processing that is available online.

The-NLP-Resources

Compendiums and awesome lists on the topic of NLP:

NLP Conferences, Paper Summaries and Paper Compendiums:

Papers and Paper Summaries
Conferences

NLP Progress and NLP Tasks:

NLP Datasets:

Word and Sentence embeddings:

Notebooks, Scripts and Repositories

Non-English resources and compendiums

Pre-trained NLP models

NLP Year in Review

2020

The-NLP-Podcasts

NLP-only podcasts

Many NLP episodes

Some NLP episodes

The-NLP-Newsletter

The-NLP-Meetups

The-NLP-Youtube

The-NLP-Benchmarks

General NLU

  • GLUE - General Language Understanding Evaluation (GLUE) benchmark
  • SuperGLUE - benchmark styled after GLUE with a new set of more difficult language understanding tasks
  • decaNLP - The Natural Language Decathlon (decaNLP) for studying general NLP models
  • RACE - ReAding Comprehension dataset collected from English Examinations
  • dialoglue - DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue
  • DynaBench - Dynabench is a research platform for dynamic data collection and benchmarking

Summarization

  • WikiAsp - WikiAsp: Multi-document aspect-based summarization Dataset

Question Answering

  • SQuAD - Stanford Question Answering Dataset (SQuAD)
  • XQuad - XQuAD (Cross-lingual Question Answering Dataset) for cross-lingual question answering
  • GrailQA - Strongly Generalizable Question Answering (GrailQA)
  • CSQA - Complex Sequential Question Answering

Multilingual and Non-English Benchmarks

  • XTREME - Massively Multilingual Multi-task Benchmark
  • GLUECoS - A benchmark for code-switched NLP
  • IndoNLU Benchmark - collection of resources for training, evaluating, and analyzing NLP for Bahasa Indonesia
  • IndicGLUE - Natural Language Understanding Benchmark for Indic Languages
  • LinCE - Linguistic Code-Switching Evaluation Benchmark

Bio, Law, and other scientific domains

  • BLURB - Biomedical Language Understanding and Reasoning Benchmark
  • BLUE - Biomedical Language Understanding Evaluation benchmark

Transformer Efficiency

Other

  • CodeXGLUE - A benchmark dataset for code intelligence
  • CrossNER - CrossNER: Evaluating Cross-Domain Named Entity Recognition
  • MultiNLI - Multi-Genre Natural Language Inference corpus

The-NLP-Research

General

Embeddings

Repositories

Blogs

Cross-lingual Word Embeddings

  • vecmap - VecMap (cross-lingual word embedding mappings) [GitHub, 527 stars]

Byte Pair Encoding

  • bpemb - Pre-trained subword embeddings in 275 languages, based on Byte-Pair Encoding (BPE) [GitHub, 893 stars]
  • subword-nmt - Unsupervised Word Segmentation for Neural Machine Translation and Text Generation [GitHub, 1633 stars]
  • python-bpe - Byte Pair Encoding for Python [GitHub, 141 stars]

Transformer-based Architectures

General

Transformer

BERT

Other Transformer Variants

T5
BigBird
Reformer / Linformer / Longformer / Performers
Switch Transformer

GPT-family

General
GPT-3
Learning Resources
Applications
  • Aweseome GPT-3 - list of all resources related to GPT-3 [GitHub, 2946 stars]
  • GPT-3 Projects - a map of all GPT-3 start-ups and commercial projects
  • OpenAI API - API Demo to use GPT-3 for commercial applications
Open-source Efforts
  • GPT-Neo - in-progress GPT-3 open source replication

Other

Distillation, Pruning and Quantization

Automated Summarization

Rule-based NLP

  • LemmInflect - A python module for English lemmatization and inflection

The-NLP-Industry

Best Practices for NLP

Transformer-based Architectures

Embeddings as a Service

NLP Recipes Industrial Applications:

NLP Applications in Bio, Finance, Legal and other industries

Model and Data testing

  • WildNLP - Corrupt an input text to test NLP models' robustness [GitHub, 64 stars]
  • Great Expectations - Write tests for your data [GitHub, 3721 stars]
  • CheckList - Beyond Accuracy: Behavioral Testing of NLP models [GitHub, 1245 stars]
  • TextAttack - framework for adversarial attacks, data augmentation, and model training in NLP [GitHub, 1263 stars]

The-NLP-Speech

General Speech Recognition

  • wav2letter - Automatic Speech Recognition Toolkit [GitHub, 5655 stars]
  • DeepSpeech - Baidu's DeepSpeech architecture [GitHub, 16623 stars]
  • Acoustic Word Embeddings by Maria Obedkova [Blog, 2020]
  • kaldi - Kaldi is a toolkit for speech recognition [GitHub, 10155 stars]
  • awesome-kaldi - resources for using Kaldi [GitHub, 381 stars]
  • ESPnet - End-to-End Speech Processing Toolkit [GitHub, 3540 stars]

Text to Speech

  • FastSpeech - The Implementation of FastSpeech based on pytorch [GitHub, 587 stars]

The-NLP-Topics

Blogs

Frameworks for Topic Modeling

  • gensim - framework for topic modeling [GitHub, 11751 stars]
  • Spark NLP [GitHub, 1963 stars]

Repositories

Keyword-Extraction

Text Rank

  • PyTextRank - PyTextRank is a Python implementation of TextRank as a spaCy pipeline extension [GitHub, 1471 stars]
  • textrank - TextRank implementation for Python 3 [GitHub, 996 stars]

RAKE - Rapid Automatic Keyword Extraction

  • rake-nltk - Rapid Automatic Keyword Extraction algorithm using NLTK [GitHub, 792 stars]
  • yake - Single-document unsupervised keyword extraction [GitHub, 577 stars]
  • RAKE-tutorial - A python implementation of the Rapid Automatic Keyword Extraction [GitHub, 346 stars]
  • rake-nltk - Rapid Automatic Keyword Extraction algorithm using NLTK [GitHub, 792 stars]

Other

Responsible-NLP

NLP and ML Interpretability

  • Language Interpretability Tool (LIT) [GitHub, 2380 stars]
  • WhatLies - Toolkit to help visualise - what lies in word embeddings [GitHub, 236 stars]
  • Interpret-Text - Interpretability techniques and visualization dashboards for NLP models [GitHub, 214 stars]
  • InterpretML - Fit interpretable models. Explain blackbox machine learning [GitHub, 3491 stars]
  • ecco - Tools to visuals and explore NLP language models [GitHub, 693 stars]
  • NLP Profiler - A simple NLP library allows profiling datasets with text columns [GitHub, 179 stars]

Ethics, Bias, and Equality in NLP

Adversarial Attacks for NLP

The-NLP-Frameworks

General Purpose

  • spaCy by Explosion AI [GitHub, 19664 stars]
  • flair by Zalando [GitHub, 9971 stars]
  • AllenNLP by AI2 [GitHub, 9705 stars]
  • stanza (former Stanford NLP) [GitHub, 5201 stars]
  • spaCy stanza [GitHub, 502 stars]
  • nltk [GitHub, 9651 stars]
  • gensim - framework for topic modeling [GitHub, 11751 stars]
  • pororo - Platform of neural models for natural language processing [GitHub, 780 stars]
  • NLP Architect - A Deep Learning NLP/NLU library by Intel® AI Lab [GitHub, 2601 stars]
  • polyglot - Multi-lingual NLP Framework [GitHub, 1773 stars]
  • FARM [GitHub, 1108 stars]
  • gobbli by RTI International [GitHub, 251 stars]
  • headliner - training and deployment of seq2seq models [GitHub, 221 stars]
  • SyferText - A privacy preserving NLP framework [GitHub, 164 stars]
  • DeText - Text Understanding Framework for Ranking and Classification Tasks [GitHub, 1023 stars]
  • TextHero - Text preprocessing, representation and visualization [GitHub, 2097 stars]
  • textblob - TextBlob: Simplified Text Processing [GitHub, 7553 stars]
  • AdaptNLP - A high level framework and library for NLP [GitHub, 269 stars]
  • textacy - NLP, before and after spaCy [GitHub, 1609 stars]
  • texar - Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow [GitHub, 2113 stars]

Data Augmentation

  • WildNLP Text manipulation library to test NLP models [GitHub, 64 stars]
  • snorkel Framework to generate training data [GitHub, 4490 stars]
  • NLPAug Data augmentation for NLP [GitHub, 1638 stars]
  • SentAugment Data augmentation by retrieving similar sentences from larger datasets [GitHub, 268 stars]
  • faker - Python package that generates fake data for you [GitHub, 12133 stars]

Adversarial NLP Attacks

  • TextAttack - framework for adversarial attacks, data augmentation, and model training in NLP [GitHub, 1263 stars]
  • CleverHans - adversarial example library for constructing NLP attacks and building defenses [GitHub, 4957 stars]

Non-English oriented

  • textblob-de - TextBlob: Simplified Text Processing for German [GitHub, 80 stars]
  • Kashgari Transfer Learning with focus on Chinese [GitHub, 2032 stars]
  • Underthesea - Vietnamese NLP Toolkit [GitHub, 814 stars]

Transformer-oriented

  • transformers by HuggingFace [GitHub, 41314 stars]
  • Adapter Hub and its documentation - Adapter modules for Transformers [GitHub, 323 stars]
  • haystack - Transformers at scale for question answering & neural search. [GitHub, 1417 stars]

Dialog Systems and Speech

  • DeepPavlov by MIPT [GitHub, 5021 stars]
  • ParlAI by FAIR [GitHub, 6999 stars]
  • rasa - Framework for Conversational Agents [GitHub, 10826 stars]
  • wav2letter - Automatic Speech Recognition Toolkit [GitHub, 5655 stars]
  • ChatterBot - conversational dialog engine for creating chat bots [GitHub, 10900 stars]

Word-embeddings oriented

  • MUSE A library for Multilingual Unsupervised or Supervised word Embeddings [GitHub, 2690 stars]
  • vecmap A framework to learn cross-lingual word embedding mappings [GitHub, 527 stars]

Distributed NLP

Machine Translation

  • COMET -A Neural Framework for MT Evaluation [GitHub, 53 stars]
  • marian-nmt - Fast Neural Machine Translation in C++ [GitHub, 764 stars]
  • argos-translate - Open source neural machine translation in Python [GitHub, 453 stars]
  • Opus-MT - Open neural machine translation models and web services [GitHub, 104 stars]

Entity and String Matching

  • PolyFuzz - Fuzzy string matching, grouping, and evaluation [GitHub, 285 stars]
  • pyahocorasick - Python module implementing Aho-Corasick algorithm for string matching [GitHub, 585 stars]
  • fuzzywuzzy - Fuzzy String Matching in Python [GitHub, 7900 stars]
  • jellyfish - approximate and phonetic matching of strings [GitHub, 1400 stars]
  • textdistance - Compute distance between sequences [GitHub, 1900 stars]
  • DeepMatcher - Compute distance between sequences [GitHub, 276 stars]

Discourse Analysis

  • ConvoKit - Cornell Conversational Analysis Toolkit [GitHub, 244 stars]

The-NLP-Learning

General

Books

Courses

Tutorials

The-NLP-Communities

Other-NLP-Topics

General

Tokenization

  • tokenizers - Fast State-of-the-Art Tokenizers optimized for Research and Production [GitHub, 4274 stars]
  • SentencePiece - Unsupervised text tokenizer for Neural Network-based text generation [GitHub, 4832 stars]
  • SoMaJo - A tokenizer and sentence splitter for German and English web and social media texts [GitHub, 84 stars]

Data Augmentation and Weak Supervision

Libraries and Frameworks
  • WildNLP Text manipulation library to test NLP models [GitHub, 64 stars]
  • snorkel Framework to generate training data [GitHub, 4490 stars]
  • NLPAug Data augmentation for NLP [GitHub, 1638 stars]
  • SentAugment Data augmentation by retrieving similar sentences from larger datasets [GitHub, 268 stars]
  • TextAttack - framework for adversarial attacks, data augmentation, and model training in NLP [GitHub, 1263 stars]
Blogs and Tutorials

Named Entity Recognition (NER)

Relation Extraction

  • tacred-relation TACRED: position-aware attention model for relation extraction [GitHub, 268 stars]
  • tacrev TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task [GitHub, 34 stars]
  • tac-self-attention Relation extraction with position-aware self-attention [GitHub, 57 stars]

Domain Adaptation

Low Resource NLP

Spell Correction

  • NeuSpell - A Neural Spelling Correction Toolkit [GitHub, 76 stars]
  • SymSpellPy - Python port of SymSpell [GitHub, 412 stars]
  • Speller100 by Microsoft [Blog, Feb 2021]

Automata Theory for NLP

  • pyahocorasick - Python module implementing Aho-Corasick algorithm for string matching [GitHub, 585 stars]

Obscene words detection

LDNOOBW - List of Dirty, Naughty, Obscene, and Otherwise Bad Words [GitHub, 1275 stars]

Reinforcement Learning for NLP

  • nlp-gym - NLPGym - A toolkit to develop RL agents to solve NLP tasks [GitHub, 80 stars]

AutoML

  • TPOT - Python Automated Machine Learning tool [GitHub, 7835 stars]
  • Auto-PyTorch - Automatic architecture search and hyperparameter optimization for PyTorch [GitHub, 1154 stars]
  • HungaBunga - Brute-Force all sklearn models with all parameters using .fit .predict [GitHub, 610 stars]
  • AutoML Natural Language - Google's paid AutoML NLP service

License CC0

Attributions

Resources

  • All linked resources belong to original authors

Icons

Fonts

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].