hscspring / All4nlp
Projects that are alternatives of or similar to All4nlp
All4NLP
Framework & Toolkit
-
facebookresearch/pytext: A natural language modeling framework based on PyTorch
- deeplearning NLP with PyTorch
- Text classifiers, Sequence taggers, Joint intent-slot model and Contextual intent-slot models
- C++ server example
-
zalandoresearch/flair: A very simple framework for state-of-the-art Natural Language Processing (NLP)
- NER, POS, sense disambiguation and classification
- on top of PyTorch
-
pytorch/fairseq: Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
- Seq2Seq modeling
- on top of PyTorch
-
BrikerMan/Kashgari: Kashgari is a Production-ready NLP Transfer learning framework for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.
- Text labeling, classification, Pre-trained
- on top of Tensorflow
-
asyml/texar: Toolkit for Machine Learning, Natural Language Processing, and Text Generation, in TensorFlow
- NLP Toolkit
- on top of Tensorflow
-
stanfordnlp/stanza: Official Stanford NLP Python Library for Many Human Languages
- on top of Pytorch
- speed, prodcution system use
-
nltk/nltk: NLTK Source
- education and research tool
- learning and exploring NLP concepts
-
sloria/TextBlob: Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.
- on top of NLTK
- fast-prtotyping
- applications don't require highly performance
-
spaCy · Industrial-strength Natural Language Processing in Python
- fast
- streamlined
- production-ready
- chartbeat-labs/textacy: NLP, before and after spaCy
Tokenizer
- OpenNMT/Tokenizer: Fast and customizable text tokenization library with BPE and SentencePiece support
- google/sentencepiece: Unsupervised text tokenizer for Neural Network-based text generation.
- huggingface/tokenizers: 💥Fast State-of-the-Art Tokenizers optimized for Research and Production
Seq2Seq
- OpenNMT - Open-Source Neural Machine Translation
- google-research/text-to-text-transfer-transformer: Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
- tensorflow/tensor2tensor: Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.
Text First
- pytorch/text: Data loaders and abstractions for text and NLP
- tensorflow/text: Making text a first-class citizen in TensorFlow.
- textpipe/textpipe: Textpipe: clean and extract metadata from text
Task & Model
Language Model
-
2020 Chinese-Bert
CLUEbenchmark/CLUEPretrainedModels -
2019 GPT2+Chinese
Morizeyao/GPT2-Chinese: Chinese version of GPT2 training code, using BERT tokenizer. -
2019 Bert-wwm
ymcui/Chinese-BERT-wwm: Pre-Training with Whole Word Masking for Chinese BERT(中文 BERT-wwm 系列模型) -
2019 Toolkit
huggingface/pytorch-transformers: 👾 A library of state-of-the-art pretrained models for Natural Language Processing (NLP) -
2019 MASK
google-research/bert: TensorFlow code and pre-trained models for BERT -
2019 Permutation
zihangdai/xlnet: XLNet: Generalized Autoregressive Pretraining for Language Understanding -
2019 MultiTask
PaddlePaddle/ERNIE: An Implementation of ERNIE For Language Understanding -
2019 Attention
kimiyoung/transformer-xl -
2019 LM
openai/gpt-2: Code for the paper "Language Models are Unsupervised Multitask Learners" -
2018 TwoLMs
ELMo: Deep contextualized word representations -
2018 Co-occurrence
stanfordnlp/GloVe: GloVe model for distributed word representation -
2019
facebookresearch/fastText: Library for fast text representation and classification. -
2019 Word2vec
Embedding/Chinese-Word-Vectors: 100+ Chinese Word Vectors 上百种预训练中文词向量 -
2018
Word2vec Chinese-Word-Vectors -
2018 LSTM
递归神经网络 | TensorFlow -
2013
Google Code Archive - Long-term storage for Google Code Project Hosting.
Text Generation
-
2020 Toolkit
RUCAIBox/TextBox: TextBox is an open-source library for building text generation system. -
2020 Awesome
tokenmill/awesome-nlg: A curated list of resources dedicated to Natural Language Generation (NLG) -
2018 BenchMark
geek-ai/Texygen: A text generation benchmarking platform -
2018 RNN
docs/text_generation.ipynb at master · tensorflow/docs -
2019 Tookit on top of TF
asyml/texar: Toolkit for Text Generation and Beyond
Classification
-
Collection
brightmart/text_classification: all kinds of text classification models and more with deep learning
NLU & IE
-
2019 Framework
RasaHQ/rasa_nlu: 💬 Open source library for natural language understanding and machine learning-based dialogue management. - All things around intent classification, entity extraction and action predictions - DIY NLP and chatbot framwork. -
2018 Chi
crownpku/Rasa_NLU_Chi: Turn Chinese natural language into structured data 中文自然语言理解 -
2019 Toolkit
snipsco/snips-nlu: Snips Python library to extract meaning from text
QA
-
2020 Toolkit
RUCAIBox/CRSLab: CRSLab is an open-source toolkit for building Conversational Recommender System (CRS). -
2018
5hirish/adam_qas: ADAM - A Question Answering System. Inspired from IBM Watson
Similarity
-
2019 Sentence
UKPLab/sentence-transformers: Sentence Embeddings with BERT & XLNet -
2019 Sentence
hanxiao/bert-as-service: Mapping a variable-length sentence to a fixed-length vector using BERT model -
2018 Sentence
explosion/sense2vec: 🦆 Use NLP to go beyond vanilla word2vec -
2019 Sentence
gensim: models.doc2vec – Doc2vec paragraph embeddings -
2014 Sentence
klb3713/sentence2vec: Tools for mapping a sentence with arbitrary length to vector space -
2019 Doc+Sentence+Word
gensim: Topic modelling for humans -
2019 MinHash
ekzhu/datasketch: MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++ -
2019 LevenshteinDistance
ztane/python-Levenshtein: The Levenshtein Python C extension module contains functions for fast computation of Levenshtein distance and string similarity -
2018 Graph
caesar0301/graphsim: Graph similarity algorithms based on NetworkX.
Pinyin
-
2019 Pinyin
mozillazg/python-pinyin: 汉字转拼音 (pypinyin)
Visualization
-
2020 Explain
jalammar/ecco: Visualize and explore NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2). -
2019 Word
JasonKessler/scattertext: Beautiful visualizations of how language differs among document types. -
2019 Bert GPT
jessevig/bertviz: Tool for visualizing attention in the Transformer model (BERT and OpenAI GPT-2) -
2019 MLC
marcotcr/lime: Lime: Explaining the predictions of any machine learning classifier -
2019 Graph Visualization Framework
antvis/G6: ♾ A Graph Visualization Framework in JavaScript -
2017 Neo4j D3
eisman/neo4jd3: Neo4j graph visualization using D3.js -
2019 Neo4j browser
neo4j-contrib/neovis.js: Neo4j + vis.js = neovis.js. Graph visualizations in the browser with data from Neo4j. -
2019 Neo4j 3D
jexp/neo4j-3d-force-graph: Experiments with Neo4j & 3d-force-graph https://github.com/vasturiano/3d-force-graph -
2019 Interactive Graphvizz
magjac/graphviz-visual-editor: A web application for interactive visual editing of Graphviz graphs described in the DOT language. -
2019 graphviz Python
mapio/GraphvizAnim: A tool to create animated graph visualizations, based on graphviz.
Readability
-
2019 Kinds of indexes
shivam5992/textstat: python package to calculate readability statistics of a text object - paragraphs, sentences, articles. -
2019 in Spacy
mholtzscher/spacy_readability: spaCy pipeline component for adding text readability meta data to Doc objects.
Translation
-
2019 XLM
facebookresearch/XLM: PyTorch original implementation of Cross-lingual Language Model Pretraining. -
2018 Microsoft Based on Phrase
Microsoft/NPMT: Towards Neural Phrase-based Machine Translation -
2019 Google Based on Seq2Seq and Attention
tensorflow/nmt: TensorFlow Neural Machine Translation Tutorial -
2019 Google Based on Pure Attention
models/official/transformer at master · tensorflow/models -
2019 Facebook Based on CNN
pytorch/fairseq: Facebook AI Research Sequence-to-Sequence Toolkit written in Python. -
2019 Facebook Based on Unsupervised
facebookresearch/UnsupervisedMT: Phrase-Based & Neural Unsupervised Machine Translation -
2019 DeepL Basedon CNN (Not Open Source)
DeepL Translator DeepL 基于 CNN 的翻译工具 -
2019 OpenNMT
OpenNMT/OpenNMT: Open Source Neural Machine Translation
Style Transfer
Tricks
Dataset
- datasets
- 中文任务基准测评
- 中文预训练语料
- cluebenchmarks.com/dataSet_search.html
- 离线百度百科下载(2012 图文版)
- 百度百科 2012 图文版
- 最全中华古诗词数据库
- Kinds of Resources
- 中文历时语料库
- 中文自然语言处理数据集。
Learn Here
- google/trax: Trax — your path to advanced deep learning
- tensorflow/models
- Transformers
- OpenNMT/OpenNMT-py
- OpenNMT/OpenNMT-tf
- microsoft/nlp-recipes: Natural Language Processing Best Practices & Examples
Leading Track
- Google Research
- Facebook Research
- microsoft/unilm: UniLM - Unified Language Model Pre-training / Pre-training for NLP and Beyond
Research
Experts
- Michael Collins, Michael Collins - Google Scholar Citations ☆
- Jason Eisner - Home Page (JHU), Jason Eisner - Google Scholar Citations ☆
- David Yarowsky, David Yarowsky - Google Scholar Citations
- Dan Jurafsky - Home Page, Dan Jurafsky - Google Scholar Citations ☆
- Christopher Manning, Stanford NLP, Christopher D Manning - Google Scholar Citations ☆
- Dan Klein's Home Page, The Berkeley NLP Group ☆
- Dan Roth - Main Page, Dan Roth - Google Scholar Citations ☆
- ChengXiang Zhai - Home Page, ChengXiang Zhai - Google Scholar Citations
- Eugene Charniak's Home Page, Eugene Charniak - Google Scholar Citations
- Joakim Nivre's Home Page, Joakim Nivre - Google Scholar Citations ☆
- Philipp Koehn, Philipp Koehn - Google Scholar Citations
- James H. Martin, James H. Martin - Google Scholar Citations
- Julia Hirschberg, Julia Hirschberg - Google Scholar Citations
- Fernando Pereira – Google AI, Fernando Pereira - Google Scholar Citations ☆
- ryan mcdonald, Ryan McDonald - Google Scholar Citations
- Slav Petrov - Слав Петров, Slav Petrov - Google Scholar Citations ☆
- Kenneth Church HomePage, Kenneth Ward Church - Google Scholar Citations