Lingua👄 The most accurate natural language detection library for Java and the JVM, suitable for long and short text alike
MatchzooFacilitating the design, comparison and sharing of deep text matching models.
NndialNNDial is an open source toolkit for building end-to-end trainable task-oriented dialogue models. It is released by Tsung-Hsien (Shawn) Wen from Cambridge Dialogue Systems Group under Apache License 2.0.
Adam qasADAM - A Question Answering System. Inspired from IBM Watson
ChakinSimple downloader for pre-trained word vectors
Displacy💥 displaCy.js: An open-source NLP visualiser for the modern web
Gcn Over Pruned TreesGraph Convolution over Pruned Dependency Trees Improves Relation Extraction (authors' PyTorch implementation)
BiosentvecBioWordVec & BioSentVec: pre-trained embeddings for biomedical words and sentences
TrankitTrankit is a Light-Weight Transformer-based Python Toolkit for Multilingual Natural Language Processing
LtpLanguage Technology Platform
ZhihuThis repo contains the source code in my personal column (https://zhuanlan.zhihu.com/zhaoyeyu), implemented using Python 3.6. Including Natural Language Processing and Computer Vision projects, such as text generation, machine translation, deep convolution GAN and other actual combat code.
Awesome ArabicA curated list of awesome projects and dev/design resources for supporting Arabic computational needs.
NlpruleA fast, low-resource Natural Language Processing and Text Correction library written in Rust.
Nlp101NLP 101: a resource repository for Deep Learning and Natural Language Processing
NlpSelected Machine Learning algorithms for natural language processing and semantic analysis in Golang
PyresparserA simple resume parser used for extracting information from resumes
TextfoolerA Model for Natural Language Attack on Text Classification and Inference
LibpostalA C library for parsing/normalizing street addresses around the world. Powered by statistical NLP and open geo data.
AutogluonAutoGluon: AutoML for Text, Image, and Tabular Data
GectorOfficial implementation of the paper “GECToR – Grammatical Error Correction: Tag, Not Rewrite” // Published on BEA15 Workshop (co-located with ACL 2020) https://www.aclweb.org/anthology/2020.bea-1.16.pdf
NerNamed Entity Recognition
Medacy🏥 Medical Text Mining and Information Extraction with spaCy
Trade DstSource code for transferable dialogue state generator (TRADE, Wu et al., 2019). https://arxiv.org/abs/1905.08743
Text2sql DataA collection of datasets that pair questions with SQL queries.
Textractextract text from any document. no muss. no fuss.
Oie ResourcesA curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
SwemThe Tensorflow code for this ACL 2018 paper: "Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms"
AdaptnlpAn easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.
BluebertBlueBERT, pre-trained on PubMed abstracts and clinical notes (MIMIC-III).
PyswipPySwip is a Python - SWI-Prolog bridge enabling to query SWI-Prolog in your Python programs. It features an (incomplete) SWI-Prolog foreign language interface, a utility class that makes it easy querying with Prolog and also a Pythonic interface.
Nlp tasksNatural Language Processing Tasks and References
Autonlp🤗 AutoNLP: train state-of-the-art natural language processing models and deploy them in a scalable environment automatically
Chatbot nerchatbot_ner: Named Entity Recognition for chatbots.
Olivia💁♀️Your new best friend powered by an artificial neural network
Tacred RelationPyTorch implementation of the position-aware attention model for relation extraction
NlpythonThis repository contains the code related to Natural Language Processing using python scripting language. All the codes are related to my book entitled "Python Natural Language Processing"
Lingua Rs👄 The most accurate natural language detection library in the Rust ecosystem, suitable for long and short text alike
LdaLDA topic modeling for node.js
Bist ParserGraph-based and Transition-based dependency parsers based on BiLSTMs
FakenewscorpusA dataset of millions of news articles scraped from a curated list of data sources.
ArticutapiAPI of Articut 中文斷詞 (兼具語意詞性標記):「斷詞」又稱「分詞」,是中文資訊處理的基礎。Articut 不用機器學習,不需資料模型,只用現代白話中文語法規則,即能達到 SIGHAN 2005 F1-measure 94% 以上,Recall 96% 以上的成績。