Top 1910 nlp open source projects

A privacy preserving NLP framework
Open Sesame
A frame-semantic parsing system based on a softmax-margin SegRNN.
Pytorch Acnn Model
code of Relation Classification via Multi-Level Attention CNNs
Simple State-of-the-Art BERT-Based Sentence Classification with Keras / TensorFlow 2. Built with HuggingFace's Transformers.
Revisiting Pre-trained Models for Chinese Natural Language Processing (Findings of EMNLP)
Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类
Unify Emotion Datasets
A Survey and Experiments on Annotated Corpora for Emotion Classification in Text
Rat Sql
A relation-aware semantic parsing model from English to SQL
Text vectorization tool to outperform TFIDF for classification tasks
甲言,专注于古代汉语(古汉语/古文/文言文/文言)处理的NLP工具包,支持文言词库构建、分词、词性标注、断句和标点。Jiayan, the NLP toolkit designed for Classical Chinese, supports lexicon construction, tokenizing, POS tagging, sentence segmentation and punctuation.
C++ model train&inference framework
Rouge 2.0
ROUGE automatic summarization evaluation toolkit. Support for ROUGE-[N, L, S, SU], stemming and stopwords in different languages, unicode text evaluation, CSV output.
Node Postal
NodeJS bindings to libpostal for fast international address parsing/normalization
Indic Bert
BERT-based Multilingual Model for Indian Languages
Turkish Stemmer Python
🐍 Turkish Language Stemmer for Python
Improved Dynamic Memory Networks Dmn Plus
Theano Implementation of DMN+ (Improved Dynamic Memory Networks) from the paper by Xiong, Merity, & Socher at MetaMind, (Dynamic Memory Networks for Visual and Textual Question Answering)
Amacımız Türkçe NLP literatüründeki birçok farklı sorunu bir arada çözebilen, eşsiz yaklaşımlar öne süren ve literatürdeki çalışmaların eksiklerini gideren open source bir yazım destekleyicisi/denetleyicisi oluşturmak. Kullanıcıların yazdıkları metinlerdeki yazım yanlışlarını derin öğrenme yaklaşımıyla çözüp aynı zamanda metinlerde anlamsal analizi de gerçekleştirerek bu bağlamda ortaya çıkan yanlışları da fark edip düzeltebilmek.
Metalearning4nlp Papers
A list of recent papers about Meta / few-shot learning methods applied in NLP areas.
Prosodic: a metrical-phonological parser, written in Python. For English and Finnish, with flexible language support.
핑퐁에서 만든 채팅체랑 잘 맞는 띄어쓰기 모델!
spaCy pipeline object for negating concepts in text
Xk Time
xk-time 是时间转换,时间计算,时间格式化,时间解析,日历,时间cron表达式和时间NLP等的工具,使用Java8,线程安全,简单易用,多达70几种常用日期格式化模板,支持Java8时间类和Date,轻量级,无第三方依赖。
A text tagger based on Lucene / Solr, using FST technology
The pluggable natural language linter for text and markdown.
SRU is a recurrent unit that can run over 10 times faster than cuDNN LSTM, without loss of accuracy tested on many tasks.
Multi rake
Multilingual Rapid Automatic Keyword Extraction (RAKE) for Python
Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index (DenSPI)
Fast, Consistent Tokenization of Natural Language Text
Library to scrape and clean web pages to create massive datasets.
A "WYSIWYG" (sort of) scripting language and runtime for browser automation
R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
PYthon Automated Term Extraction
Ai Hackathon 2018
"한계를 넘어 상상에 도전하자!" 네이버 AI 해커톤 2018 - 대회종료
Nlp bahasa resources
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
A painless way to pick future time.
Ruijin round2
Keras Xlnet
Implementation of XLNet that can load pretrained checkpoints
Python library for Natural Language Preprocessing (NLPre)
Implementation of Very Deep Convolutional Neural Network for Text Classification
Parser for Attempto Controlled English (ACE)
Awesome Nlp
📖 A curated list of resources dedicated to Natural Language Processing (NLP)
Linguistic Annotation and Visualization Tool for PDF Documents
Speech signal processing and classification
Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].
Links to Russian corpora + Python functions for loading and parsing
Neural Paraphrase Generation
Neural Paraphrase Generation
Lemmatization Lists
Machine-readable lists of lemma-token pairs in 23 languages.
text2vec, chinese text to vetor.(文本向量化表示工具,包括词向量化、句子向量化、句子相似度计算)
A Hierarchical Latent Structure For Variational Conversation Modeling
PyTorch Implementation of "A Hierarchical Latent Structure for Variational Conversation Modeling" (NAACL 2018 Oral)
Natural Language Processing Specialization
This repo contains my coursework, assignments, and Slides for Natural Language Processing Specialization by on Coursera