Top 1910 nlp open source projects

Syfertext
A privacy preserving NLP framework
Open Sesame
A frame-semantic parsing system based on a softmax-margin SegRNN.
Pytorch Acnn Model
code of Relation Classification via Multi-Level Attention CNNs
Ernie
Simple State-of-the-Art BERT-Based Sentence Classification with Keras / TensorFlow 2. Built with HuggingFace's Transformers.
Macbert
Revisiting Pre-trained Models for Chinese Natural Language Processing (Findings of EMNLP)
Jiagu
Jiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类
Unify Emotion Datasets
A Survey and Experiments on Annotated Corpora for Emotion Classification in Text
Rat Sql
A relation-aware semantic parsing model from English to SQL
Textvec
Text vectorization tool to outperform TFIDF for classification tasks
Jiayan
甲言,专注于古代汉语(古汉语/古文/文言文/文言)处理的NLP工具包,支持文言词库构建、分词、词性标注、断句和标点。Jiayan, the NLP toolkit designed for Classical Chinese, supports lexicon construction, tokenizing, POS tagging, sentence segmentation and punctuation.
✭ 167
pythonnlp
Radish
C++ model train&inference framework
Rouge 2.0
ROUGE automatic summarization evaluation toolkit. Support for ROUGE-[N, L, S, SU], stemming and stopwords in different languages, unicode text evaluation, CSV output.
Node Postal
NodeJS bindings to libpostal for fast international address parsing/normalization
Indic Bert
BERT-based Multilingual Model for Indian Languages
Turkish Stemmer Python
🐍 Turkish Language Stemmer for Python
Improved Dynamic Memory Networks Dmn Plus
Theano Implementation of DMN+ (Improved Dynamic Memory Networks) from the paper by Xiong, Merity, & Socher at MetaMind, http://arxiv.org/abs/1603.01417 (Dynamic Memory Networks for Visual and Textual Question Answering)
Fixy
Amacımız Türkçe NLP literatüründeki birçok farklı sorunu bir arada çözebilen, eşsiz yaklaşımlar öne süren ve literatürdeki çalışmaların eksiklerini gideren open source bir yazım destekleyicisi/denetleyicisi oluşturmak. Kullanıcıların yazdıkları metinlerdeki yazım yanlışlarını derin öğrenme yaklaşımıyla çözüp aynı zamanda metinlerde anlamsal analizi de gerçekleştirerek bu bağlamda ortaya çıkan yanlışları da fark edip düzeltebilmek.
Metalearning4nlp Papers
A list of recent papers about Meta / few-shot learning methods applied in NLP areas.
Prosodic
Prosodic: a metrical-phonological parser, written in Python. For English and Finnish, with flexible language support.
Chatspace
핑퐁에서 만든 채팅체랑 잘 맞는 띄어쓰기 모델!
Negspacy
spaCy pipeline object for negating concepts in text
Xk Time
xk-time 是时间转换,时间计算,时间格式化,时间解析,日历,时间cron表达式和时间NLP等的工具,使用Java8,线程安全,简单易用,多达70几种常用日期格式化模板,支持Java8时间类和Date,轻量级,无第三方依赖。
Solrtexttagger
A text tagger based on Lucene / Solr, using FST technology
Textlint
The pluggable natural language linter for text and markdown.
Sru
SRU is a recurrent unit that can run over 10 times faster than cuDNN LSTM, without loss of accuracy tested on many tasks.
Multi rake
Multilingual Rapid Automatic Keyword Extraction (RAKE) for Python
Denspi
Real-Time Open-Domain Question Answering with Dense-Sparse Phrase Index (DenSPI)
Tokenizers
Fast, Consistent Tokenization of Natural Language Text
Lazynlp
Library to scrape and clean web pages to create massive datasets.
Kasaya
A "WYSIWYG" (sort of) scripting language and runtime for browser automation
Udpipe
R package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
Pyate
PYthon Automated Term Extraction
✭ 161
htmlnlpai
Ai Hackathon 2018
"한계를 넘어 상상에 도전하자!" 네이버 AI 해커톤 2018 - 대회종료
Nlp bahasa resources
A Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
Datetimeseer
A painless way to pick future time.
Ruijin round2
瑞金医院MMC人工智能辅助构建知识图谱大赛复赛
Keras Xlnet
Implementation of XLNet that can load pretrained checkpoints
Nlpre
Python library for Natural Language Preprocessing (NLPre)
Vdcnn
Implementation of Very Deep Convolutional Neural Network for Text Classification
Ape
Parser for Attempto Controlled English (ACE)
✭ 156
prolognlpace
Awesome Nlp
📖 A curated list of resources dedicated to Natural Language Processing (NLP)
Pdfanno
Linguistic Annotation and Visualization Tool for PDF Documents
Speech signal processing and classification
Front-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].
Corus
Links to Russian corpora + Python functions for loading and parsing
Neural Paraphrase Generation
Neural Paraphrase Generation
Lemmatization Lists
Machine-readable lists of lemma-token pairs in 23 languages.
✭ 154
nlp
Text2vec
text2vec, chinese text to vetor.(文本向量化表示工具,包括词向量化、句子向量化、句子相似度计算)
A Hierarchical Latent Structure For Variational Conversation Modeling
PyTorch Implementation of "A Hierarchical Latent Structure for Variational Conversation Modeling" (NAACL 2018 Oral)
Natural Language Processing Specialization
This repo contains my coursework, assignments, and Slides for Natural Language Processing Specialization by deeplearning.ai on Coursera