Open SesameA frame-semantic parsing system based on a softmax-margin SegRNN.
ErnieSimple State-of-the-Art BERT-Based Sentence Classification with Keras / TensorFlow 2. Built with HuggingFace's Transformers.
MacbertRevisiting Pre-trained Models for Chinese Natural Language Processing (Findings of EMNLP)
JiaguJiagu深度学习自然语言处理工具 知识图谱关系抽取 中文分词 词性标注 命名实体识别 情感分析 新词发现 关键词 文本摘要 文本聚类
Rat SqlA relation-aware semantic parsing model from English to SQL
TextvecText vectorization tool to outperform TFIDF for classification tasks
Jiayan甲言,专注于古代汉语(古汉语/古文/文言文/文言)处理的NLP工具包,支持文言词库构建、分词、词性标注、断句和标点。Jiayan, the NLP toolkit designed for Classical Chinese, supports lexicon construction, tokenizing, POS tagging, sentence segmentation and punctuation.
RadishC++ model train&inference framework
Rouge 2.0ROUGE automatic summarization evaluation toolkit. Support for ROUGE-[N, L, S, SU], stemming and stopwords in different languages, unicode text evaluation, CSV output.
Node PostalNodeJS bindings to libpostal for fast international address parsing/normalization
Indic BertBERT-based Multilingual Model for Indian Languages
Improved Dynamic Memory Networks Dmn PlusTheano Implementation of DMN+ (Improved Dynamic Memory Networks) from the paper by Xiong, Merity, & Socher at MetaMind, http://arxiv.org/abs/1603.01417 (Dynamic Memory Networks for Visual and Textual Question Answering)
FixyAmacımız Türkçe NLP literatüründeki birçok farklı sorunu bir arada çözebilen, eşsiz yaklaşımlar öne süren ve literatürdeki çalışmaların eksiklerini gideren open source bir yazım destekleyicisi/denetleyicisi oluşturmak. Kullanıcıların yazdıkları metinlerdeki yazım yanlışlarını derin öğrenme yaklaşımıyla çözüp aynı zamanda metinlerde anlamsal analizi de gerçekleştirerek bu bağlamda ortaya çıkan yanlışları da fark edip düzeltebilmek.
ProsodicProsodic: a metrical-phonological parser, written in Python. For English and Finnish, with flexible language support.
NegspacyspaCy pipeline object for negating concepts in text
Xk Timexk-time 是时间转换,时间计算,时间格式化,时间解析,日历,时间cron表达式和时间NLP等的工具,使用Java8,线程安全,简单易用,多达70几种常用日期格式化模板,支持Java8时间类和Date,轻量级,无第三方依赖。
SolrtexttaggerA text tagger based on Lucene / Solr, using FST technology
TextlintThe pluggable natural language linter for text and markdown.
SruSRU is a recurrent unit that can run over 10 times faster than cuDNN LSTM, without loss of accuracy tested on many tasks.
Multi rakeMultilingual Rapid Automatic Keyword Extraction (RAKE) for Python
DenspiReal-Time Open-Domain Question Answering with Dense-Sparse Phrase Index (DenSPI)
TokenizersFast, Consistent Tokenization of Natural Language Text
LazynlpLibrary to scrape and clean web pages to create massive datasets.
KasayaA "WYSIWYG" (sort of) scripting language and runtime for browser automation
UnilmLarge-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
UdpipeR package for Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing Based on the UDPipe Natural Language Processing Toolkit
PyatePYthon Automated Term Extraction
Nlp bahasa resourcesA Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
Keras XlnetImplementation of XLNet that can load pretrained checkpoints
Pytorch NlpBasic Utilities for PyTorch Natural Language Processing (NLP)
NlprePython library for Natural Language Preprocessing (NLPre)
VdcnnImplementation of Very Deep Convolutional Neural Network for Text Classification
ApeParser for Attempto Controlled English (ACE)
GensimTopic Modelling for Humans
Awesome Nlp📖 A curated list of resources dedicated to Natural Language Processing (NLP)
Awesome Pytorch ListA comprehensive list of pytorch related content on github,such as different models,implementations,helper libraries,tutorials etc.
SlingSLING - A natural language frame semantics parser
PdfannoLinguistic Annotation and Visualization Tool for PDF Documents
Speech signal processing and classificationFront-end speech processing aims at extracting proper features from short- term segments of a speech utterance, known as frames. It is a pre-requisite step toward any pattern recognition problem employing speech or audio (e.g., music). Here, we are interesting in voice disorder classification. That is, to develop two-class classifiers, which can discriminate between utterances of a subject suffering from say vocal fold paralysis and utterances of a healthy subject.The mathematical modeling of the speech production system in humans suggests that an all-pole system function is justified [1-3]. As a consequence, linear prediction coefficients (LPCs) constitute a first choice for modeling the magnitute of the short-term spectrum of speech. LPC-derived cepstral coefficients are guaranteed to discriminate between the system (e.g., vocal tract) contribution and that of the excitation. Taking into account the characteristics of the human ear, the mel-frequency cepstral coefficients (MFCCs) emerged as descriptive features of the speech spectral envelope. Similarly to MFCCs, the perceptual linear prediction coefficients (PLPs) could also be derived. The aforementioned sort of speaking tradi- tional features will be tested against agnostic-features extracted by convolu- tive neural networks (CNNs) (e.g., auto-encoders) [4]. The pattern recognition step will be based on Gaussian Mixture Model based classifiers,K-nearest neighbor classifiers, Bayes classifiers, as well as Deep Neural Networks. The Massachussets Eye and Ear Infirmary Dataset (MEEI-Dataset) [5] will be exploited. At the application level, a library for feature extraction and classification in Python will be developed. Credible publicly available resources will be 1used toward achieving our goal, such as KALDI. Comparisons will be made against [6-8].
CorusLinks to Russian corpora + Python functions for loading and parsing
Text2vectext2vec, chinese text to vetor.(文本向量化表示工具,包括词向量化、句子向量化、句子相似度计算)