NlpaugData augmentation for NLP
ElectraELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
Vmf vae nlpCode for EMNLP18 paper "Spherical Latent Spaces for Stable Variational Autoencoders"
Ie Survey北航大数据高精尖中心张日崇研究团队对信息抽取领域的调研。包括实体识别,关系抽取,属性抽取等子任务,每类子任务分别对学术界和工业界进行调研。
Segmentit任何 JS 环境可用的中文分词包,fork from leizongmin/node-segment
Parselawdocuments对收集的法律文档进行一系列分析,包括根据规范自动切分、案件相似度计算、案件聚类、法律条文推荐等(试验目前基于婚姻类案件,可扩展至其它领域)。
Kaggle Crowdflower1st Place Solution for CrowdFlower Product Search Results Relevance Competition on Kaggle.
MnemonicreaderA PyTorch implementation of Mnemonic Reader for the Machine Comprehension task
DeeplearningfornlpinpytorchAn IPython Notebook tutorial on deep learning for natural language processing, including structure prediction.
MedcatMedical Concept Annotation Tool
FinbertBERT for Finance : UC Berkeley MIDS w266 Final Project
JprocessingJapanese Natural Langauge Processing Libraries
Rasa💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
TmtoolkitText Mining and Topic Modeling Toolkit for Python with parallel processing power
Question AnsweringTensorFlow implementation of Match-LSTM and Answer pointer for the popular SQuAD dataset.
UdaUnsupervised Data Augmentation (UDA)
SnowballImplementation with some extensions of the paper "Snowball: Extracting Relations from Large Plain-Text Collections" (Agichtein and Gravano, 2000)
Awesome Bertbert nlp papers, applications and github resources, including the newst xlnet , BERT、XLNet 相关论文和 github 项目
Nlp estimator tutorialEducational material on using the TensorFlow Estimator framework for text classification
PrenlpPreprocessing Library for Natural Language Processing
Konoha🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
Kaggle Quora DupSolution to Kaggle's Quora Duplicate Question Detection Competition
Id Cnn CwsSource codes and corpora of paper "Iterated Dilated Convolutions for Chinese Word Segmentation"
Abstractive SummarizationImplementation of abstractive summarization using LSTM in the encoder-decoder architecture with local attention.
Ml ProjectsML based projects such as Spam Classification, Time Series Analysis, Text Classification using Random Forest, Deep Learning, Bayesian, Xgboost in Python
BnlpBNLP is a natural language processing toolkit for Bengali Language.
RdrpostaggerA fast and accurate POS and morphological tagging toolkit (EACL 2014)
Neuro🔮 Neuro.js is machine learning library for building AI assistants and chat-bots (WIP).
Fusionnet NliAn example for applying FusionNet to Natural Language Inference
Hash EmbeddingsPyTorch implementation of Hash Embeddings (NIPS 2017). Submission to the NIPS Implementation Challenge.
MatildaLIDA: Lightweight Interactive Dialogue Annotator (in EMNLP 2019)
EchoPython package containing all custom layers used in Neural Networks (Compatible with PyTorch, TensorFlow and MegEngine)
FugashiA Cython MeCab wrapper for fast, pythonic Japanese tokenization and morphological analysis.
Camel toolsA suite of Arabic natural language processing tools developed by the CAMeL Lab at New York University Abu Dhabi.
CutletJapanese to romaji converter in Python
Spacy Js🎀 JavaScript API for spaCy with Python REST API
SyntokText tokenization and sentence segmentation (segtok v2)
StogAMR Parsing as Sequence-to-Graph Transduction
Files2rougeCalculating ROUGE score between two files (line-by-line)