Ua GecUA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
Stars: ✭ 108 (-22.3%)
CoarijCorpus of Annual Reports in Japan
Stars: ✭ 55 (-60.43%)
Nlp bahasa resourcesA Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
Stars: ✭ 158 (+13.67%)
FakenewscorpusA dataset of millions of news articles scraped from a curated list of data sources.
Stars: ✭ 255 (+83.45%)
Dataset Listlists of text corpus and more (mainly Japanese)
Stars: ✭ 84 (-39.57%)
Pytorch NlpBasic Utilities for PyTorch Natural Language Processing (NLP)
Stars: ✭ 1,996 (+1335.97%)
Text2sql DataA collection of datasets that pair questions with SQL queries.
Stars: ✭ 287 (+106.47%)
GectorOfficial implementation of the paper “GECToR – Grammatical Error Correction: Tag, Not Rewrite” // Published on BEA15 Workshop (co-located with ACL 2020) https://www.aclweb.org/anthology/2020.bea-1.16.pdf
Stars: ✭ 287 (+106.47%)
DoccanoOpen source annotation tool for machine learning practitioners.
Stars: ✭ 5,600 (+3928.78%)
NcrfppNCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.
Stars: ✭ 1,767 (+1171.22%)
BondBOND: BERT-Assisted Open-Domain Name Entity Recognition with Distant Supervision
Stars: ✭ 96 (-30.94%)
Pytreebank😡😇 Stanford Sentiment Treebank loader in Python
Stars: ✭ 93 (-33.09%)
Mams For AbsaA Multi-Aspect Multi-Sentiment Dataset for aspect-based sentiment analysis.
Stars: ✭ 135 (-2.88%)
Efaqa Corpus Zh❤️Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库
Stars: ✭ 170 (+22.3%)
Oie ResourcesA curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Stars: ✭ 283 (+103.6%)
PororoPORORO: Platform Of neuRal mOdels for natuRal language prOcessing
Stars: ✭ 812 (+484.17%)
Dialog corpus用于训练中英文对话系统的语料库 Datasets for Training Chatbot System
Stars: ✭ 1,662 (+1095.68%)
Jsut LabHTS-style full-context labels for JSUT v1.1
Stars: ✭ 28 (-79.86%)
MtntCode for the collection and analysis of the MTNT dataset
Stars: ✭ 48 (-65.47%)
Char Rnn TensorflowMulti-layer Recurrent Neural Networks for character-level language models implements by TensorFlow
Stars: ✭ 58 (-58.27%)
Typing AssistantTyping Assistant provides the ability to autocomplete words and suggests predictions for the next word. This makes typing faster, more intelligent and reduces effort.
Stars: ✭ 32 (-76.98%)
Clue中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Stars: ✭ 2,425 (+1644.6%)
NlvrCornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.
Stars: ✭ 192 (+38.13%)
WikisqlA large annotated semantic parsing corpus for developing natural language interfaces.
Stars: ✭ 965 (+594.24%)
NeuronblocksNLP DNN Toolkit - Building Your NLP DNN Models Like Playing Lego
Stars: ✭ 1,356 (+875.54%)
Chinese Names Corpus中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。
Stars: ✭ 3,053 (+2096.4%)
ChazutsuThe tool to make NLP datasets ready to use
Stars: ✭ 238 (+71.22%)
AnagoBidirectional LSTM-CRF and ELMo for Named-Entity Recognition, Part-of-Speech Tagging and so on.
Stars: ✭ 1,392 (+901.44%)
Spokestack PythonSpokestack is a library that allows a user to easily incorporate a voice interface into any Python application.
Stars: ✭ 103 (-25.9%)
Awesome Persian Nlp IrCurated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
Stars: ✭ 460 (+230.94%)
Neuronlp2Deep neural models for core NLP tasks (Pytorch version)
Stars: ✭ 397 (+185.61%)
Nlp chinese corpus大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+4688.49%)
Cluener2020CLUENER2020 中文细粒度命名实体识别 Fine Grained Named Entity Recognition
Stars: ✭ 689 (+395.68%)
QuantedaAn R package for the Quantitative Analysis of Textual Data
Stars: ✭ 647 (+365.47%)
SeqevalA Python framework for sequence labeling evaluation(named-entity recognition, pos tagging, etc...)
Stars: ✭ 508 (+265.47%)
Ja.text8Japanese text8 corpus for word embedding.
Stars: ✭ 79 (-43.17%)
FlairA very simple framework for state-of-the-art Natural Language Processing (NLP)
Stars: ✭ 11,065 (+7860.43%)
PrenlpPreprocessing Library for Natural Language Processing
Stars: ✭ 130 (-6.47%)
Sluice NetworksCode for Sluice networks: Learning what to share between loosely related tasks
Stars: ✭ 135 (-2.88%)
TextacyNLP, before and after spaCy
Stars: ✭ 1,849 (+1230.22%)
Legacy straightA vocoder framework which had been widely used in research community since 1999.
Stars: ✭ 130 (-6.47%)
Datasets🎁 3,000,000+ Unsplash images made available for research and machine learning
Stars: ✭ 1,805 (+1198.56%)
Rasa💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
Stars: ✭ 13,219 (+9410.07%)
Tvqa[EMNLP 2018] PyTorch code for TVQA: Localized, Compositional Video Question Answering
Stars: ✭ 130 (-6.47%)
Konoha🌿 An easy-to-use Japanese Text Processing tool, which makes it possible to switch tokenizers with small changes of code.
Stars: ✭ 130 (-6.47%)
Chars2vecCharacter-based word embeddings model based on RNN for handling real world texts
Stars: ✭ 130 (-6.47%)
Hpatches BenchmarkPython & Matlab code for local feature descriptor evaluation with the HPatches dataset.
Stars: ✭ 129 (-7.19%)
Kaggle Crowdflower1st Place Solution for CrowdFlower Product Search Results Relevance Competition on Kaggle.
Stars: ✭ 1,708 (+1128.78%)