ProsodyHelsinki Prosody Corpus and A System for Predicting Prosodic Prominence from Text
Stars: ✭ 139 (+152.73%)
Ua GecUA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language
Stars: ✭ 108 (+96.36%)
Nlp bahasa resourcesA Curated List of Dataset and Usable Library Resources for NLP in Bahasa Indonesia
Stars: ✭ 158 (+187.27%)
FakenewscorpusA dataset of millions of news articles scraped from a curated list of data sources.
Stars: ✭ 255 (+363.64%)
Text2sql DataA collection of datasets that pair questions with SQL queries.
Stars: ✭ 287 (+421.82%)
Char Rnn TensorflowMulti-layer Recurrent Neural Networks for character-level language models implements by TensorFlow
Stars: ✭ 58 (+5.45%)
Dataset Listlists of text corpus and more (mainly Japanese)
Stars: ✭ 84 (+52.73%)
Dialog corpus用于训练中英文对话系统的语料库 Datasets for Training Chatbot System
Stars: ✭ 1,662 (+2921.82%)
Mams For AbsaA Multi-Aspect Multi-Sentiment Dataset for aspect-based sentiment analysis.
Stars: ✭ 135 (+145.45%)
Pytreebank😡😇 Stanford Sentiment Treebank loader in Python
Stars: ✭ 93 (+69.09%)
Ja.text8Japanese text8 corpus for word embedding.
Stars: ✭ 79 (+43.64%)
Efaqa Corpus Zh❤️Emotional First Aid Dataset, 心理咨询问答、聊天机器人语料库
Stars: ✭ 170 (+209.09%)
BondBOND: BERT-Assisted Open-Domain Name Entity Recognition with Distant Supervision
Stars: ✭ 96 (+74.55%)
NlvrCornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.
Stars: ✭ 192 (+249.09%)
Pytorch NlpBasic Utilities for PyTorch Natural Language Processing (NLP)
Stars: ✭ 1,996 (+3529.09%)
Pandas DatareaderExtract data from a wide range of Internet sources into a pandas DataFrame.
Stars: ✭ 2,183 (+3869.09%)
ChazutsuThe tool to make NLP datasets ready to use
Stars: ✭ 238 (+332.73%)
Chinese Names Corpus中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。
Stars: ✭ 3,053 (+5450.91%)
Oie ResourcesA curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.
Stars: ✭ 283 (+414.55%)
Awesome Persian Nlp IrCurated List of Persian Natural Language Processing and Information Retrieval Tools and Resources
Stars: ✭ 460 (+736.36%)
MtntCode for the collection and analysis of the MTNT dataset
Stars: ✭ 48 (-12.73%)
QuantedaAn R package for the Quantitative Analysis of Textual Data
Stars: ✭ 647 (+1076.36%)
Clue中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Stars: ✭ 2,425 (+4309.09%)
Nlp chinese corpus大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
Stars: ✭ 6,656 (+12001.82%)
WikisqlA large annotated semantic parsing corpus for developing natural language interfaces.
Stars: ✭ 965 (+1654.55%)
Awesome Financial NlpResearches for Natural Language Processing for Financial Domain
Stars: ✭ 220 (+300%)
DoccanoOpen source annotation tool for machine learning practitioners.
Stars: ✭ 5,600 (+10081.82%)
Typing AssistantTyping Assistant provides the ability to autocomplete words and suggests predictions for the next word. This makes typing faster, more intelligent and reduces effort.
Stars: ✭ 32 (-41.82%)
Market ReporterAutomatic Generation of Brief Summaries of Time-Series Data
Stars: ✭ 54 (-1.82%)
Codar✅ CODAR is a Framework built using PyTorch to analyze post (Text+Media) and predict Cyber Bullying and offensive content. 💬📷
Stars: ✭ 52 (-5.45%)
Cdqa Annotator⛔ [NOT MAINTAINED] A web-based annotator for closed-domain question answering datasets with SQuAD format.
Stars: ✭ 48 (-12.73%)
MultidigitmnistCombine multiple MNIST digits to create datasets with 100/1000 classes for few-shot learning/meta-learning
Stars: ✭ 48 (-12.73%)
Nltk Book ResourceNotes and solutions to complement the official NLTK book
Stars: ✭ 54 (-1.82%)
Covid 19Novel Coronavirus 2019 time series data on cases
Stars: ✭ 1,060 (+1827.27%)
Spacy Lookups Data📂 Additional lookup tables and data resources for spaCy
Stars: ✭ 48 (-12.73%)
PaymintThe Paymint Wallet is a secure and user friendly Bitcoin wallet
Stars: ✭ 48 (-12.73%)
Iob2corpusJapanese IOB2 tagged corpus for Named Entity Recognition.
Stars: ✭ 51 (-7.27%)
StocksightStock market analyzer and predictor using Elasticsearch, Twitter, News headlines and Python natural language processing and sentiment analysis
Stars: ✭ 1,037 (+1785.45%)
GreynirThe greynir.is natural language processing website for Icelandic
Stars: ✭ 47 (-14.55%)
ScdvText classification with Sparse Composite Document Vectors.
Stars: ✭ 54 (-1.82%)
ThotThot toolkit for statistical machine translation
Stars: ✭ 53 (-3.64%)
Images Web CrawlerThis package is a complete tool for creating a large dataset of images (specially designed -but not only- for machine learning enthusiasts). It can crawl the web, download images, rename / resize / covert the images and merge folders..
Stars: ✭ 51 (-7.27%)
PujanggaPujangga - Indonesian Natural Language Processing Tool with REST API, an Interface for InaNLP and Deeplearning4j's Word2Vec
Stars: ✭ 47 (-14.55%)
ExemplarAn open relation extraction system
Stars: ✭ 46 (-16.36%)
MsgarchMSGARCH R Package
Stars: ✭ 51 (-7.27%)
Py NltoolsA collection of basic python modules for spoken natural language processing
Stars: ✭ 46 (-16.36%)
Finance.jsA JavaScript library for common financial calculations
Stars: ✭ 1,070 (+1845.45%)
Lingua FrancaMycroft's multilingual text parsing and formatting library
Stars: ✭ 51 (-7.27%)