Cornell NLVR and NLVR2 are natural language grounding datasets. Each example shows a visual input and a sentence describing it, and is annotated with the truth-value of the sentence.

Stars: ✭ 192 (+249.09%)

Mutual labels: corpus, natural-language-processing

Pytorch Nlp

Basic Utilities for PyTorch Natural Language Processing (NLP)

Stars: ✭ 1,996 (+3529.09%)

Mutual labels: dataset, natural-language-processing

Pandas Datareader

Extract data from a wide range of Internet sources into a pandas DataFrame.

Stars: ✭ 2,183 (+3869.09%)

Mutual labels: dataset, finance

Chazutsu

The tool to make NLP datasets ready to use

Stars: ✭ 238 (+332.73%)

Mutual labels: dataset, natural-language-processing

Medical-Names-Corpus

医疗语料库。医疗机构名语料库。药品本位码。

Stars: ✭ 26 (-52.73%)

Mutual labels: corpus, dataset

Chinese Names Corpus

中文人名语料库。人名生成器。中文姓名,姓氏,名字,称呼,日本人名,翻译人名,英文人名。可用于中文分词、人名实体识别。

Stars: ✭ 3,053 (+5450.91%)

Mutual labels: dataset, corpus

Oie Resources

A curated list of Open Information Extraction (OIE) resources: papers, code, data, etc.

Stars: ✭ 283 (+414.55%)

Mutual labels: dataset, natural-language-processing

Awesome Persian Nlp Ir

Curated List of Persian Natural Language Processing and Information Retrieval Tools and Resources

Stars: ✭ 460 (+736.36%)

Mutual labels: corpus, natural-language-processing

Mtnt

Code for the collection and analysis of the MTNT dataset

Stars: ✭ 48 (-12.73%)

Mutual labels: dataset, natural-language-processing

Quanteda

An R package for the Quantitative Analysis of Textual Data

Stars: ✭ 647 (+1076.36%)

Mutual labels: corpus, natural-language-processing

Clue

中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard

Stars: ✭ 2,425 (+4309.09%)

Mutual labels: dataset, corpus

Nlp chinese corpus

大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP

Stars: ✭ 6,656 (+12001.82%)

Mutual labels: dataset, corpus

Company Names Corpus

公司名语料库。机构名语料库。公司简称,缩写,品牌词,企业名。可用于中文分词、机构名实体识别。

Stars: ✭ 868 (+1478.18%)

Mutual labels: dataset, corpus

Wikisql

A large annotated semantic parsing corpus for developing natural language interfaces.

Stars: ✭ 965 (+1654.55%)

Mutual labels: dataset, natural-language-processing

Awesome Financial Nlp

Researches for Natural Language Processing for Financial Domain

Stars: ✭ 220 (+300%)

Mutual labels: finance, natural-language-processing

Doccano

Open source annotation tool for machine learning practitioners.

Stars: ✭ 5,600 (+10081.82%)

Mutual labels: dataset, natural-language-processing

Gossiping Chinese Corpus

PTT 八卦版問答中文語料

Stars: ✭ 137 (+149.09%)

Mutual labels: dataset, corpus

Weixin public corpus

微信公众号语料库

Stars: ✭ 465 (+745.45%)

Mutual labels: corpus, natural-language-processing

Typing Assistant

Typing Assistant provides the ability to autocomplete words and suggests predictions for the next word. This makes typing faster, more intelligent and reduces effort.

Stars: ✭ 32 (-41.82%)

Mutual labels: corpus, natural-language-processing

Market Reporter

Automatic Generation of Brief Summaries of Time-Series Data

Stars: ✭ 54 (-1.82%)

Mutual labels: finance, natural-language-processing

Codar

✅ CODAR is a Framework built using PyTorch to analyze post (Text+Media) and predict Cyber Bullying and offensive content. 💬📷

Stars: ✭ 52 (-5.45%)

Mutual labels: dataset

Cdqa Annotator

⛔ [NOT MAINTAINED] A web-based annotator for closed-domain question answering datasets with SQuAD format.

Stars: ✭ 48 (-12.73%)

Mutual labels: natural-language-processing

Multidigitmnist

Combine multiple MNIST digits to create datasets with 100/1000 classes for few-shot learning/meta-learning

Stars: ✭ 48 (-12.73%)

Mutual labels: dataset

Nltk Book Resource

Notes and solutions to complement the official NLTK book

Stars: ✭ 54 (-1.82%)

Mutual labels: natural-language-processing

Covid 19

Novel Coronavirus 2019 time series data on cases

Stars: ✭ 1,060 (+1827.27%)

Mutual labels: dataset

Spacy Lookups Data

📂 Additional lookup tables and data resources for spaCy

Stars: ✭ 48 (-12.73%)

Mutual labels: natural-language-processing

Paymint

The Paymint Wallet is a secure and user friendly Bitcoin wallet

Stars: ✭ 48 (-12.73%)

Mutual labels: finance

Iob2corpus

Japanese IOB2 tagged corpus for Named Entity Recognition.

Stars: ✭ 51 (-7.27%)

Mutual labels: natural-language-processing

Stocksight

Stock market analyzer and predictor using Elasticsearch, Twitter, News headlines and Python natural language processing and sentiment analysis

Stars: ✭ 1,037 (+1785.45%)

Mutual labels: natural-language-processing

Greynir

The greynir.is natural language processing website for Icelandic

Stars: ✭ 47 (-14.55%)

Mutual labels: natural-language-processing

Scdv

Text classification with Sparse Composite Document Vectors.

Stars: ✭ 54 (-1.82%)

Mutual labels: natural-language-processing

Thot

Thot toolkit for statistical machine translation

Stars: ✭ 53 (-3.64%)

Mutual labels: natural-language-processing

Images Web Crawler

This package is a complete tool for creating a large dataset of images (specially designed -but not only- for machine learning enthusiasts). It can crawl the web, download images, rename / resize / covert the images and merge folders..

Stars: ✭ 51 (-7.27%)

Mutual labels: dataset

Pujangga

Pujangga - Indonesian Natural Language Processing Tool with REST API, an Interface for InaNLP and Deeplearning4j's Word2Vec

Stars: ✭ 47 (-14.55%)

Mutual labels: natural-language-processing

Exemplar

An open relation extraction system

Stars: ✭ 46 (-16.36%)

Mutual labels: natural-language-processing

Msgarch

MSGARCH R Package