Repositori ini merupakan kumpulan dataset terkait analisis sentimen Berbahasa Indonesia. Apabila Anda menggunakan dataset-dataset yang ada pada repositori ini untuk penelitian, maka cantumkanlah/kutiplah jurnal artikel terkait dataset tersebut. Dataset yang tersedia telah diimplementasikan dalam beberapa penelitian dan hasilnya telah dipublikasi…

Stars: ✭ 38 (-5%)

Mutual labels: datasets

datasets

The primary repository for all of the CORGIS Datasets

Stars: ✭ 19 (-52.5%)

Mutual labels: datasets

airy

💬 Open source conversational platform to power conversations with an open source Live Chat, Messengers like Facebook Messenger, WhatsApp and more - 💎 UI from Inbox to dashboards - 🤖 Integrations to Conversational AI / NLP tools and standard enterprise software - ⚡ APIs, WebSocket, Webhook - 🔧 Create any conversational experience

Stars: ✭ 299 (+647.5%)

Mutual labels: spacy

spacy hunspell

✏️ Hunspell extension for spaCy 2.0.

Stars: ✭ 94 (+135%)

Mutual labels: spacy

multi-task-defocus-deblurring-dual-pixel-nimat

Reference github repository for the paper "Improving Single-Image Defocus Deblurring: How Dual-Pixel Images Help Through Multi-Task Learning". We propose a single-image deblurring network that incorporates the two sub-aperture views into a multitask framework. Specifically, we show that jointly learning to predict the two DP views from a single …

Stars: ✭ 29 (-27.5%)

Mutual labels: datasets

DaCy

DaCy: The State of the Art Danish NLP pipeline using SpaCy

Stars: ✭ 66 (+65%)

Mutual labels: spacy

alter-nlu

Natural language understanding library for chatbots with intent recognition and entity extraction.

Stars: ✭ 45 (+12.5%)

Mutual labels: spacy

agile

🌌 Global State and Logic Library for JavaScript/Typescript applications

Stars: ✭ 90 (+125%)

Mutual labels: spacy

bert-tensorflow-pytorch-spacy-conversion

Instructions for how to convert a BERT Tensorflow model to work with HuggingFace's pytorch-transformers, and spaCy. This walk-through uses DeepPavlov's RuBERT as example.

Stars: ✭ 26 (-35%)

Mutual labels: spacy

dataset

dataset is a command line tool, Go package, shared library and Python package for working with JSON objects as collections

Stars: ✭ 21 (-47.5%)

Mutual labels: datasets

View All Similar Projects ➔

Machine learning dataset loaders for testing and examples

Loaders for various machine learning datasets for testing and example scripts. Previously in thinc.extra.datasets.

Setup and installation

The package can be installed via pip:

pip install ml-datasets

Loaders

Loaders can be imported directly or used via their string name (which is useful if they're set via command line arguments). Some loaders may take arguments – see the source for details.

# Import directly
from ml_datasets import imdb
train_data, dev_data = imdb()

# Load via registry
from ml_datasets import loaders
imdb_loader = loaders.get("imdb")
train_data, dev_data = imdb_loader()

Available loaders

NLP datasets

ID / Function	Description	NLP task	From URL
`imdb`	IMDB sentiment dataset	Binary classification: sentiment analysis	✓
`dbpedia`	DBPedia ontology dataset	Multi-class single-label classification	✓
`cmu`	CMU movie genres dataset	Multi-class, multi-label classification	✓
`quora_questions`	Duplicate Quora questions dataset	Detecting duplicate questions	✓
`reuters`	Reuters dataset (texts not included)	Multi-class multi-label classification	✓
`snli`	Stanford Natural Language Inference corpus	Recognizing textual entailment	✓
`stack_exchange`	Stack Exchange dataset	Question Answering
`ud_ancora_pos_tags`	Universal Dependencies Spanish AnCora corpus	POS tagging	✓
`ud_ewtb_pos_tags`	Universal Dependencies English EWT corpus	POS tagging	✓
`wikiner`	WikiNER data	Named entity recognition

Other ML datasets

ID / Function	Description	ML task	From URL
`mnist`	MNIST data	Image recognition	✓

Dataset details

IMDB

Each instance contains the text of a movie review, and a sentiment expressed as 0 or 1.

train_data, dev_data = ml_datasets.imdb()
for text, annot in train_data[0:5]:
    print(f"Review: {text}")
    print(f"Sentiment: {annot}")

Download URL: http://ai.stanford.edu/~amaas/data/sentiment/
Citation: Andrew L. Maas et al., 2011

Property	Training	Dev
# Instances	25000	25000
Label values	{`0`, `1`}	{`0`, `1`}
Labels per instance	Single	Single
Label distribution	Balanced (50/50)	Balanced (50/50)

DBPedia

Each instance contains an ontological description, and a classification into one of the 14 distinct labels.

train_data, dev_data = ml_datasets.dbpedia()
for text, annot in train_data[0:5]:
    print(f"Text: {text}")
    print(f"Category: {annot}")

Download URL: Via fast.ai
Original citation: Xiang Zhang et al., 2015

Property	Training	Dev
# Instances	560000	70000
Label values	`1`-`14`	`1`-`14`
Labels per instance	Single	Single
Label distribution	Balanced	Balanced

CMU

Each instance contains a movie description, and a classification into a list of appropriate genres.

train_data, dev_data = ml_datasets.cmu()
for text, annot in train_data[0:5]:
    print(f"Text: {text}")
    print(f"Genres: {annot}")

Download URL: http://www.cs.cmu.edu/~ark/personas/
Original citation: David Bamman et al., 2013

Property	Training	Dev
# Instances	41793	0
Label values	363 different genres	-
Labels per instance	Multiple	-
Label distribution	Imbalanced: 147 labels with less than 20 examples, while `Drama` occurs more than 19000 times	-

Quora

train_data, dev_data = ml_datasets.quora_questions()
for questions, annot in train_data[0:50]:
    q1, q2 = questions
    print(f"Question 1: {q1}")
    print(f"Question 2: {q2}")
    print(f"Similarity: {annot}")

Each instance contains two quora questions, and a label indicating whether or not they are duplicates (0: no, 1: yes). The ground-truth labels contain some amount of noise: they are not guaranteed to be perfect.

Download URL: http://qim.fs.quoracdn.net/quora_duplicate_questions.tsv
Original citation: Kornél Csernai et al., 2017

Property	Training	Dev
# Instances	363859	40429
Label values	{`0`, `1`}	{`0`, `1`}
Labels per instance	Single	Single
Label distribution	Imbalanced: 63% label `0`	Imbalanced: 63% label `0`

Registering loaders

Loaders can be registered externally using the loaders registry as a decorator. For example:

@ml_datasets.loaders("my_custom_loader")
def my_custom_loader():
    return load_some_data()

assert "my_custom_loader" in ml_datasets.loaders

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

explosion / ml-datasets

Programming Languages

Labels

Projects that are alternatives of or similar to ml-datasets

Machine learning dataset loaders for testing and examples

Setup and installation

Loaders

Available loaders

NLP datasets

Other ML datasets

Dataset details

IMDB

DBPedia

CMU

Quora

Registering loaders