Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

IdenProf dataset is a collection of images of identifiable professionals. It is been collected to enable the development of AI systems that can serve by identifying people and the nature of their job by simply looking at an image, just like humans can do.

Stars: ✭ 149 (-11.83%)

Mutual labels: datasets

Bird Recognition Review

A list of useful resources in the bird sound (song and calls) recognition, such as datasets, papers, links to open source projects and competitions

Stars: ✭ 116 (-31.36%)

Mutual labels: datasets

Machine Learning Models

Decision Trees, Random Forest, Dynamic Time Warping, Naive Bayes, KNN, Linear Regression, Logistic Regression, Mixture Of Gaussian, Neural Network, PCA, SVD, Gaussian Naive Bayes, Fitting Data to Gaussian, K-Means

Stars: ✭ 160 (-5.33%)

Mutual labels: logistic-regression

Pix2code

pix2code: Generating Code from a Graphical User Interface Screenshot

Stars: ✭ 11,349 (+6615.38%)

Mutual labels: datasets

Awesome Nlp Polish

A curated list of resources dedicated to Natural Language Processing (NLP) in polish. Models, tools, datasets.

Stars: ✭ 153 (-9.47%)

Mutual labels: datasets

Mylearn

machine learning algorithm

Stars: ✭ 125 (-26.04%)

Mutual labels: logistic-regression

Remo Python

🐰 Python lib for remo - the app for annotations and images management in Computer Vision

Stars: ✭ 138 (-18.34%)

Mutual labels: datasets

The Python Workshop

A New, Interactive Approach to Learning Python

Stars: ✭ 150 (-11.24%)

Mutual labels: logistic-regression

Pipedream

Connect APIs, remarkably fast. Free for developers.

Stars: ✭ 2,068 (+1123.67%)

Mutual labels: datasets

Corus

Links to Russian corpora + Python functions for loading and parsing

Stars: ✭ 154 (-8.88%)

Mutual labels: datasets

Ml Fraud Detection

Credit card fraud detection through logistic regression, k-means, and deep learning.

Stars: ✭ 117 (-30.77%)

Mutual labels: logistic-regression

Pins

Pin, Discover and Share Resources

Stars: ✭ 149 (-11.83%)

Mutual labels: datasets

Machine learning

Estudo e implementação dos principais algoritmos de Machine Learning em Jupyter Notebooks.

Stars: ✭ 161 (-4.73%)

Mutual labels: logistic-regression

Amazon Product Recommender System

Sentiment analysis on Amazon Review Dataset available at http://snap.stanford.edu/data/web-Amazon.html

Stars: ✭ 158 (-6.51%)

Mutual labels: logistic-regression

Robotcar Dataset Sdk

Software Development Kit for the Oxford Robotcar Dataset

Stars: ✭ 151 (-10.65%)

Mutual labels: datasets

View All Similar Projects ➔

Requirements:

System packages

Python 3.6+
git

Installing Python dependencies

pip3 install requests sh click
pip3 install regex docopt numpy sklearn scipy, if you want to use classify_xvsy_logreg.py
git clone [email protected]:sarnthil/unify-emotion-datasets.git

This will create a new folder called unify-emotion-datasets.

Running the two scripts

First run the script that downloads all obtainable datasets:

cd unify-emotion-datasets # go inside the repository
python3 download_datasets.py

Please read carefully the instructions, you will be asked to read and confirm having read the licenses and terms of use of each dataset. In case the dataset is not obtainable directly you will be given instructions on how to obtain the dataset.

Then run the script that unifies the downloaded datasets, which will be located in unify-emotion-datasets/datasets/:

python3 create_unified_dataset.py

This will create a new file called unified-dataset.jsonl in the same folder.

Also, we advise you to cite the papers corresponding to the datasets you use. The corresponding bibtex citations you find in the file datasets/README.md or while running download_datasets.py.

Paper/Reference

An Analysis of Annotated Corpora for Emotion Classification in Text

If you plan to use this corpus, please use this citation:

@inproceedings{Bostan2018,
  author = {Bostan, Laura Ana Maria and Klinger, Roman},
  title = {An Analysis of Annotated Corpora for Emotion Classification in Text},
  booktitle = {Proceedings of the 27th International Conference on Computational Linguistics},
  year = {2018},
  publisher = {Association for Computational Linguistics},
  pages = {2104--2119},
  location = {Santa Fe, New Mexico, USA},
  url = {http://aclweb.org/anthology/C18-1179},
  pdf = {http://aclweb.org/anthology/C18-1179.pdf}
}

Experimenting with classification

If you want to reuse the code for the emotion classification task, see the script classify_xvsy_logreg.py:

python3 classify_xvsy_logreg.py --help will show you the following:

Classify using MaxEnt algorithm

Usage:
    classify_xvsy_logreg.py [options] <first> <second>
    classify_xvsy_logreg.py [options] --all-vs <second>

Options:
    -j --json=<JSONFILE>  Filename of the json file [default: ../unified.jsonl]
    -a --all-vs<=dataset> Dataset name of the testing data
    -d --debug            Use a small word list and a fast classifier
    -o --output=<OUTPUT>  Output folder [default: .]
    -m --force-multi      Force using multi-label classification
    -k --keep-last        Quit immediately if results file found

For example if you want to train on TEC and test on SSEC do the following:

python3 classify_xvsy_logreg.py -d tec emoint

The names of the dataset are the ones used in the file unified-dataset.jsonl in the field source.

Tip

Use jq for an easy interaction with the unified-dataset.jsonl

Examples of how to use it for various tasks:

selecting the instances of that have as a source crowdflower or tec jq 'select(.source=="crowdflower" or .source =="tec")' <unified-dataset.jsonl | less
count how often instances are annotated with high surprise per dataset jq 'select(.emotions.surprise >0.5) | .source' <unified-dataset.jsonl | sort | uniq -c

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 169

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (4) 🔗