Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

ChatGirl is an AI ChatBot based on TensorFlow Seq2Seq Model. ChatGirl 一个基于 TensorFlow Seq2Seq 模型的聊天机器人。（包含预处理过的 twitter 英文数据集，训练，运行，工具代码，来波 Star 。）QQ群：167122861

Stars: ✭ 105 (-2.78%)

Mutual labels: dataset

Cubicasa5k

CubiCasa5k floor plan dataset

Stars: ✭ 98 (-9.26%)

Mutual labels: dataset

Iso 3166 Countries With Regional Codes

ISO 3166-1 country lists merged with their UN Geoscheme regional codes in ready-to-use JSON, XML, CSV data sets

Stars: ✭ 1,372 (+1170.37%)

Mutual labels: dataset

Bond

BOND: BERT-Assisted Open-Domain Name Entity Recognition with Distant Supervision

Stars: ✭ 96 (-11.11%)

Mutual labels: dataset

Deepweeds

A Multiclass Weed Species Image Dataset for Deep Learning

Stars: ✭ 96 (-11.11%)

Mutual labels: dataset

Scientificsummarizationdatasets

Datasets I have created for scientific summarization, and a trained BertSum model

Stars: ✭ 100 (-7.41%)

Mutual labels: dataset

Persian Swear Words

دیتاست کلمات نامناسب و بد فارسی برای فیلتر کردن متن ها

Stars: ✭ 95 (-12.04%)

Mutual labels: dataset

Fma

FMA: A Dataset For Music Analysis

Stars: ✭ 1,391 (+1187.96%)

Mutual labels: dataset

Ml Pyxis

Tool for reading and writing datasets of tensors in a Lightning Memory-Mapped Database (LMDB). Designed to manage machine learning datasets with fast reading speeds.

Stars: ✭ 93 (-13.89%)

Mutual labels: dataset

Objectron

Objectron is a dataset of short, object-centric video clips. In addition, the videos also contain AR session metadata including camera poses, sparse point-clouds and planes. In each video, the camera moves around and above the object and captures it from different views. Each object is annotated with a 3D bounding box. The 3D bounding box describes the object’s position, orientation, and dimensions. The dataset contains about 15K annotated video clips and 4M annotated images in the following categories: bikes, books, bottles, cameras, cereal boxes, chairs, cups, laptops, and shoes

Stars: ✭ 1,352 (+1151.85%)

Mutual labels: dataset

Ua Gec

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

Stars: ✭ 108 (+0%)

Mutual labels: dataset

Faceaging By Cyclegan

Stars: ✭ 105 (-2.78%)

Mutual labels: dataset

Investigating Prior Knowledge for Challenging Chinese Machine Reading Comprehension

Stars: ✭ 101 (-6.48%)

Mutual labels: dataset

View All Similar Projects ➔

RACE Reading Comprehension Task

Code for the paper: RACE: Large-scale ReAding Comprehension Dataset From Examination. Guokun Lai*, Qizhe Xie*, Hanxiao Liu, Yiming Yang and Eduard Hovy. EMNLP 2017

Leaderboard of RACE

Dependencies

Python 2.7
Theano >= 0.7
Lasagne 0.2.dev1

Datasets

RACE: Please submit a data request here. The data will be automatically sent to you. Create a "data" directory alongside "src" directory and download the data.
Word embeddings:
- glove.6B.zip: http://nlp.stanford.edu/data/glove.6B.zip

Usage

Preprocessing

* python preprocess.py

Stanford AR

* test pre-trained model: bash test_SAR.sh
* train: bash train_SAR.sh (The pre-trained model will be replaced)

GA

* test pre-trained model: bash test_GA.sh
* train: bash train_GA.sh (The pre-trained model will be replaced)

Reference

@inproceedings{lai2017large,
  title={RACE: Large-scale ReAding Comprehension Dataset From Examinations},
  author={Lai, Guokun and Xie, Qizhe and Liu, Hanxiao and Yang, Yiming and Hovy, Eduard},
  booktitle={EMNLP},
  year={2017}
}

Acknowledgement

The code is adapted from Stanford AR https://github.com/danqi/rc-cnn-dailymail and GA https://github.com/bdhingra/ga-reader

Contact

Please contact Qizhe Xie (qzxie AT cs DOT cmu DOT edu) if you find bugs or missing info

License

MIT

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 108

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗