All Projects → iterative → aita_dataset

iterative / aita_dataset

Licence: other
AITA dataset based on r/AmItheAsshole/

Programming Languages

python
139335 projects - #7 most used programming language

Labels

Projects that are alternatives of or similar to aita dataset

play-scala-chatroom-example
Play chatroom with Scala API
Stars: ✭ 43 (+59.26%)
Mutual labels:  example
scenic asteroids
A toy Asteroids clone written in Elixir with the Scenic UI library
Stars: ✭ 42 (+55.56%)
Mutual labels:  example
play-java-ebean-example
Example Play application showing Java with Ebean
Stars: ✭ 54 (+100%)
Mutual labels:  example
ksonnet-cheat-sheet
No description or website provided.
Stars: ✭ 18 (-33.33%)
Mutual labels:  example
SeatLayout
A seat selection library for Android with an example for selecting seats for flights, sports venue, theatres, etc
Stars: ✭ 30 (+11.11%)
Mutual labels:  example
learning-python
notes and codes while learning python
Stars: ✭ 71 (+162.96%)
Mutual labels:  example
Quarto
A working example of the Quarto board game using Elm and Netlify. An exploration of game development, OSS, and functional programming.
Stars: ✭ 15 (-44.44%)
Mutual labels:  example
reinforcement learning financial trading
MATLAB example on how to use Reinforcement Learning for developing a financial trading model
Stars: ✭ 94 (+248.15%)
Mutual labels:  example
rest-api-endpoints
🌾 WordPress REST API endpoints
Stars: ✭ 31 (+14.81%)
Mutual labels:  example
haxe
Qt binding for Haxe | Showcase example for https://github.com/therecipe/qt
Stars: ✭ 21 (-22.22%)
Mutual labels:  example
widgets playground
Showcase example for https://github.com/therecipe/qt
Stars: ✭ 50 (+85.19%)
Mutual labels:  example
Discord-Bot-TypeScript-Template
Discord bot - A discord.js bot template written with TypeScript.
Stars: ✭ 86 (+218.52%)
Mutual labels:  example
iOS ARkit2 Multiusers
An example implemented multiplayer experience in ARKit2
Stars: ✭ 19 (-29.63%)
Mutual labels:  example
example-orbitdb-todomvc
TodoMVC with OrbitDB
Stars: ✭ 17 (-37.04%)
Mutual labels:  example
Hello-GLUT
A very simple "Hello World!" GLUT application demonstrating how to write OpenGL applications in C with MinGW and MSVC.
Stars: ✭ 27 (+0%)
Mutual labels:  example
todo-graphql-example
Example Todo app on top of json-graphql-server
Stars: ✭ 20 (-25.93%)
Mutual labels:  example
db2-samples
Db2 application code, configuration samples, and other examples
Stars: ✭ 56 (+107.41%)
Mutual labels:  example
react-native-css-modules-with-media-queries-example
An example app to show how CSS Media Queries work in React Native.
Stars: ✭ 18 (-33.33%)
Mutual labels:  example
api-examples
Plesk API-RPC usage examples
Stars: ✭ 79 (+192.59%)
Mutual labels:  example
hugo-bare-min-theme
A bare minimum theme for Hugo (https://gohugo.io) to help develop and debug Hugo sites -- https://hugo-bare-min.netlify.com/,
Stars: ✭ 71 (+162.96%)
Mutual labels:  example

AITA Dataset

DOI

Great news! Since the original blog post was shared, we discovered that the API used to collect post scores excluded ~30K posts from AITA in 2018-2019. These have been added to the dataset in the latest release. We will be sharing an update to some of the metrics calculated in the blog shortly.

This repo contains code to replicate our scrape of the r/AmItheAsshole subreddit, as well as .dvc files linking this GitHub repo to an S3 bucket hosting the dataset.

Building the dataset is accomplished in three scripts:

  1. 0_scraper_push_api.py collects Reddit post ids and scores from within a desired timeframe.
  2. 1_scraper_praw.py uses the praw library to query each post by id, and grab associated text and meta-data.
  3. 2_clean_and_consolidate.py cleans data and does some general neatening.

The dataset contained in aita_clean.csv has 9 features:

  • id, a unique string provided by Reddit's API to index every post
  • timestamp of post creation, in epoch/Unix format
  • title, a string
  • body, a string
  • edited, the timestamp at which a post was edited. If no edits occurred this field is False.
  • verdict, a string in the set {"asshole", "not the asshole", "everyone sucks", "no assholes here")
  • score, an integer corresponding to the difference between upvotes and downvotes
  • num_comments, an integer corresponding to the total number of comments (including nested discussion) to the post
  • is_asshole, a boolean corresponding to whether the verdict is in the set {"asshole","everyone sucks"}

To get this dataset, install DVC and run:

$ dvc get https://github.com/iterative/aita_dataset aita_clean.csv

or

$ dvc import https://github.com/iterative/aita_dataset aita_clean.csv to also download the associated .dvc files for data set versioning.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].