All Projects → yumoxu → Stocknet Dataset

yumoxu / Stocknet Dataset

Licence: mit
A comprehensive dataset for stock movement prediction from tweets and historical stock prices.

Projects that are alternatives of or similar to Stocknet Dataset

French Sentiment Analysis Dataset
A collection of over 1.5 Million tweets data translated to French, with their sentiment.
Stars: ✭ 35 (-84.65%)
Mutual labels:  dataset, tweets
Covid19 twitter
Covid-19 Twitter dataset for non-commercial research use and pre-processing scripts - under active development
Stars: ✭ 304 (+33.33%)
Mutual labels:  dataset, tweets
Computervisiondatasets
Stars: ✭ 207 (-9.21%)
Mutual labels:  dataset
H36m Fetch
Human 3.6M 3D human pose dataset fetcher
Stars: ✭ 220 (-3.51%)
Mutual labels:  dataset
Dialogrpt
EMNLP 2020: "Dialogue Response Ranking Training with Large-Scale Human Feedback Data"
Stars: ✭ 216 (-5.26%)
Mutual labels:  dataset
Charlatan
Create fake data in R
Stars: ✭ 209 (-8.33%)
Mutual labels:  dataset
Datatable
A go in-memory table
Stars: ✭ 215 (-5.7%)
Mutual labels:  dataset
Dmarchiver
A tool to archive the direct messages, images and videos from your private conversations on Twitter
Stars: ✭ 204 (-10.53%)
Mutual labels:  tweets
Torchdata
PyTorch dataset extended with map, cache etc. (tensorflow.data like)
Stars: ✭ 226 (-0.88%)
Mutual labels:  dataset
Short Jokes Dataset
Python scripts for building 'Short Jokes' dataset, featured on Kaggle
Stars: ✭ 215 (-5.7%)
Mutual labels:  dataset
Discord Twitter Bot
Posts Twitter Tweets to Discord through Webhook
Stars: ✭ 219 (-3.95%)
Mutual labels:  tweets
Ava downloader
⏬ Download AVA dataset (A Large-Scale Database for Aesthetic Visual Analysis)
Stars: ✭ 214 (-6.14%)
Mutual labels:  dataset
Dynamic Training Bench
Simplify the training and tuning of Tensorflow models
Stars: ✭ 210 (-7.89%)
Mutual labels:  dataset
Bccd dataset
BCCD (Blood Cell Count and Detection) Dataset is a small-scale dataset for blood cells detection.
Stars: ✭ 216 (-5.26%)
Mutual labels:  dataset
Mini Imagenet Tools
Tools for generating mini-ImageNet dataset and processing batches
Stars: ✭ 209 (-8.33%)
Mutual labels:  dataset
Automated Resume Screening System
Automated Resume Screening System using Machine Learning (With Dataset)
Stars: ✭ 224 (-1.75%)
Mutual labels:  dataset
Covid19za
Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa
Stars: ✭ 208 (-8.77%)
Mutual labels:  dataset
Pynasa
Stars: ✭ 212 (-7.02%)
Mutual labels:  dataset
Dataset Serialize
JSON to DataSet and DataSet to JSON converter for Delphi and Lazarus (FPC)
Stars: ✭ 213 (-6.58%)
Mutual labels:  dataset
Vehicle reid Collection
🚗 the collection of vehicle re-ID papers, datasets. 🚗
Stars: ✭ 225 (-1.32%)
Mutual labels:  dataset

stocknet-dataset

This repository releases a comprehensive dataset for stock movement prediction from tweets and historical stock prices. Please cite the following paper [bib] if you use this dataset,

Yumo Xu and Shay B. Cohen. 2018. Stock Movement Prediction from Tweets and Historical Prices. In Proceedings of the 56st Annual Meeting of the Association for Computational Linguistics. Melbourne, Australia, volume 1.

Stock movement prediction is a challenging problem: the market is highly stochastic, and we make temporally-dependent predictions from chaotic data. We treat these three complexities and present a novel deep generative model jointly exploiting text and price signals for this task. Unlike the case with discriminative or topic modeling, our model introduces recurrent, continuous latent variables for a better treatment of stochasticity, and uses neural variational inference to address the intractable posterior inference. We also provide a hybrid objective with temporal auxiliary to flexibly capture predictive dependencies. We demonstrate the state-of-the-art performance of our proposed model on a new stock movement prediction dataset which we collected.

You might also be interested in our code for stock movement prediction.

Should you have any query please contact me at [email protected].

Dataset Overview

Two-year price movements from 01/01/2014 to 01/01/2016 of 88 stocks are selected to target, coming from all the 8 stocks in the Conglomerates sector and the top 10 stocks in capital size in each of the other 8 sectors. The full list of 88 stocks and their companies selected from 9 sectors is available in StockTable, a facsimile of the paper appendix appendix_table_of_target_stocks.pdf.

Data Component

This dataset comprises two main components,

Each component contains their raw data and preprocessed data organized by stocks,

  • ./tweet/raw
  • ./tweet/preprocessed

and

  • ./price/raw
  • ./price/preprocessed

Data Format

Raw Tweet Data

Format: JSON
Keys: see Introduction to Tweet JSON

Preprocessed Tweet Data

Format: JSON
Keys: 'text', 'user_id_str', 'created_at'

Raw Price Data

Format: CSV
Entries: date, open price, high price, low price, close price, adjust close price, volume

Preprocessed Price Data

Format: TXT
Entries: date, movement percent, open price, high price, low price, close price, volume
Note: open, high, low, close prices are normalized values.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].