All Projects → thepanacealab → Covid19_twitter

thepanacealab / Covid19_twitter

Licence: other
Covid-19 Twitter dataset for non-commercial research use and pre-processing scripts - under active development

Projects that are alternatives of or similar to Covid19 twitter

Data Science Resources
👨🏽‍🏫You can learn about what data science is and why it's important in today's modern world. Are you interested in data science?🔋
Stars: ✭ 171 (-43.75%)
Mutual labels:  jupyter-notebook, dataset
Weatherbench
A benchmark dataset for data-driven weather forecasting
Stars: ✭ 227 (-25.33%)
Mutual labels:  jupyter-notebook, dataset
Fifa18 All Player Statistics
A complete catalog of all the players in Fifa 18 and their complete statistics.
Stars: ✭ 185 (-39.14%)
Mutual labels:  jupyter-notebook, dataset
Motion Sense
MotionSense Dataset for Human Activity and Attribute Recognition ( time-series data generated by smartphone's sensors: accelerometer and gyroscope)
Stars: ✭ 159 (-47.7%)
Mutual labels:  jupyter-notebook, dataset
Dataset Api
The ApolloScape Open Dataset for Autonomous Driving and its Application.
Stars: ✭ 260 (-14.47%)
Mutual labels:  jupyter-notebook, dataset
Cifar 10.1
Release of CIFAR-10.1, a new test set for CIFAR-10.
Stars: ✭ 166 (-45.39%)
Mutual labels:  jupyter-notebook, dataset
Covid19za
Coronavirus COVID-19 (2019-nCoV) Data Repository and Dashboard for South Africa
Stars: ✭ 208 (-31.58%)
Mutual labels:  jupyter-notebook, dataset
Coronawatchnl
Numbers concerning COVID-19 disease cases in The Netherlands by RIVM, LCPS, NICE, ECML, and Rijksoverheid.
Stars: ✭ 135 (-55.59%)
Mutual labels:  jupyter-notebook, dataset
Taco
🌮 Trash Annotations in Context Dataset Toolkit
Stars: ✭ 243 (-20.07%)
Mutual labels:  jupyter-notebook, dataset
Covid Chestxray Dataset
We are building an open database of COVID-19 cases with chest X-ray or CT images.
Stars: ✭ 2,759 (+807.57%)
Mutual labels:  jupyter-notebook, dataset
Lacmus
Lacmus is a cross-platform application that helps to find people who are lost in the forest using computer vision and neural networks.
Stars: ✭ 142 (-53.29%)
Mutual labels:  jupyter-notebook, dataset
Tehran Stocks
A python package to access tsetmc data
Stars: ✭ 282 (-7.24%)
Mutual labels:  jupyter-notebook, dataset
Gossiping Chinese Corpus
PTT 八卦版問答中文語料
Stars: ✭ 137 (-54.93%)
Mutual labels:  jupyter-notebook, dataset
Shape Detection
🟣 Object detection of abstract shapes with neural networks
Stars: ✭ 170 (-44.08%)
Mutual labels:  jupyter-notebook, dataset
Datasets
🎁 3,000,000+ Unsplash images made available for research and machine learning
Stars: ✭ 1,805 (+493.75%)
Mutual labels:  jupyter-notebook, dataset
Trump Lies
Tutorial: Web scraping in Python with Beautiful Soup
Stars: ✭ 201 (-33.88%)
Mutual labels:  jupyter-notebook, dataset
Contactpose
Large dataset of hand-object contact, hand- and object-pose, and 2.9 M RGB-D grasp images.
Stars: ✭ 129 (-57.57%)
Mutual labels:  jupyter-notebook, dataset
Real Time Sentiment Tracking On Twitter For Brand Improvement And Trend Recognition
A real-time interactive web app based on data pipelines using streaming Twitter data, automated sentiment analysis, and MySQL&PostgreSQL database (Deployed on Heroku)
Stars: ✭ 127 (-58.22%)
Mutual labels:  jupyter-notebook, tweets
Datasets
source{d} datasets ("big code") for source code analysis and machine learning on source code
Stars: ✭ 231 (-24.01%)
Mutual labels:  jupyter-notebook, dataset
Data Science Hacks
Data Science Hacks consists of tips, tricks to help you become a better data scientist. Data science hacks are for all - beginner to advanced. Data science hacks consist of python, jupyter notebook, pandas hacks and so on.
Stars: ✭ 273 (-10.2%)
Mutual labels:  jupyter-notebook, dataset

Latest Updates:

03/18/21 Daily data (under the /dailies/ folder) has been added for 3/17 and 3/16, note that some tweets will bleed into the following day due to different timezones captured.

03/16/21 Daily data (under the /dailies/ folder) has been added for 3/15 and 3/14, note that some tweets will bleed into the following day due to different timezones captured.

03/14/21 Version 53 of the dataset. This release marks one full year of releases from us. Additional to our weekly update, we have a new additional set of 2.4 million Russian tweets provided by Katya Artemova (NRU HSE) and Elena Tutubalina (KFU). Daily data has been added for 3/13, 3/12, and 3/11.

03/11/21 Daily data (under the /dailies/ folder) has been added for 3/10 and 3/09, note that some tweets will bleed into the following day due to different timezones captured.

03/09/21 Daily data (under the /dailies/ folder) has been added for 3/08 and 3/07, note that some tweets will bleed into the following day due to different timezones captured.

03/07/21 Version 52 of the dataset. Dailies have been added for 3/06, 3/05 and 3/04. New: we added a Colab Notebook tutorial with some code to help you hydrate and pre-process the dataset. Note that this is just for illustration and will not download and process the whole dataset for you.

Covid-19 Twitter chatter dataset for scientific use

Due to the relevance of the COVID-19 global pandemic, we are releasing our dataset of tweets acquired from the Twitter Stream related to COVID-19 chatter. The first 9 weeks of data (from January 1st, 2020 to March 11th, 2020) contain very low tweet counts as we filtered other data we were collecting for other research purposes, however, one can see the dramatic increase as the awareness for the virus spread. Dedicated data gathering started from March 11th yielding over 4 million tweets a day.

The data collected from the stream captures all languages, but the higher prevalence are: English, Spanish, and French. We release all tweets and retweets on the full dataset, and a cleaned version with no retweets. There are several practical reasons for us to leave the retweets, tracing important tweets and their dissemination is one of them. For NLP tasks we provide the top 1000 frequent terms, the top 1000 bigrams, and the top 1000 trigrams. Some general statistics per day are included for both datasets.

We will continue to update the dataset every two days here and weekly in Zenodo.

For more information on processing and visualizations please visit: www.panacealab.org/covid19

Usage

All tweets ids found in full_dataset.tsv and full_dataset-clean.tsv need to be hydrated using a tool like get_metada.py from the Social Media Toolkit (SMMT) released by our lab or Twarc.

Note: All the code in the /processing_code folder is provided as-is, it was used to generate the provided files from the source Tweet JSON files. Documentation will be gradually added for these scripts.

Mainted by:

Panacea Lab - Georgia State University - Juan M. Banda, Ramya Tekumalla, and Gerardo Chowell-Puente. Additional data provided by: Guanyu Wang (Missouri school of journalism, University of Missouri), Jingyuan Yu (Department of social psychology, Universitat Autònoma de Barcelona), Tuo Liu (Department of psychology, Carl von Ossietzky Universität Oldenburg), Yuning Ding (Language technology lab, Universität Duisburg-Essen), Katya Artemova (NRU HSE) and Elena Tutubalina (KFU)

Version 53.0 release notes

DOI

Version 53 of the dataset. This release marks one full year of releases from us. Additional to our weekly update, we have a new additional set of 2.4 million Russian tweets provided by Katya Artemova (NRU HSE) and Elena Tutubalina (KFU).

How to cite this dataset:

Our paper:

@misc{banda2020largescale,
      title={A large-scale COVID-19 Twitter chatter dataset for open scientific research -- an international collaboration}, 
      author={Banda, Juan M. and Tekumalla, Ramya and Wang, Guanyu and Yu, Jingyuan and Liu, Tuo and Ding, Yuning and Artemova, Katya and Tutubalinа, Elena and Chowell, Gerardo},
      year={2020},
      eprint={2004.03688},
      archivePrefix={arXiv},
      primaryClass={cs.SI},
      url={https://arxiv.org/abs/2004.03688}
}

Version 53.0

@dataset{banda_juan_m_2020_3757272,
  author       = {Banda, Juan M. and
                  Tekumalla, Ramya and
                  Wang, Guanyu and
                  Yu, Jingyuan and
                  Liu, Tuo and
                  Ding, Yuning and
                  Artemova, Katya and
                  Tutubalinа, Elena and
                  Chowell, Gerardo},
  title        = {{A large-scale COVID-19 Twitter chatter dataset for 
                   open scientific research - an international
                   collaboration}},
  month        = may,
  year         = 2020,
  note         = {{This dataset will be updated bi-weekly at least 
                   with additional tweets, look at the github repo
                   for these updates. Release: We have standardized
                   the name of the resource to match our pre-print
                   manuscript and to not have to update it every
                   week.}},
  publisher    = {Zenodo},
  version      = {53.0},
  doi          = {10.5281/zenodo.3723939},
  url          = {https://doi.org/10.5281/zenodo.3723939}
}

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].