Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Snape is a convenient artificial dataset generator that wraps sklearn's make_classification and make_regression and then adds in 'realism' features such as complex formating, varying scales, categorical variables, and missing values.

Stars: ✭ 155 (-2.52%)

Mutual labels: dataset

Lapa Dataset

A large-scale dataset for face parsing (AAAI2020)

Stars: ✭ 149 (-6.29%)

Mutual labels: dataset

Covid 19 Timeline

以社会学年鉴模式体例规范地统编自2019年末起新冠肺炎疫情进展的时间线。

Stars: ✭ 1,887 (+1086.79%)

Mutual labels: news

Rt gene

RT-GENE: Real-Time Eye Gaze and Blink Estimation in Natural Environments

Stars: ✭ 157 (-1.26%)

Mutual labels: dataset

Isic Archive Downloader

A script to download the ISIC Archive of lesion images

Stars: ✭ 153 (-3.77%)

Mutual labels: dataset

Evoskeleton

Official project website for the CVPR 2020 paper (Oral Presentation) "Cascaded Deep Monocular 3D Human Pose Estimation With Evolutionary Training Data"

Stars: ✭ 154 (-3.14%)

Mutual labels: dataset

Quickdraw Appendix

Dataset of 25k penises: an appendix to the Quick, Draw! Dataset

Stars: ✭ 153 (-3.77%)

Mutual labels: dataset

Newswatch React Native

📺 A news app using YouTube playlists, built with React Native

Stars: ✭ 155 (-2.52%)

Mutual labels: news

Music Dance Video Synthesis

(ACM MM 20 Oral) PyTorch implementation of Self-supervised Dance Video Synthesis Conditioned on Music

Stars: ✭ 150 (-5.66%)

Mutual labels: dataset

Omr Datasets

Collection of datasets used for Optical Music Recognition

Stars: ✭ 158 (-0.63%)

Mutual labels: dataset

Census Data Downloader

Download U.S. census data and reformat it for humans

Stars: ✭ 149 (-6.29%)

Mutual labels: news

Awesome Biomechanics

A curated, public list collecting resources for biomechanics and human motion: datasets, processing tools, software for simulation, educational videos, lectures, etc.

Stars: ✭ 154 (-3.14%)

Mutual labels: dataset

Dem.net

Digital Elevation model library in C#. 3D terrain models, line/point Elevations, intervisibility reports

Stars: ✭ 153 (-3.77%)

Mutual labels: dataset

Motion Sense

MotionSense Dataset for Human Activity and Attribute Recognition ( time-series data generated by smartphone's sensors: accelerometer and gyroscope)

Stars: ✭ 159 (+0%)

Mutual labels: dataset

Pytorch Nlp

Basic Utilities for PyTorch Natural Language Processing (NLP)

Stars: ✭ 1,996 (+1155.35%)

Mutual labels: dataset

View All Similar Projects ➔

Reuters-full-data-set

Full unofficial data set of Reuters composed of 8,551,441 news titles, links and timestamps (Jan 2007 - Aug 2016).

NB: To generate it from scrach (from 2007 up to today), please scroll down.

Using the pre-existing one

git clone https://github.com/philipperemy/Reuters-full-data-set.git
cd Reuters-full-data-set
python3 read.py

ts = 20070228 11:46 AM EST, t = European stocks hit 7-week low amid new sell-off, h= http://www.reuters.com/article/companyNewsAndPR/idUSWEB277620070228
ts = 20070228 11:46 AM EST, t = Schering-Plough announces Ismail Kola as VP and Chief Scientific Officer, h= http://www.reuters.com/article/inPlayBriefing/idUSIN20070228164651SGP20070228
ts = 20070228 11:46 AM EST, t = O'Reilly Automotive forecasts 2007 earnings growth, h= http://www.reuters.com/article/marketsNews/idUSN2845320220070228
ts = 20070228 11:42 AM EST, t = Market Wrap, h= http://www.reuters.com/article/inPlayBriefing/idUSIN20070228164235WRAPX20070228
ts = 20070228 11:42 AM EST, t = Chile's CMPC net profit falls 13 pct in 2006, h= http://www.reuters.com/article/tnBasicIndustries-SP/idUSN2844077020070228
ts = 20070228 11:42 AM EST, t = Toyota Venezuela to halt March ops on currency woes, h= http://www.reuters.com/article/tnBasicIndustries-SP/idUSN2827887820070228

Each pickle file in data represents a day (e.g. 20160102.pkl is for Jan, 2 2016).

One day is composed of several news, gathered in a list.

Each news is a dict of the form:

ts: timestamp of the form 20070228 11:46 AM EST
title: title of the news
href: link to the article to get the full content

Generate your own data set

Nothing could be easier. Just run those commands to generate pickle and CSV files.

I get the data from http://www.reuters.com/resources/archive/us.

git clone https://github.com/philipperemy/Reuters-full-data-set.git
cd Reuters-full-data-set
pip3 install beautifulsoup4 requests
python3 generate.py
python3 dump_to_csv.py DATA_DIR # where DATA_DIR is the directory contained your pickle files from generate.py

Other languages exist

Japanese: http://jp.reuters.com/resources/archive/jp/20160414.html

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 159

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗