All Projects → sahitpj → India-WhatsAppFakeNews-Dataset

sahitpj / India-WhatsAppFakeNews-Dataset

Licence: GPL-2.0 license
WhatsApps related deaths News Articles along with other articles across India during that period

Programming Languages

python
139335 projects - #7 most used programming language
TeX
3793 projects

Projects that are alternatives of or similar to India-WhatsAppFakeNews-Dataset

cl-torrents
Searching torrents on popular trackers - CLI, readline, GUI, web client. Tutorial and binaries (issue tracker on https://gitlab.com/vindarel/cl-torrents/)
Stars: ✭ 83 (+102.44%)
Mutual labels:  web-scraping
rreddit
𝐫⟋ Get Reddit data
Stars: ✭ 49 (+19.51%)
Mutual labels:  web-scraping
automation-scripts
Simple scripts that I'm using to automate the boring things.
Stars: ✭ 14 (-65.85%)
Mutual labels:  web-scraping
grailer
web scraping tool for grailed.com
Stars: ✭ 30 (-26.83%)
Mutual labels:  web-scraping
extractnet
A Dragnet that also extract author, headline, date, keywords from context
Stars: ✭ 52 (+26.83%)
Mutual labels:  web-scraping
Final Project
Using Twitter Ego Network Analysis to Detect Sources of Fake News
Stars: ✭ 44 (+7.32%)
Mutual labels:  fake-news
selectorlib
A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them
Stars: ✭ 53 (+29.27%)
Mutual labels:  web-scraping
IMDB-Scraper
Scrapy project for scraping data from IMDB with Movie Dataset including 58,623 movies' data.
Stars: ✭ 37 (-9.76%)
Mutual labels:  web-scraping
browser-pool
A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.
Stars: ✭ 71 (+73.17%)
Mutual labels:  web-scraping
fake-news-datasets
This repository contains list of available fake news datasets for data mining.
Stars: ✭ 28 (-31.71%)
Mutual labels:  fake-news
htmlunit
🕸🧰☕️Tools to Scrape Dynamic Web Content via the 'HtmlUnit' Java Library
Stars: ✭ 39 (-4.88%)
Mutual labels:  web-scraping
iww
AI based web-wrapper for web-content-extraction
Stars: ✭ 61 (+48.78%)
Mutual labels:  web-scraping
leetcode-compensation
Compensation analysis on the posts scraped from leetcode.com/discuss/compensation. At present, the reports have been generated only for Indian cities.
Stars: ✭ 83 (+102.44%)
Mutual labels:  web-scraping
Data-Wrangling-with-Python
Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices
Stars: ✭ 90 (+119.51%)
Mutual labels:  web-scraping
OLX Scraper
📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.
Stars: ✭ 15 (-63.41%)
Mutual labels:  web-scraping
rymscraper
Python API to extract data from rateyourmusic.com.
Stars: ✭ 63 (+53.66%)
Mutual labels:  web-scraping
WaWebSessionHandler
(DISCONTINUED) Save WhatsApp Web Sessions as files and open them everywhere!
Stars: ✭ 27 (-34.15%)
Mutual labels:  web-scraping
bullshit-detector
🔍 Chráňte vašich blízkych pred nedôveryhodným 🇸🇰 a 🇨🇿 obsahom
Stars: ✭ 24 (-41.46%)
Mutual labels:  fake-news
feedIO
A Feed Aggregator that Knows What You Want to Read.
Stars: ✭ 26 (-36.59%)
Mutual labels:  fake-news
Node-js-functionalities
This repository contains very useful restful API's and functionalities in node-js containing many important tutorial code for mastering node-js, all tutorials have been published on medium.com, tutorials link is given below
Stars: ✭ 69 (+68.29%)
Mutual labels:  web-scraping

India WhatsApp Fake News Dataset DOI

The following repository consists of about !million+ News Articles scraped from the Times of India website, from late 2017 to June 2018.

The data was then checked for keyowrds which could point us to news articles which covered stories about WhatsApp fake news cases in India, which was a growing concern at that time. To be clear, this is not a dataset for fake articles

Details

  • The file Data.csv has the following files, with the date, place, and the keywords mentioned.
  • The webscrapper consists of a scrapy spider which can be used to extract files from the news site
  • The files archivelist_finder.py and extract_csv_data.py can be used for reference in the process

Labelling Data and Insights

After textfiles were preprocessed, keywords were found in the following dataset, which were selected to find news articles which had a good probabilty of being articles about Fake News. These were then crosschecked to see if the stories did correspond to them.

The following data was used by the BBC in order to help generate useful insights about Fake News trends in India, what type of fake news cases were being spread and how they were being spread.

The complete file containing all articles in the form of .txt files can be found at the following link. https://drive.google.com/file/d/19IbOlTO18BAXYRQoVkWfQ6paad4v9sfB/view?usp=sharing

Interesting things which could be done with this dataset

A few helper ideas in case you were wondering, how this dataset can be useful.

  1. Understanding fake news trends (the initial idea of this project). Could probably be extended to identifying trends and interesting facts, while trying to understand contemporary articles to them

  2. Understanding news headlines. If news headlines to these articles definitely portray the matter they have.

  3. Is there a particular trend for newspaper articles? News paper websites have been known to put click bait articles to help increase CTR for their website, but however contain matter which is questionable.

  4. Understanding quality of news articles. What makes a good news artcile?

These are some few good ideas, which I believe have a large potential in the field of Data journalism. Do feel free to open an issue, regarding any query related to this.

https://zenodo.org/badge/latestdoi/157810232

Please do cite this repository if you happen to use this dataset in your research. 😃

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].