sahitpj / India-WhatsAppFakeNews-Dataset

Licence: GPL-2.0 license

WhatsApps related deaths News Articles along with other articles across India during that period

Programming Languages

python

139335 projects - #7 most used programming language

TeX

3793 projects

Projects that are alternatives of or similar to India-WhatsAppFakeNews-Dataset

cl-torrents

Searching torrents on popular trackers - CLI, readline, GUI, web client. Tutorial and binaries (issue tracker on https://gitlab.com/vindarel/cl-torrents/)

Stars: ✭ 83 (+102.44%)

Mutual labels: web-scraping

rreddit

𝐫⟋ Get Reddit data

Stars: ✭ 49 (+19.51%)

Mutual labels: web-scraping

automation-scripts

Simple scripts that I'm using to automate the boring things.

Stars: ✭ 14 (-65.85%)

Mutual labels: web-scraping

grailer

web scraping tool for grailed.com

Stars: ✭ 30 (-26.83%)

Mutual labels: web-scraping

extractnet

A Dragnet that also extract author, headline, date, keywords from context

Stars: ✭ 52 (+26.83%)

Mutual labels: web-scraping

Final Project

Using Twitter Ego Network Analysis to Detect Sources of Fake News

Stars: ✭ 44 (+7.32%)

Mutual labels: fake-news

selectorlib

A library to read a YML file with Xpath or CSS Selectors and extract data from HTML pages using them

Stars: ✭ 53 (+29.27%)

Mutual labels: web-scraping

IMDB-Scraper

Scrapy project for scraping data from IMDB with Movie Dataset including 58,623 movies' data.

Stars: ✭ 37 (-9.76%)

Mutual labels: web-scraping

browser-pool

A Node.js library to easily manage and rotate a pool of web browsers, using any of the popular browser automation libraries like Puppeteer, Playwright, or SecretAgent.

Stars: ✭ 71 (+73.17%)

Mutual labels: web-scraping

fake-news-datasets

This repository contains list of available fake news datasets for data mining.

Stars: ✭ 28 (-31.71%)

Mutual labels: fake-news

htmlunit

🕸🧰☕️Tools to Scrape Dynamic Web Content via the 'HtmlUnit' Java Library

Stars: ✭ 39 (-4.88%)

Mutual labels: web-scraping

iww

AI based web-wrapper for web-content-extraction

Stars: ✭ 61 (+48.78%)

Mutual labels: web-scraping

leetcode-compensation

Compensation analysis on the posts scraped from leetcode.com/discuss/compensation. At present, the reports have been generated only for Indian cities.

Stars: ✭ 83 (+102.44%)

Mutual labels: web-scraping

Data-Wrangling-with-Python

Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices

Stars: ✭ 90 (+119.51%)

Mutual labels: web-scraping

OLX Scraper

📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.

Stars: ✭ 15 (-63.41%)

Mutual labels: web-scraping

rymscraper

Python API to extract data from rateyourmusic.com.

Stars: ✭ 63 (+53.66%)

Mutual labels: web-scraping

WaWebSessionHandler

(DISCONTINUED) Save WhatsApp Web Sessions as files and open them everywhere!

Stars: ✭ 27 (-34.15%)

Mutual labels: web-scraping

bullshit-detector

🔍 Chráňte vašich blízkych pred nedôveryhodným 🇸🇰 a 🇨🇿 obsahom

Stars: ✭ 24 (-41.46%)

Mutual labels: fake-news

feedIO

A Feed Aggregator that Knows What You Want to Read.

Stars: ✭ 26 (-36.59%)

Mutual labels: fake-news

Node-js-functionalities

This repository contains very useful restful API's and functionalities in node-js containing many important tutorial code for mastering node-js, all tutorials have been published on medium.com, tutorials link is given below

Stars: ✭ 69 (+68.29%)

Mutual labels: web-scraping

View All Similar Projects ➔

India WhatsApp Fake News Dataset

The following repository consists of about !million+ News Articles scraped from the Times of India website, from late 2017 to June 2018.

The data was then checked for keyowrds which could point us to news articles which covered stories about WhatsApp fake news cases in India, which was a growing concern at that time. To be clear, this is not a dataset for fake articles

Details

The file Data.csv has the following files, with the date, place, and the keywords mentioned.
The webscrapper consists of a scrapy spider which can be used to extract files from the news site
The files archivelist_finder.py and extract_csv_data.py can be used for reference in the process

Labelling Data and Insights

After textfiles were preprocessed, keywords were found in the following dataset, which were selected to find news articles which had a good probabilty of being articles about Fake News. These were then crosschecked to see if the stories did correspond to them.

The following data was used by the BBC in order to help generate useful insights about Fake News trends in India, what type of fake news cases were being spread and how they were being spread.

The complete file containing all articles in the form of .txt files can be found at the following link. https://drive.google.com/file/d/19IbOlTO18BAXYRQoVkWfQ6paad4v9sfB/view?usp=sharing

Interesting things which could be done with this dataset

A few helper ideas in case you were wondering, how this dataset can be useful.

Understanding fake news trends (the initial idea of this project). Could probably be extended to identifying trends and interesting facts, while trying to understand contemporary articles to them
Understanding news headlines. If news headlines to these articles definitely portray the matter they have.
Is there a particular trend for newspaper articles? News paper websites have been known to put click bait articles to help increase CTR for their website, but however contain matter which is questionable.
Understanding quality of news articles. What makes a good news artcile?

These are some few good ideas, which I believe have a large potential in the field of Data journalism. Do feel free to open an issue, regarding any query related to this.

https://zenodo.org/badge/latestdoi/157810232

Please do cite this repository if you happen to use this dataset in your research. 😃

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

sahitpj / India-WhatsAppFakeNews-Dataset

Programming Languages

Labels

Projects that are alternatives of or similar to India-WhatsAppFakeNews-Dataset

India WhatsApp Fake News Dataset

Details

Labelling Data and Insights

Interesting things which could be done with this dataset