All Projects → several27 → Fakenewsrecognition

several27 / Fakenewsrecognition

Projects that are alternatives of or similar to Fakenewsrecognition

Ephys Analysis
Scripts and utilities for processing electrophysiology data
Stars: ✭ 7 (-56.25%)
Mutual labels:  jupyter-notebook
Chinese data analysis
An Analysis of the Distribution Law of Word Frequency and Stroke Number in Chinese
Stars: ✭ 16 (+0%)
Mutual labels:  jupyter-notebook
Autovega
Jupyter widget for quick visualization of Pandas dataframes using Vega and Altair
Stars: ✭ 16 (+0%)
Mutual labels:  jupyter-notebook
Misc
misc
Stars: ✭ 7 (-56.25%)
Mutual labels:  jupyter-notebook
Aicrystallographer
Here, we will upload our deep/machine learning models and 'workflows' (such as AtomNet, DefectNet, SymmetryNet, etc) that aid in automated analysis of atomically resolved images
Stars: ✭ 16 (+0%)
Mutual labels:  jupyter-notebook
Rebalance.portfolio.python
Stars: ✭ 16 (+0%)
Mutual labels:  jupyter-notebook
Neurosleeve
Stars: ✭ 7 (-56.25%)
Mutual labels:  jupyter-notebook
Nyu Intro To Python Spring 2018
Stars: ✭ 16 (+0%)
Mutual labels:  jupyter-notebook
Machine Learning For Telecommunications
A base solution that helps to generate insights from their data. The solution provides a framework for an end-to-end machine learning process including ad-hoc data exploration, data processing and feature engineering, and modeling training and evaluation. This baseline will provide the foundation for industry specific data to be applied and models created to release industry specific ML solutions.
Stars: ✭ 16 (+0%)
Mutual labels:  jupyter-notebook
Deepem For Weakly Supervised Detection
MICCAI18 DeepEM: Deep 3D ConvNets with EM for Weakly Supervised Pulmonary Nodule Detection
Stars: ✭ 16 (+0%)
Mutual labels:  jupyter-notebook
Credit Card Fraud Detection Dataset
Classification Problem to detect credit card fraud
Stars: ✭ 16 (+0%)
Mutual labels:  jupyter-notebook
Hass Google Coral
RETIRED - instead use https://github.com/robmarkcole/HASS-Deepstack-object
Stars: ✭ 16 (+0%)
Mutual labels:  jupyter-notebook
Fundamentals Of Deep Learning For Computer Vision Nvidia
The repository includes Notebook files and documents of the course I completed in NVIDIA Deep Learning Institute. Feel free to acess and work with the Notebooks and other files.
Stars: ✭ 16 (+0%)
Mutual labels:  jupyter-notebook
Rubik research
Experiments with using neural nets to solve a Rubik's Cube - read README first
Stars: ✭ 7 (-56.25%)
Mutual labels:  jupyter-notebook
Kaggle Start
Kaggle 入门教程
Stars: ✭ 16 (+0%)
Mutual labels:  jupyter-notebook
Gcdri ts cat ml
Stars: ✭ 7 (-56.25%)
Mutual labels:  jupyter-notebook
Code Demos
Code exercises and demos complementing lecture materials.
Stars: ✭ 16 (+0%)
Mutual labels:  jupyter-notebook
Webinar Titanic
Stars: ✭ 16 (+0%)
Mutual labels:  jupyter-notebook
Julia Machine Learning Review
Stars: ✭ 16 (+0%)
Mutual labels:  jupyter-notebook
Tensorflow Tutorial
Stars: ✭ 16 (+0%)
Mutual labels:  jupyter-notebook

Project Brief

Student’s Name: Maciej Szpakowski (ms3u14 - 28170911) Project Title: Fake News Recognition Supervisor’s Name: Dr. Jonathon Hare

Problem

Nowadays, with the increased ease of online communication and social media, people have immediate access to many different types of information. Some of this information comes from reliable portals. However, there's plenty of news coming from gossipy magazines and click baits that share scientifically dubious claims, hatred and biased speech. The problem becomes even more prominent as people often share articles that have controversial titles, increasing their popularity. Unfortunately, with more sources appearing on the internet, manual validation is becoming impossible.

Goals

The goal of this project is to explore the issue of fake news and create a deep learning based algorithm for automatic labelling of texts that are potentially deceptive. As this is a far-reaching problem that affects many areas and different types of media, the first step is to narrow it down based on available data and previous research.

The second step is to, using this data, create/choose the most efficient algorithm that will be able to determine the veracity of the news. Some of the possible approaches include context-based (e.g., using the links network), knowledge-based (e.g., using information from Wikipedia), style-based (e.g., using text classification) or an amalgamation of those.

Additionally, if time allows, the last step is to create a simple browser extension, which will automatically detect fake news and inform the user about it (e.g., by showing an alert on the feed or actual news page). It will also allow for gathering feedback from users to assess the real performance of created approach.

Scope

One of the most significant limitations of this work is the data that is openly available. The focus of this work is on using novel deep learning based approaches for natural language processing. Some of them (like LSTMs based on word embeddings) usually require large quantities of data. Therefore additional website crawling may be necessary.

Finally, the term “fake news” sometimes is associated with political views some people do not agree with. While we will do our best to make sure the datasets used for training contain actual information and are accurately classified, the purpose of this work is not to fact check all the input data to make sure initial labels are 100% correct.

Data pipeline

  1. Run the scraper. From inside of the spider directory run:

    export PATH_FNR=/home/several27/FakeNewsRecognition/
    scrapy crawl news -s LOG_FILE="../data/7_opensources_co/news_spider.log.1" -s JOBDIR="../data/7_opensources_co/news_spider_job_1/"
    
  2. Parse the scraped dataset by creating another DB with title and content limiters

  3. Download the webhose dataset and convert to a DB (Webhose analytics.ipynb)

  4. Download the nytimes dataset (scrape_nytimes.py)

  5. Append the webhose and nytimes datasets to the scraped one setting the type to reliable and source to webhose and nytimes (news_cleaned + webhose + nytimes merge.ipynb)

  6. Convert the dataset to csv (data_full.py)

  7. Create a dataset sample (data_sample.py)

  8. Generate fasttext embeddings (Generate fastText.ipynb)

Running machines

To run the machines the supplied csv from FakeNewsCorpus needs to preprocessed using the data_generator.py. The fasttext embeddings are also necessary.

Look into those counts

[email protected]:~/FakeNewsRecognition$ python3 2018-02-21\ -\ count_unique_articles.py 8529090it [14:14, 9987.15it/s] Counter all: 8529090 Counter short_content: 0 Unique hashes count content: 6248655 Unique hashes count url: 8462558 Unique hashes count title_url: 8465409 Unique hashes count title_content: 7297002

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].