All Projects → N2ITN → Are You Fake News

N2ITN / Are You Fake News

Licence: agpl-3.0
Bias detection in the news. Back and front end for areyoufakenews.com

Projects that are alternatives of or similar to Are You Fake News

Homeless Arrests Analysis
A Los Angeles Times analysis of arrests of the homeless by the LAPD
Stars: ✭ 53 (-49.52%)
Mutual labels:  journalism, jupyter-notebook, media
utopia-crm
Utopía is an open source platform for community based newsrooms to manage their subscriptions
Stars: ✭ 15 (-85.71%)
Mutual labels:  media, journalism
Notebooks
All of our computational notebooks
Stars: ✭ 292 (+178.1%)
Mutual labels:  journalism, jupyter-notebook
California Coronavirus Data
The Los Angeles Times' independent tally of coronavirus cases in California.
Stars: ✭ 188 (+79.05%)
Mutual labels:  journalism, jupyter-notebook
Aws Security Workshops
A collection of the latest AWS Security workshops
Stars: ✭ 332 (+216.19%)
Mutual labels:  lambda, jupyter-notebook
Hack The Media
This repo collects examples of intentional and unintentional hacks of media sources
Stars: ✭ 1,194 (+1037.14%)
Mutual labels:  journalism, media
Web Publisher
Superdesk Publisher - the next generation publishing platform for journalists and newsrooms.
Stars: ✭ 82 (-21.9%)
Mutual labels:  journalism, media
Ipywidgets Static
[obsolete] Static Widgets for IPython Notebooks
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook
Tensorflow 2.0 Quick Start Guide
Tensorflow 2.0 Quick Start Guide, published by Packt
Stars: ✭ 106 (+0.95%)
Mutual labels:  jupyter-notebook
Openomni
Documentation and library for decoding omnipod communications.
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook
Cgoes
Research by Carlos Góes
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook
Makeittalk
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook
Research Methods For Data Science With Python
Research Methods for Data Science with Python
Stars: ✭ 106 (+0.95%)
Mutual labels:  jupyter-notebook
Intro machine learning
Introduction to Machine Learning, a series of IPython Notebook and accompanying slideshow and video
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook
Sklearn tutorial
Materials for my scikit-learn tutorial
Stars: ✭ 1,521 (+1348.57%)
Mutual labels:  jupyter-notebook
Intro To Deep Learning For Nlp
The repository contains code walkthroughs which introduces Deep Learning in the field of Natural Language Processing.
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook
Msu Datascience Ml Tutorial 2018
Machine learning with Python tutorial at MSU Data Science 2018
Stars: ✭ 106 (+0.95%)
Mutual labels:  jupyter-notebook
Cc6204
Material del curso de Deep Learning de la Universidad de Chile
Stars: ✭ 106 (+0.95%)
Mutual labels:  jupyter-notebook
Harry potter nlp
Harry Potter and the Allocation of Dirichlet
Stars: ✭ 106 (+0.95%)
Mutual labels:  jupyter-notebook
Mcmc pydata london 2019
PyData London 2019 Tutorial on Markov chain Monte Carlo with PyMC3
Stars: ✭ 105 (+0%)
Mutual labels:  jupyter-notebook

Fake News Detector

In an era increasingly defined by the proliferation of misinformation and polarized politics, it's important for internet users to have context for what's on their screen. This microservice uses natural language processing and deep learning to analyze patterns of bias on any news website in real time. Each time a url is submitted, dozens of the most recent articles are collected and analyzed for a variety of factors, from political bias to journalistic accuracy.

Microservice Architecture [WIP]

Each of the directories in the ./Docker/ folder will contain the ingredients for a microservice. These services work together to form the app. Microservices in this app are comprised of serveral elements which make for a stable and well-defined function unit of code.

Docker/
    example_microservice/

        README.MD - High level overview
            Purpose
                Explain functionality
            Connections
                What services does this connect to, how, and why
            API Definition
                Outline service agreement + protocol

        Unit tests - Formally test the API Definition

        Dockerfile
            Lightweight officially supported image
            Specified version numbers of libraries

        Source Code
            Design
                Efficient, flexible, lightweight
            Quality
                Organized, commented, formatted.



Front end services

These microservices comprise the production website for serving predicitons to the user.

Control Flow

This is the controller of logic for the production website. It directs the backend execution of web requests initiated by user activity.

Web

The web interface. Uses flask, gninx, gunicorn to host the dynamic and static pages.

Web scraper

Several small functions for gathering text and metadata from news sites. Includes a domain spiderer to inventory article urls on a site, and map/reduce scraper pattern for asyncronous web scraping of the article urls.

Plotter

Matplotlib and code for generating plots given prediction data.

Predict

Lightweight tensorflow/keras NLP container for generating predictions from text.


Data Persistence


Mongo

Contains a mongoDB image with a few custom queries. Serves as the central source of state on the site.


Model Training


These services are used to collect data from labelled sources for training the convolutional neural net. The resulting model files are then used in the production site.

Gather Data

Downloads articles from websites into a common format. Articles are stored in MongoDB.

Train

Trains model using collected data and generates neural network weights, word vectors.


Site Background

Data Collection


OpenSources maintains a downloadable database of news sites with tags related to journalistic accuracy.

Media Bias Fact Check maintains an online directory of news sites, categorized by the political bias and accuracy.

Using a customized fork of the excellent Newspaper library this project spiders ~3000 labelled websites for new articles to and stores them by their bias tag in MongoDB. Article texts are minmally preprocessed with unicode cleaning.

Modeling


Using the collected data, a TFIDF vector is fitted on the article collection. A custom-built convolutional neural network is trained in a multi-label classification scheme using a binary crossentropy loss fucntion with a sigmoid output layer. Th model is deployed to AWS Lambda.

Deployment


The website is published via Flask. After a user enters a news site URL, the webserver scans the site for the most 150 recent articles and gathers their URLS. Asynchronously, the text in each url is downloaded using AWS Lambda. The article text is then sent to another AWS Lambda function with the trained neural network model. Results are plotted via matplotlib and rendered in the webpage.

Deeper


For a much more detailed discussion of the project please see this living presentation on google slides: https://docs.google.com/presentation/d/1wwnTx0hKB2MJXGPBHbAzElQnCPKH4UFicfnrzsxQG2g/edit?usp=sharing

Open Source

This is GNU GPL licensed, so anyone can use it as long as it remains open source. Anyone who is interested in contributing is welcome to head over to the Data For Democracy repo, where issues are being tracked. https://github.com/Data4Democracy/are-you-fake-news

Contact

aracel.io

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].