All Projects → the-markup → investigation-youtube-ad-placements

the-markup / investigation-youtube-ad-placements

Licence: other
Data and code from our stories, "Google Has a Secret Blocklist that Hides YouTube Hate Videos from Advertisers—But It’s Full of Holes," and "Google Blocks Advertisers from Targeting Black Lives Matter YouTube Videos."

Programming Languages

Jupyter Notebook
11667 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to investigation-youtube-ad-placements

investigation-amazon-brands
Materials to reproduce our findings in our stories, "Amazon Puts Its Own 'Brands' First Above Better-Rated Products" and "When Amazon Takes the Buy Box, it Doesn’t Give it up"
Stars: ✭ 56 (+107.41%)
Mutual labels:  algorithm-auditing
KAREN
KAREN: Unifying Hatespeech Detection and Benchmarking
Stars: ✭ 18 (-33.33%)
Mutual labels:  hate
DECARCERATION-PLATFORM
wisdp.com
Stars: ✭ 17 (-37.04%)
Mutual labels:  social-justice
HateALERT-EVALITA
Code for replicating results of team 'hateminers' at EVALITA-2018 for AMI task
Stars: ✭ 13 (-51.85%)
Mutual labels:  hate-speech
Pangaia
A Operating System for the Creative Economy. See also AI docs. For use with the Business Plan at wiki.hackerspaces.org.
Stars: ✭ 36 (+33.33%)
Mutual labels:  social-justice
DE-LIMIT
DeEpLearning models for MultIlingual haTespeech (DELIMIT): Benchmarking multilingual models across 9 languages and 16 datasets.
Stars: ✭ 90 (+233.33%)
Mutual labels:  hate-speech
opensource-hate
Hate in OpenSource
Stars: ✭ 90 (+233.33%)
Mutual labels:  hate
toxicity
The world's largest social media toxicity dataset.
Stars: ✭ 135 (+400%)
Mutual labels:  hate-speech
HatefulUsersTwitter
Code for the paper "Characterizing and Detecting Hateful Users on Twitter"
Stars: ✭ 69 (+155.56%)
Mutual labels:  hate-speech
poly
Open source, modern software to share and learn every language in the world.
Stars: ✭ 76 (+181.48%)
Mutual labels:  social-justice
Open-Sentencing
To help public defenders better serve their clients, Open Sentencing shows racial bias in data such as demographics providing insights for each case
Stars: ✭ 69 (+155.56%)
Mutual labels:  racial-justice

YouTube Ad Placements

This repository contains code and data to reproduce the findings featured in our stories, "Google Has a Secret Blocklist that Hides YouTube Hate Videos from Advertisers—But It’s Full of Holes," and "Google Blocks Advertisers from Targeting Black Lives Matter YouTube Videos" from our series Google the Giant.

Our methodology is described in "How We Discovered Google’s Hate Blocklist for Ad Placements on YouTube," and "How We Discovered Google’s Social Justice Blocklist for YouTube Ad Placements."

Data that we collected and analyzed are in the data folder.

Jupyter notebooks used for data collection, preprocessing and analysis are in the notebooks folder.

Advertisers can use Google's ad portal to search for YouTube videos and channels related to keywords like: "hiking gear reviews" to advertise on.

Warning: this repository contains many offensive terms and expletives.

Installation

Python

Make sure you have Python 3.6+ installed, we used Miniconda to create a Python 3.8 virtual environment.

Then install the Python packages:
pip install -r requirements.txt

Notebooks

These notebooks are intended to be run sequentially, but they are not dependent on one another.

0-data-collection.ipynb

How we interacted with the "PlacementSuggestionService" API from "ads.google.com". We sent each term from terms.py through the API. Use this notebook for reference: it is not functional due to the expired or redacted parameters present.

1-data-preprocessing.ipynb

Parsing the API responses and fetching the suggested videos and channels for each term we sent to the API.

2-data-analysis-hate.ipynb

The bulk of stats and tables for our hate methodology.

3-suggestion-analysis.ipynb

Looks at videos and channels suggested by the API for hate terms. We cross reference these suggestions with channels identified as "extremist" or "alternative" by researchers from Dartmouth College, Northeastern University, and University of Exeter and provided to the Markup by the ADL, as well as a channels researchers at EPFL and UFMG identified as "alt-right" or "alt-lite".

4-data-analysis-social-justice.ipynb

The bulk of stats and tables for our social justice methodology.

5-rerun-and-check-status.ipynb

After we shared our findings with Google, we re-run the analysis on data collected on March 31, 2021. Check what changed from the original data we collected on November 20, 2020.

utils.py

Contains functions to parse and decipher the API responses.

terms.py

This contains lists of terms used in the series. This includes hate terms sourced from the SPLC, RationalWiki.org, and Muslim Advocates. social_justice terms sourced from Color of Change, MediaJustice, Mijente, and Muslim Advocates. adhoc terms were submitted for comparison against terms in the other lists. noise contains randomly generated strings.

Refer to the "Data" section below for the API status of each of these terms.

Data

This directory is where inputs, intermediaries and outputs are saved.

data
├─── reference
│   ├── placements_api_example_responses
│   │   ├── blocked.json
│   │   ├── empty.json
│   │   ├── full.json
│   │   └── partially_blocked.json
│   └── what_is_blocked.xlsx
├── input
│   ├── channel_lists
│   ├── hate_terms_background_info.csv
│   ├── placements_api
│   │   ├── adhoc
│   │   ├── blocked
│   │   ├── blocked_basewords
│   │   ├── hate
│   │   ├── noise
│   │   ├── policy
│   │   └── social_justice
│   ├── placements_api_deep3
│   │   ├── we wuz kangz.json
│   │   ├── white ethnostate.json
│   │   └── whitegenocide.json
│   └── video_metadata
│       ├── deep_catalog_wwk_wg_we.csv
│       └── topline_hate_videos.csv
└── output
    ├── channel_overlap.csv
    ├── tables
    ├── placements_api_keyword_status
    │   ├── adhoc.csv
    │   ├── basewords.csv
    │   ├── hate.csv
    │   ├── policy.csv
    │   └── social_justice.csv
    └── placements_api_suggestions
        ├── channels_for_hate_terms.csv
        ├── channels_for_social_justice_terms.csv
        ├── videos_for_hate_terms.csv
        └── videos_for_social_justice_terms.csv
filename or directory description
data/reference/placement_api_example_responses/ Examples of blocked, partially_blocked, full and empty responses from the "PlacementSuggestionService" API. See the methodology for details, and determine_status in notebooks/utils.py for implementation.
data/reference/what_is_blocked.xlsx A spreadsheet with the kind of API responses for all the terms in our investigation.
data/output/tables/ CSVs of tables that are in the methodology.
data/input/placements_api/ This contains responses for keywords from notebooks/terms.py that we submitted to the "PlacementSuggestionService" API. Each subdirectory is organized by the keyword list used. blocked are terms that we resubmitted after removing spaces, and blocked_basewords are terms that were blocked and resubmitted word-by-word.
data/output/placements_api_keyword_status/ Contains the API status of keywords from notebooks/terms.py after being sent through the "PlacementSuggestionService" API.
data/output/placements_api_suggestions/ The suggested YouTube videos and channels for each search term.
data/input/placements_api_deep3/ API responses for the hate terms "we wuz kangz", "white ethnostate" and "white genocide" beyond the topline 20 video suggestions.
data/input/video_metadata/ Video metadata for suggested videos from the YouTube Data API (v3). Collected with the unofficial Python client (YouTube Data API)
data/input/channel_lists/ Contains channel names and IDs that researchers at EPFL and UFMG identified as "alt-right" and "alt-lite" in their 2020 ACM FAT* paper "Auditing radicalization pathways on YouTube". We used a supplementary "extremist" and "alternative" channel list created by researchers for the ADL report "Exposure to Alternative & Extremist Content on YouTube", however that list is only available by request.
data/output/channel_overlap.csv The count of unique "extremist", "alternative", "alt-right" and "alt-lite" videos and channels from the topline suggestions for hate terms we sent through the "PlacementSuggestionService" API. We included the channels that were suggested in the channels column.
data/input/hate_terms_background_info.csv Links for more information about each of the terms in the hate list.
data/z_data_rerun/ API responses from re-running the experiment on March 31, 2021. Identical organization as data/input/placements_api/ and data/output/placements_api_keyword_status/.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].