All Projects → irworkshop → campfin

irworkshop / campfin

Licence: CC-BY-4.0 license
R package to help wrangle campaign finance data 💸

Programming Languages

r
7636 projects

Projects that are alternatives of or similar to campfin

census-error-analyzer
Analyze the margin of error in U.S. census data
Stars: ✭ 15 (+15.38%)
Mutual labels:  data-journalism
civic-scraper
Tools for downloading agendas, minutes and other documents produced by local government
Stars: ✭ 21 (+61.54%)
Mutual labels:  data-journalism
kobe-every-shot-ever
A Los Angeles Times analysis of Every shot in Kobe Bryant's NBA career
Stars: ✭ 66 (+407.69%)
Mutual labels:  data-journalism
django-calaccess-raw-data
A Django app to download, extract and load campaign finance and lobbying activity data from the California Secretary of State's CAL-ACCESS database
Stars: ✭ 61 (+369.23%)
Mutual labels:  data-journalism
census-map-downloader
Easily download U.S. census maps
Stars: ✭ 31 (+138.46%)
Mutual labels:  data-journalism
CampaignFinance
NC Campaign Finance Dashboard. Making NC Campaign Funds visible to all citizens
Stars: ✭ 36 (+176.92%)
Mutual labels:  campaign-finance
california-electricity-capacity-analysis
A Los Angeles Times analysis of California's costly power glut
Stars: ✭ 17 (+30.77%)
Mutual labels:  data-journalism
django-anss-archive
A Django application to archive real-time earthquake notifications from the USGS's Advanced National Seismic System
Stars: ✭ 14 (+7.69%)
Mutual labels:  data-journalism
los-angeles-police-killings-data
The Los Angeles Times' database of people killed by local police in Los Angeles County.
Stars: ✭ 14 (+7.69%)
Mutual labels:  data-journalism
Idyll
Create explorable explanations and interactive essays.
Stars: ✭ 1,848 (+14115.38%)
Mutual labels:  data-journalism
tutorials
All of our code examples and tutorials
Stars: ✭ 62 (+376.92%)
Mutual labels:  data-journalism
ClimateChangeProjections
An embeddable map that shows climate change projections. How hot will it be by 2070 if we don't do something about it? Accessible at https://climatechange.codeforafrica.org
Stars: ✭ 29 (+123.08%)
Mutual labels:  data-journalism
idyll-studio
A graphical editor for creating Idyll documents.
Stars: ✭ 63 (+384.62%)
Mutual labels:  data-journalism
account
📚️ ➕ 🔢 Tell little stories with numbers
Stars: ✭ 94 (+623.08%)
Mutual labels:  data-journalism

campfin

Lifecycle: maturing CRAN status Downloads Codecov test coverage R build status

The campfin package was created to facilitate the work being done on the The Accountability Project, a tool created by The Investigative Reporting Workshop in Washington, DC. The Accountability Project curates, cleans, and indexes public data to give journalists, researchers and others a simple way to search across otherwise siloed records.

The data focuses on people, organizations and locations. This package was created specifically to help with state-level campaign finance data, although the tools included are useful in general database exploration and normalization.

Installation

You can install the released version of campfin from CRAN with:

install.packages("campfin")

The development version can be installed from GitHub with:

# install.packages("remotes")
remotes::install_github("irworkshop/campfin")

Normalize

The package was originally built to normalize geographic data using the normal_*() functions, which take the messy self-reported geographic data of a contributor, vendor, candidate, or committee and return normalized text that is more searchable. They are largely wrappers around the stringr package, and can call other sub-functions to streamline normalization.

  • normal_address() takes a street address and reduces inconsistencies.
  • normal_zip() takes ZIP Codes and aims to return a valid 5-digit code.
  • normal_state() takes US states and returns a 2 digit abbreviation.
  • normal_city() takes cities and reduces inconsistencies.
  • normal_phone() consistently formats US telephone numbers.

Please see the vignette on normalization for an example of how these functions are used to fix a wide variety of string inconsistencies and make campaign finance data more consistent.

Data

library(campfin)
library(tidyverse)

The campfin package contains a number of built in data frames and strings used to help wrangle campaign finance data.

The /data-raw directory contains the code used to create the objects.

zipcodes

The zipcodes (plural) table is a new version of the zipcode (singular) table from the archived zipcode R package.

This database was composed using ZIP code gazetteers from the US Census Bureau from 1999 and 2000, augmented with additional ZIP code information The database is believed to contain over 98% of the ZIP Codes in current use in the United States. The remaining ZIP Codes absent from this database are entirely PO Box or Firm ZIP codes added in the last five years, which are no longer published by the Census Bureau, but in any event serve a very small minority of the population (probably on the order of .1% or less). Although every attempt has been made to filter them out, this data set may contain up to .5% false positives, that is, ZIP codes that do not exist or are no longer in use but are included due to erroneous data sources.

The included valid_city and valid_zip vectors are sorted, unique columns from the zipcodes data frame.

sample_frac(zipcodes)
#> # A tibble: 44,336 × 3
#>    city       state zip  
#>    <chr>      <chr> <chr>
#>  1 SAN JUAN   PR    00914
#>  2 BRANCHDALE PA    17923
#>  3 ATHENS     IL    62613
#>  4 ALBANY     GA    31706
#>  5 HULL       IA    51239
#>  6 CHICAGO    IL    60640
#>  7 WASHINGTON DC    20380
#>  8 LA HONDA   CA    94020
#>  9 POMONA     CA    91767
#> 10 OSHKOSH    NE    69190
#> # … with 44,326 more rows

usps_* and valid_*

The usps_* data frames were scraped from the official United States Postal Service (USPS) Postal Addressing Standards. These data frames are designed to work with the abbreviation functionality of normal_address() and normal_city() to replace common abbreviations with their full equivalent.

usps_city is a curated subset of usps_state, whose full version appear at least once in the valid_city vector from zipcodes. The valid_state and valid_name vectors contain the columns from usps_state and include territories not found in R’s build in state.abb and state.name vectors.

sample_n(usps_street, 3)
#> # A tibble: 3 × 2
#>   full   abb  
#>   <chr>  <chr>
#> 1 PLAIN  PLN  
#> 2 COVE   CV   
#> 3 ARCADE ARC
sample_n(usps_state, 3)
#> # A tibble: 3 × 2
#>   full      abb  
#>   <chr>     <chr>
#> 1 UTAH      UT   
#> 2 ALABAMA   AL   
#> 3 WISCONSIN WI
setdiff(valid_state, state.abb)
#>  [1] "AS" "AA" "AE" "AP" "DC" "FM" "GU" "MH" "MP" "PW" "PR" "VI"

The campfin project is released with a Contributor Code of Conduct. By contributing, you agree to abide by its terms.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].