All Projects → jamesqo → Gun Violence Data

jamesqo / Gun Violence Data

A comprehensive, accessible database that contains records of over 260k US gun violence incidents from January 2013 to March 2018.

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Gun Violence Data

Datacamp
🍧 A repository that contains courses I have taken on DataCamp
Stars: ✭ 69 (-43.9%)
Mutual labels:  data-science, statistics
Openml R
R package to interface with OpenML
Stars: ✭ 81 (-34.15%)
Mutual labels:  data-science, statistics
Metriculous
Measure and visualize machine learning model performance without the usual boilerplate.
Stars: ✭ 71 (-42.28%)
Mutual labels:  data-science, statistics
Pycm
Multi-class confusion matrix library in Python
Stars: ✭ 1,076 (+774.8%)
Mutual labels:  data-science, statistics
Tennis Crystal Ball
Ultimate Tennis Statistics and Tennis Crystal Ball - Tennis Big Data Analysis and Prediction
Stars: ✭ 107 (-13.01%)
Mutual labels:  data-science, statistics
Lifetimes
Lifetime value in Python
Stars: ✭ 1,082 (+779.67%)
Mutual labels:  data-science, statistics
Tsv Utils
eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
Stars: ✭ 1,215 (+887.8%)
Mutual labels:  data-science, statistics
Mlj.jl
A Julia machine learning framework
Stars: ✭ 982 (+698.37%)
Mutual labels:  data-science, statistics
Papers Literature Ml Dl Rl Ai
Highly cited and useful papers related to machine learning, deep learning, AI, game theory, reinforcement learning
Stars: ✭ 1,341 (+990.24%)
Mutual labels:  data-science, statistics
Probflow
A Python package for building Bayesian models with TensorFlow or PyTorch
Stars: ✭ 95 (-22.76%)
Mutual labels:  data-science, statistics
25daysinmachinelearning
I will update this repository to learn Machine learning with python with statistics content and materials
Stars: ✭ 53 (-56.91%)
Mutual labels:  data-science, statistics
Sweetviz
Visualize and compare datasets, target values and associations, with one line of code.
Stars: ✭ 1,851 (+1404.88%)
Mutual labels:  data-science, statistics
Ppd599
USC urban data science course series with Python and Jupyter
Stars: ✭ 1,062 (+763.41%)
Mutual labels:  data-science, statistics
Data Science Best Resources
Carefully curated resource links for data science in one place
Stars: ✭ 1,104 (+797.56%)
Mutual labels:  data-science, statistics
Datumbox Framework
Datumbox is an open-source Machine Learning framework written in Java which allows the rapid development of Machine Learning and Statistical applications.
Stars: ✭ 1,063 (+764.23%)
Mutual labels:  data-science, statistics
Hyperlearn
50% faster, 50% less RAM Machine Learning. Numba rewritten Sklearn. SVD, NNMF, PCA, LinearReg, RidgeReg, Randomized, Truncated SVD/PCA, CSR Matrices all 50+% faster
Stars: ✭ 1,204 (+878.86%)
Mutual labels:  data-science, statistics
Socrat
A Dynamic Web Toolbox for Interactive Data Processing, Analysis, and Visualization
Stars: ✭ 26 (-78.86%)
Mutual labels:  data-science, statistics
Pandas Profiling
Create HTML profiling reports from pandas DataFrame objects
Stars: ✭ 8,329 (+6671.54%)
Mutual labels:  data-science, statistics
Bayesian Cognitive Modeling In Pymc3
PyMC3 codes of Lee and Wagenmakers' Bayesian Cognitive Modeling - A Pratical Course
Stars: ✭ 93 (-24.39%)
Mutual labels:  data-science, statistics
Scikit Learn
scikit-learn: machine learning in Python
Stars: ✭ 48,322 (+39186.18%)
Mutual labels:  data-science, statistics

Gun Violence Data

What is this repository?

This repository contains data for all recorded gun violence incidents in the US between January 2013 and March 2018, inclusive.

Why was it created?

There's currently a lack of large and easily-accessible amounts of detailed data on gun violence. This project aims to change that; we make a record of more than 260k gun violence incidents, with detailed information about each incident, available in CSV format. We hope that this will make it easier for data scientists and statisticians to study gun violence and predict future trends.

Where did you get the data?

The data was downloaded from Gun Violence Archive's website. From the organization's description:

Gun Violence Archive (GVA) is a not for profit corporation formed in 2013 to provide free online public access to accurate information about gun-related violence in the United States. GVA will collect and check for accuracy, comprehensive information about gun-related violence in the U.S. and then post and disseminate it online.

All credits for the data go to Gun Violence Archive.

How did you get the data?

Because GVA limits the number of incidents that are returned from a single query, and because the website's "Export to CSV" functionality was missing crucial fields, it was necessary to obtain this dataset using web scraping techniques.

Stage 1: For each date between 1/1/2013 and 3/31/2018, a Python script queried all incidents that happened at that particular date, then scraped the data and wrote it to a CSV file. Each month got its own CSV file, with the exception of 2013, since not many incidents were recorded from then.

Stage 2: Each entry was augmented with additional data not directly viewable from the query results page, such as participant information, geolocation data, etc.

Stage 3: The entries were sorted in order of increasing date, then merged into a single CSV file.

Click here to download the tarball the data is stored in. You can decompress the tarball using the 7-Zip utility on Windows, or via the tar executable on macOS/Linux.

Data format

The data is stored in a single CSV file sorted by increasing date. It has the following fields:

field type description required?
incident_id int gunviolencearchive.org ID for incident yes
date str date of occurrence yes
state str yes
city_or_county str yes
address str address where incident took place yes
n_killed int number of people killed yes
n_injured int number of people injured yes
incident_url str link to gunviolencearchive.org webpage containing details of incident yes
source_url str link to online news story concerning incident no
incident_url_fields_missing bool ignore, always False yes
congressional_district int no
gun_stolen dict[int, str] key: gun ID, value: 'Unknown' or 'Stolen' no
gun_type dict[int, str] key: gun ID, value: description of gun type no
incident_characteristics list[str] list of incident characteristics no
latitude float no
location_description str description of location where incident took place no
longitude float no
n_guns_involved int number of guns involved no
notes str additional notes about the incident no
participant_age dict[int, int] key: participant ID no
participant_age_group dict[int, str] key: participant ID, value: description of age group, e.g. 'Adult 18+' no
participant_gender dict[int, str] key: participant ID, value: 'Male' or 'Female' no
participant_name dict[int, str] key: participant ID no
participant_relationship dict[int, str] key: participant ID, value: relationship of participant to other participants no
participant_status dict[int, str] key: participant ID, value: 'Arrested', 'Killed', 'Injured', or 'Unharmed' no
participant_type dict[int, str] key: participant ID, value: 'Victim' or 'Subject-Suspect' no
sources list[str] links to online news stories concerning incident no
state_house_district int no
state_senate_district int no

Important notes:

  • Each list is encoded as a string with separator ||. For example, "a||b" represents ['a', 'b'].

  • Each dict is encoded as a string with outer separator || and inner separator ::. For example, 0::a, 1::b represents {0: 'a', 1: 'b'}.

  • The "gun ID" and "participant ID" are numbers specific to a given incident that refer to a particular gun/person involved in that incident. For example, this:

    participant_age_group = 0::Teen 12-17||1::Adult 18+
    participant_status = 0::Killed||1::Injured
    participant_type = 0::Victim||1::Victim
    

    corresponds to this:

    Age Group Status Type
    Participant #0 Teen 12-17 Killed Victim
    Participant #1 Adult 18+ Injured Victim

Example

The incident described here resulted in the following fields:

incident_id date state city_or_county address n_killed n_injured incident_url source_url incident_url_fields_missing congressional_district gun_stolen gun_type incident_characteristics latitude location_description longitude n_guns_involved notes participant_age participant_age_group participant_gender participant_name participant_relationship participant_status participant_type sources state_house_district state_senate_district
1081561 3/29/2018 Colorado Pueblo 617 W Northern Ave 0 0 http://www.gunviolencearchive.org/incident/1081561 https://www.chieftain.com/news/crime/pueblo-sheriff-seizes-illegal-guns-drugs-cash-in-bessemer-building/article_436d713a-4be6-565f-a919-747ab83e66df.html False 3 0::Stolen||1::Unknown||2::Unknown||3::Unknown||4::Unknown||5::Unknown||6::Unknown||7::Unknown||8::Unknown||9::Unknown||10::Unknown||11::Unknown||12::Unknown||13::Unknown||14::Unknown||15::Unknown||16::Unknown||17::Unknown||18::Unknown||19::Unknown||20::Unknown||21::Unknown||22::Unknown||23::Unknown||24::Unknown 0::Handgun||1::Handgun||2::Unknown||3::Unknown||4::Unknown||5::Unknown||6::Unknown||7::Unknown||8::Unknown||9::Unknown||10::Unknown||11::Unknown||12::Unknown||13::Unknown||14::Unknown||15::Unknown||16::Unknown||17::Unknown||18::Unknown||19::Unknown||20::Unknown||21::Unknown||22::Unknown||23::Unknown||24::Unknown Non-Shooting Incident||Drug involvement||ATF/LE Confiscation/Raid/Arrest||Possession (gun(s) found during commission of other crimes)||Possession of gun by felon or prohibited person||Stolen/Illegally owned gun{s} recovered during arrest/warrant 38.2442 Bessemer -104.618 25 Guns and drugs recovered from residence. 0::43 0::Adult 18+ 0::Male 0::Phillip W. Key 0::Unharmed, Arrested 0::Subject-Suspect https://www.chieftain.com/news/crime/pueblo-sheriff-seizes-illegal-guns-drugs-cash-in-bessemer-building/article_436d713a-4be6-565f-a919-747ab83e66df.html 46 3

Additional notes

  • The list of incidents from 2013 is not exhaustive; only 279 incidents from that year were catalogued.

  • 2 incidents were manually removed from the dataset: the Las Vegas mass shooting incident and incident 1081885.

    • The Las Vegas mass shooting had to be removed because information about the incident was stored in a PDF, which caused scraping to fail since the scraper expects an HTML webpage.
    • Incident 1081885 had to be removed because the location details were not parsing nicely.
    • PRs to manually add back either/both of these incidents are welcome. (Please edit the stage1 files in intermediate/)
  • Known issue: the address field should be required, but is missing for ~16k incidents.

  • Please provide credit to and notify Gun Violence Archive if you intend to use this dataset in your project. Read their terms here.

    All we ask is to please provide proper credit for use of Gun Violence Archive data and advise us of its use.

Contact us

If you're interested in using this dataset for your research, feel free to contact us and ask questions / let us know about your work at [email protected]. Please note that we are not affiliated with gunviolencearchive.org in any way.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].