Police-Data-Accessibility-Project / PDAP-Scrapers

Licence: GPL-3.0 license
Code relating to scraping public police data.

Programming Languages

python
139335 projects - #7 most used programming language
javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to PDAP-Scrapers

bigquery-kafka-connect
☁️ nodejs kafka connect connector for Google BigQuery
Stars: ✭ 17 (-76.39%)
Mutual labels:  etl
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-66.67%)
Mutual labels:  etl
cobrix
A COBOL parser and Mainframe/EBCDIC data source for Apache Spark
Stars: ✭ 109 (+51.39%)
Mutual labels:  etl
angel.co-companies-list-scraping
No description or website provided.
Stars: ✭ 54 (-25%)
Mutual labels:  scraper
diosts
A Go scraper that validates security.txt files and outputs them in the disclose.io JSON format.
Stars: ✭ 18 (-75%)
Mutual labels:  scraper
scraped-tvtime-api
A free TVTime API based on scraping TVTime website. No API key required
Stars: ✭ 23 (-68.06%)
Mutual labels:  scraper
singer-runner
A CLI and library to run Singer Taps and Targets
Stars: ✭ 33 (-54.17%)
Mutual labels:  etl
DataBridge.NET
Configurable data bridge for permanent ETL jobs
Stars: ✭ 16 (-77.78%)
Mutual labels:  etl
wrangle
A data transformation package for deep learning with Autonomio, Keras and TensorFlow.
Stars: ✭ 15 (-79.17%)
Mutual labels:  etl
wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (-27.78%)
Mutual labels:  scraper
esaj
Scrapers for many e-SAJ systems
Stars: ✭ 35 (-51.39%)
Mutual labels:  scraper
mik
The Move to Islandora Kit is an extensible PHP command-line tool for converting source content and metadata into packages suitable for importing into Islandora (or other digital repository and preservations systems).
Stars: ✭ 32 (-55.56%)
Mutual labels:  etl
VK-Scraper
Scrapes VK user's photos
Stars: ✭ 42 (-41.67%)
Mutual labels:  scraper
oge
Page metadata as a service
Stars: ✭ 22 (-69.44%)
Mutual labels:  scraper
MangaReaderScraper
Search and download mangas from the command line
Stars: ✭ 23 (-68.06%)
Mutual labels:  scraper
architect big data solutions with spark
code, labs and lectures for the course
Stars: ✭ 40 (-44.44%)
Mutual labels:  etl
cubetl
CubETL - Framework and tool for data ETL (Extract, Transform and Load) in Python (PERSONAL PROJECT / SELDOM MAINTAINED)
Stars: ✭ 21 (-70.83%)
Mutual labels:  etl
scraper
A simple web scraper built around the JavaFX WebEngine
Stars: ✭ 13 (-81.94%)
Mutual labels:  scraper
go-bqloader
bqloader is a simple ETL framework to load data from Cloud Storage into BigQuery.
Stars: ✭ 16 (-77.78%)
Mutual labels:  etl
ScrapeM
A monadic web scraping library
Stars: ✭ 17 (-76.39%)
Mutual labels:  scraper

Police Data Accessibility Project Scrapers

This repo contains the data scrapers for Police Data Accessibility Project. Thank you for being here!

How to run a scraper

Right now, this requires some Python knowledge and patience. We're in the early stages: there's no automated scraper farm or fancy GUI yet.

  1. Install Python.
  2. Clone this repo.
  3. Find the scraper you wish to run. These are sorted geographically, so start by looking in /USA/....
  4. Run the scraper.py file with something like python3 <scraper path> depending on how you installed it.

Did it work?

If it worked, discuss your findings in our Discord. If it didn't, make an issue in this repo or reach out in Discord.

How to contribute

To write a scraper, start with CONTRIBUTING.md. Be sure to check out the /common folder!

For everything else, start with docs.pdap.io.

What data are we scraping?

The datasets listed here are our to-do list. If we should targeting a new data type, suggest it in Discord or make a DoltHub PR!

Resources

Potentially useful tools. If you find something useful, or if one of these is out of date, make a PR!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].