All Projects → City-Bureau → City Scrapers

City-Bureau / City Scrapers

Licence: mit
Scrape, standardize and share public meetings from local government websites

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to City Scrapers

Faster Than Requests
Faster requests on Python 3
Stars: ✭ 639 (+190.45%)
Mutual labels:  open-data, scrapy, web-scraping
Scrapy Fake Useragent
Random User-Agent middleware based on fake-useragent
Stars: ✭ 520 (+136.36%)
Mutual labels:  scrapy, web-scraping
Scrapple
A framework for creating semi-automatic web content extractors
Stars: ✭ 464 (+110.91%)
Mutual labels:  scrapy, web-scraping
Querido Diario
📰 Brazilian government gazettes, accessible to everyone.
Stars: ✭ 681 (+209.55%)
Mutual labels:  hacktoberfest, open-data
restaurant-finder-featureReviews
Build a Flask web application to help users retrieve key restaurant information and feature-based reviews (generated by applying market-basket model – Apriori algorithm and NLP on user reviews).
Stars: ✭ 21 (-90.45%)
Mutual labels:  web-scraping, scrapy
Alltheplaces
A set of spiders and scrapers to extract location information from places that post their location on the internet.
Stars: ✭ 277 (+25.91%)
Mutual labels:  hacktoberfest, scrapy
Sentinelsat
Search and download Copernicus Sentinel satellite images
Stars: ✭ 576 (+161.82%)
Mutual labels:  hacktoberfest, open-data
scrapy-wayback-machine
A Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
Stars: ✭ 92 (-58.18%)
Mutual labels:  web-scraping, scrapy
Scrapy Craigslist
Web Scraping Craigslist's Engineering Jobs in NY with Scrapy
Stars: ✭ 54 (-75.45%)
Mutual labels:  scrapy, web-scraping
Scrapyd Cluster On Heroku
Set up free and scalable Scrapyd cluster for distributed web-crawling with just a few clicks. DEMO 👉
Stars: ✭ 106 (-51.82%)
Mutual labels:  scrapy, web-scraping
Maria Quiteria
Backend para coleta e disponibilização dos dados 📜
Stars: ✭ 115 (-47.73%)
Mutual labels:  hacktoberfest, scrapy
scraping-ebay
Scraping Ebay's products using Scrapy Web Crawling Framework
Stars: ✭ 79 (-64.09%)
Mutual labels:  web-scraping, scrapy
IMDB-Scraper
Scrapy project for scraping data from IMDB with Movie Dataset including 58,623 movies' data.
Stars: ✭ 37 (-83.18%)
Mutual labels:  web-scraping, scrapy
Serenata De Amor
🕵 Artificial Intelligence for social control of public administration
Stars: ✭ 4,367 (+1885%)
Mutual labels:  hacktoberfest, open-data
OLX Scraper
📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.
Stars: ✭ 15 (-93.18%)
Mutual labels:  web-scraping, scrapy
Netflix Clone
Netflix like full-stack application with SPA client and backend implemented in service oriented architecture
Stars: ✭ 156 (-29.09%)
Mutual labels:  scrapy, web-scraping
Scrapy Splash
Scrapy+Splash for JavaScript integration
Stars: ✭ 2,666 (+1111.82%)
Mutual labels:  hacktoberfest, scrapy
Place2live
Analysis of the characteristics of different countries
Stars: ✭ 30 (-86.36%)
Mutual labels:  hacktoberfest, scrapy
Juno crawler
Scrapy crawler to collect data on the back catalog of songs listed for sale.
Stars: ✭ 150 (-31.82%)
Mutual labels:  scrapy, web-scraping
Scrapy Training
Scrapy Training companion code
Stars: ✭ 157 (-28.64%)
Mutual labels:  scrapy, web-scraping

City Scrapers

CI build status Cron build status

Who are the City Bureau Documenters, and why do they want to scrape websites?

Public meetings are important spaces for democracy where any resident can participate and hold public figures accountable. City Bureau's Documenters program pays community members an hourly wage to inform and engage their communities by attending and documenting public meetings.

How does the Documenters program know when meetings are happening? It isn’t easy! These events are spread across dozens of websites, rarely in useful data formats.

That’s why City Bureau is working together with a team of civic coders to develop and coordinate the City Scrapers, a community open source project to scrape and store these meetings in a central database.

What are the City Scrapers?

The City Scrapers collect information about public meetings. Every day, the City Scrapers go out and fetch information about new meetings from the Chicago city council's website, the local school council's website, the Chicago police department's website, and many more. This happens automatically so that a person doesn't have to do it. The scrapers store all of the meeting information in a database for journalists at City Bureau to report on them.

Community members are also welcome to use this code to set up their own databases.

What can I learn from working on the City Scrapers?

A lot about the City of Chicago! What is City Council talking about this week? What are the local school councils, and what community power do they have? What neighborhoods is the police department doing outreach in? Who governs our water?

From building a scraper, you'll gain experience with:

  • how the web works (HTTP requests and responses, reading HTML)
  • writing functions and tests in Python
  • version control and collaborative coding (git and Github)
  • a basic data file format (JSON), working with a schema and data validation
  • problem-solving, finding patterns, designing robust code

How can I set up City Scrapers for my area?

This repo is focused on Chicago, but you can set up City Scrapers for your area by following the instructions in the City-Bureau/city-scrapers-template repo.

Community Mission

The City Bureau Labs community welcomes contributions from everyone. We prioritize learning and leadership opportunities for under-represented individuals in tech and journalism.

We hope that working with us will fill experience gaps (like using git/GitHub, working with data, or having your ideas taken seriously), so that more under-represented people will become decision-makers in both our community and Chicago’s tech and media scenes at large.

Ready to code with us?

  1. Fill out this form to join our Slack channel and meet the community.
  2. Read about how we collaborate and review our Code of Conduct.
  3. Check out our documentation, and get started with Installation and Contributing a spider.

We ask all new contributors to start by writing a spider and its documentation or fixing a bug in an existing one in order to gain familiarity with our code and culture. Reach out on Slack for support if you need it.

Don't want to code?

Join our Slack channel (chatroom) to discuss ideas and meet the community!

A. We have ongoing conversations about what sort of data we should collect and how it should be collected. Help us make these decisions by commenting on issues with a non-coding label.

B. Research sources for public meetings. Answer questions like: Are we scraping events from the right websites? Are there local agencies that we're missing? Should events be updated manually or by a scraper? Triage events sources on these issues.

Support this work

This project is organized and maintained by City Bureau.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].