All Projects → douglasnavarro → sp-subway-scraper

douglasnavarro / sp-subway-scraper

Licence: other
🚆This web scraper builds a dataset for São Paulo subway operation status

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to sp-subway-scraper

Sillynium
Automate the creation of Python Selenium Scripts by drawing coloured boxes on webpage elements
Stars: ✭ 100 (+316.67%)
Mutual labels:  scraper, web-scraping
Scrape Linkedin Selenium
`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Stars: ✭ 239 (+895.83%)
Mutual labels:  scraper, web-scraping
Rod
A Devtools driver for web automation and scraping
Stars: ✭ 1,392 (+5700%)
Mutual labels:  scraper, web-scraping
Autoscraper
A Smart, Automatic, Fast and Lightweight Web Scraper for Python
Stars: ✭ 4,077 (+16887.5%)
Mutual labels:  scraper, web-scraping
TikTokDownloader PyWebIO
🚀「Douyin_TikTok_Download_API」是一个开箱即用的高性能异步抖音|TikTok数据爬取工具,支持API调用,在线批量解析及下载。
Stars: ✭ 919 (+3729.17%)
Mutual labels:  scraper, web-scraping
Hockey Scraper
Python Package for scraping NHL Play-by-Play and Shift data
Stars: ✭ 93 (+287.5%)
Mutual labels:  scraper, web-scraping
Phpscraper
PHP Scraper - an highly opinionated web-interface for PHP
Stars: ✭ 148 (+516.67%)
Mutual labels:  scraper, web-scraping
Spidr
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Stars: ✭ 656 (+2633.33%)
Mutual labels:  scraper, web-scraping
saveddit
Bulk Downloader for Reddit
Stars: ✭ 130 (+441.67%)
Mutual labels:  scraper, web-scraping
BookingScraper
🌎 🏨 Scrape Booking.com 🏨 🌎
Stars: ✭ 68 (+183.33%)
Mutual labels:  scraper, web-scraping
papercut
Papercut is a scraping/crawling library for Node.js built on top of JSDOM. It provides basic selector features together with features like Page Caching and Geosearch.
Stars: ✭ 15 (-37.5%)
Mutual labels:  scraper, web-scraping
Linkedin-Client
Web scraper for grabing data from Linkedin profiles or company pages (personal project)
Stars: ✭ 42 (+75%)
Mutual labels:  scraper, web-scraping
Zillow
Zillow Scraper for Python using Selenium
Stars: ✭ 141 (+487.5%)
Mutual labels:  scraper, web-scraping
lopez
Crawling and scraping the Web for fun and profit
Stars: ✭ 20 (-16.67%)
Mutual labels:  scraper, web-scraping
rymscraper
Python API to extract data from rateyourmusic.com.
Stars: ✭ 63 (+162.5%)
Mutual labels:  scraper, web-scraping
OLX Scraper
📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.
Stars: ✭ 15 (-37.5%)
Mutual labels:  scraper, web-scraping
plugin.video.covenant
Covenant Kodi Addon Development - Kodi is a registered trademark of the XBMC Foundation. We are not connected to or in any other way affiliated with Kodi - DMCA: [email protected]
Stars: ✭ 24 (+0%)
Mutual labels:  scraper
scrapy facebooker
Collection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.
Stars: ✭ 22 (-8.33%)
Mutual labels:  scraper
Scraper-Projects
🕸 List of mini projects that involve web scraping 🕸
Stars: ✭ 25 (+4.17%)
Mutual labels:  scraper
peeling-onions
A repository to store Deep Web (onion domain) crawler, scraper, and NLP tools for Tor network.
Stars: ✭ 18 (-25%)
Mutual labels:  scraper

What it is

This project consists of basically a single python script to write the status of the São Paulo subway lines to a docs.google worksheet.

The sheets can be viewed (and freely used for any datascience project) here.

How it works

Every 5 minutes the script fetches the official subway company page using 'requests' module and extracts the operation status as shown in the column on the right-side of the page using 'beautiful soup' module. The last-update time shown is also stored and later on is associated with each subwat line.

Once everything is properly parsed, the information is stored in the worksheet using the 'gspread' module.

The script runs indefinately on heroku.

Unavailability or other issues

If for some reason the data points registered are empty, an e-mail is sent with the page attached so I can see the page and if necessary the logs to find out what happend.

If this data is ever useful to you, let me know. Enjoy! 🍻

Data Analysis

An analysis of the data was made by Paulo! You can read it here

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].