testdrivenio / concurrent-web-scraping

Licence: other

Building a Concurrent Web Scraper with Python and Selenium

Programming Languages

HTML

75241 projects

python

139335 projects - #7 most used programming language

Projects that are alternatives of or similar to concurrent-web-scraping

Web Scraping

Detailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, SHFE and news data crawlers on BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist

Stars: ✭ 153 (+446.43%)

Mutual labels: web-scraping

Trump Lies

Tutorial: Web scraping in Python with Beautiful Soup

Stars: ✭ 201 (+617.86%)

Mutual labels: web-scraping

Wayback Machine Scraper

A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.

Stars: ✭ 230 (+721.43%)

Mutual labels: web-scraping

Scrapy Training

Scrapy Training companion code

Stars: ✭ 157 (+460.71%)

Mutual labels: web-scraping

Twitter Intelligence

Twitter Intelligence OSINT project performs tracking and analysis of the Twitter

Stars: ✭ 179 (+539.29%)

Mutual labels: web-scraping

Short Jokes Dataset

Python scripts for building 'Short Jokes' dataset, featured on Kaggle

Stars: ✭ 215 (+667.86%)

Mutual labels: web-scraping

Juno crawler

Scrapy crawler to collect data on the back catalog of songs listed for sale.

Stars: ✭ 150 (+435.71%)

Mutual labels: web-scraping

UofT-Timetable-Generator

A web application that generates timetables for university students at the University of Toronto

Stars: ✭ 34 (+21.43%)

Mutual labels: web-scraping

Bet On Sibyl

Machine Learning Model for Sport Predictions (Football, Basketball, Baseball, Hockey, Soccer & Tennis)

Stars: ✭ 190 (+578.57%)

Mutual labels: web-scraping

Docbao

Công cụ quét và phân tích từ khoá các trang báo mạng Việt Nam

Stars: ✭ 230 (+721.43%)

Mutual labels: web-scraping

Learnpythonforresearch

This repository provides everything you need to get started with Python for (social science) research.

Stars: ✭ 163 (+482.14%)

Mutual labels: web-scraping

Grab

Web Scraping Framework

Stars: ✭ 2,147 (+7567.86%)

Mutual labels: web-scraping

Selenium Python Helium

Selenium-python but lighter: Helium is the best Python library for web automation.

Stars: ✭ 2,732 (+9657.14%)

Mutual labels: web-scraping

Netflix Clone

Netflix like full-stack application with SPA client and backend implemented in service oriented architecture

Stars: ✭ 156 (+457.14%)

Mutual labels: web-scraping

Scrape Linkedin Selenium

`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.

Stars: ✭ 239 (+753.57%)

Mutual labels: web-scraping

Helena

A Chrome extension for writing custom web scraping programs and web automation programs. Just demonstrate how to collect the first row of data, then let the extension write the program for collecting all rows.

Stars: ✭ 151 (+439.29%)

Mutual labels: web-scraping

R Web Scraping Cheat Sheet

Guide, reference and cheatsheet on web scraping using rvest, httr and Rselenium.

Stars: ✭ 207 (+639.29%)

Mutual labels: web-scraping

wayback

⏪ Tools to Work with the Various Internet Archive Wayback Machine APIs

Stars: ✭ 52 (+85.71%)

Mutual labels: web-scraping

Quora Api

An unofficial API for Quora.

Stars: ✭ 250 (+792.86%)

Mutual labels: web-scraping

City Scrapers

Scrape, standardize and share public meetings from local government websites

Stars: ✭ 220 (+685.71%)

Mutual labels: web-scraping

View All Similar Projects ➔

Concurrent Web Scraping with Python and Selenium

Want to learn how to build this project?

Check out the blog post.

Want to use this project?

Fork/Clone
Create and activate a virtual environment
Install the requirements

Run the scrapers:

# sync
(env)$ python script.py headless

# parallel with multiprocessing
(env)$ python script_parallel_1.py headless

# parallel with concurrent.futures
(env)$ python script_parallel_2.py headless

# concurrent with concurrent.futures (should be the fastest!)
(env)$ python script_concurrent.py headless

# parallel with concurrent.futures and concurrent with asyncio
(env)$ python script_asyncio.py headless

Run the tests:

(env)$ python -m pytest test/test_scraper.py
(env)$ python -m pytest test/test_scraper_mock.py

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

testdrivenio / concurrent-web-scraping

Programming Languages

Labels

Projects that are alternatives of or similar to concurrent-web-scraping

Concurrent Web Scraping with Python and Selenium

Want to learn how to build this project?

Want to use this project?