All Projects → jamesturk → Scrapelib

jamesturk / Scrapelib

Licence: bsd-2-clause
⛏ a library for scraping things

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Scrapelib

Proxyscrape
Python library for retrieving free proxies (HTTP, HTTPS, SOCKS4, SOCKS5).
Stars: ✭ 134 (-18.29%)
Mutual labels:  scraper
Youtube Projects
This repository contains all the code I use in my YouTube tutorials.
Stars: ✭ 144 (-12.2%)
Mutual labels:  scraper
Demeter
Demeter is a tool for scraping the calibre web ui
Stars: ✭ 155 (-5.49%)
Mutual labels:  scraper
Udemycoursegrabber
Your will to enroll in Udemy course is here, but the money isn't? Search no more! This python program searches for your desired course in more than [insert big number here] websites, compares the last updated date, and gives you the download link of the latest one back, but you also have the choice to see the other ones as well!
Stars: ✭ 137 (-16.46%)
Mutual labels:  scraper
Zillow
Zillow Scraper for Python using Selenium
Stars: ✭ 141 (-14.02%)
Mutual labels:  scraper
Scraperwiki Python
ScraperWiki Python library for scraping and saving data
Stars: ✭ 146 (-10.98%)
Mutual labels:  scraper
Mwoffliner
Scrape any online Mediawiki motorised wiki (like Wikipedia) to your local filesystem
Stars: ✭ 121 (-26.22%)
Mutual labels:  scraper
Opensanctions
An open database of international sanctions data, persons of interest and politically exposed persons
Stars: ✭ 157 (-4.27%)
Mutual labels:  scraper
Google Play Scraper
Google play scraper for Python inspired by <facundoolano/google-play-scraper>
Stars: ✭ 143 (-12.8%)
Mutual labels:  scraper
Serpscrap
SEO python scraper to extract data from major searchengine result pages. Extract data like url, title, snippet, richsnippet and the type from searchresults for given keywords. Detect Ads or make automated screenshots. You can also fetch text content of urls provided in searchresults or by your own. It's usefull for SEO and business related research tasks.
Stars: ✭ 153 (-6.71%)
Mutual labels:  scraper
Onegram
This repository is no longer maintained.
Stars: ✭ 137 (-16.46%)
Mutual labels:  scraper
Go Jd
京东自动登录,在线商品自动下单
Stars: ✭ 139 (-15.24%)
Mutual labels:  scraper
Phpscraper
PHP Scraper - an highly opinionated web-interface for PHP
Stars: ✭ 148 (-9.76%)
Mutual labels:  scraper
Newspaper
News, full-text, and article metadata extraction in Python 3. Advanced docs:
Stars: ✭ 11,545 (+6939.63%)
Mutual labels:  scraper
Instagram Scraper
scrapes medias, likes, followers, tags and all metadata. Inspired by instagram-php-scraper,bot
Stars: ✭ 2,209 (+1246.95%)
Mutual labels:  scraper
Scraper
A scraper that switches between normal mode and gentleman mode, built on Eletron, React
Stars: ✭ 127 (-22.56%)
Mutual labels:  scraper
Google2csv
Google2Csv a simple google scraper that saves the results on a csv/xlsx/jsonl file
Stars: ✭ 145 (-11.59%)
Mutual labels:  scraper
Datmusic Api
Alternative for VK Audio API
Stars: ✭ 160 (-2.44%)
Mutual labels:  scraper
Covid19 mobility
COVID-19 Mobility Data Aggregator. Scraper of Google, Apple, Waze and TomTom COVID-19 Mobility Reports🚶🚘🚉
Stars: ✭ 156 (-4.88%)
Mutual labels:  scraper
Nooverviewavailable.com
A survey of Apple developer documentation.
Stars: ✭ 152 (-7.32%)
Mutual labels:  scraper

========= scrapelib

.. image:: https://github.com/jamesturk/scrapelib/workflows/Test/badge.svg :target: https://github.com/jamesturk/scrapelib/actions

.. image:: https://coveralls.io/repos/jamesturk/scrapelib/badge.png?branch=master :target: https://coveralls.io/r/jamesturk/scrapelib

.. image:: https://img.shields.io/pypi/v/scrapelib.svg :target: https://pypi.python.org/pypi/scrapelib

.. image:: https://readthedocs.org/projects/scrapelib/badge/?version=latest :target: https://readthedocs.org/projects/scrapelib/?badge=latest :alt: Documentation Status

scrapelib is a library for making requests to less-than-reliable websites, it is implemented (as of 0.7) as a wrapper around requests <http://python-requests.org>_.

scrapelib originated as part of the Open States <http://openstates.org/>_ project to scrape the websites of all 50 state legislatures and as a result was therefore designed with features desirable when dealing with sites that have intermittent errors or require rate-limiting.

Advantages of using scrapelib over alternatives like httplib2 simply using requests as-is:

  • All of the power of the suberb requests <http://python-requests.org>_ library.
  • HTTP, HTTPS, and FTP requests via an identical API
  • support for simple caching with pluggable cache backends
  • request throttling
  • configurable retries for non-permanent site failures

Written by James Turk [email protected], thanks to Michael Stephens for initial urllib2/httplib2 version

See https://github.com/jamesturk/scrapelib/graphs/contributors for contributors.

Requirements

  • python 2.7, >=3.3
  • requests >= 2.0 (earlier versions may work but aren't tested)

Example Usage

Documentation: http://scrapelib.readthedocs.org/en/latest/

::

import scrapelib s = scrapelib.Scraper(requests_per_minute=10)

Grab Google front page

s.get('http://google.com')

Will be throttled to 10 HTTP requests per minute

while True: s.get('http://example.com')

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].