All Projects → estin → pomp

estin / pomp

Licence: other
Screen scraping and web crawling framework

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to pomp

feedsearch-crawler
Crawl sites for RSS, Atom, and JSON feeds.
Stars: ✭ 23 (-62.3%)
Mutual labels:  scraping, crawling, asyncio
Easy Scraping Tutorial
Simple but useful Python web scraping tutorial code.
Stars: ✭ 583 (+855.74%)
Mutual labels:  scraping, crawling, asyncio
scrape-github-trending
Tutorial for web scraping / crawling with Node.js.
Stars: ✭ 42 (-31.15%)
Mutual labels:  scraping, crawling
double-agent
A test suite of common scraper detection techniques. See how detectable your scraper stack is.
Stars: ✭ 123 (+101.64%)
Mutual labels:  scraping, crawling
scrapy-fieldstats
A Scrapy extension to log items coverage when the spider shuts down
Stars: ✭ 17 (-72.13%)
Mutual labels:  scraping, crawling
Antch
Antch, a fast, powerful and extensible web crawling & scraping framework for Go
Stars: ✭ 198 (+224.59%)
Mutual labels:  scraping, crawling
Colly
Elegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+25367.21%)
Mutual labels:  scraping, crawling
diffbot-php-client
[Deprecated - Maintenance mode - use APIs directly please!] The official Diffbot client library
Stars: ✭ 53 (-13.11%)
Mutual labels:  scraping, crawling
Scrapy
Scrapy, a fast high-level web crawling & scraping framework for Python.
Stars: ✭ 42,343 (+69314.75%)
Mutual labels:  scraping, crawling
proxycrawl-python
ProxyCrawl Python library for scraping and crawling
Stars: ✭ 51 (-16.39%)
Mutual labels:  scraping, crawling
zcrawl
An open source web crawling platform
Stars: ✭ 21 (-65.57%)
Mutual labels:  scraping, crawling
wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (-14.75%)
Mutual labels:  scraping, crawling
Linkedin Profile Scraper
🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (+180.33%)
Mutual labels:  scraping, crawling
Linkedin Learning Downloader
Linkedin Learning videos downloader
Stars: ✭ 171 (+180.33%)
Mutual labels:  scraping, asyncio
Memorious
Distributed crawling framework for documents and structured data.
Stars: ✭ 248 (+306.56%)
Mutual labels:  scraping, crawling
Awesome Puppeteer
A curated list of awesome puppeteer resources.
Stars: ✭ 1,728 (+2732.79%)
Mutual labels:  scraping, crawling
socials
👨‍👩‍👦 Social account detection and extraction in Python, e.g. for crawling/scraping.
Stars: ✭ 37 (-39.34%)
Mutual labels:  scraping, crawling
Grawler
Grawler is a tool written in PHP which comes with a web interface that automates the task of using google dorks, scrapes the results, and stores them in a file.
Stars: ✭ 98 (+60.66%)
Mutual labels:  scraping, crawling
Dotnetcrawler
DotnetCrawler is a straightforward, lightweight web crawling/scrapying library for Entity Framework Core output based on dotnet core. This library designed like other strong crawler libraries like WebMagic and Scrapy but for enabling extandable your custom requirements. Medium link : https://medium.com/@mehmetozkaya/creating-custom-web-crawler-with-dotnet-core-using-entity-framework-core-ec8d23f0ca7c
Stars: ✭ 100 (+63.93%)
Mutual labels:  scraping, crawling
crawling-framework
Easily crawl news portals or blog sites using Storm Crawler.
Stars: ✭ 22 (-63.93%)
Mutual labels:  scraping, crawling

Pomp

circleci codecov Latest PyPI version python versions Have wheel License

Pomp is a screen scraping and web crawling framework. Pomp is inspired by and similar to Scrapy, but has a simpler implementation that lacks the hard Twisted dependency.

Features:

  • Pure python
  • Only one dependency for Python 2.x - concurrent.futures (backport of package for Python 2.x)
  • Supports one file applications; Pomps doesn't force a specific project layout or other restrictions.
  • Pomp is a meta framework like Paste: you may use it to create your own scraping framework.
  • Extensible networking: you may use any sync or async method.
  • No parsing libraries in the core; use you preferred approach.
  • Pomp instances may be distributed and are designed to work with an external queue.

Pomp makes no attempt to accomodate:

  • redirects
  • proxies
  • caching
  • database integration
  • cookies
  • authentication
  • etc.

If you want proxies, redirects, or similar, you may use the excellent requests library as the Pomp downloader.

Pomp examples

Pomp docs

Pomp is written and maintained by Evgeniy Tatarkin and is licensed under the BSD license.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].