All Git Users → scrapinghub

40 open source projects by scrapinghub

1. Webstruct
NER toolkit for HTML data
2. Price Parser
Extract price amount and currency symbol from a raw text string
✭ 180
python
3. Testspiders
Useful test spiders for Scrapy
✭ 172
python
4. Python Scrapinghub
A client interface for Scrapinghub's API
✭ 169
python
5. Adblockparser
Python parser for Adblock Plus filters
✭ 158
python
6. Scrapy Training
Scrapy Training companion code
7. Dateparser
python parser for human readable dates
8. Js2xml
Convert Javascript code to an XML document
9. Skinfer
Skinfer is a tool for inferring and merging JSON schemas
✭ 117
python
10. Aile
Automatic Item List Extraction
11. Pydepta
A python implementation of DEPTA
12. Wappalyzer Python
UNMAINTAINED Python wrapper for Wappalyzer (utility that uncovers the technologies used on websites)
✭ 75
python
13. Frontera
A scalable frontier for web crawlers
✭ 1,147
python
14. Slackbot
A chat bot for Slack (https://slack.com).
✭ 1,131
python
15. Portia
Visual scraping for Scrapy
16. Page clustering
A simple algorithm for clustering web pages, suitable for crawlers
17. Crawlera Tools
Crawlera tools
✭ 26
python
18. Python Crfsuite
A python binding for crfsuite
19. Scrapyrt
HTTP API for Scrapy spiders
20. Extruct
Extract embedded metadata from HTML markup
21. Splash
Lightweight, scriptable browser as a service with an HTTP API
22. Spidermon
Scrapy Extension for monitoring spiders execution.
23. aduana
Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even when making big crawls (one billion pages).
24. scrapylib
Collection of Scrapy utilities (extensions, middlewares, pipelines, etc)
✭ 27
python
25. scrapy-autounit
Automatic unit test generation for Scrapy.
✭ 49
python
26. scrapinghub-entrypoint-scrapy
Scrapy entrypoint for Scrapinghub job runner
✭ 23
python
27. flatson
Tool to flatten stream of JSON-like objects, configured via schema
28. scmongo
MongoDB extensions for Scrapy
✭ 43
python
29. kafka-scanner
High Level Kafka Scanner
✭ 18
python
30. webpager
Paginating the web
✭ 35
cpython
31. scrapinghub-stack-scrapy
Software stack with latest Scrapy and updated deps
✭ 58
Dockerfile
32. shub
Scrapinghub Command Line Client
✭ 118
python
33. exporters
Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations
34. scrapy-mosquitera
Restrict crawl and scraping scope using matchers.
35. web-poet
Web scraping Page Objects core library
36. number-parser
Parse numbers written in natural language
37. mdr
A python library detect and extract listing data from HTML page.
39. python-cld2
Python bindings for CLD2.
✭ 17
pythonC++
40. docker-devpi
pypi caching service using devpi and docker
✭ 27
shell
1-40 of 40 user projects