All Projects → scrapinghub → aduana

scrapinghub / aduana

Licence: other
Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even when making big crawls (one billion pages).

Programming Languages

c
50402 projects - #5 most used programming language
python
139335 projects - #7 most used programming language
CMake
9771 projects

Description Build Status

A library to guide a web crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even when making big crawls (one billion pages).

Warning: I only test with regularity under Linux, my development platform. From time to time I test also on OS X and Windows 8 using MinGW64.

Installation

pip install aduana

Documentation

Available at readthedocs

I have started documenting plans/ideas at the wiki.

Example

Single spider example:

cd example
pip install -r requirements.txt
scrapy crawl example

To run the distributed crawler see the docs

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].