A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.

Stars: ✭ 38 (+72.73%)

Mutual labels: spider, scraping, scrapy

wget-lua

Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.

Stars: ✭ 52 (+136.36%)

Mutual labels: scraper, spider, scraping

Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy

Stars: ✭ 309 (+1304.55%)

Mutual labels: scraper, scraping, scrapy

OpenScraper

An open source webapp for scraping: towards a public service for webscraping

Stars: ✭ 80 (+263.64%)

Mutual labels: scraper, spider, scrapy

Geziyor

Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.

Stars: ✭ 1,246 (+5563.64%)

Mutual labels: scraper, spider, scraping

Seleniumcrawler

An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site

Stars: ✭ 117 (+431.82%)

Mutual labels: scraper, scraping, scrapy

Linkedin Profile Scraper

🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.

Stars: ✭ 171 (+677.27%)

Mutual labels: scraper, spider, scraping

facebook-discussion-tk

A collection of tools to (semi-)automatically collect and analyze data from online discussions on Facebook groups and pages.

Stars: ✭ 33 (+50%)

Mutual labels: scraper, facebook, scraping

Colly

Elegant Scraper and Crawler Framework for Golang

Stars: ✭ 15,535 (+70513.64%)

Mutual labels: scraper, spider, scraping

Email Extractor

The main functionality is to extract all the emails from one or several URLs - La funcionalidad principal es extraer todos los correos electrónicos de una o varias Url

Stars: ✭ 81 (+268.18%)

Mutual labels: scraper, scraping, scrapy

Goribot

[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。

Stars: ✭ 190 (+763.64%)

Mutual labels: scraper, spider, scrapy

crawler-chrome-extensions

爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler developer

Stars: ✭ 53 (+140.91%)

Mutual labels: scraper, spider, scraping

V2EX Spider

V2EX爬虫

Stars: ✭ 21 (-4.55%)

Mutual labels: spider, scrapy

elves

🎊 Design and implement of lightweight crawler framework.

Stars: ✭ 322 (+1363.64%)

Mutual labels: spider, scrapy

OLX Scraper

📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.

Stars: ✭ 15 (-31.82%)

Mutual labels: scraper, scrapy

163Music

163music spider by scrapy.

Stars: ✭ 60 (+172.73%)

Mutual labels: spider, scrapy

View All Similar Projects ➔

scrapy_facebooker

scrapy_facebooker is a collection of scrapy spiders which can scrape posts, images, and so on from public Faceook Pages.

These spiders are intended to archive public Facebook pages, use it at your own risk!

There are spiders which can operate normally without a Facebook account, but there are also spiders which just can operate with a Facebook Graph API access token.

How to prepare

Before using these spiders you need to install all of its dependencies, you can easily install it in one command:

pip install -r requirements.txt

This project is intended to run in Python 3.

How to run

To run a spider, first you need to choose what spider you want to use, you can look at spiders available at this project in /scrapy_facebooker/spiders/.

For example, I want to use facebook_post spider and run it to scrape a public page in Facebook with username RHWEBsites, and print the output to a file named output.json:

$ scrapy crawl facebook_post -a target_username=RHWEBsites -o output.json

This is a name list of the spiders available in this repository:

facebook_event_graph
facebook_post_graph
facebook_photo_graph
facebook_video_graph
facebook_event
facebook_post
facebook_photo

License

Is available at LICENSE.txt in the root of this project.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

refeed / scrapy_facebooker

Programming Languages

Labels

Projects that are alternatives of or similar to scrapy facebooker

scrapy_facebooker

How to prepare

How to run

License