All Projects → refeed → scrapy_facebooker

refeed / scrapy_facebooker

Licence: MIT License
Collection of scrapy spiders which can scrape posts, images, and so on from public Facebook Pages.

Programming Languages

HTML
75241 projects
python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to scrapy facebooker

Fbcrawl
A Facebook crawler
Stars: ✭ 536 (+2336.36%)
Mutual labels:  scraper, facebook, spider, scrapy
Django Dynamic Scraper
Creating Scrapy scrapers via the Django admin interface
Stars: ✭ 1,024 (+4554.55%)
Mutual labels:  scraper, spider, scraping, scrapy
Crawly
Crawly, a high-level web crawling & scraping framework for Elixir.
Stars: ✭ 440 (+1900%)
Mutual labels:  scraper, spider, scraping
Mailinglistscraper
A python web scraper for public email lists.
Stars: ✭ 19 (-13.64%)
Mutual labels:  scraper, spider, scrapy
scrapy-distributed
A series of distributed components for Scrapy. Including RabbitMQ-based components, Kafka-based components, and RedisBloom-based components for Scrapy.
Stars: ✭ 38 (+72.73%)
Mutual labels:  spider, scraping, scrapy
wget-lua
Wget-AT is a modern Wget with Lua hooks, Zstandard (+dictionary) WARC compression and URL-agnostic deduplication.
Stars: ✭ 52 (+136.36%)
Mutual labels:  scraper, spider, scraping
Linkedin
Linkedin Scraper using Selenium Web Driver, Chromium headless, Docker and Scrapy
Stars: ✭ 309 (+1304.55%)
Mutual labels:  scraper, scraping, scrapy
OpenScraper
An open source webapp for scraping: towards a public service for webscraping
Stars: ✭ 80 (+263.64%)
Mutual labels:  scraper, spider, scrapy
Geziyor
Geziyor, a fast web crawling & scraping framework for Go. Supports JS rendering.
Stars: ✭ 1,246 (+5563.64%)
Mutual labels:  scraper, spider, scraping
Seleniumcrawler
An example using Selenium webdrivers for python and Scrapy framework to create a web scraper to crawl an ASP site
Stars: ✭ 117 (+431.82%)
Mutual labels:  scraper, scraping, scrapy
Linkedin Profile Scraper
🕵️‍♂️ LinkedIn profile scraper returning structured profile data in JSON. Works in 2020.
Stars: ✭ 171 (+677.27%)
Mutual labels:  scraper, spider, scraping
facebook-discussion-tk
A collection of tools to (semi-)automatically collect and analyze data from online discussions on Facebook groups and pages.
Stars: ✭ 33 (+50%)
Mutual labels:  scraper, facebook, scraping
Colly
Elegant Scraper and Crawler Framework for Golang
Stars: ✭ 15,535 (+70513.64%)
Mutual labels:  scraper, spider, scraping
Email Extractor
The main functionality is to extract all the emails from one or several URLs - La funcionalidad principal es extraer todos los correos electrónicos de una o varias Url
Stars: ✭ 81 (+268.18%)
Mutual labels:  scraper, scraping, scrapy
Goribot
[Crawler/Scraper for Golang]🕷A lightweight distributed friendly Golang crawler framework.一个轻量的分布式友好的 Golang 爬虫框架。
Stars: ✭ 190 (+763.64%)
Mutual labels:  scraper, spider, scrapy
crawler-chrome-extensions
爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler developer
Stars: ✭ 53 (+140.91%)
Mutual labels:  scraper, spider, scraping
V2EX Spider
V2EX爬虫
Stars: ✭ 21 (-4.55%)
Mutual labels:  spider, scrapy
elves
🎊 Design and implement of lightweight crawler framework.
Stars: ✭ 322 (+1363.64%)
Mutual labels:  spider, scrapy
OLX Scraper
📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.
Stars: ✭ 15 (-31.82%)
Mutual labels:  scraper, scrapy
163Music
163music spider by scrapy.
Stars: ✭ 60 (+172.73%)
Mutual labels:  spider, scrapy

scrapy_facebooker

Build Status

scrapy_facebooker is a collection of scrapy spiders which can scrape posts, images, and so on from public Faceook Pages.

These spiders are intended to archive public Facebook pages, use it at your own risk!

There are spiders which can operate normally without a Facebook account, but there are also spiders which just can operate with a Facebook Graph API access token.

How to prepare

Before using these spiders you need to install all of its dependencies, you can easily install it in one command:

pip install -r requirements.txt

This project is intended to run in Python 3.

How to run

To run a spider, first you need to choose what spider you want to use, you can look at spiders available at this project in /scrapy_facebooker/spiders/.

For example, I want to use facebook_post spider and run it to scrape a public page in Facebook with username RHWEBsites, and print the output to a file named output.json:

$ scrapy crawl facebook_post -a target_username=RHWEBsites -o output.json

This is a name list of the spiders available in this repository:

  • facebook_event_graph
  • facebook_post_graph
  • facebook_photo_graph
  • facebook_video_graph
  • facebook_event
  • facebook_post
  • facebook_photo

License

Is available at LICENSE.txt in the root of this project.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].