All Projects → SoloSynth1 → wordpress-scraper

SoloSynth1 / wordpress-scraper

Licence: MIT license
Simple, easy-to-use scraper to scrape data from WordPress JSON API

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to wordpress-scraper

wikipedia-reference-scraper
Wikipedia API wrapper for references
Stars: ✭ 34 (+54.55%)
Mutual labels:  scraper
gochanges
**[ARCHIVED]** website changes tracker 🔍
Stars: ✭ 12 (-45.45%)
Mutual labels:  scraper
scrapers
scrapers for building your own image databases
Stars: ✭ 46 (+109.09%)
Mutual labels:  scraper
tinyPornManager
Made for pornhub. Fork from tinyMediaManager v3
Stars: ✭ 57 (+159.09%)
Mutual labels:  scraper
tripadvisor-scraper
Scrape Tripadvisor restaurant, hotels, and places.
Stars: ✭ 40 (+81.82%)
Mutual labels:  scraper
ColegaDondeEstaMiTFM
Un bot de Twitter que comparte cada hora un TFM hasta que Cristina Cifuentes enseñe el suyo.
Stars: ✭ 14 (-36.36%)
Mutual labels:  scraper
file-extensions
JSON collection of scraped file extensions, along with their description and type, from FileInfo.com
Stars: ✭ 15 (-31.82%)
Mutual labels:  scraper
fansly
Simply scrape / download all the media from an fansly account
Stars: ✭ 351 (+1495.45%)
Mutual labels:  scraper
crawler-chrome-extensions
爬虫工程师常用的 Chrome 插件 | Chrome extensions used by crawler developer
Stars: ✭ 53 (+140.91%)
Mutual labels:  scraper
cat-message
Finds cat images/videos/gifs on reddit, sends them to my mom via applescript
Stars: ✭ 35 (+59.09%)
Mutual labels:  scraper
Facebook-Profile-Pictures-Downloader
😆 Download public profile pictures from Facebook.
Stars: ✭ 23 (+4.55%)
Mutual labels:  scraper
web-crawler
Python Web Crawler with Selenium and PhantomJS
Stars: ✭ 19 (-13.64%)
Mutual labels:  scraper
stweet
Advanced python library to scrap Twitter (tweets, users) from unofficial API
Stars: ✭ 287 (+1204.55%)
Mutual labels:  scraper
nyt-first-said
Tweets when words are published for the first time in the NYT
Stars: ✭ 222 (+909.09%)
Mutual labels:  scraper
INMET-API-temperature
Crawler dos dados metereológicos de estações convencionais do INMET (BDMEP)
Stars: ✭ 32 (+45.45%)
Mutual labels:  scraper
lopez
Crawling and scraping the Web for fun and profit
Stars: ✭ 20 (-9.09%)
Mutual labels:  scraper
scrapy-LBC
Araignée LeBonCoin avec Scrapy et ElasticSearch
Stars: ✭ 14 (-36.36%)
Mutual labels:  scraper
scripts
A collection of random scripts I coded up
Stars: ✭ 17 (-22.73%)
Mutual labels:  scraper
PyScholar
A 'supervised' parser for Google Scholar
Stars: ✭ 74 (+236.36%)
Mutual labels:  scraper
barclayscrape
A small app to programmatically mainpulate Barclays online banking
Stars: ✭ 57 (+159.09%)
Mutual labels:  scraper

wordpress-scraper

Description

Simple, easy-to-use scraper to scrape data from WordPress JSON API

Features

  • Support storing crawled documents as MongoDB documents / JSON files
  • Auto retry upon errors

Requirements

  • Python 3.7+

Installation

pip install -r requirements.txt

How to use

Basic

Just run crawl.py with the sites URL supplied:

python3 crawl.py https://your.website.here

This will crawl the site using DefaultCrawlSession, which attempts to crawl all posts, categories & tags from the site.

The crawled JSON files will be stored in the directory ./data/<domain-name>.

Most of the time, This will suffice when scraping sites that are:

  1. not required to sign in
  2. JSON API paths not blocked

Advanced

For advanced usage and customizations you may want to look at wpscraper/session.py for actual crawling procedures, and make your own CrawlSession.

Upcoming Features

  • Rewrite/Refactor
  • MongoDB Connector
  • Async session
  • Authentication Module
  • Cloudflare circumvention
  • Configurable retry policies
  • Full WPv2 API resources support
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].