All Projects → JLospinoso → abrade

JLospinoso / abrade

Licence: AGPL-3.0 license
A fast Web API scraper written in C++ and built on Boost ASIO

Programming Languages

C++
36643 projects - #6 most used programming language
CMake
9771 projects

Projects that are alternatives of or similar to abrade

bmcweb
A do everything Redfish, KVM, GUI, and DBus webserver for OpenBMC
Stars: ✭ 109 (+147.73%)
Mutual labels:  boost-asio, boost-beast
Getsy
A simple browser/client-side web scraper.
Stars: ✭ 238 (+440.91%)
Mutual labels:  web-scraper
Project Tauro
A Router WiFi key recovery/cracking tool with a twist.
Stars: ✭ 52 (+18.18%)
Mutual labels:  web-scraper
Soup
Web Scraper in Go, similar to BeautifulSoup
Stars: ✭ 1,685 (+3729.55%)
Mutual labels:  web-scraper
Social Media Profile Scrapers
Fetch user's data across social media
Stars: ✭ 60 (+36.36%)
Mutual labels:  web-scraper
Awesome Web Scraper
A collection of awesome web scaper, crawler.
Stars: ✭ 147 (+234.09%)
Mutual labels:  web-scraper
Spidr
A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.
Stars: ✭ 656 (+1390.91%)
Mutual labels:  web-scraper
yellowpages-scraper
Yellowpages.com Web Scraper written in Python and LXML to extract business details available based on a particular category and location.
Stars: ✭ 56 (+27.27%)
Mutual labels:  web-scraper
Summarizer
A Reddit bot that summarizes news articles written in Spanish or English. It uses a custom built algorithm to rank words and sentences.
Stars: ✭ 213 (+384.09%)
Mutual labels:  web-scraper
Daftlistings
A library that enables programmatic interaction with daft.ie. Daft.ie has nationwide coverage and contains about 80% of the total available properties in Ireland.
Stars: ✭ 86 (+95.45%)
Mutual labels:  web-scraper
Detect Cms
PHP Library for detecting CMS
Stars: ✭ 78 (+77.27%)
Mutual labels:  web-scraper
Cascadia
Go cascadia package command line CSS selector
Stars: ✭ 67 (+52.27%)
Mutual labels:  web-scraper
Phpscraper
PHP Scraper - an highly opinionated web-interface for PHP
Stars: ✭ 148 (+236.36%)
Mutual labels:  web-scraper
Scrapy Craigslist
Web Scraping Craigslist's Engineering Jobs in NY with Scrapy
Stars: ✭ 54 (+22.73%)
Mutual labels:  web-scraper
Scrape Linkedin Selenium
`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.
Stars: ✭ 239 (+443.18%)
Mutual labels:  web-scraper
Stealth
🚀 Stealth - Secure, Peer-to-Peer, Private and Automateable Web Browser/Scraper/Proxy
Stars: ✭ 659 (+1397.73%)
Mutual labels:  web-scraper
Html Metadata
MetaData html scraper and parser for Node.js (supports Promises and callback style)
Stars: ✭ 129 (+193.18%)
Mutual labels:  web-scraper
onlyfans-dl
OnlyFans content downloader
Stars: ✭ 592 (+1245.45%)
Mutual labels:  web-scraper
100projectsofcode
A list of practical knowledge-building projects.
Stars: ✭ 1,183 (+2588.64%)
Mutual labels:  web-scraper
Web Scraping
Detailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, SHFE and news data crawlers on BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist
Stars: ✭ 153 (+247.73%)
Mutual labels:  web-scraper

Abrade

CI

Abrade is a coroutine-based web scraper suitable for querying the existence (a HEAD request) or the contents (a GET request) of a web resource with a sequential, numerical pattern.

Check out the blog post at http://lospi.net for usage and examples.

> abrade -h
Usage: abrade host pattern:
  --host arg                            host name (eg example.com)
  --pattern arg (=/)                    format of URL (eg ?mynum={1:5}&myhex=0x
                                        {hhhh}). See documentation for
                                        formatting of patterns.
  --agent arg (=Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0)
                                        User-agent string (default: Firefox 47)
  --out arg                             output path. dir if contents enabled.
                                        (default: HOSTNAME)
  --err arg                             error path (file). (default:
                                        HOSTNAME-err.log)
  --proxy arg                           SOCKS5 proxy address:port. (default:
                                        none)
  --screen arg                          omits 200-level response if contents
                                        contains screen (default: none)
  -d [ --stdin ]                        read from stdin (default: no)
  -t [ --tls ]                          use tls/ssl (default: no)
  -s [ --sensitive ]                    complain about rude TCP teardowns
                                        (default: no)
  -o [ --tor ]                          use local proxy at 127.0.0.1:9050
                                        (default: no)
  -r [ --verify ]                       verify ssl (default: no)
  -l [ --leadzero ]                     output leading zeros in URL (default:
                                        no)
  -e [ --telescoping ]                  do not telescope the pattern (default:
                                        no)
  -f [ --found ]                        print when resource found (default:
                                        no). implied by verbose
  -v [ --verbose ]                      prints gratuitous output to console
                                        (default: no)
  -c [ --contents ]                     read full contents (default: no)
  --test                                no network requests, just write
                                        generated URIs to console (default: no)
  -p [ --optimize ]                     Optimize number of simultaneous
                                        requests (default: no)
  -i [ --init ] arg (=1000)             Initial number of simultaneous requests
  --min arg (=1)                        Minimum number of simultaneous requests
  --max arg (=25000)                    Maximum number of simultaneous requests
  --ssize arg (=50)                     Size of velocity sliding window
  --sint arg (=1000)                    Size of sampling interval
  -h [ --help ]                         produce help message

v0.2

You can now pipe URLs to Abrade via the --stdin option:

echo /anything/a/b/c?d=123 | abrade httpbin.org --stdin --contents --verbose

You must omit the pattern positional argument to pipe from stdin.

You can also use the --screen option to detect error landing pages that still return 200 responses. Such responses get screened out and will not get written to disk during a --content scrape.

Linux ELF

Windows EXE

Docker Image

docker pull jlospinoso/abrade:v0.2.0

or

docker pull quay.io/jlospinoso/abrade:v0.2.0

v0.1

Linux ELF

Windows EXE

Docker Image

docker pull jlospinoso/abrade:v0.1.0

or

docker pull quay.io/jlospinoso/abrade:v0.1.0

Building Abrade

  1. Abrade uses cmake, so you'll need to install it.
  2. Clone abrade.
  3. Navigate to the checked out directory.
  4. Make a build subdirectory.
  5. Navigate to the build directory.
  6. Invoke cmake.
  7. Use make (*nix) or Visual Studio (Windows) to build the project.

For example, on *nix:

git clone [email protected]:JLospinoso/abrade.git
cd abrade
mkdir build
cd build
cmake ..
make

On Windows, you'll need to open the abrade.sln file and build.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].