A versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely. Spidr is designed to be fast and easy to use.

Stars: ✭ 656 (+1390.91%)

Mutual labels: web-scraper

yellowpages-scraper

Yellowpages.com Web Scraper written in Python and LXML to extract business details available based on a particular category and location.

Stars: ✭ 56 (+27.27%)

Mutual labels: web-scraper

Summarizer

A Reddit bot that summarizes news articles written in Spanish or English. It uses a custom built algorithm to rank words and sentences.

Stars: ✭ 213 (+384.09%)

Mutual labels: web-scraper

Daftlistings

A library that enables programmatic interaction with daft.ie. Daft.ie has nationwide coverage and contains about 80% of the total available properties in Ireland.

Stars: ✭ 86 (+95.45%)

Mutual labels: web-scraper

Detect Cms

PHP Library for detecting CMS

Stars: ✭ 78 (+77.27%)

Mutual labels: web-scraper

Cascadia

Go cascadia package command line CSS selector

Stars: ✭ 67 (+52.27%)

Mutual labels: web-scraper

Phpscraper

PHP Scraper - an highly opinionated web-interface for PHP

Stars: ✭ 148 (+236.36%)

Mutual labels: web-scraper

Scrapy Craigslist

Web Scraping Craigslist's Engineering Jobs in NY with Scrapy

Stars: ✭ 54 (+22.73%)

Mutual labels: web-scraper

Scrape Linkedin Selenium

`scrape_linkedin` is a python package that allows you to scrape personal LinkedIn profiles & company pages - turning the data into structured json.

Stars: ✭ 239 (+443.18%)

Mutual labels: web-scraper

Stealth

🚀 Stealth - Secure, Peer-to-Peer, Private and Automateable Web Browser/Scraper/Proxy

Stars: ✭ 659 (+1397.73%)

Mutual labels: web-scraper

Html Metadata

MetaData html scraper and parser for Node.js (supports Promises and callback style)

Stars: ✭ 129 (+193.18%)

Mutual labels: web-scraper

onlyfans-dl

OnlyFans content downloader

Stars: ✭ 592 (+1245.45%)

Mutual labels: web-scraper

100projectsofcode

A list of practical knowledge-building projects.

Stars: ✭ 1,183 (+2588.64%)

Mutual labels: web-scraper

Web Scraping

Detailed web scraping tutorials for dummies with financial data crawlers on Reddit WallStreetBets, CME (both options and futures), US Treasury, CFTC, LME, SHFE and news data crawlers on BBC, Wall Street Journal, Al Jazeera, Reuters, Financial Times, Bloomberg, CNN, Fortune, The Economist

Stars: ✭ 153 (+247.73%)

Mutual labels: web-scraper

View All Similar Projects ➔

Abrade is a coroutine-based web scraper suitable for querying the existence (a HEAD request) or the contents (a GET request) of a web resource with a sequential, numerical pattern.

Check out the blog post at http://lospi.net for usage and examples.

> abrade -h
Usage: abrade host pattern:
  --host arg                            host name (eg example.com)
  --pattern arg (=/)                    format of URL (eg ?mynum={1:5}&myhex=0x
                                        {hhhh}). See documentation for
                                        formatting of patterns.
  --agent arg (=Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0)
                                        User-agent string (default: Firefox 47)
  --out arg                             output path. dir if contents enabled.
                                        (default: HOSTNAME)
  --err arg                             error path (file). (default:
                                        HOSTNAME-err.log)
  --proxy arg                           SOCKS5 proxy address:port. (default:
                                        none)
  --screen arg                          omits 200-level response if contents
                                        contains screen (default: none)
  -d [ --stdin ]                        read from stdin (default: no)
  -t [ --tls ]                          use tls/ssl (default: no)
  -s [ --sensitive ]                    complain about rude TCP teardowns
                                        (default: no)
  -o [ --tor ]                          use local proxy at 127.0.0.1:9050
                                        (default: no)
  -r [ --verify ]                       verify ssl (default: no)
  -l [ --leadzero ]                     output leading zeros in URL (default:
                                        no)
  -e [ --telescoping ]                  do not telescope the pattern (default:
                                        no)
  -f [ --found ]                        print when resource found (default:
                                        no). implied by verbose
  -v [ --verbose ]                      prints gratuitous output to console
                                        (default: no)
  -c [ --contents ]                     read full contents (default: no)
  --test                                no network requests, just write
                                        generated URIs to console (default: no)
  -p [ --optimize ]                     Optimize number of simultaneous
                                        requests (default: no)
  -i [ --init ] arg (=1000)             Initial number of simultaneous requests
  --min arg (=1)                        Minimum number of simultaneous requests
  --max arg (=25000)                    Maximum number of simultaneous requests
  --ssize arg (=50)                     Size of velocity sliding window
  --sint arg (=1000)                    Size of sampling interval
  -h [ --help ]                         produce help message

v0.2

You can now pipe URLs to Abrade via the --stdin option:

echo /anything/a/b/c?d=123 | abrade httpbin.org --stdin --contents --verbose

You must omit the pattern positional argument to pipe from stdin.

You can also use the --screen option to detect error landing pages that still return 200 responses. Such responses get screened out and will not get written to disk during a --content scrape.

Linux ELF

2,310 KB. SHA-256=89df60eebcf1c8f224fed98b89ee403b45022c86181a12e84cba8abc5d56ca07
https://s3.amazonaws.com/net.lospi.abrade/0.2.0/abrade

Windows EXE

1,187 KB. SHA-256=b574aa1d8e37f9f0a867ed4d890d5b3d152388f0f4e3d9c9c4223d7804d1be4b
This is an Authenticode signed binary (Issued to: Joshua Alfred Lospinoso)
https://s3.amazonaws.com/net.lospi.abrade/0.2.0/abrade.exe

Docker Image

docker pull jlospinoso/abrade:v0.2.0

docker pull quay.io/jlospinoso/abrade:v0.2.0

v0.1

Linux ELF

2,243 KB. SHA-256=1b8adf0fe8b7db252c4f84398bf5980f0a0c57a7592cd529ac6558b57407f238
https://s3.amazonaws.com/net.lospi.abrade/0.1.0/abrade

Windows EXE

1,181 KB. SHA-256=f98ca3a68fbdcc7dde3f7db868b24d8a0b328d3c05732aa1d81b5a70b0531f31
This is an Authenticode signed binary (Issued to: Joshua Alfred Lospinoso)
https://s3.amazonaws.com/net.lospi.abrade/0.1.0/abrade.exe

Docker Image

docker pull jlospinoso/abrade:v0.1.0

docker pull quay.io/jlospinoso/abrade:v0.1.0

Building Abrade

Abrade uses cmake, so you'll need to install it.
Clone abrade.
Navigate to the checked out directory.
Make a build subdirectory.
Navigate to the build directory.
Invoke cmake.
Use make (*nix) or Visual Studio (Windows) to build the project.

For example, on *nix:

git clone [email protected]:JLospinoso/abrade.git
cd abrade
mkdir build
cd build
cmake ..
make

On Windows, you'll need to open the abrade.sln file and build.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

JLospinoso / abrade

Programming Languages

Labels

Projects that are alternatives of or similar to abrade

v0.2

Linux ELF

Windows EXE

Docker Image

v0.1

Linux ELF

Windows EXE

Docker Image

Building Abrade