grafted-in / web-scraping-engine

Licence: BSD-3-Clause license

A simple web scraping engine supporting concurrent and anonymous scraping

Programming Languages

haskell

3896 projects

Nix

1067 projects

shell

77523 projects

Projects that are alternatives of or similar to web-scraping-engine

Ultra Runner

🏃⛰ Ultra fast monorepo script runner and build tool

Stars: ✭ 496 (+1737.04%)

Mutual labels: parallel, concurrent

Suman

🌇 🌆 🌉 Advanced, user-friendly, language-agnostic, super-high-performance test runner. http://sumanjs.org

Stars: ✭ 57 (+111.11%)

Mutual labels: parallel, concurrent

Cloe

Cloe programming language

Stars: ✭ 398 (+1374.07%)

Mutual labels: parallel, concurrent

node-bogota

🚀 Run tape tests concurrently with tap-spec output

Stars: ✭ 15 (-44.44%)

Mutual labels: parallel, concurrent

pareach

a tiny function that "parallelizes" work in NodeJS

Stars: ✭ 19 (-29.63%)

Mutual labels: parallel, concurrent

Rubico

[a]synchronous functional programming

Stars: ✭ 133 (+392.59%)

Mutual labels: parallel, concurrent

Hamsters.js

100% Vanilla Javascript Multithreading & Parallel Execution Library

Stars: ✭ 517 (+1814.81%)

Mutual labels: parallel, concurrent

PTTmineR

Parallel Searching and Crawling Data from PTT 🚀

Stars: ✭ 31 (+14.81%)

Mutual labels: scraper, parallel

Util

A collection of useful utility functions

Stars: ✭ 201 (+644.44%)

Mutual labels: parallel, concurrent

Pytest Parallel

A pytest plugin for parallel and concurrent testing

Stars: ✭ 146 (+440.74%)

Mutual labels: parallel, concurrent

YACLib

Yet Another Concurrency Library

Stars: ✭ 193 (+614.81%)

Mutual labels: parallel, concurrent

java-multithread

Códigos feitos para o curso de Multithreading com Java, no canal RinaldoDev do YouTube.

Stars: ✭ 24 (-11.11%)

Mutual labels: parallel, concurrent

impartus-downloader

Download Impartus lectures, convert to mkv for offline viewing.

Stars: ✭ 19 (-29.63%)

Mutual labels: scraper

OpenScraper

An open source webapp for scraping: towards a public service for webscraping

Stars: ✭ 80 (+196.3%)

Mutual labels: scraper

OLX Scraper

📻 An OLX Scraper using Scrapy + MongoDB. It Scrapes recent ads posted regarding requested product and dumps to NOSQL MONGODB.

Stars: ✭ 15 (-44.44%)

Mutual labels: scraper

stock-market-scraper

Scraps historical stock market data from Yahoo Finance (https://finance.yahoo.com/)

Stars: ✭ 110 (+307.41%)

Mutual labels: scraper

tieba-zhuaqu

百度贴吧分布式爬虫，用于贴吧数据挖掘。从贴吧维度和用户维度进行数据分析

Stars: ✭ 56 (+107.41%)

Mutual labels: scraper

document-dl

Command line program to download documents from web portals.

Stars: ✭ 14 (-48.15%)

Mutual labels: scraper

FoldsCUDA.jl

Data-parallelism on CUDA using Transducers.jl and for loops (FLoops.jl)

Stars: ✭ 48 (+77.78%)

Mutual labels: parallel

InstagramLocationScraper

No description or website provided.

Stars: ✭ 13 (-51.85%)

Mutual labels: scraper

View All Similar Projects ➔

Web Scraping Engine

Usage

To run:

stack exec example --cache-dir cache -a user-agents.txt -o output.csv

During testing/development, you can run the scraper from within GHCI:

cd example
stack ghci
mainTest "--cache-dir cache --cache-only -a user-agents.txt -o output.csv"

To run the scraper with anonymization:

cd example
bash build-proxies.sh > torrc-file
tor -f torrc-file & (wait until logs report success)
stack exec example -- --cache-dir cache -a user-agents.txt --torrc torrc-file o outdata.csv -m 8111 +RTS -N15 where * 8111 is the port to an EKG monitor on localhost * -N15 is how many cores to use
After a long time you will need to kill the process manually.

Development

Develop with one of:

stack ghci
nix-shell --run 'cabal repl'

Build with one of:

stack build
nix-shell --run 'cabal build'
nix-build

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

grafted-in / web-scraping-engine

Programming Languages

Labels

Projects that are alternatives of or similar to web-scraping-engine

Web Scraping Engine

Usage

Development