All Projects → temoto → heroshi

temoto / heroshi

Licence: other
Heroshi – open source web crawler.

Programming Languages

go
31211 projects - #10 most used programming language
python
139335 projects - #7 most used programming language
shell
77523 projects

Projects that are alternatives of or similar to heroshi

Grab
Web Scraping Framework
Stars: ✭ 2,147 (+4109.8%)
Mutual labels:  http-client, web-scraping
Php Curl Class
PHP Curl Class makes it easy to send HTTP requests and integrate with web APIs
Stars: ✭ 2,903 (+5592.16%)
Mutual labels:  http-client, web-scraping
zenith
⚡ Functional Scala HTTP server, client, and toolkit.
Stars: ✭ 15 (-70.59%)
Mutual labels:  http-client
text-mining-corona-articles
Text Mining for Indonesian Online News Articles About Corona
Stars: ✭ 15 (-70.59%)
Mutual labels:  web-scraping
scraping-ebay
Scraping Ebay's products using Scrapy Web Crawling Framework
Stars: ✭ 79 (+54.9%)
Mutual labels:  web-scraping
hunt-http
http library for D, support http 1.1 / http 2.0 (http2) / websocket server and client.
Stars: ✭ 29 (-43.14%)
Mutual labels:  http-client
java-restify
Java Restify - Simple interface-based HTTP client for Java
Stars: ✭ 31 (-39.22%)
Mutual labels:  http-client
csharp-http-client
Twilio SendGrid's C# HTTP Client for calling APIs
Stars: ✭ 25 (-50.98%)
Mutual labels:  http-client
tableau-scraping
Tableau scraper python library. R and Python scripts to scrape data from Tableau viz
Stars: ✭ 91 (+78.43%)
Mutual labels:  web-scraping
watermelon-http-client
GitHub Action to perform HTTP requests. Supports GraphQL!
Stars: ✭ 21 (-58.82%)
Mutual labels:  http-client
foxy
Session-based Beast/Asio wrapper requiring C++14
Stars: ✭ 61 (+19.61%)
Mutual labels:  http-client
India-WhatsAppFakeNews-Dataset
WhatsApps related deaths News Articles along with other articles across India during that period
Stars: ✭ 41 (-19.61%)
Mutual labels:  web-scraping
domhttpx
domhttpx is a google search engine dorker with HTTP toolkit built with python, can make it easier for you to find many URLs/IPs at once with fast time.
Stars: ✭ 59 (+15.69%)
Mutual labels:  http-client
http-accept
Parse Accept and Accept-Language HTTP headers in Ruby.
Stars: ✭ 69 (+35.29%)
Mutual labels:  http-client
http-requests
An HTTP client abstraction that provides a common interface to several different client implementations.
Stars: ✭ 22 (-56.86%)
Mutual labels:  http-client
libashttp
A C++ async HTTP client library to use in asynchronous applications while communicating with REST services.
Stars: ✭ 51 (+0%)
Mutual labels:  http-client
requester
The package provides a very thin wrapper (no external dependencies) for http.Client allowing the use of layers (middleware).
Stars: ✭ 14 (-72.55%)
Mutual labels:  http-client
IMDB-Scraper
Scrapy project for scraping data from IMDB with Movie Dataset including 58,623 movies' data.
Stars: ✭ 37 (-27.45%)
Mutual labels:  web-scraping
malloy
A C++ library providing embeddable server & client components for both HTTP and WebSocket.
Stars: ✭ 29 (-43.14%)
Mutual labels:  http-client
top-github-scraper
Scape top GitHub repositories and users based on keywords
Stars: ✭ 40 (-21.57%)
Mutual labels:  web-scraping
Heroshi, open source web crawler.

Motivation 1: learn HTTP, libraries, real world quirks.
Motivation 2: collection of libraries and tools for building custom crawlers.
Motivation 3: provide access to representative subset of Web for educational and research purposes.

As of 2012-10-12, last goal is not even started, but these guys did amazing job at it http://commoncrawl.org/

See http://temoto.github.com/heroshi/ for more information.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].