All Projects → crawler-commons → url-frontier

crawler-commons / url-frontier

Licence: Apache-2.0 License
API definition, resources and reference implementation of URL Frontiers

Programming Languages

java
68154 projects - #9 most used programming language
Dockerfile
14818 projects

Projects that are alternatives of or similar to url-frontier

jaeger-clickhouse
Jaeger ClickHouse storage plugin implementation
Stars: ✭ 103 (+543.75%)
Mutual labels:  grpc
xrgrpc
gRPC library for Cisco IOS XR
Stars: ✭ 40 (+150%)
Mutual labels:  grpc
ContosoLending
An ASP.NET Core 3.1 app showcasing gRPC, server-side Blazor, SignalR, and C# 8.
Stars: ✭ 15 (-6.25%)
Mutual labels:  grpc
grpc-go-kit-example
go-kit and gRPC
Stars: ✭ 30 (+87.5%)
Mutual labels:  grpc
griffin
gRPC server and client for Ruby
Stars: ✭ 79 (+393.75%)
Mutual labels:  grpc
dalal-street-client
Frontend client for Dalal Street
Stars: ✭ 13 (-18.75%)
Mutual labels:  grpc
phalanx
Phalanx is a cloud-native distributed search engine that provides endpoints through gRPC and traditional RESTful API.
Stars: ✭ 192 (+1100%)
Mutual labels:  grpc
openmgmt
Documentation and examples for using open network management tools such as OpenConfig
Stars: ✭ 23 (+43.75%)
Mutual labels:  grpc
api
Temporal gRPC API and proto files
Stars: ✭ 25 (+56.25%)
Mutual labels:  grpc
grpc-spring-security-demo
Spring Boot-based gRPC server with gRPC endpoints secured by Spring Security
Stars: ✭ 50 (+212.5%)
Mutual labels:  grpc
kubernetes-go-grpc
Microservices using Go, gRPC and Kubernates
Stars: ✭ 35 (+118.75%)
Mutual labels:  grpc
grpcman
A grpc testing tool based on Electron & Vue.js & Element-UI
Stars: ✭ 22 (+37.5%)
Mutual labels:  grpc
grpc-angular
gRPC to Angular service compatible with grpc-gateway
Stars: ✭ 12 (-25%)
Mutual labels:  grpc
pool
Connection pool for Go's grpc client with supports connection reuse.
Stars: ✭ 105 (+556.25%)
Mutual labels:  grpc
gogrpcgin
golang grpc gin
Stars: ✭ 33 (+106.25%)
Mutual labels:  grpc
Mediator
Cross-platform GUI gRPC debugging proxy
Stars: ✭ 36 (+125%)
Mutual labels:  grpc
ARGUS
ARGUS is an easy-to-use web scraping tool. The program is based on the Scrapy Python framework and is able to crawl a broad range of different websites. On the websites, ARGUS is able to perform tasks like scraping texts or collecting hyperlinks between websites. See: https://link.springer.com/article/10.1007/s11192-020-03726-9
Stars: ✭ 68 (+325%)
Mutual labels:  webcrawling
gruf-demo
A demonstration Rails application utilizing gruf, a gRPC Rails framework.
Stars: ✭ 42 (+162.5%)
Mutual labels:  grpc
Stock-Fundamental-data-scraping-and-analysis
Project on building a web crawler to collect the fundamentals of the stock and review their performance in one go
Stars: ✭ 40 (+150%)
Mutual labels:  webcrawling
docker-protobuf
An all-inclusive protoc Docker image
Stars: ✭ 105 (+556.25%)
Mutual labels:  grpc

URL Frontier

license Build Status Docker Image Version (latest semver)

Discovering content on the web is possible thanks to web crawlers, luckily there are many excellent open-source solutions for this; however, most of them have their own way of storing and accessing the information about the URLs.

The aim of the URL Frontier project is to develop a crawler/language-neutral API for the operations that web crawlers do when communicating with a web frontier e.g. get the next URLs to crawl, update the information about URLs already processed, change the crawl rate for a particular hostname, get the list of active hosts, get statistics, etc... Such an API can used by a variety of web crawlers, regardless of whether they are implemented in Java like StormCrawler and Heritrix or in Python like Scrapy.

The outcomes of the project are to:

  • design an API with gRPC, provide a Java stubs for the API and instructions on how to achieve the same for other languages
  • deliver a robust reference implementation of the URL Frontier service
  • implement a command line client for basic interactions with a service
  • provide a test suite to check that any implementation of the API behaves as expected

One of the objectives of URL Frontier is to involve as many actors in the web crawling community as possible and get real users to give continuous feedback on our proposals.

Please use the project mailing list or Discussions section for questions, comments or suggestions.

There are many ways to get involved if you want to.

This project is funded through the NGI0 Discovery Fund, a fund established by NLnet with financial support from the European Commission's Next Generation Internet programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 825322.

NLNet
NGI0

License information

This project is available as open source under the terms of Apache 2.0. For accurate information, please check individual files.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].