All Projects → alaz → legitbot

alaz / legitbot

Licence: other
🤔 Is this Web request from a real search engine🕷 or from an impersonating agent 🕵️‍♀️?

Programming Languages

ruby
36898 projects - #4 most used programming language

Projects that are alternatives of or similar to legitbot

solr-cool.github.io
The Solr Package Directory and Sanctuary
Stars: ✭ 13 (-27.78%)
Mutual labels:  search-engine
jochre
Java Optical CHaracter Recognition
Stars: ✭ 18 (+0%)
Mutual labels:  search-engine
daru-io
daru-io is a plugin gem to the existing daru gem, which aims to add support to Importing DataFrames from / Exporting DataFrames to multiple formats.
Stars: ✭ 21 (+16.67%)
Mutual labels:  ruby-gem
awesome-vector-search
Collections of vector search related libraries, service and research papers
Stars: ✭ 460 (+2455.56%)
Mutual labels:  search-engine
CodeDepot
A search engine for programming source code and documentation
Stars: ✭ 18 (+0%)
Mutual labels:  search-engine
sov2ex
sov2ex - 一个便捷的 v2ex 站内搜索引擎
Stars: ✭ 36 (+100%)
Mutual labels:  search-engine
waifu2x
Ruby wrapper and CLI for waifu2x
Stars: ✭ 15 (-16.67%)
Mutual labels:  ruby-gem
memsec
Rust implementation `libsodium/utils`.
Stars: ✭ 39 (+116.67%)
Mutual labels:  protection
wp-statistics
Complete WordPress Analytics and Statistics for your site!
Stars: ✭ 83 (+361.11%)
Mutual labels:  search-engine
graphql-remote loader
Performant remote GraphQL queries from within the resolvers of a Ruby GraphQL API.
Stars: ✭ 52 (+188.89%)
Mutual labels:  ruby-gem
jekyll-gzip
Generate gzipped assets and files for your Jekyll site at build time
Stars: ✭ 34 (+88.89%)
Mutual labels:  ruby-gem
querqy-elasticsearch
Querqy for Elasticsearch
Stars: ✭ 37 (+105.56%)
Mutual labels:  search-engine
kitchen-google
Google Compute Engine driver for Test-Kitchen
Stars: ✭ 47 (+161.11%)
Mutual labels:  ruby-gem
gamesearch
A Simple Search Engine to help you find FREE Download Links to your Favourite Games
Stars: ✭ 30 (+66.67%)
Mutual labels:  search-engine
Portforge
Lightweight utility to fool port scanners
Stars: ✭ 23 (+27.78%)
Mutual labels:  protection
auctus
Dataset search engine, discovering data from a variety of sources, profiling it, and allowing advanced queries on the index
Stars: ✭ 34 (+88.89%)
Mutual labels:  search-engine
bucky-core
System testing framework for web application.
Stars: ✭ 32 (+77.78%)
Mutual labels:  ruby-gem
FakeNewsBlocker
Get a notification if a website is known to publish fake news
Stars: ✭ 15 (-16.67%)
Mutual labels:  fake
ruby terraform
A simple Ruby wrapper for invoking terraform commands.
Stars: ✭ 92 (+411.11%)
Mutual labels:  ruby-gem
x
Commerce Search & Discovery frontend web components
Stars: ✭ 54 (+200%)
Mutual labels:  search-engine

Legitbot codecov

Ruby gem to make sure that an IP really belongs to a bot, typically a search engine.

Usage

Suppose you have a Web request and you would like to check it is not diguised:

bot = Legitbot.bot(userAgent, ip)

bot will be nil if no bot signature was found in the User-Agent. Otherwise, it will be an object with methods

bot.detected_as # => :google
bot.valid? # => true
bot.fake? # => false

Sometimes you already know which search engine to expect. For example, you might be using rack-attack:

Rack::Attack.blocklist("fake Googlebot") do |req|
  req.user_agent =~ %r(Googlebot) && Legitbot::Google.fake?(req.ip)
end

Or if you do not like all those ghoulish crawlers stealing your content, evaluating it and getting ready to invade your site with spammers, then block them all:

Rack::Attack.blocklist 'fake search engines' do |request|
  Legitbot.bot(request.user_agent, request.ip)&.fake?
end

Versioning

Semantic versioning with the following clarifications:

  • MINOR version is incremented when support for new bots is added.
  • PATCH version is incremented when validation logic for a bot changes (IP list updated, for example).

Supported

License

Apache 2.0

Other projects

  • Play Framework variant in Scala: play-legitbot
  • Article When (Fake) Googlebots Attack Your Rails App
  • Voight-Kampff is a Ruby gem that detects bots by User-Agent
  • crawler_detect is a Ruby gem and Rack middleware to detect crawlers by few different request headers, including User-Agent
  • Project Honeypot's http:BL can not only classify IP as a search engine, but also label them as suspicious and reports the number of days since the last activity. My implementation of the protocol in Scala is here.
  • CIDRAM is a PHP routing manager with built-in support to validate bots.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].