All Projects → zix99 → Rare

zix99 / Rare

Fast, realtime regex-extraction, and aggregation into common formats such as histograms, numerical summaries, tables, and more!

Programming Languages

go
31211 projects - #10 most used programming language
awk
318 projects

Projects that are alternatives of or similar to Rare

Kataribe
Access log profiler based on response time
Stars: ✭ 298 (+292.11%)
Mutual labels:  analyzer, nginx, apache
Engintron
Engintron for cPanel/WHM is the easiest way to integrate Nginx on your cPanel/WHM server. Engintron will improve the performance & web serving capacity of your server, while reducing CPU/RAM load at the same time, by installing & configuring the popular Nginx webserver to act as a reverse caching proxy in front of Apache.
Stars: ✭ 587 (+672.37%)
Mutual labels:  nginx, apache
Docker Alpine
Docker containers running Alpine Linux and s6 for process management. Solid, reliable containers.
Stars: ✭ 574 (+655.26%)
Mutual labels:  nginx, apache
Ripgrep
ripgrep recursively searches directories for a regex pattern while respecting your gitignore
Stars: ✭ 28,564 (+37484.21%)
Mutual labels:  regex, grep
Ezhttp
The bash shell script stack for installation of Nginx OpenResty Tengine lua_nginx_module nginx_concat_module nginx_upload_module ngx_substitutions_filter_module Apache-2.2 Apache-2.4 MySQL-5.1 MySQL-5.5 MySQL-5.6 MySQL-5.7 PHP-5.2 PHP-5.3 PHP-5.4 PHP-5.5 PHP-5.6 ZendOptimizer ZendGuardLoader Xcache Eaccelerator Imagemagick IonCube Memcache Memcached Redis Mongo Xdebug Mssql Memcached PureFtpd PhpMyAdmin Redis Mongodb PhpRedisAdmin MemAdmin RockMongo Jdk7 Jdk8 Tomcat7 Tomcat8
Stars: ✭ 443 (+482.89%)
Mutual labels:  nginx, apache
Modsecurity
ModSecurity is an open source, cross platform web application firewall (WAF) engine for Apache, IIS and Nginx that is developed by Trustwave's SpiderLabs. It has a robust event-based programming language which provides protection from a range of attacks against web applications and allows for HTTP traffic monitoring, logging and real-time analys…
Stars: ✭ 5,015 (+6498.68%)
Mutual labels:  nginx, apache
Sakura
SAKURA Editor (Japanese text editor for MS Windows)
Stars: ✭ 689 (+806.58%)
Mutual labels:  regex, grep
Netkiller.github.io
Netkiller Free ebook - 免费电子书
Stars: ✭ 861 (+1032.89%)
Mutual labels:  nginx, apache
Docker Testing
Stars: ✭ 18 (-76.32%)
Mutual labels:  nginx, apache
Ansible Role Htpasswd
Ansible Role - htpasswd
Stars: ✭ 17 (-77.63%)
Mutual labels:  nginx, apache
Server Error Pages
Easy to use, professional error pages to replace the plaintext error pages that come with any server software like Nginx or Apache
Stars: ✭ 338 (+344.74%)
Mutual labels:  nginx, apache
Ansible Config encoder filters
Ansible role used to deliver the Config Encoder Filters.
Stars: ✭ 48 (-36.84%)
Mutual labels:  nginx, apache
Devilbox
A modern Docker LAMP stack and MEAN stack for local development
Stars: ✭ 3,598 (+4634.21%)
Mutual labels:  nginx, apache
H5ai
HTTP web server index for Apache httpd, lighttpd and nginx.
Stars: ✭ 4,650 (+6018.42%)
Mutual labels:  nginx, apache
Config
Armbian configuration utility
Stars: ✭ 317 (+317.11%)
Mutual labels:  nginx, apache
Ugrep
🔍NEW ugrep v3.1: ultra fast grep with interactive query UI and fuzzy search: search file systems, source code, text, binary files, archives (cpio/tar/pax/zip), compressed files (gz/Z/bz2/lzma/xz/lz4), documents and more. A faster, user-friendly and compatible grep replacement.
Stars: ✭ 626 (+723.68%)
Mutual labels:  regex, grep
Yii2 Advanced One Domain Config
A template configuration without separation on the frontend and backend parts on different domains.
Stars: ✭ 258 (+239.47%)
Mutual labels:  nginx, apache
Rhit
A nginx log explorer
Stars: ✭ 231 (+203.95%)
Mutual labels:  analyzer, nginx
Medusa
🐈Medusa是一个红队武器库平台,目前包括扫描功能(200+个漏洞)、XSS平台、协同平台、CVE监控等功能,持续开发中 http://medusa.ascotbe.com
Stars: ✭ 796 (+947.37%)
Mutual labels:  nginx, apache
Docs4dev
后端开发常用框架文档及中文翻译,包含 Spring 系列文档(Spring, Spring Boot, Spring Cloud, Spring Security, Spring Session),大数据(Apache Hive, HBase, Apache Flume),日志(Log4j2, Logback),Http Server(NGINX,Apache),Python,数据库(OpenTSDB,MySQL,PostgreSQL)等最新官方文档以及对应的中文翻译。
Stars: ✭ 974 (+1181.58%)
Mutual labels:  nginx, apache

rare

GitHub Workflow Status codecov

A file scanner/regex extractor and realtime summarizor.

Supports various CLI-based graphing and metric formats (histogram, table, etc).

rare gif

Features

  • Multiple summary formats including: filter (like grep), histogram, and numerical analysis
  • File glob expansions (eg /var/log/* or /var/log/*/*.log) and -R
  • Optional gzip decompression (with -z)
  • Following -f or re-open following -F (use --poll to poll)
  • Ignoring lines that match an expression
  • Aggregating and realtime summary (Don't have to wait for all data to be scanned)
  • Multi-threaded reading, parsing, and aggregation
  • Color-coded outputs (optionally)
  • Pipe support (stdin for reading, stdout will disable color) eg. tail -f | rare ...

Installation

Manual

Download appropriate binary from Releases, unzip, and put it in /bin

Homebrew

brew tap zix99/rare
brew install rare

From code

Clone the repo, and:

Requires GO 1.11 or higher (Uses go modules)

go get ./...

# Pack documentation (Only necessary for release builds)
go run github.com/gobuffalo/packr/v2/packr2

# Build binary
go build .

# OR, with experimental features
go build -tags experimental .

Docs

All documentation may be found here, in the docs/ folder, and by running rare docs (embedded docs/ folder)

You can also see a dump of the CLI options at cli-help.md

Example

Extract status codes from nginx logs

$ rare histo -m '"(\w{3,4}) ([A-Za-z0-9/.]+).*" (\d{3})' -e '{3} {1}' access.log
200 GET                          160663
404 GET                          857
304 GET                          53
200 HEAD                         18
403 GET                          14

Extract number of bytes sent by bucket, and format

This shows an example of how to bucket the values into size of 1000. In this case, it doesn't make sense to see the histogram by number of bytes, but we might want to know the ratio of various orders-of-magnitudes.

$ rare histo -m '"(\w{3,4}) ([A-Za-z0-9/.]+).*" (\d{3}) (\d+)' -e "{bucket {4} 10000}" -n 10 access.log -b
0                   144239     ||||||||||||||||||||||||||||||||||||||||||||||||||
190000              2599       
10000               1290       
180000              821        
20000               496        
30000               445        
40000               440        
200000              427        
140000              323        
70000               222        
Matched: 161622 / 161622
Groups:  1203

Output Formats

Histogram (histo)

The histogram format outputs an aggregation by counting the occurences of an extracted match. That is to say, on every line a regex will be matched (or not), and the matched groups can be used to extract and build a key, that will act as the bucketing name.

NAME:
   rare histogram - Summarize results by extracting them to a histogram

USAGE:
   rare histogram [command options] <-|filename|glob...>

DESCRIPTION:
   Generates a live-updating histogram of the extracted information from a file
    Each line in the file will be matched, any the matching part extracted
    as a key and counted.
    If an extraction expression is provided with -e, that will be used
    as the key instead

OPTIONS:
   --follow, -f                 Read appended data as file grows
   --reopen, -F                 Same as -f, but will reopen recreated files
   --poll                       When following a file, poll for changes rather than using inotify
   --posix, -p                  Compile regex as against posix standard
   --match value, -m value      Regex to create match groups to summarize on (default: ".*")
   --extract value, -e value    Expression that will generate the key to group by (default: "{0}")
   --gunzip, -z                 Attempt to decompress file when reading
   --batch value                Specifies io batching size. Set to 1 for immediate input (default: 1000)
   --workers value, -w value    Set number of data processors (default: 5)
   --readers value, --wr value  Sets the number of concurrent readers (Infinite when -f) (default: 3)
   --ignore value, -i value     Ignore a match given a truthy expression (Can have multiple)
   --recursive, -R              Recursively walk a non-globbing path and search for plain-files
   --bars, -b                   Display bars as part of histogram
   --num value, -n value        Number of elements to display (default: 5)
   --reverse                    Reverses the display sort-order
   --sortkey, --sk              Sort by key, rather than value

Filter (filter)

Filter is a command used to match and (optionally) extract that match without any aggregation. It's effectively a grep or a combination of grep, awk, and/or sed.

NAME:
   rare filter - Filter incoming results with search criteria, and output raw matches

USAGE:
   rare filter [command options] <-|filename|glob...>

DESCRIPTION:
   Filters incoming results by a regex, and output the match or an extracted expression.
    Unable to output contextual information due to the application's parallelism.  Use grep if you
    need that

OPTIONS:
   --follow, -f                 Read appended data as file grows
   --reopen, -F                 Same as -f, but will reopen recreated files
   --poll                       When following a file, poll for changes rather than using inotify
   --posix, -p                  Compile regex as against posix standard
   --match value, -m value      Regex to create match groups to summarize on (default: ".*")
   --extract value, -e value    Expression that will generate the key to group by (default: "{0}")
   --gunzip, -z                 Attempt to decompress file when reading
   --batch value                Specifies io batching size. Set to 1 for immediate input (default: 1000)
   --workers value, -w value    Set number of data processors (default: 5)
   --readers value, --wr value  Sets the number of concurrent readers (Infinite when -f) (default: 3)
   --ignore value, -i value     Ignore a match given a truthy expression (Can have multiple)
   --recursive, -R              Recursively walk a non-globbing path and search for plain-files
   --line, -l                   Output line numbers

Numerical Analysis

This command will extract a number from logs and run basic analysis on that number (Such as mean, median, mode, and quantiles).

NAME:
   rare analyze - Numerical analysis on a set of filtered data

USAGE:
   rare analyze [command options] <-|filename|glob...>

DESCRIPTION:
   Treat every extracted expression as a numerical input, and run analysis
    on that input.  Will extract mean, median, mode, min, max.  If specifying --extra
    will also extract std deviation, and quantiles

OPTIONS:
   --follow, -f                 Read appended data as file grows
   --reopen, -F                 Same as -f, but will reopen recreated files
   --poll                       When following a file, poll for changes rather than using inotify
   --posix, -p                  Compile regex as against posix standard
   --match value, -m value      Regex to create match groups to summarize on (default: ".*")
   --extract value, -e value    Expression that will generate the key to group by (default: "{0}")
   --gunzip, -z                 Attempt to decompress file when reading
   --batch value                Specifies io batching size. Set to 1 for immediate input (default: 1000)
   --workers value, -w value    Set number of data processors (default: 5)
   --readers value, --wr value  Sets the number of concurrent readers (Infinite when -f) (default: 3)
   --ignore value, -i value     Ignore a match given a truthy expression (Can have multiple)
   --recursive, -R              Recursively walk a non-globbing path and search for plain-files
   --extra                      Displays extra analysis on the data (Requires more memory and cpu)
   --reverse, -r                Reverses the numerical series when ordered-analysis takes place (eg Quantile)
   --quantile value, -q value   Adds a quantile to the output set. Requires --extra (default: "90", "99", "99.9")

Example:

$ go run *.go --color analyze -m '"(\w{3,4}) ([A-Za-z0-9/[email protected]_-]+).*" (\d{3}) (\d+)' -e "{4}" testdata/access.log 
Samples:  161,622
Mean:     2,566,283.9616
Min:      0.0000
Max:      1,198,677,592.0000

Median:   1,021.0000
Mode:     1,021.0000
P90:      19,506.0000
P99:      64,757,808.0000
P99.9:    395,186,166.0000
Matched: 161,622 / 161,622

Tabulate

Create a 2D view (table) of data extracted from a file. Expression needs to yield a two dimensions separated by a tab. Can either use \x00 or the {$ a b} helper. First element is the column name, followed by the row name.

NAME:
   rare tabulate - Create a 2D summarizing table of extracted data

USAGE:
   rare tabulate [command options] <-|filename|glob...>

DESCRIPTION:
   Summarizes the extracted data as a 2D data table.
    The key is provided in the expression, and should be separated by a tab \x00
    character or via {$ a b} Where a is the column header, and b is the row

OPTIONS:
   --follow, -f                 Read appended data as file grows
   --reopen, -F                 Same as -f, but will reopen recreated files
   --poll                       When following a file, poll for changes rather than using inotify
   --posix, -p                  Compile regex as against posix standard
   --match value, -m value      Regex to create match groups to summarize on (default: ".*")
   --extract value, -e value    Expression that will generate the key to group by (default: "{0}")
   --gunzip, -z                 Attempt to decompress file when reading
   --batch value                Specifies io batching size. Set to 1 for immediate input (default: 1000)
   --workers value, -w value    Set number of data processors (default: 5)
   --readers value, --wr value  Sets the number of concurrent readers (Infinite when -f) (default: 3)
   --ignore value, -i value     Ignore a match given a truthy expression (Can have multiple)
   --recursive, -R              Recursively walk a non-globbing path and search for plain-files
   --delim value                Character to tabulate on. Use {$} helper by default (default: "\x00")
   --num value, -n value        Number of elements to display (default: 20)
   --cols value                 Number of columns to display (default: 10)
   --sortkey, --sk              Sort rows by key name rather than by values

Example:

$ rare tabulate -m "(\d{3}) (\d+)" -e "{$ {1} {bucket {2} 100000}}" -sk access.log

         200      404      304      403      301      206      
0        153,271  860      53       14       12       2                 
1000000  796      0        0        0        0        0                 
2000000  513      0        0        0        0        0                 
7000000  262      0        0        0        0        0                 
4000000  257      0        0        0        0        0                 
6000000  221      0        0        0        0        0                 
5000000  218      0        0        0        0        0                 
9000000  206      0        0        0        0        0                 
3000000  202      0        0        0        0        0                 
10000000 201      0        0        0        0        0                 
11000000 190      0        0        0        0        0                 
21000000 142      0        0        0        0        0                 
15000000 138      0        0        0        0        0                 
8000000  137      0        0        0        0        0                 
22000000 123      0        0        0        0        0                 
14000000 121      0        0        0        0        0                 
16000000 110      0        0        0        0        0                 
17000000 99       0        0        0        0        0                 
34000000 91       0        0        0        0        0                 
Matched: 161,622 / 161,622
Rows: 223; Cols: 6

Performance Benchmarking

I know there are different solutions, and rare accomplishes summarization in a way that grep, awk, etc can't, however I think it's worth analyzing the performance of this tool vs standard tools to show that it's at least as good.

It's worth noting that in many of these results rare is just as fast, but part of that reason is that it consumes CPU in a more efficient way (go is great at parallelization). So take that into account, for better or worse.

All tests were done on ~200MB of gzip'd nginx logs spread acorss 10 files.

Each program was run 3 times and the last time was taken (to make sure things were cached equally).

zcat & grep

$ time zcat testdata/* | grep -Poa '" (\d{3})' | wc -l
1131354

real	0m0.990s
user	0m1.480s
sys	0m0.080s

$ time zcat testdata/* | grep -Poa '" 200' > /dev/null

real	0m1.136s
user	0m1.644s
sys	0m0.044s

I believe the largest holdup here is the fact that zcat will pass all the data to grep via a synchronous pipe, whereas rare can process everything in async batches. Using pigz instead didn't yield different results, but on single-file results they did perform comparibly.

Silver Searcher (ag)

$ ag --version
ag version 0.31.0

Features:
  +jit +lzma +zlib

$ time ag -z '" (\d{3})' testdata/* | wc -l
1131354

real	0m3.944s
user	0m3.904s
sys	0m0.152s

rare

$ rare -v
rare version 0.1.16, 11ca2bfc4ad35683c59929a74ad023cc762a29ae

$ time rare filter -m '" (\d{3})' -e "{1}" -z testdata/* | wc -l
Matched: 1,131,354 / 3,638,594
1131354

real	0m0.927s
user	0m1.764s
sys	0m1.144s

$ time rare histo -m '" (\d{3})' -e "{1}" -z testdata/*
200                 1,124,767 
404                 6,020     
304                 371       
403                 98        
301                 84        

Matched: 1,131,354 / 3,638,594
Groups:  6

real	0m0.284s
user	0m1.648s
sys	0m0.048s

License

Copyright (C) 2019  Christopher LaPointe

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].