All Projects → RediSearch → ftsb

RediSearch / ftsb

Licence: MIT license
Full Text Search Benchmark, a tool for comparing and evaluating full-text search engines.

Programming Languages

python
139335 projects - #7 most used programming language
go
31211 projects - #10 most used programming language
Makefile
30231 projects

Projects that are alternatives of or similar to ftsb

glDelegateBenchmark
quick and dirty benchmark for TFLite gles delegate on iOS
Stars: ✭ 13 (+8.33%)
Mutual labels:  benchmark
typescript-orm-benchmark
⚖️ ORM benchmarking for Node.js applications written in TypeScript
Stars: ✭ 106 (+783.33%)
Mutual labels:  benchmark
react-benchmark
A tool for benchmarking the render performance of React components
Stars: ✭ 99 (+725%)
Mutual labels:  benchmark
nowplaying-RS-Music-Reco-FM
#nowplaying-RS: Music Recommendation using Factorization Machines
Stars: ✭ 23 (+91.67%)
Mutual labels:  benchmark
Face-Renovation
Official repository of the paper "HiFaceGAN: Face Renovation via Collaborative Suppression and Replenishment".
Stars: ✭ 245 (+1941.67%)
Mutual labels:  benchmark
benchmarking-fft
choosing FFT library...
Stars: ✭ 74 (+516.67%)
Mutual labels:  benchmark
LuaJIT-Benchmarks
LuaJIT Benchmark tests
Stars: ✭ 20 (+66.67%)
Mutual labels:  benchmark
facies classification benchmark
The repository includes PyTorch code, and the data, to reproduce the results for our paper titled "A Machine Learning Benchmark for Facies Classification" (published in the SEG Interpretation Journal, August 2019).
Stars: ✭ 79 (+558.33%)
Mutual labels:  benchmark
node-vs-ruby-io
Node vs Ruby I/O benchmarks when resizing images with libvips.
Stars: ✭ 11 (-8.33%)
Mutual labels:  benchmark
httpit
A rapid http(s) benchmark tool written in Go
Stars: ✭ 156 (+1200%)
Mutual labels:  benchmark
hood
The plugin to manage benchmarks on your CI
Stars: ✭ 17 (+41.67%)
Mutual labels:  benchmark
cpm
Continuous Perfomance Monitor (CPM) for C++ code
Stars: ✭ 39 (+225%)
Mutual labels:  benchmark
revl
Helps to benchmark code for Autodesk Maya.
Stars: ✭ 14 (+16.67%)
Mutual labels:  benchmark
beapi-bench
Tool for benchmarking apis. Uses ApacheBench(ab) to generate data and gnuplot for graphing. Adding new features almost daily
Stars: ✭ 16 (+33.33%)
Mutual labels:  benchmark
rpc-bench
RPC Benchmark of gRPC, Aeron and KryoNet
Stars: ✭ 59 (+391.67%)
Mutual labels:  benchmark
gl-bench
⏱ WebGL performance monitor with CPU/GPU load.
Stars: ✭ 146 (+1116.67%)
Mutual labels:  benchmark
caliper-benchmarks
Sample benchmark files for Hyperledger Caliper https://wiki.hyperledger.org/display/caliper
Stars: ✭ 69 (+475%)
Mutual labels:  benchmark
php-simple-benchmark-script
Очень простой скрипт тестирования быстродействия PHP | Very simple script for testing of PHP operations speed (rusoft repo mirror)
Stars: ✭ 50 (+316.67%)
Mutual labels:  benchmark
map benchmark
Comprehensive benchmarks of C++ maps
Stars: ✭ 132 (+1000%)
Mutual labels:  benchmark
sets
Benchmarks for set data structures: hash sets, dawg's, bloom filters, etc.
Stars: ✭ 20 (+66.67%)
Mutual labels:  benchmark

license CircleCI GitHub issues Codecov Go Report Card GoDoc

Full-Text Search Benchmark (FTSB)

Forum Discord

This repo contains code for benchmarking full text search databases, including RediSearch. This code is based on a fork of work initially made public by TSBS at https://github.com/timescale/tsbs.

Overview

The Full-Text Search Benchmark (FTSB) is a collection of Python and Go programs that are used to generate datasets (Python) and then benchmark(Go) read and write performance of various databases. The intent is to make the FTSB extensible so that a variety of use cases (e.g., ecommerce, jsondata, logs, etc.), query types, and databases can be included and benchmarked. To this end, we hope to help SAs, and prospective database administrators find the best database for their needs and their workloads.

What the FTSB tests

FTSB is used to benchmark bulk load performance and query execution performance. To accomplish this in a fair way, the data to be inserted and the queries to run are always pre-generated and native Go clients are used wherever possible to connect to each database.

Current databases supported

  • RediSearch

Current use cases

Currently, FTSB supports three use cases:

  • nyc_taxis [details here]. This benchmark focusses on write performance, making usage of TLC Trip Record Data that contains the rides that have been performed in yellow cab taxis in New York in 2015. The benchmark loads over 12M documents.

  • enwiki-abstract [details here], from English-language Wikipedia:Database page abstracts. This use case generates 3 TEXT fields per document, and focusses on full text queries performance.

  • enwiki-pages [details here], from English-language Wikipedia:Database last page revisions, containing processed metadata extracted from the full Wikipedia XML dumppage abstracts. This use case generates 3 TEXT fields per document, and focuses on full text queries performance.

  • ecommerce-inventory [details here], from a base dataset of 10K fashion products on Amazon.com which are then multiplexed by categories, sellers, and countries to produce larger datasets (> 1M documents). This benchmark focuses on updates and aggregate performance, splitting into Reads (FT.AGGREGATE), Cursor Reads (FT.CURSOR), and Updates (FT.ADD) the performance numbers. The use case generates an index with 10 TAG fields (3 sortable and 1 non indexed), and 16 NUMERIC sortable non indexed fields per document. The aggregate queries are designed to be extremely costly both on computation and network TX, given that each query aggregates and filters a large portion of the dataset while additionally loading 21 fields. Both the update and read rates can be adjusted.

Installation

FTSB is a collection of Go programs (with some auxiliary bash and Python scripts). The easiest way to get and install the Go programs is to use go get and then issuing make:

# Fetch FTSB and its dependencies
go get github.com/RediSearch/ftsb
cd $GOPATH/src/github.com/RediSearch/ftsb

# Install desired binaries. At a minimum this includes ftsb_redisearch binary:
make

How to use it?

Using FTSB for benchmarking involves 2 phases: data and query generation, and query execution.

Data and query generation ( single time step )

So that benchmarking results are not affected by generating data or queries on-the-fly, with FTSB you generate the data and queries you want to benchmark first, and then you can (re-)use it as input to the benchmarking phase. You can either use one of the pre-baked benchmark suites or develop one of your own. The requirement is that of the generated benchmark input file(s) they all respect the following:

  • CSV format, with one command per line.

  • On each line, the first three columns are related to the query type (READ, WRITE, UPDATE, DELETE, SETUP_WRITE), query group ( any unique identifier you like. example Q1 ), and key position.

  • The columns >3 are the command and command arguments themselves, with one column per command argument.

Here is an example of a CSV line:

WRITE,U1,2,FT.ADD,idx,doc1,1.0,FIELDS,title,hello world

which will translate to the following command being issued:

FT.ADD idx doc1 1.0 FIELDS title "hello world"

The following links deep dive on:

  • Generating inputs from pre-baked benchmark suites (ecommerce-inventory , enwiki-abstract , enwiki-pages)

  • Generating your own use cases

Apart from the CSV files, and not mandatory, there is a benchmark suite specification that enables you to describe in detail the benchmark, what key metrics it provides, and how to automatically run more complex suites (with several steps, etc… ). This is not mandatory and for a simple benchmark, you just need to feed the CSV file as input.

Query execution ( benchmarking )

So that benchmarking results are not affected by generating data or queries on-the-fly, you are always required to feed an input file to the benchmark runner that respects the previous specification format. The overall idea is that the benchmark runner only concerns himself on executing the queries as fast as possible while enabling client runtime variations that influence performance ( and are not related to the use-case himself ) like, command pipelining ( auto pipelining based on time or number of commands ), cluster support, number of concurrent clients, rate limiting ( to find sustainable throughputs ), etc…

Running a benchmark is as simple as feeding an input file to the DB benchmark runner ( in this case ftsb_redisearch ):

ftsb_redisearch --file ecommerce-inventory.redisearch.commands.BENCH.csv

The resulting stdout output will look similar to this:

$ ftsb_redisearch --file ecommerce-inventory.redisearch.commands.BENCH.csv 
    setup writes/sec          writes/sec         updates/sec           reads/sec    cursor reads/sec         deletes/sec     current ops/sec           total ops             TX BW/sRX BW/s
          0 (0.000)           0 (0.000)        1571 (2.623)         288 (7.451)           0 (0.000)           0 (0.000)        1859 (3.713)                1860             3.1KB/s  1.4MB/s
          0 (0.000)           0 (0.000)        1692 (2.627)         287 (7.071)           0 (0.000)           0 (0.000)        1979 (3.597)                3839             3.3KB/s  1.4MB/s
          0 (0.000)           0 (0.000)        1571 (2.761)         293 (7.087)           0 (0.000)           0 (0.000)        1864 (3.679)                5703             3.1KB/s  1.4MB/s
          0 (0.000)           0 (0.000)        1541 (2.983)         280 (7.087)           0 (0.000)           0 (0.000)        1821 (3.739)                7524             3.1KB/s  1.4MB/s
          0 (0.000)           0 (0.000)        1441 (2.989)         255 (7.375)           0 (0.000)           0 (0.000)        1696 (3.773)                9220             2.8KB/s  1.3MB/s

Summary:
Issued 9885 Commands in 5.455sec with 8 workers
        Overall stats:
        - Total 1812 ops/sec                    q50 lat 3.819 ms
        - Setup Writes 0 ops/sec                q50 lat 0.000 ms
        - Writes 0 ops/sec                      q50 lat 0.000 ms
        - Reads 276 ops/sec                     q50 lat 7.531 ms
        - Cursor Reads 0 ops/sec                q50 lat 0.000 ms
        - Updates 1536 ops/sec                  q50 lat 3.117 ms
        - Deletes 0 ops/sec                     q50 lat 0.000 ms
        Overall TX Byte Rate: 3KB/sec
        Overall RX Byte Rate: 1.4MB/sec

Apart from the input file, you should also always specify the name of JSON output file to output benchmark results, in order to do more complex analysis or store the results. Here is the full list of supported options:

$ ftsb_redisearch -h
Usage of ftsb_redisearch:
  -cluster-mode
        If set to true, it will run the client in cluster mode.
  -debug int
        Debug printing (choices: 0, 1, 2). (default 0)
  -do-benchmark
        Whether to write databuild. Set this flag to false to check input read speed. (default true)
  -file string
        File name to read databuild from
  -host string
        The host:port for Redis connection (default "localhost:6379")
  -json-out-file string
        Name of json output file to output benchmark results. If not set, will not print to json.
  -max-rps uint
        enable limiting the rate of queries per second, 0 = no limit. By default no limit is specified and the binaries will stress the DB up to the maximum. A normal "modus operandi" would be to initially stress the system ( no limit on RPS) and afterwards that we know the limit vary with lower rps configurations.
  -metadata-string string
        Metadata string to add to json-out-file. If -json-out-file is not set, will not use this option.
  -pipeline-max-size int
        If limit is zero then no limit will be used and pipelines will only be limited by the specified time window (default 100)
  -pipeline-window-ms float
        If window is zero then implicit pipelining will be disabled (default 0.5)
  -reporting-period duration
        Period to report write stats (default 1s)
  -requests uint
        Number of total requests to issue (0 = all of the present in input file).
  -workers uint
        Number of parallel clients inserting (default 8)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].