All Projects → oliver006 → Elasticsearch Test Data

oliver006 / Elasticsearch Test Data

Licence: mit
Generate and upload test data to Elasticsearch for performance and load testing

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Elasticsearch Test Data

Databook
A facebook for data
Stars: ✭ 26 (-86.6%)
Mutual labels:  elasticsearch, data
Elasticsearch Hn
Index & Search Hacker News using Elasticsearch and the HN API
Stars: ✭ 92 (-52.58%)
Mutual labels:  elasticsearch, tornado
Elasticsearch Gmail
Index your Gmail Inbox with Elasticsearch
Stars: ✭ 1,964 (+912.37%)
Mutual labels:  elasticsearch, tornado
Mahuta
IPFS Storage service with search capability
Stars: ✭ 185 (-4.64%)
Mutual labels:  elasticsearch
Elastiflow
Network flow analytics (Netflow, sFlow and IPFIX) with the Elastic Stack
Stars: ✭ 2,322 (+1096.91%)
Mutual labels:  elasticsearch
Elasticsearch Jest Example
ElasticSearch Java Rest Client Examples
Stars: ✭ 189 (-2.58%)
Mutual labels:  elasticsearch
Spandex
Elasticsearch client for Clojure (built on new ES 7.x java client)
Stars: ✭ 195 (+0.52%)
Mutual labels:  elasticsearch
Prometheus Es Exporter
Prometheus Elasticsearch Exporter
Stars: ✭ 184 (-5.15%)
Mutual labels:  elasticsearch
Hrshell
HRShell is an HTTPS/HTTP reverse shell built with flask. It is an advanced C2 server with many features & capabilities.
Stars: ✭ 193 (-0.52%)
Mutual labels:  tornado
Awesome Es
简书的优秀资源可以向专题“elasticsearch”投稿,简书外的资源欢迎向本awesome pull requests
Stars: ✭ 188 (-3.09%)
Mutual labels:  elasticsearch
Vue Smooth Picker
🏄🏼 A SmoothPicker for Vue 2 (like native datetime picker of iOS)
Stars: ✭ 188 (-3.09%)
Mutual labels:  data
Mirador
Tool for visual exploration of complex data.
Stars: ✭ 186 (-4.12%)
Mutual labels:  data
Free Ai Resources
🚀 FREE AI Resources - 🎓 Courses, 👷 Jobs, 📝 Blogs, 🔬 AI Research, and many more - for everyone!
Stars: ✭ 192 (-1.03%)
Mutual labels:  data
Mongo Es
A MongoDB to Elasticsearch connector
Stars: ✭ 185 (-4.64%)
Mutual labels:  elasticsearch
Bitglitter
⚡ Embed data payloads inside of ordinary images or video with high-performance animated 2-D barcodes. (Python library)
Stars: ✭ 193 (-0.52%)
Mutual labels:  data
Wazuh
Wazuh - The Open Source Security Platform
Stars: ✭ 3,154 (+1525.77%)
Mutual labels:  elasticsearch
Rediscompare
rediscompare is a tool for chech two redis db data consistency. 是用来对比、校验redis 多个数据库数据一致性的命令行工具,支持单实例到单实例、单实例到原生集群、多实例多库到单实例等场景。
Stars: ✭ 194 (+0%)
Mutual labels:  data
California Coronavirus Data
The Los Angeles Times' independent tally of coronavirus cases in California.
Stars: ✭ 188 (-3.09%)
Mutual labels:  data
Volbx
Graphical tool for data manipulation written in C++/Qt
Stars: ✭ 187 (-3.61%)
Mutual labels:  data
Opendata
CRAN OpenData Task View
Stars: ✭ 188 (-3.09%)
Mutual labels:  data

Elasticsearch For Beginners: Generate and Upload Randomized Test Data

Because everybody loves test data.

Ok, so what is this thing doing?

es_test_data.py lets you generate and upload randomized test data to your ES cluster so you can start running queries, see what performance is like, and verify your cluster is able to handle the load.

It allows for easy configuring of what the test documents look like, what kind of data types they include and what the field names are called.

Cool, how do I use this?

Let's assume you have an Elasticsearch cluster running.

Python and Tornado are used. Run pip install tornado to install Tornado if you don't have it already.

Lets get started

It's as simple as this:

$ python es_test_data.py --es_url=http://localhost:9200
[I 150604 15:43:19 es_test_data:42] Trying to create index http://localhost:9200/test_data
[I 150604 15:43:19 es_test_data:47] Guess the index exists already
[I 150604 15:43:19 es_test_data:184] Generating 10000 docs, upload batch size is 1000
[I 150604 15:43:19 es_test_data:62] Upload: OK - upload took:    25ms, total docs uploaded:    1000
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took:    25ms, total docs uploaded:    2000
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took:    19ms, total docs uploaded:    3000
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took:    18ms, total docs uploaded:    4000
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took:    27ms, total docs uploaded:    5000
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took:    19ms, total docs uploaded:    6000
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took:    15ms, total docs uploaded:    7000
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took:    24ms, total docs uploaded:    8000
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took:    32ms, total docs uploaded:    9000
[I 150604 15:43:20 es_test_data:62] Upload: OK - upload took:    31ms, total docs uploaded:   10000
[I 150604 15:43:20 es_test_data:216] Done - total docs uploaded: 10000, took 1 seconds
[I 150604 15:43:20 es_test_data:217] Bulk upload average:           23 ms
[I 150604 15:43:20 es_test_data:218] Bulk upload median:            24 ms
[I 150604 15:43:20 es_test_data:219] Bulk upload 95th percentile:   31 ms

Without any command line options, it will generate and upload 1000 documents of the format

{
    "name":<<str>>,
    "age":<<int>>,
    "last_updated":<<ts>>
}

to an Elasticsearch cluster at http://localhost:9200 to an index called test_data.

Not bad but what can I configure?

python es_test_data.py --help gives you the full set of command line ptions, here are the most important ones:

  • --es_url=http://localhost:9200 the base URL of your ES node, don't include the index name
  • --username=<username> the username when basic auth is required
  • --password=<password> the password when basic auth is required
  • --count=### number of documents to generate and upload
  • --index_name=test_data the name of the index to upload the data to. If it doesn't exist it'll be created with these options
    • --num_of_shards=2 the number of shards for the index
    • --num_of_replicas=0 the number of replicas for the index
  • --batch_size=### we use bulk upload to send the docs to ES, this option controls how many we send at a time
  • --force_init_index=False if True it will delete and re-create the index
  • --dict_file=filename.dic if provided the dict data type will use words from the dictionary file, format is one word per line. The entire file is loaded at start-up so be careful with (very) large files.

What about the document format?

Glad you're asking, let's get to the doc format.

The doc format is configured via --format=<<FORMAT>> with the default being name:str,age:int,last_updated:ts.

The general syntax looks like this:

<<field_name>>:<<field_type>>,<<field_name>>::<<field_type>>, ...

For every document, es_test_data.py will generate random values for each of the fields configured.

Currently supported field types are:

  • bool returns a random true or false
  • ts a timestamp (in milliseconds), randomly picked between now +/- 30 days
  • ipv4 returns a random ipv4
  • tstxt a timestamp in the "%Y-%m-%dT%H:%M:%S.000-0000" format, randomly picked between now +/- 30 days
  • int:min:max a random integer between min and max. If min and max are not provided they default to 0 and 100000
  • str:min:max a word ( as in, a string), made up of min to max random upper/lowercase and digit characters. If min and max are optional, defaulting to 3 and 10
  • words:min:max a random number of strs, separated by space, min and max are optional, defaulting to '2' and 10
  • dict:min:max a random number of entries from the dictionary file, separated by space, min and max are optional, defaulting to '2' and 10
  • text:words:min:max a random number of words seperated by space from a given list of - seperated words, the words are optional defaulting to text1 text2 and text3, min and max are optional, defaulting to 1 and 1

Todo

  • document the remaining cmd line options
  • more different format types
  • ...

All suggestions, comments, ideas, pull requests are welcome!

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].