All Projects → xros → jsonpyes

xros / jsonpyes

Licence: other
The tool which imports raw JSON to ElasticSearch in one line of commands

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to jsonpyes

android-downloader
An powerful download library for Android.
Stars: ✭ 375 (+459.7%)
Mutual labels:  multi-threading
soda-for-java
SODA (Simple Oracle Document Access) for Java is an Oracle library for writing Java apps that work with JSON (and not only JSON!) in the Oracle Database. SODA allows your Java app to use the Oracle Database as a NoSQL document store.
Stars: ✭ 61 (-8.96%)
Mutual labels:  json-data
superfast
⚡ SuperFast codecs for fre:ac
Stars: ✭ 59 (-11.94%)
Mutual labels:  multi-threading
DataStore
Visual develop tool of creating mocked Json
Stars: ✭ 30 (-55.22%)
Mutual labels:  json-data
thread-pool
BS::thread_pool: a fast, lightweight, and easy-to-use C++17 thread pool library
Stars: ✭ 1,043 (+1456.72%)
Mutual labels:  multi-threading
table2pojo
Generate POJOs for database table/columns
Stars: ✭ 16 (-76.12%)
Mutual labels:  multi-threading
krates
📦 A free HTTP based JSON storage service
Stars: ✭ 36 (-46.27%)
Mutual labels:  json-data
TAOMP
《多处理器编程的艺术》一书中的示例代码实现,带有注释与单元测试
Stars: ✭ 39 (-41.79%)
Mutual labels:  multi-threading
Master-Thesis
Deep Reinforcement Learning in Autonomous Driving: the A3C algorithm used to make a car learn to drive in TORCS; Python 3.5, Tensorflow, tensorboard, numpy, gym-torcs, ubuntu, latex
Stars: ✭ 33 (-50.75%)
Mutual labels:  multi-threading
cucumber-performance
A performance testing framework for cucumber
Stars: ✭ 28 (-58.21%)
Mutual labels:  multi-threading
Octavo
Verilog FPGA Parts Library. Old Octavo soft-CPU project.
Stars: ✭ 66 (-1.49%)
Mutual labels:  multi-threading
api-data
Static JSON data from the API, plus a JSON Schema
Stars: ✭ 88 (+31.34%)
Mutual labels:  json-data
PowerJSON
Powerjson is json's improved data format.
Stars: ✭ 24 (-64.18%)
Mutual labels:  json-data
graphql-cli-load
A graphql-cli data import plugin to call mutations with data from JSON/CSV files
Stars: ✭ 63 (-5.97%)
Mutual labels:  json-data
Fibrous
Concurrency library for .Net
Stars: ✭ 47 (-29.85%)
Mutual labels:  multi-threading
jsonfiddle
JSON Fiddling
Stars: ✭ 14 (-79.1%)
Mutual labels:  json-data
postal-codes-json-xml-csv
Collection of postal codes in different formats, ready for importing.
Stars: ✭ 181 (+170.15%)
Mutual labels:  json-data
covid-19
Current and historical coronavirus covid-19 confirmed, recovered, deaths and active case counts segmented by country and region. Includes csv, json and sqlite data along with an interactive website explorer.
Stars: ✭ 15 (-77.61%)
Mutual labels:  json-data
synapse
Non-intrusive C++ signal programming library
Stars: ✭ 48 (-28.36%)
Mutual labels:  multi-threading
brackit
Query processor with proven optimizations, ready to use for your document store to query semi-structured data with a JSONiq like extension of XQuery. Can also be used as an ad-hoc in-memory query processor.
Stars: ✭ 28 (-58.21%)
Mutual labels:  json-data

json-py-es

Downloads Build Status GitHub release GitHub license

Alexander Liu

  • To import raw JSON data files to ElasticSearch in one line of commands

jsonpyes diagram

Very fast -- 4 to 10 times faster when processing big data.

Installation

pip install jsonpyes

Notice: Before using pip to install jsonpyes, firstly you need to install python-pip on your system. ( Supports Python2.7, 3,4, 3.5, 3.6 )

jsonpyes

user interface

Instructions:

There are 3 proccesses of importing raw JSON data to ElasticSearch
1. Only validating raw JSON data
2. Without validating ,just import data to ElasticSearch
3. After validating successfully, then import data to ElasticSearch

A valid JSON file here refers to a JSON file stacked with many lines of data

file valid_data.json and its content

{"key1": "valueA", "key2": {"sub_key1": "value2A", "sub_key2": ["Good", "Morning"]}}
{"key1": "valueB", "key2": {"sub_key1": "value2B", "sub_key2": ["Good", "Afternoon"]}}
...
{"key1": "valueC", "key2": {"sub_key1": "value2C", "sub_key2": ["Good", "Evening"]}}

Functions included

1. Validating JSON format data

jsonpyes --data raw_data.json --check

If the json data file is valid:

json valid

If the json data file is invalid:

json invalid

2. Only importing without validating

jsonpyes --data raw_data.json --bulk http://localhost:9200 --import --index myindex2 --type mytype2

Notice: If the raw JSON data file is invalid, jsonpyes will not import it.

Or enable multi-threads jsonpyes --data raw_data.json --bulk http://localhost:9200 --import --index myindex2 --type mytype2 --thread 8

no threads

jsonpyes supports multi-threads when importing data to elasticsearch

Multi-threads comparison
  1. No multi-threads

    benchmarks

  2. With 8 threads and jsonpyes cuts files into pieces, then destributes to workers fairly

    use helpers.bulk API with multi-threads

As you can see these two containers have same docs loaded, if we use --thread 8 it could be several times faster, usually 5 to 10 times faster. That really depends on your computer/server resources. This was tested on a 4GB RAM / 2.4Ghz intel i5 Linux x64 laptop system.

And it works.

it works

3. Both validating and importing

jsonpyes --data raw_data.json --bulk http://localhost:9200 --import --index myindex1 --type mytype1 --check

validating and importing

And it works.

the results

Reference

  • Algorithm handwritting

handwritting

Happy hacking!
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].