All Projects → moshe → Elasticsearch_loader

moshe / Elasticsearch_loader

Licence: mit
A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch

Programming Languages

python
139335 projects - #7 most used programming language

Projects that are alternatives of or similar to Elasticsearch loader

Devops Python Tools
80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.
Stars: ✭ 406 (+35.33%)
Mutual labels:  json, parquet, elasticsearch
Vscode Data Preview
Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files
Stars: ✭ 245 (-18.33%)
Mutual labels:  json, csv, parquet
Choetl
ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+24%)
Mutual labels:  json, csv, parquet
Pytablewriter
pytablewriter is a Python library to write a table in various formats: CSV / Elasticsearch / HTML / JavaScript / JSON / LaTeX / LDJSON / LTSV / Markdown / MediaWiki / NumPy / Excel / Pandas / Python / reStructuredText / SQLite / TOML / TSV.
Stars: ✭ 422 (+40.67%)
Mutual labels:  json, csv, elasticsearch
Rumble
⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more
Stars: ✭ 58 (-80.67%)
Mutual labels:  json, csv, parquet
Elasticsearch Dataformat
Excel/CSV/BulkJSON downloads on Elasticsearch.
Stars: ✭ 135 (-55%)
Mutual labels:  json, csv, elasticsearch
Foxylink
An easy way to handle integration tasks in a reliable way and run them on 1C:Enterprise server
Stars: ✭ 77 (-74.33%)
Mutual labels:  json, csv, elasticsearch
Json Logging Python
Python logging library to emit JSON log that can be easily indexed and searchable by logging infrastructure such as ELK, EFK, AWS Cloudwatch, GCP Stackdriver
Stars: ✭ 143 (-52.33%)
Mutual labels:  json, logstash, elasticsearch
Elastix
A simple Elasticsearch REST client written in Elixir.
Stars: ✭ 231 (-23%)
Mutual labels:  json, elasticsearch
Algeria Cities
The list of all Algerian provinces and cities according to the official division in different formats: csv, xlsx, php, json, etc.
Stars: ✭ 232 (-22.67%)
Mutual labels:  json, csv
Pxi
🧚 pxi (pixie) is a small, fast, and magical command-line data processor similar to jq, mlr, and awk.
Stars: ✭ 248 (-17.33%)
Mutual labels:  json, csv
Elastic
R client for the Elasticsearch HTTP API
Stars: ✭ 227 (-24.33%)
Mutual labels:  json, elasticsearch
Babelish
Chaotically confused, like Babel
Stars: ✭ 217 (-27.67%)
Mutual labels:  json, csv
Json To Csv
Nested JSON to CSV Converter
Stars: ✭ 216 (-28%)
Mutual labels:  json, csv
dbd
dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.
Stars: ✭ 30 (-90%)
Mutual labels:  csv, parquet
DaFlow
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
Stars: ✭ 24 (-92%)
Mutual labels:  csv, parquet
Octosql
OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.
Stars: ✭ 2,579 (+759.67%)
Mutual labels:  json, csv
Miller
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Stars: ✭ 4,633 (+1444.33%)
Mutual labels:  json, csv
Http Rpc
Lightweight REST for Java
Stars: ✭ 298 (-0.67%)
Mutual labels:  json, csv
Sqawk
Like Awk but with SQL and table joins
Stars: ✭ 263 (-12.33%)
Mutual labels:  json, csv

elasticsearch_loader Build Status Can I Use Python 3? PyPI version

Main features

  • Batch upload CSV (actually any *SV) files to Elasticsearch
  • Batch upload JSON files / JSON lines to Elasticsearch
  • Batch upload parquet files to Elasticsearch
  • Pre defining custom mappings
  • Delete index before upload
  • Index documents with _id from the document itself
  • Load data directly from url
  • SSL and basic auth
  • Unicode Support ✌️

Plugins

In order to install plugin, simply run pip install plugin-name

  • esl-redis - Read continuously from a redis list(s) and index to elasticsearch
  • esl-s3 - Plugin for listing and indexing files from S3

Test matrix

python / es 5.6.16 6.8.0 7.1.1
2.7 V V V
3.7 V V V

Installation

pip install elasticsearch-loader
In order to add parquet support run pip install elasticsearch-loader[parquet]

Usage

(venv)/tmp $ elasticsearch_loader --help
Usage: elasticsearch_loader [OPTIONS] COMMAND [ARGS]...

Options:
  -c, --config-file TEXT          Load default configuration file from esl.yml
  --bulk-size INTEGER             How many docs to collect before writing to
                                  Elasticsearch (default 500)
  --es-host TEXT                  Elasticsearch cluster entry point. (default
                                  http://localhost:9200)
  --verify-certs                  Make sure we verify SSL certificates
                                  (default false)
  --use-ssl                       Turn on SSL (default false)
  --ca-certs TEXT                 Provide a path to CA certs on disk
  --http-auth TEXT                Provide username and password for basic auth
                                  in the format of username:password
  --index TEXT                    Destination index name  [required]
  --delete                        Delete index before import? (default false)
  --update                        Merge and update existing doc instead of
                                  overwrite
  --progress                      Enable progress bar - NOTICE: in order to
                                  show progress the entire input should be
                                  collected and can consume more memory than
                                  without progress bar
  --type TEXT                     Docs type. TYPES WILL BE DEPRECATED IN APIS
                                  IN ELASTICSEARCH 7, AND COMPLETELY REMOVED
                                  IN 8.  [required]
  --id-field TEXT                 Specify field name that be used as document
                                  id
  --as-child                      Insert _parent, _routing field, the value is
                                  same as _id. Note: must specify --id-field
                                  explicitly
  --with-retry                    Retry if ES bulk insertion failed
  --index-settings-file FILENAME  Specify path to json file containing index
                                  mapping and settings, creates index if
                                  missing
  --timeout FLOAT                 Specify request timeout in seconds for
                                  Elasticsearch client
  --encoding TEXT                 Specify content encoding for input files
  --keys TEXT                     Comma separated keys to pick from each
                                  document
  -h, --help                      Show this message and exit.

Commands:
  csv
  json     FILES with the format of [{"a": "1"}, {"b": "2"}]
  parquet
  redis
  s3

Examples

Load 2 CSV to elasticsearch

elasticsearch_loader --index incidents --type incident csv file1.csv file2.csv

Load JSONs to elasticsearch

elasticsearch_loader --index incidents --type incident json *.json

Load all git commits into elasticsearch

git log --pretty=format:'{"sha":"%H","author_name":"%aN", "author_email": "%aE","date":"%ad","message":"%f"}' | elasticsearch_loader --type git --index git json --json-lines -

Load parquet to elasticsearch

elasticsearch_loader --index incidents --type incident parquet file1.parquet

Load CSV from github repo (actually any http/https is ok)

elasticsearch_loader --index data --type avg_height --id-field country json https://raw.githubusercontent.com/samayo/country-data/master/src/country-avg-male-height.json

Load data from stdin

generate_data | elasticsearch_loader --index data --type incident csv -

Read id from incident_id field

elasticsearch_loader --id-field incident_id --index incidents --type incident csv file1.csv file2.csv

Load custom mappings

elasticsearch_loader --index-settings-file samples/mappings.json --index incidents --type incident csv file1.csv file2.csv

Tests and sample data

End to end and regression tests are located under test directory and can run by running ./test.py Input formats can be found under samples

Stargazers over time

Stargazers over time

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].