Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → moshe → Elasticsearch_loader

moshe / Elasticsearch_loader

Licence: mit

A tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch

Programming Languages

python

139335 projects - #7 most used programming language

Labels

json elasticsearch csv logstash parquet

Projects that are alternatives of or similar to Elasticsearch loader

Devops Python Tools

80+ DevOps & Data CLI Tools - AWS, GCP, GCF Python Cloud Function, Log Anonymizer, Spark, Hadoop, HBase, Hive, Impala, Linux, Docker, Spark Data Converters & Validators (Avro/Parquet/JSON/CSV/INI/XML/YAML), Travis CI, AWS CloudFormation, Elasticsearch, Solr etc.

Stars: ✭ 406 (+35.33%)

Mutual labels: json, parquet, elasticsearch

Vscode Data Preview

Data Preview 🈸 extension for importing 📤 viewing 🔎 slicing 🔪 dicing 🎲 charting 📊 & exporting 📥 large JSON array/config, YAML, Apache Arrow, Avro, Parquet & Excel data files

Stars: ✭ 245 (-18.33%)

Mutual labels: json, csv, parquet

Choetl

ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)

Stars: ✭ 372 (+24%)

Mutual labels: json, csv, parquet

Pytablewriter

pytablewriter is a Python library to write a table in various formats: CSV / Elasticsearch / HTML / JavaScript / JSON / LaTeX / LDJSON / LTSV / Markdown / MediaWiki / NumPy / Excel / Pandas / Python / reStructuredText / SQLite / TOML / TSV.

Stars: ✭ 422 (+40.67%)

Mutual labels: json, csv, elasticsearch

Rumble

⛈️ Rumble 1.11.0 "Banyan Tree"🌳 for Apache Spark | Run queries on your large-scale, messy JSON-like data (JSON, text, CSV, Parquet, ROOT, AVRO, SVM...) | No install required (just a jar to download) | Declarative Machine Learning and more

Stars: ✭ 58 (-80.67%)

Mutual labels: json, csv, parquet

Elasticsearch Dataformat

Excel/CSV/BulkJSON downloads on Elasticsearch.

Stars: ✭ 135 (-55%)

Mutual labels: json, csv, elasticsearch

Foxylink

An easy way to handle integration tasks in a reliable way and run them on 1C:Enterprise server

Stars: ✭ 77 (-74.33%)

Mutual labels: json, csv, elasticsearch

Json Logging Python

Python logging library to emit JSON log that can be easily indexed and searchable by logging infrastructure such as ELK, EFK, AWS Cloudwatch, GCP Stackdriver

Stars: ✭ 143 (-52.33%)

Mutual labels: json, logstash, elasticsearch

Elastix

A simple Elasticsearch REST client written in Elixir.

Stars: ✭ 231 (-23%)

Mutual labels: json, elasticsearch

Algeria Cities

The list of all Algerian provinces and cities according to the official division in different formats: csv, xlsx, php, json, etc.

Stars: ✭ 232 (-22.67%)

Mutual labels: json, csv

Pxi

🧚 pxi (pixie) is a small, fast, and magical command-line data processor similar to jq, mlr, and awk.

Stars: ✭ 248 (-17.33%)

Mutual labels: json, csv

Elastic

R client for the Elasticsearch HTTP API

Stars: ✭ 227 (-24.33%)

Mutual labels: json, elasticsearch

Babelish

Chaotically confused, like Babel

Stars: ✭ 217 (-27.67%)

Mutual labels: json, csv

Json To Csv

Nested JSON to CSV Converter

Stars: ✭ 216 (-28%)

Mutual labels: json, csv

dbd

dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.

Stars: ✭ 30 (-90%)

Mutual labels: csv, parquet

DaFlow

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

Stars: ✭ 24 (-92%)

Mutual labels: csv, parquet

Octosql

OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.

Stars: ✭ 2,579 (+759.67%)

Mutual labels: json, csv

Miller

Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON

Stars: ✭ 4,633 (+1444.33%)

Mutual labels: json, csv

Http Rpc

Lightweight REST for Java

Stars: ✭ 298 (-0.67%)

Mutual labels: json, csv

Sqawk

Like Awk but with SQL and table joins

Stars: ✭ 263 (-12.33%)

Mutual labels: json, csv

View All Similar Projects ➔

elasticsearch_loader

Main features

Batch upload CSV (actually any *SV) files to Elasticsearch
Batch upload JSON files / JSON lines to Elasticsearch
Batch upload parquet files to Elasticsearch
Pre defining custom mappings
Delete index before upload
Index documents with _id from the document itself
Load data directly from url
SSL and basic auth
Unicode Support ✌️

Plugins

In order to install plugin, simply run pip install plugin-name

esl-redis - Read continuously from a redis list(s) and index to elasticsearch
esl-s3 - Plugin for listing and indexing files from S3

Test matrix

python / es	5.6.16	6.8.0	7.1.1
2.7	V	V	V
3.7	V	V	V

Installation

pip install elasticsearch-loader
In order to add parquet support run pip install elasticsearch-loader[parquet]

Usage

(venv)/tmp $ elasticsearch_loader --help
Usage: elasticsearch_loader [OPTIONS] COMMAND [ARGS]...

Options:
  -c, --config-file TEXT          Load default configuration file from esl.yml
  --bulk-size INTEGER             How many docs to collect before writing to
                                  Elasticsearch (default 500)
  --es-host TEXT                  Elasticsearch cluster entry point. (default
                                  http://localhost:9200)
  --verify-certs                  Make sure we verify SSL certificates
                                  (default false)
  --use-ssl                       Turn on SSL (default false)
  --ca-certs TEXT                 Provide a path to CA certs on disk
  --http-auth TEXT                Provide username and password for basic auth
                                  in the format of username:password
  --index TEXT                    Destination index name  [required]
  --delete                        Delete index before import? (default false)
  --update                        Merge and update existing doc instead of
                                  overwrite
  --progress                      Enable progress bar - NOTICE: in order to
                                  show progress the entire input should be
                                  collected and can consume more memory than
                                  without progress bar
  --type TEXT                     Docs type. TYPES WILL BE DEPRECATED IN APIS
                                  IN ELASTICSEARCH 7, AND COMPLETELY REMOVED
                                  IN 8.  [required]
  --id-field TEXT                 Specify field name that be used as document
                                  id
  --as-child                      Insert _parent, _routing field, the value is
                                  same as _id. Note: must specify --id-field
                                  explicitly
  --with-retry                    Retry if ES bulk insertion failed
  --index-settings-file FILENAME  Specify path to json file containing index
                                  mapping and settings, creates index if
                                  missing
  --timeout FLOAT                 Specify request timeout in seconds for
                                  Elasticsearch client
  --encoding TEXT                 Specify content encoding for input files
  --keys TEXT                     Comma separated keys to pick from each
                                  document
  -h, --help                      Show this message and exit.

Commands:
  csv
  json     FILES with the format of [{"a": "1"}, {"b": "2"}]
  parquet
  redis
  s3

Examples

Load 2 CSV to elasticsearch

elasticsearch_loader --index incidents --type incident csv file1.csv file2.csv

Load JSONs to elasticsearch

elasticsearch_loader --index incidents --type incident json *.json

Load all git commits into elasticsearch

git log --pretty=format:'{"sha":"%H","author_name":"%aN", "author_email": "%aE","date":"%ad","message":"%f"}' | elasticsearch_loader --type git --index git json --json-lines -

Load parquet to elasticsearch

elasticsearch_loader --index incidents --type incident parquet file1.parquet

Load CSV from github repo (actually any http/https is ok)

elasticsearch_loader --index data --type avg_height --id-field country json https://raw.githubusercontent.com/samayo/country-data/master/src/country-avg-male-height.json

Load data from stdin

generate_data | elasticsearch_loader --index data --type incident csv -

Read id from incident_id field

elasticsearch_loader --id-field incident_id --index incidents --type incident csv file1.csv file2.csv

Load custom mappings

elasticsearch_loader --index-settings-file samples/mappings.json --index incidents --type incident csv file1.csv file2.csv

Tests and sample data

End to end and regression tests are located under test directory and can run by running ./test.py Input formats can be found under samples

Stargazers over time

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 300

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (2) 🔗