All Projects → fauna-labs → faunadb-importer

fauna-labs / faunadb-importer

Licence: other
Importer tool to load data into FaunaDB

Programming Languages

scala
5932 projects
java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to faunadb-importer

daru-io
daru-io is a plugin gem to the existing daru gem, which aims to add support to Importing DataFrames from / Exporting DataFrames to multiple formats.
Stars: ✭ 21 (-36.36%)
Mutual labels:  importer
Trakt2Letterboxd
Script to export your movies from Trakt to Letterboxd
Stars: ✭ 27 (-18.18%)
Mutual labels:  importer
redisgraph-bulk-loader
A Python utility for building RedisGraph databases from CSV inputs
Stars: ✭ 59 (+78.79%)
Mutual labels:  bulk-loader
VGltf
A glTF 2.0 importer/exporter library written in pure C# with support for use in Unity
Stars: ✭ 53 (+60.61%)
Mutual labels:  importer
MidiAnimImporter
A custom importer that imports a .mid file (SMF; Standard MIDI File) into an animation clip.
Stars: ✭ 69 (+109.09%)
Mutual labels:  importer
Mod3-MHW-Importer
Blender Mod3 Import-Exporter for Monster Hunter World
Stars: ✭ 44 (+33.33%)
Mutual labels:  importer
Embulk
Embulk: Pluggable Bulk Data Loader.
Stars: ✭ 1,609 (+4775.76%)
Mutual labels:  bulk-loader
local-docker-db
A bunch o' Docker Compose files used to quickly spin up local databases.
Stars: ✭ 251 (+660.61%)
Mutual labels:  faunadb
vim-js-file-import
Import/require files in javascript and typescript with single button!
Stars: ✭ 130 (+293.94%)
Mutual labels:  importer
Excel-Timesheet
⏰ This Add-In is used to produce a timesheet file with functionality to import your Google Timeline. The standard timesheet has options for start and end dates, day of week and default start, end and break times. The Google timeline options are start and end dates, UTC selection, daylight savings time parameters and title filter for timeline ent…
Stars: ✭ 25 (-24.24%)
Mutual labels:  bulk-loader
UnityTexture3DAtlasImportPipeline
A Texture3D Atlas Import Pipeline for Unity 2019.3 and newer.
Stars: ✭ 24 (-27.27%)
Mutual labels:  importer
beancount-plugins
A collection of my custom beancount importers & price sources, written in Python
Stars: ✭ 14 (-57.58%)
Mutual labels:  importer
sublime-simple-import
A Sublime Text Plugin that helps you to import your modules.
Stars: ✭ 15 (-54.55%)
Mutual labels:  importer
fluxbb to flarum
🚀 FluxBB to Flarum importer
Stars: ✭ 14 (-57.58%)
Mutual labels:  importer
buttercup-importer
🎣 3rd-party archive importer for Buttercup
Stars: ✭ 39 (+18.18%)
Mutual labels:  importer
neo4j doc manager
Doc manager for Neo4j
Stars: ✭ 95 (+187.88%)
Mutual labels:  importer
laravel-json-syncer
A Json importer and exporter for Laravel.
Stars: ✭ 22 (-33.33%)
Mutual labels:  importer
shopnote
shopnote is a JAMstack application that helps in creating notes with shopping items. This application is built to showcase the JAMstack concept using Fauna, Netlify Serverless Functions and GatsbyJS.
Stars: ✭ 15 (-54.55%)
Mutual labels:  faunadb
auth-email
🔐 Lightweight authentication specifically designed for Next.js
Stars: ✭ 73 (+121.21%)
Mutual labels:  faunadb
couchbase-java-importer
This is a pluggable importer for Couchbase
Stars: ✭ 13 (-60.61%)
Mutual labels:  importer

FaunaDB Importer

FaunaDB Importer is a command line utility to help you import static data into FaunaDB. It can import data into FaunaDB Cloud or an on-premises FaunaDB Enterprise cluster.

Supported input file formats:

  • JSON
  • CSV
  • TSV

Requirements:

  • Java 8

Usage

Download the latest version and extract the zip file. Inside the extracted folder, run:

./bin/faunadb-importer \
  import-file \
  --secret <keys-secret> \
  --class <class-name> \
  <file-to-import>

NOTE: The command line arguments are the same on Windows, but you must use a different startup script. For example:

.\bin\faunadb-importer.bat import-file --secret <keys-secret> --class <class-name> <file-to-import>

For example:

./bin/faunadb-importer \
  import-file \
  --secret "abc" \
  --class users \
  data/users.json

The importer will load all data into the specified class, preserving the field names and types as described in the import file.

You can also type ./bin/faunadb-importer --help for more detailed information.

How it works

The importer is a stateful process separated into two phases: ID generation and data import.

First, the importer will parse all records and generate unique IDs by calling the next_id function for each record. Pre-generating IDs beforehand allows us to import schemas containing relational data while keeping foreign keys consistent. It also ensures that we can safely re-run the process without the risk of duplicating information.

In order to map legacy IDs to newly generated Fauna IDs, the importer will:

  • Check if there is a field configured with the ref type. The field's value will be used as the lookup term for the new Fauna ID.
  • If no field is configured with the ref type, the importer will assign a sequential number for each record as the lookup term for the new Fauna ID.

Once this phase completes, the pre-generated IDs will be stored at the cache directory. In case of a re-run, the importer will load the IDs from disk and skip this phase.

Second, the importer will insert all records into FaunaDB, using the pre-generated IDs from the first step as their ref field.

At this phase, if the import fails to run due to data inconsistency, it is:

  • SAFE to fix data inconsistencies in any field except fields configured with the ref type.
  • NOT SAFE to change fields configured with the ref type as they will be used as the lookup term for the pre-generated ID from the first phase.
  • NOT SAFE to remove entries from the import file if you don't have a field configured as a ref field; this will alter the sequential number assigned to the record.

As long as you keep the cache directory intact, it is safe to re-run the process until the import completes. If you want to use the importer again with a different input file, you must empty the cache directory first.

File structure

.
├── README.md                    # This file
├── bin                          #
│   ├── faunadb-importer         # Unix startup script
│   └── faunadb-importer.bat     # Windows startup script
├── cache                        # Where the importer saves its cache
├── data                         # Where you should copy the files you wish to import
├── lib                          #
│   └── faunadb-importer-1.0.jar # The importer library
└── logs                         # Logs for each execution

Advanced usage

Configuring fields

When importing JSON files, field names and types are optional; when importing text files, you must specify each field's name and type in order using the --format option:

./bin/faunadb-importer \
  import-file \
  --secret "<your-keys-secret-here>" \
  --class <your-class-name> \
  --format "<field-name>:<field-type>,..." \
  <file-to-import>

For example:

./bin/faunadb-importer \
  import-file \
  --secret "abc" \
  --class users \
  --format "id:ref, username:string, vip:bool" \
  data/users.csv

Supported types:

Name Description
string A string value
long A numeric value
double A double precision numeric value
bool A boolean value
ref A ref value. It can be used to mark the field as a primary key or to reference another class when importing multiple files. For example city:ref(cities)
ts A numeric value representing the number of milliseconds passed since 1970-01-01 00:00:00. You can also specify your own format as a parameter. For example: ts("dd/MM/yyyyTHH:mm:ss.000Z")
date A date value formatted as yyyy-MM-dd. You can also specify your own format as a parameter. For example: date("dd/MM/yyyy")

Renaming fields

You can rename fields from the input file as they are inserted into FaunaDB with the following syntax:

<field-name>-><new-field-name>:<field-type>

For example:

./bin/faunadb-importer \
  import-file \
  --secret "abc" \
  --class users \
  --format "id:ref, username->userName:string, vip->VIP:bool" \
  data/users.csv

Ignoring root element

When importing a JSON file where the root element of the file is an array, or when importing a text file where the first line is the file header, you can skip the root element with the --skip-root option. For example:

./bin/faunadb-importer \
  import-file \
  --secret "abc" \
  --class users \
  --skip-root true \
  data/users.csv

Ignoring fields

You can ignore fields with the --ignore-fields option. For example:

./bin/faunadb-importer \
  import-file \
  --secret "abc" \
  --class users \
  --format "id:ref, username->userName:string, vip->VIP:bool" \
  --ignore-fields "id" \
  data/users.csv

NOTE: In the above example, we omit the id field when importing the data into FaunaDB, but we still use the id field as the ref type so that the importer tool will properly map the newly-generated Fauna ID for that specific user.

How to maintain data in chronological order

You can maintain chronological order when importing data by using the --ts-field option. For example:

./bin/faunadb-importer \
  import-file \
  --secret "abc" \
  --class users \
  --ts-field "created_at" \
  data/users.csv

The value configured in the --ts-field option will be used as the ts field for the imported instance.

Importing to your own cluster

By default, the importer will load your data into FaunaDB Cloud. If you wish to import the data to your own cluster, you can use the --endpoints option. For example:

./bin/faunadb-importer \
  import-file \
  --secret "abc" \
  --class users \
  --endpoints "http://10.0.0.120:8443, http://10.0.0.121:8443" \
  data/users.csv

NOTE: The importer will load balance requests across all configured endpoints.

Importing multiple files

In order to import multiple files, you must run the importer with a schema definition file. For example:

./bin/faunadb-importer \
  import-schema \
  --secret "abc" \
  data/my-schema.yaml

Schema definition syntax

<file-address>:
  class: <class-name>
  skipRoot: <boolean>
  tsField: <field-name>
  fields:
    - name: <field-name>
      type: <field-type>
      rename: <new-field-name>
  ignoredFields:
    - <field-name>

For example:

data/users.json:
  class: users
  fields:
    - name: id
      type: ref

    - name: name
      type: string

  ignoredFields:
    - id

data/tweets.csv:
  class: tweets
  tsField: created_at
  fields:
    - name: id
      type: ref

    - name: user_id
      type: ref(users)
      rename: user_ref

    - name: text
      type: string
      rename: tweet

  ignoredFields:
    - id
    - created_at

Performance considerations

The importer's default settings should be enough to provide good performance for most cases. Still, there are a few things that are worth mentioning:

Memory

You can set the maximum amount of memory available to the import tool with -J-Xmx. For example:

./bin/faunadb-importer \
  -J-Xmx10G \
  import-schema \
  --secret "abc" \
  data/my-schema.yaml

NOTE: Parameters prefixed with -J must be placed as the first parameters for the import tool.

Batch sizes

The size of each individual batch is controlled by --batch-size parameter.

In general, individual requests will have a higher latency with a larger batch size. However, the overall throughput of the import process may increase by inserting more records in a single request.

Large batches can exceed the maximum size of a HTTP request, forcing the import tool to split the batch into smaller requests, therefore degrading the overall performance.

Default: 50 records per batch.

Managing concurrency

Concurrency is configured using the --concurrent-streams parameter.

A large number of concurrent streams can cause timeouts. When timeouts happen, the import tool will retry failing requests applying exponential backoff to each request.

Default: the number of available processors * 2

Backoff configuration

Exponential backoff is a combination of the follow parameters:

  • network-errors-backoff-time: The number of seconds to delay new requests when the network is unstable. Default: 1 second.
  • network-errors-backoff-factor: The number to multiply network-errors-backoff-time by per network issue detected; not to exceed max-network-errors-backoff-time. Default: 2.
  • max-network-errors-backoff-time: The maximum number of seconds to delay new requests when applying exponential backoff. Default: 60 seconds.
  • max-network-errors: The maximum number of network errors tolerated within the configured timeframe. Default: 50 errors.
  • reset-network-errors-period: The number of seconds the import tool will wait for a new network error before resetting the error count. Default: 120 seconds.

License

All projects in this repository are licensed under the Mozilla Public License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].