All Projects → jf-tech → Omniparser

jf-tech / Omniparser

Licence: mit
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.

Programming Languages

javascript
184084 projects - #8 most used programming language
go
31211 projects - #10 most used programming language
golang
3204 projects

Projects that are alternatives of or similar to Omniparser

Choetl
ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+151.35%)
Mutual labels:  json, xml, csv, parser, etl
Scobot
SCORM API for Content. JavaScript library, QUnit tests and examples.
Stars: ✭ 128 (-13.51%)
Mutual labels:  json, xml, schema, schemas
Stream Parser
⚡ PHP7 / Laravel Multi-format Streaming Parser
Stars: ✭ 391 (+164.19%)
Mutual labels:  json, xml, csv, parser
Etl.net
Mass processing data with a complete ETL for .net developers
Stars: ✭ 129 (-12.84%)
Mutual labels:  csv, etl, transform
Ps Webapi
(Migrated from CodePlex) Let PowerShell Script serve or command-line process as WebAPI. PSWebApi is a simple library for building ASP.NET Web APIs (RESTful Services) by PowerShell Scripts or batch/executable files out of the box.
Stars: ✭ 24 (-83.78%)
Mutual labels:  json, xml, csv
Xml Js
Converter utility between XML text and Javascript object / JSON text.
Stars: ✭ 874 (+490.54%)
Mutual labels:  json, xml, parser
Fsharp.data
F# Data: Library for Data Access
Stars: ✭ 631 (+326.35%)
Mutual labels:  json, xml, csv
Countries States Cities Database
🌍 World countries, states, regions, provinces, cities, towns in JSON, SQL, XML, PLIST, YAML, and CSV. All Countries, States, Cities with ISO2, ISO3, Country Code, Phone Code, Capital, Native Language, Timezones, Latitude, Longitude, Region, Subregion, Flag Emoji, and Currency. #countries #states #cities
Stars: ✭ 1,130 (+663.51%)
Mutual labels:  json, xml, csv
Fast Xml Parser
Validate XML, Parse XML to JS/JSON and vise versa, or parse XML to Nimn rapidly without C/C++ based libraries and no callback
Stars: ✭ 1,021 (+589.86%)
Mutual labels:  json, xml, parser
Magento2 Import Export Sample Files
Default Magento 2 CE import / export CSV files & sample files for Firebear Improved Import / Export extension
Stars: ✭ 68 (-54.05%)
Mutual labels:  json, xml, csv
Filecontextcore
FileContextCore is a "Database"-Provider for Entity Framework Core and adds the ability to store information in files instead of being limited to databases.
Stars: ✭ 91 (-38.51%)
Mutual labels:  json, xml, csv
Dasel
Query, update and convert data structures from the command line. Comparable to jq/yq but supports JSON, TOML, YAML, XML and CSV with zero runtime dependencies.
Stars: ✭ 759 (+412.84%)
Mutual labels:  json, xml, parser
Sheetjs
📗 SheetJS Community Edition -- Spreadsheet Data Toolkit
Stars: ✭ 28,479 (+19142.57%)
Mutual labels:  json, xml, csv
Structured Text Tools
A list of command line tools for manipulating structured text data
Stars: ✭ 6,180 (+4075.68%)
Mutual labels:  json, xml, csv
Dbwebapi
(Migrated from CodePlex) DbWebApi is a .Net library that implement an entirely generic Web API (RESTful) for HTTP clients to call database (Oracle & SQL Server) stored procedures or functions in a managed way out-of-the-box without any configuration or coding.
Stars: ✭ 84 (-43.24%)
Mutual labels:  json, xml, csv
Metayaml
A powerful schema validator!
Stars: ✭ 92 (-37.84%)
Mutual labels:  json, xml, schema
Parsrs
CSV, JSON, XML text parsers and generators written in pure POSIX shellscript
Stars: ✭ 56 (-62.16%)
Mutual labels:  json, xml, csv
Schema Registry
Confluent Schema Registry for Kafka
Stars: ✭ 1,647 (+1012.84%)
Mutual labels:  json, schema, schemas
Servicestack
Thoughtfully architected, obscenely fast, thoroughly enjoyable web services for all
Stars: ✭ 4,976 (+3262.16%)
Mutual labels:  json, xml, csv
Countries
World countries in JSON, CSV, XML and Yaml. Any help is welcome!
Stars: ✭ 5,379 (+3534.46%)
Mutual labels:  json, xml, csv

omniparser

CI codecov Go Report Card PkgGoDev Mentioned in Awesome Go

Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JSON, and custom formats) in streaming fashion and transforms data into desired JSON output based on a schema written in JSON.

Golang Version: 1.14

Documentation

Docs:

References:

Examples:

In the example folders above you will find pairs of input files and their schema files. Then in the .snapshots sub directory, you'll find their corresponding output files.

Online Playground

Use https://omniparser.herokuapp.com/ (may need to wait for a few seconds for heroku instance to wake up) for trying out schemas and inputs, yours or existing samples, to see how ingestion and transform work.

Why

  • No good ETL transform/parser library exists in Golang.
  • Even looking into Java and other languages, choices aren't many and all have limitations:
    • Smooks is dead, plus its EDI parsing/transform is too heavyweight, needing code-gen.
    • BeanIO can't deal with EDI input.
    • Jolt can't deal with anything other than JSON input.
    • JSONata still only JSON -> JSON transform.
  • Many of the parsers/transforms don't support streaming read, loading entire input into memory - not acceptable in some situations.

Requirements

  • Golang 1.14

Recent Major Feature Additions/Changes

  • Added Transform.RawRecord() for caller of omniparser to access the raw ingested record.
  • Deprecated custom_parse in favor of custom_func (custom_parse is still usable for back-compatibility, it is just removed from all public docs and samples).
  • Added NonValidatingReader EDI segment reader.
  • Added fixed-length file format support in omniv21 handler.
  • Added EDI file format support in omniv21 handler.
  • Major restructure/refactoring
    • Upgrade omni schema version to omni.2.1 due a number of incompatible schema changes:
      • 'result_type' -> 'type'
      • 'ignore_error_and_return_empty_str -> 'ignore_error'
      • 'keep_leading_trailing_space' -> 'no_trim'
    • Changed how we handle custom functions: previously we always use strings as in param type as well as result param type. Not anymore, all types are supported for custom function in and out params.
    • Changed the way how we package custom functions for extensions: previously we collect custom functions from all extensions and then pass all of them to the extension that is used; This feels weird, now changed to only the custom functions included in a particular extension are used in that extension.
    • Deprecated/removed most of the custom functions in favor of using 'javascript'.
    • A number of package renaming.
  • Added CSV file format support in omniv2 handler.
  • Introduced IDR node cache for allocation recycling.
  • Introduced IDR for in-memory data representation.
  • Added trie based high performance times.SmartParse.
  • Command line interface (one-off transform cmd or long-running http server mode).
  • javascript engine integration as a custom_func.
  • JSON stream parser.
  • Extensibility:
    • Ability to provide custom functions.
    • Ability to provide custom schema handler.
    • Ability to customize the built-in omniv2 schema handler's parsing code.
    • Ability to provide a new file format support to built-in omniv2 schema handler.

Footnotes

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].