jf-tech / Omniparser
Licence: mit
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
Stars: ✭ 148
Programming Languages
javascript
184084 projects - #8 most used programming language
go
31211 projects - #10 most used programming language
golang
3204 projects
Projects that are alternatives of or similar to Omniparser
Choetl
ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+151.35%)
Mutual labels: json, xml, csv, parser, etl
Scobot
SCORM API for Content. JavaScript library, QUnit tests and examples.
Stars: ✭ 128 (-13.51%)
Mutual labels: json, xml, schema, schemas
Stream Parser
⚡ PHP7 / Laravel Multi-format Streaming Parser
Stars: ✭ 391 (+164.19%)
Mutual labels: json, xml, csv, parser
Etl.net
Mass processing data with a complete ETL for .net developers
Stars: ✭ 129 (-12.84%)
Mutual labels: csv, etl, transform
Ps Webapi
(Migrated from CodePlex) Let PowerShell Script serve or command-line process as WebAPI. PSWebApi is a simple library for building ASP.NET Web APIs (RESTful Services) by PowerShell Scripts or batch/executable files out of the box.
Stars: ✭ 24 (-83.78%)
Mutual labels: json, xml, csv
Xml Js
Converter utility between XML text and Javascript object / JSON text.
Stars: ✭ 874 (+490.54%)
Mutual labels: json, xml, parser
Countries States Cities Database
🌍 World countries, states, regions, provinces, cities, towns in JSON, SQL, XML, PLIST, YAML, and CSV. All Countries, States, Cities with ISO2, ISO3, Country Code, Phone Code, Capital, Native Language, Timezones, Latitude, Longitude, Region, Subregion, Flag Emoji, and Currency. #countries #states #cities
Stars: ✭ 1,130 (+663.51%)
Mutual labels: json, xml, csv
Fast Xml Parser
Validate XML, Parse XML to JS/JSON and vise versa, or parse XML to Nimn rapidly without C/C++ based libraries and no callback
Stars: ✭ 1,021 (+589.86%)
Mutual labels: json, xml, parser
Magento2 Import Export Sample Files
Default Magento 2 CE import / export CSV files & sample files for Firebear Improved Import / Export extension
Stars: ✭ 68 (-54.05%)
Mutual labels: json, xml, csv
Filecontextcore
FileContextCore is a "Database"-Provider for Entity Framework Core and adds the ability to store information in files instead of being limited to databases.
Stars: ✭ 91 (-38.51%)
Mutual labels: json, xml, csv
Dasel
Query, update and convert data structures from the command line. Comparable to jq/yq but supports JSON, TOML, YAML, XML and CSV with zero runtime dependencies.
Stars: ✭ 759 (+412.84%)
Mutual labels: json, xml, parser
Sheetjs
📗 SheetJS Community Edition -- Spreadsheet Data Toolkit
Stars: ✭ 28,479 (+19142.57%)
Mutual labels: json, xml, csv
Structured Text Tools
A list of command line tools for manipulating structured text data
Stars: ✭ 6,180 (+4075.68%)
Mutual labels: json, xml, csv
Dbwebapi
(Migrated from CodePlex) DbWebApi is a .Net library that implement an entirely generic Web API (RESTful) for HTTP clients to call database (Oracle & SQL Server) stored procedures or functions in a managed way out-of-the-box without any configuration or coding.
Stars: ✭ 84 (-43.24%)
Mutual labels: json, xml, csv
Parsrs
CSV, JSON, XML text parsers and generators written in pure POSIX shellscript
Stars: ✭ 56 (-62.16%)
Mutual labels: json, xml, csv
Schema Registry
Confluent Schema Registry for Kafka
Stars: ✭ 1,647 (+1012.84%)
Mutual labels: json, schema, schemas
Servicestack
Thoughtfully architected, obscenely fast, thoroughly enjoyable web services for all
Stars: ✭ 4,976 (+3262.16%)
Mutual labels: json, xml, csv
Countries
World countries in JSON, CSV, XML and Yaml. Any help is welcome!
Stars: ✭ 5,379 (+3534.46%)
Mutual labels: json, xml, csv
omniparser
Omniparser is a native Golang ETL parser that ingests input data of various formats (CSV, txt, fixed length/width, XML, EDI/X12/EDIFACT, JSON, and custom formats) in streaming fashion and transforms data into desired JSON output based on a schema written in JSON.
Golang Version: 1.14
Documentation
Docs:
- Getting Started: a tutorial for writing your first omniparser schema.
- IDR: in-memory data representation of ingested data for omniparser.
- XPath Based Record Filtering and Data Extraction: xpath queries are essential to omniparser schema writing. Learn the concept and tricks in depth.
-
All About Transforms: everything about
transform_declarations
. -
Use of
custom_func
, Speciallyjavascript
: An in depth look of howcustom_func
is used, specially the all mightyjavascript
(andjavascript_with_context
). - CSV Schema in Depth: everything about schemas for CSV input.
- Fixed-Length Schema in Depth: everything about schemas for fixed-length (e.g. TXT) input
- JSON/XML Schema in Depth: everything about schemas for JSON or XML input.
- EDI Schema in Depth: everything about schemas for EDI input.
- Programmability: Advanced techniques for using omniparser (or some of its components) in your code.
References:
- Custom Functions: a complete reference of all built-in custom functions.
Examples:
- CSV Examples
- Fixed-Length Examples
- JSON Examples
- XML Examples.
- EDI Examples.
- Custom File Format
- Custom Funcs
In the example folders above you will find pairs of input files and their schema files. Then in the
.snapshots
sub directory, you'll find their corresponding output files.
Online Playground
Use https://omniparser.herokuapp.com/ (may need to wait for a few seconds for heroku instance to wake up) for trying out schemas and inputs, yours or existing samples, to see how ingestion and transform work.
Why
- No good ETL transform/parser library exists in Golang.
- Even looking into Java and other languages, choices aren't many and all have limitations:
- Many of the parsers/transforms don't support streaming read, loading entire input into memory - not acceptable in some situations.
Requirements
- Golang 1.14
Recent Major Feature Additions/Changes
- Added
Transform.RawRecord()
for caller of omniparser to access the raw ingested record. - Deprecated
custom_parse
in favor ofcustom_func
(custom_parse
is still usable for back-compatibility, it is just removed from all public docs and samples). - Added
NonValidatingReader
EDI segment reader. - Added fixed-length file format support in omniv21 handler.
- Added EDI file format support in omniv21 handler.
- Major restructure/refactoring
- Upgrade omni schema version to
omni.2.1
due a number of incompatible schema changes:-
'result_type'
->'type'
-
'ignore_error_and_return_empty_str
->'ignore_error'
-
'keep_leading_trailing_space'
->'no_trim'
-
- Changed how we handle custom functions: previously we always use strings as in param type as well as result param type. Not anymore, all types are supported for custom function in and out params.
- Changed the way how we package custom functions for extensions: previously we collect custom functions from all extensions and then pass all of them to the extension that is used; This feels weird, now changed to only the custom functions included in a particular extension are used in that extension.
- Deprecated/removed most of the custom functions in favor of using 'javascript'.
- A number of package renaming.
- Upgrade omni schema version to
- Added CSV file format support in omniv2 handler.
- Introduced IDR node cache for allocation recycling.
- Introduced IDR for in-memory data representation.
- Added trie based high performance
times.SmartParse
. - Command line interface (one-off
transform
cmd or long-running httpserver
mode). -
javascript
engine integration as a custom_func. - JSON stream parser.
- Extensibility:
- Ability to provide custom functions.
- Ability to provide custom schema handler.
- Ability to customize the built-in omniv2 schema handler's parsing code.
- Ability to provide a new file format support to built-in omniv2 schema handler.
Footnotes
Note that the project description data, including the texts, logos, images, and/or trademarks,
for each open source project belongs to its rightful owner.
If you wish to add or remove any projects, please contact us at [email protected].