All Projects → uhop → Stream Json

uhop / Stream Json

Licence: other
The micro-library of Node.js stream components for creating custom JSON processing pipelines with a minimal memory footprint. It can parse JSON files far exceeding available memory streaming individual primitives using a SAX-inspired API.

Programming Languages

javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to Stream Json

Vector
A reliable, high-performance tool for building observability data pipelines.
Stars: ✭ 8,736 (+1790.91%)
Mutual labels:  parser, stream-processing
Tinyrb
A tiny subset of Ruby with a Lua'esc VM
Stars: ✭ 452 (-2.16%)
Mutual labels:  parser
Jsonparser
One of the fastest alternative JSON parser for Go that does not require schema
Stars: ✭ 4,323 (+835.71%)
Mutual labels:  parser
Picofeed
PHP library to parse and write RSS/Atom feeds
Stars: ✭ 439 (-4.98%)
Mutual labels:  parser
Javalang
Pure Python Java parser and tools
Stars: ✭ 408 (-11.69%)
Mutual labels:  parser
Ksql
The database purpose-built for stream processing applications.
Stars: ✭ 4,668 (+910.39%)
Mutual labels:  stream-processing
Php Parser
🌿 NodeJS PHP Parser - extract AST or tokens (PHP5 and PHP7)
Stars: ✭ 400 (-13.42%)
Mutual labels:  parser
Hazelcast
Open-source distributed computation and storage platform
Stars: ✭ 4,662 (+909.09%)
Mutual labels:  stream-processing
Exifr
📷 The fastest and most versatile JS EXIF reading library.
Stars: ✭ 448 (-3.03%)
Mutual labels:  parser
Tiny Compiler
A tiny compiler for a language featuring LL(2) with Lexer, Parser, ASM-like codegen and VM. Complex enough to give you a flavour of how the "real" thing works whilst not being a mere toy example
Stars: ✭ 425 (-8.01%)
Mutual labels:  parser
Seafox
A blazing fast 100% spec compliant, self-hosted javascript parser written in Typescript
Stars: ✭ 425 (-8.01%)
Mutual labels:  parser
Dev Blog
翻译、开发心得或学习笔记
Stars: ✭ 3,929 (+750.43%)
Mutual labels:  parser
Mwparserfromhell
A Python parser for MediaWiki wikicode
Stars: ✭ 440 (-4.76%)
Mutual labels:  parser
Crossplane
Quick and reliable way to convert NGINX configurations into JSON and back.
Stars: ✭ 407 (-11.9%)
Mutual labels:  parser
Form
🚂 Decodes url.Values into Go value(s) and Encodes Go value(s) into url.Values. Dual Array and Full map support.
Stars: ✭ 454 (-1.73%)
Mutual labels:  parser
Tomlplusplus
Header-only TOML config file parser and serializer for C++17 (and later!).
Stars: ✭ 403 (-12.77%)
Mutual labels:  parser
Binary Parser
Blazing-fast declarative parser builder for binary data
Stars: ✭ 422 (-8.66%)
Mutual labels:  parser
Anystyle
Fast and smart citation reference parsing
Stars: ✭ 438 (-5.19%)
Mutual labels:  parser
Compiler
The Hoa\Compiler library.
Stars: ✭ 458 (-0.87%)
Mutual labels:  parser
Minigo
minigo🐥is a small Go compiler made from scratch. It can compile itself.
Stars: ✭ 456 (-1.3%)
Mutual labels:  parser

stream-json NPM version

stream-json is a micro-library of node.js stream components with minimal dependencies for creating custom data processors oriented on processing huge JSON files while requiring a minimal memory footprint. It can parse JSON files far exceeding available memory. Even individual primitive data items (keys, strings, and numbers) can be streamed piece-wise. Streaming SAX-inspired event-based API is included as well.

Available components:

  • Streaming JSON Parser.
    • It produces a SAX-like token stream.
    • Optionally it can pack keys, strings, and numbers (controlled separately).
    • The main module provides helpers to create a parser.
  • Filters to edit a token stream:
    • Pick selects desired objects.
      • It can produces multiple top-level objects just like in JSON Streaming protocol.
      • Don't forget to use StreamValues when picking several subobjects!
    • Replace substitutes objects with a replacement.
    • Ignore removes objects.
    • Filter filters tokens maintaining stream's validity.
  • Streamers to produce a stream of JavaScript objects.
    • StreamValues can handle a stream of JSON objects.
      • Useful to stream objects selected by Pick, or generated by other means.
      • It supports JSON Streaming protocol, where individual values are separated semantically (like in "{}[]"), or with white spaces (like in "true 1 null").
    • StreamArray takes an array of objects and produces a stream of its components.
      • It streams array components individually taking care of assembling them automatically.
      • Created initially to deal with JSON files similar to Django-produced database dumps.
      • Only one top-level array per stream is valid!
    • StreamObject takes an object and produces a stream of its top-level properties.
      • Only one top-level object per stream is valid!
  • Essentials:
    • Assembler interprets a token stream creating JavaScript objects.
    • Disassembler produces a token stream from JavaScript objects.
    • Stringer converts a token stream back into a JSON text stream.
    • Emitter reads a token stream and emits each token as an event.
      • It can greatly simplify data processing.
  • Utilities:
    • emit() makes any stream component to emit tokens as events.
    • withParser() helps to create stream components with a parser.
    • Batch batches items into arrays to simplify their processing.
    • Verifier reads a stream and verifies that it is a valid JSON.
    • Utf8Stream sanitizes multibyte utf8 text input.
  • Special helpers:
    • JSONL AKA JSON Lines:
      • jsonl/Parser parses a JSONL file producing objects similar to StreamValues.
        • Useful when we know that individual items can fit in memory.
        • Generally it is faster than the equivalent combination of Parser({jsonStreaming: true}) + StreamValues.
      • jsonl/Stringer produces a JSONL file from a stream of JavaScript objects.
        • Generally it is faster than the equivalent combination of Disassembler + Stringer.

All components are meant to be building blocks to create flexible custom data processing pipelines. They can be extended and/or combined with custom code. They can be used together with stream-chain to simplify data processing.

This toolkit is distributed under New BSD license.

Introduction

const {chain}  = require('stream-chain');

const {parser} = require('stream-json');
const {pick}   = require('stream-json/filters/Pick');
const {ignore} = require('stream-json/filters/Ignore');
const {streamValues} = require('stream-json/streamers/StreamValues');

const fs   = require('fs');
const zlib = require('zlib');

const pipeline = chain([
  fs.createReadStream('sample.json.gz'),
  zlib.createGunzip(),
  parser(),
  pick({filter: 'data'}),
  ignore({filter: /\b_meta\b/i}),
  streamValues(),
  data => {
    const value = data.value;
    // keep data only for the accounting department
    return value && value.department === 'accounting' ? data : null;
  }
]);

let counter = 0;
pipeline.on('data', () => ++counter);
pipeline.on('end', () =>
  console.log(`The accounting department has ${counter} employees.`));

See the full documentation in Wiki.

Companion projects:

  • stream-csv-as-json streams huge CSV files in a format compatible with stream-json: rows as arrays of string values. If a header row is used, it can stream rows as objects with named fields.

Installation

npm install --save stream-json
# or: yarn add stream-json

Use

The whole library is organized as a set of small components, which can be combined to produce the most effective pipeline. All components are based on node.js streams, and events. They implement all required standard APIs. It is easy to add your own components to solve your unique tasks.

The code of all components is compact and simple. Please take a look at their source code to see how things are implemented, so you can produce your own components in no time.

Obviously, if a bug is found, or a way to simplify existing components, or new generic components are created, which can be reused in a variety of projects, don't hesitate to open a ticket, and/or create a pull request.

Release History

  • 1.7.1 minor bugfix and improved error reporting.
  • 1.7.0 added utils/Utf8Stream to sanitize utf8 input, all parsers support it automatically. Thx john30 for the suggestion.
  • 1.6.1 the technical release, no need to upgrade.
  • 1.6.0 added jsonl/Parser and jsonl/Stringer.
  • 1.5.0 Disassembler and streamers now follow JSON.stringify() and JSON.parse() protocols respectively including replacer and reviver.
  • 1.4.1 bugfix: Stringer with makeArray should produce empty array if no input.
  • 1.4.0 added makeArray functionality to Stringer. Thx all who asked for it!
  • 1.3.3 bugfix: very large/infinite streams with garbage didn't fail. Thx Arne Marschall!
  • 1.3.2 bugfix: filters could fail with packed-only token streams. Thx Trey Brisbane!
  • 1.3.1 bugfix: reverted the last bugfix in Verifier, a bugfix in tests, thx Guillermo Ares.
  • 1.3.0 added Batch, a bugfix in Verifier.
  • 1.2.1 the technical release.
  • 1.2.0 added Verifier.
  • 1.1.4 fixed Filter going haywire, thx @codebling!
  • 1.1.3 fixed Parser streaming numbers when shouldn't, thx Grzegorz Lachowski!
  • 1.1.2 fixed Stringer not escaping some symbols, thx Pavel Bardov!
  • 1.1.1 minor updates in docs and comments.
  • 1.1.0 added Disassembler.
  • 1.0.3 minor tweaks, added TypeScript typings and the badge.
  • 1.0.2 minor tweaks, documentation improvements.
  • 1.0.1 reorg to fix export problems.
  • 1.0.0 the first 1.0 release.
  • 0.6.1 the technical release.
  • 0.6.0 added Stringer to convert event streams back to JSON.
  • 0.5.3 bug fix to allow empty JSON Streaming.
  • 0.5.2 bug fixes in Filter.
  • 0.5.1 corrected README.
  • 0.5.0 added support for JSON Streaming.
  • 0.4.2 refreshed dependencies.
  • 0.4.1 added StreamObject by Sam Noedel.
  • 0.4.0 new high-performant Combo component, switched to the previous parser.
  • 0.3.0 new even faster parser, bug fixes.
  • 0.2.2 refreshed dependencies.
  • 0.2.1 added utilities to filter objects on the fly.
  • 0.2.0 new faster parser, formal unit tests, added utilities to assemble objects on the fly.
  • 0.1.0 bug fixes, more documentation.
  • 0.0.5 bug fixes.
  • 0.0.4 improved grammar.
  • 0.0.3 the technical release.
  • 0.0.2 bug fixes.
  • 0.0.1 the initial release.
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].