All Projects → nilobarp → text2json

nilobarp / text2json

Licence: MIT License
Performant parser for textual data (CSV parser)

Programming Languages

typescript
32286 projects
javascript
184084 projects - #8 most used programming language

Projects that are alternatives of or similar to text2json

tabular-stream
Detects tabular data (spreadsheets, dsv or json, 20+ different formats) and emits normalized objects.
Stars: ✭ 34 (+3.03%)
Mutual labels:  csv, stream
Fast Csv
CSV parser and formatter for node
Stars: ✭ 1,054 (+3093.94%)
Mutual labels:  csv, stream
Csv
CSV Decoding and Encoding for Elixir
Stars: ✭ 398 (+1106.06%)
Mutual labels:  csv, stream
Csv Stream
📃 Streaming CSV Parser for Node. Small and made entirely out of streams.
Stars: ✭ 98 (+196.97%)
Mutual labels:  csv, stream
Csvbuilder
Easily encode complex JSON objects to CSV with CsvBuilder's schema-like API
Stars: ✭ 128 (+287.88%)
Mutual labels:  csv, stream
React Papaparse
react-papaparse is the fastest in-browser CSV (or delimited text) parser for React. It is full of useful features such as CSVReader, CSVDownloader, readString, jsonToCSV, readRemoteFile, ... etc.
Stars: ✭ 116 (+251.52%)
Mutual labels:  csv, stream
Iostreams
IOStreams is an incredibly powerful streaming library that makes changes to file formats, compression, encryption, or storage mechanism transparent to the application.
Stars: ✭ 84 (+154.55%)
Mutual labels:  csv, stream
eec
A fast and lower memory excel write/read tool.一个非POI底层,支持流式处理的高效且超低内存的Excel读写工具
Stars: ✭ 93 (+181.82%)
Mutual labels:  csv, stream
pcap-processor
Read and process pcap files using this nifty tool
Stars: ✭ 36 (+9.09%)
Mutual labels:  csv, stream
data-models
Collection of various biomedical data models in parseable formats.
Stars: ✭ 23 (-30.3%)
Mutual labels:  csv
django-csv-export-view
Django class-based view for CSV exports
Stars: ✭ 17 (-48.48%)
Mutual labels:  csv
sample
Produce a sample of lines from files.
Stars: ✭ 17 (-48.48%)
Mutual labels:  stream
livego
直播服务器 hls stream online RTMP AMF HLS HTTP-FLV
Stars: ✭ 30 (-9.09%)
Mutual labels:  stream
civ
A simple CSV interactive viewer written in Go
Stars: ✭ 23 (-30.3%)
Mutual labels:  csv
AndrOBD-Plugin
AndrOBD plugin development project
Stars: ✭ 38 (+15.15%)
Mutual labels:  csv
brain-brew
Automated Anki flashcard creation and extraction to/from Csv
Stars: ✭ 55 (+66.67%)
Mutual labels:  csv
videowall
Video wall with multiple tiles that enables synchronized video playback, mirrored or tiled.
Stars: ✭ 57 (+72.73%)
Mutual labels:  stream
YouPlot
A command line tool that draw plots on the terminal.
Stars: ✭ 412 (+1148.48%)
Mutual labels:  csv
destroy
destroy a stream if possible
Stars: ✭ 51 (+54.55%)
Mutual labels:  stream
streamdelay
A delay + dump button for live streams, allowing screening and redaction of explict content.
Stars: ✭ 31 (-6.06%)
Mutual labels:  stream

text2json

Build Status npm Coverage Status Downloads

Performant parser for textual data

  • Parse 100K rows in ~550 ms (may vary with data)
  • Very low memory footprint (~10 MB)
  • Supports parsing from file, string or buffers
  • Supports streaming output
  • Passes CSV Acid Test suite csv-spectrum

Parsing the following bit of data

id,firstName,lastName,jobTitle
1,Jed,Hoppe,Customer Markets Supervisor
2,Cristian,Miller,Principal Division Specialist
3,Kenyatta,Schimmel,Product Implementation Executive

will produce

[ { id: '1',
    firstName: 'Jed',
    lastName: 'Hoppe',
    jobTitle: 'Customer Markets Supervisor' },
  { id: '2',
    firstName: 'Cristian',
    lastName: 'Miller',
    jobTitle: 'Principal Division Specialist' },
  { id: '3',
    firstName: 'Kenyatta',
    lastName: 'Schimmel',
    jobTitle: 'Product Implementation Executive' } ]

Usage

Installation

npm install text2json --save

Quick start

  • Parse the entire file into JSON
'use strict'

let Parser = require('text2json').Parser
let rawdata = './data/file_100.txt'

let parse = new Parser({hasHeader : true})

parse.text2json (rawdata, (err, data) => {
  if (err) {
    console.error (err)
  } else {
    console.log(data)
  }
})
  • If parsing a large file, stream the output
'use strict'

let Parser = require('text2json').Parser
let rawdata = './data/file_500000.txt'

let parse = new Parser({hasHeader : true})

parse.text2json (rawdata)
   .on('error', (err) => {
     console.error (err)
   })
   .on('headers', (headers) => {
     console.log(headers)
   })
   .on('row', (row) => {
     console.log(row)
   })
   .on('end', () => {
     console.log('Done')
   })

Options

The parser accepts following options through its constructor (all are optional)

{
  hasHeader?: boolean,
  headers?: string[],
  newline?: string,
  separator?: string,
  quote?: string,
  encoding?: string,
  skipRows?: number,
  filters?: Filters,
  headersOnly?: boolean
}
  • hasHeader
    • If true, first line is treated as header row.
    • Defaults to false.
  • headers
    • An array of strings to be used as headers.
    • If specified, overrides header row in data.
    • Defaults is an empty array
  • newline
    • Choose between Unix and Windows line endings (\n or \r\n)
    • Defaults to \n
  • separator
    • Specify column separator
    • Defaults is , (comma)
  • quote
    • Specify quote character
    • Default is " (double quotes)
  • encoding (see https://nodejs.org/api/buffer.html#buffer_buffers_and_character_encodings)
    • Use a different encoding while parsing
    • Defaults to utf-8
  • skipRows
    • Number of rows to skip from top (excluding header)
    • Default is zero rows
  • filters
    • Filter columns based on index or header name
  • headersOnly
    • Parse only header row

Header fill

If hasHeader is false and custom headers are not specified, parser will generate headers using a zero based index of the columns. i.e. when data has 5 columns, generated headers will be ['_1', '_2', '_3', '_4', '_5']

Header fill will also occur when number of headers given in custom headers array is less than the actual numbers of columns in the data.

Events

  • headers - emitted after parsing the header row or once header fill has completed. The payload contains an array of header names.
  • row - emitted once for every row parsed. Payload is an object with properties corresponding to the header row.
  • error - emitted once for the first error encountered. Payload is an Error object with an indicative description of the problem.
  • end - emitted once, when the parser is done parsing. No payload is provided with this event.

Roadmap

  • Return columns selectively (either by column index or header name)
  • Ignore header row in data and use custom header names provided in options
  • Skip rows (start parsing from a given row number)
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].