All Projects → leandromoh → RecordParser

leandromoh / RecordParser

Licence: MIT license
Zero Allocation Writer/Reader Parser for .NET Core

Programming Languages

C#
18002 projects

Projects that are alternatives of or similar to RecordParser

Choetl
ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+140%)
Mutual labels:  csv, flat, reader
qwery
A SQL-like language for performing ETL transformations.
Stars: ✭ 28 (-81.94%)
Mutual labels:  tsv, csv, delimited
Flatfiles
Reads and writes CSV, fixed-length and other flat file formats with a focus on schema definition, configuration and speed.
Stars: ✭ 275 (+77.42%)
Mutual labels:  tsv, csv, mapper
Structured Text Tools
A list of command line tools for manipulating structured text data
Stars: ✭ 6,180 (+3887.1%)
Mutual labels:  tsv, csv
Swiftcsv
CSV parser for Swift
Stars: ✭ 511 (+229.68%)
Mutual labels:  tsv, csv
Csvtk
A cross-platform, efficient and practical CSV/TSV toolkit in Golang
Stars: ✭ 566 (+265.16%)
Mutual labels:  tsv, csv
Rainbow csv
🌈Rainbow CSV - Vim plugin: Highlight columns in CSV and TSV files and run queries in SQL-like language
Stars: ✭ 337 (+117.42%)
Mutual labels:  tsv, csv
Q
q - Run SQL directly on CSV or TSV files
Stars: ✭ 8,809 (+5583.23%)
Mutual labels:  tsv, csv
Pyexcel Io
One interface to read and write the data in various excel formats, import the data into and export the data from databases
Stars: ✭ 40 (-74.19%)
Mutual labels:  tsv, csv
Tsv Utils
eBay's TSV Utilities: Command line tools for large, tabular data files. Filtering, statistics, sampling, joins and more.
Stars: ✭ 1,215 (+683.87%)
Mutual labels:  tsv, csv
Intellij Csv Validator
CSV validator, highlighter and formatter plugin for JetBrains Intellij IDEA, PyCharm, WebStorm, ...
Stars: ✭ 198 (+27.74%)
Mutual labels:  tsv, csv
Vroom
Fast reading of delimited files
Stars: ✭ 462 (+198.06%)
Mutual labels:  tsv, csv
Pytablewriter
pytablewriter is a Python library to write a table in various formats: CSV / Elasticsearch / HTML / JavaScript / JSON / LaTeX / LDJSON / LTSV / Markdown / MediaWiki / NumPy / Excel / Pandas / Python / reStructuredText / SQLite / TOML / TSV.
Stars: ✭ 422 (+172.26%)
Mutual labels:  tsv, csv
Sqlitebiter
A CLI tool to convert CSV / Excel / HTML / JSON / Jupyter Notebook / LDJSON / LTSV / Markdown / SQLite / SSV / TSV / Google-Sheets to a SQLite database file.
Stars: ✭ 601 (+287.74%)
Mutual labels:  tsv, csv
Visidata
A terminal spreadsheet multitool for discovering and arranging data
Stars: ✭ 4,606 (+2871.61%)
Mutual labels:  tsv, csv
Faster Than Csv
Faster CSV on Python 3
Stars: ✭ 52 (-66.45%)
Mutual labels:  tsv, csv
Winmerge
WinMerge is an Open Source differencing and merging tool for Windows. WinMerge can compare both folders and files, presenting differences in a visual text format that is easy to understand and handle.
Stars: ✭ 2,358 (+1421.29%)
Mutual labels:  tsv, csv
Data Curator
Data Curator - share usable open data
Stars: ✭ 199 (+28.39%)
Mutual labels:  tsv, csv
Miller
Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON
Stars: ✭ 4,633 (+2889.03%)
Mutual labels:  tsv, csv
Sq
swiss-army knife for data
Stars: ✭ 275 (+77.42%)
Mutual labels:  tsv, csv

Nuget GitHub Workflow Status (branch) GitHub GitHub code size in bytes

RecordParser - Simple, Fast, GC friendly & Extensible

RecordParser is a expression tree based parser that helps you to write maintainable parsers with high-performance and zero allocations, thanks to Span type. It makes easier for developers to do parsing by automating non-relevant code, which allow you to focus on the essentials of mapping.

🏆 3rd place in The fastest CSV parser in .NET blog post

Even the focus of this library being data mapping to objects (classes or structs), it got an excellent result in the blog benchmark which tested how fast libraries can transform a CSV row into an array of strings. We got 3rd place by parsing a 1 million lines file in ~1.8 seconds.

RecordParser is a Zero Allocation Writer/Reader Parser for .NET Core

  1. It supports .NET Core 2.1, 3.1, 5.0, 6.0 and .NET Standard 2.1
  2. It has minimal heap allocations because it does intense use of Span type, a new .NET type designed to have high-performance and reduce memory allocations (see benchmark)
  3. It is even more performant because the relevant code is generated using expression trees, which once compiled is almost fast as handwriting code
  4. It supports to parse classes and structs types, without doing boxing
  5. It is flexible: you can choose the most convenient way to configure each of your parsers: indexed or sequential configuration
  6. It is extensible: you can totally customize your parsing with lambdas/delegates
  7. It is even more extensible because you can easily create extension methods that wraps custom mappings
  8. It is not intrusive: all mapping configuration is done outside of the mapped type. It keeps your classes with minimised dependencies and low coupling
  9. It provides clean API with familiar methods: Parse, TryParse and TryFormat
  10. It is easy configurated with a builder object, even programmatically, because does not require to define a class each time you want to define a parser
  11. Compliant with RFC 4180 standard

Benchmark

Libraries always say themselves have great perfomance, but how often they show you a benchmark comparing with other libraries? Check the benchmark page to see RecordParser comparisons. If you miss some, a PR is welcome.

Third Party Benchmarks

Currently there are parsers for 2 record formats:

  1. Fixed length, common in positional files, e.g. financial services, mainframe use, etc
  2. Variable length, common in delimited files, e.g. CSV, TSV files, etc

Fixed Length Reader

There are 2 flavors for mapping: indexed or sequential.

Indexed is useful when you want to map columns by its position: start/length.

[Fact]
public void Given_value_using_standard_format_should_parse_without_extra_configuration()
{
    var reader = new FixedLengthReaderBuilder<(string Name, DateTime Birthday, decimal Money)>()
        .Map(x => x.Name, startIndex: 0, length: 11)
        .Map(x => x.Birthday, 12, 10)
        .Map(x => x.Money, 23, 7)
        .Build();

    var result = reader.Parse("foo bar baz 2020.05.23 0123.45");

    result.Should().BeEquivalentTo((Name: "foo bar baz",
                                    Birthday: new DateTime(2020, 05, 23),
                                    Money: 123.45M));
}

Sequential is useful when you want to map columns by its order, so you just need specify the lengths.

[Fact]
public void Given_value_using_standard_format_should_parse_without_extra_configuration()
{
    var reader = new FixedLengthReaderSequentialBuilder<(string Name, DateTime Birthday, decimal Money)>()
        .Map(x => x.Name, length: 11)
        .Skip(1)
        .Map(x => x.Birthday, 10)
        .Skip(1)
        .Map(x => x.Money, 7)
        .Build();

    var result = reader.Parse("foo bar baz 2020.05.23 0123.45");

    result.Should().BeEquivalentTo((Name: "foo bar baz",
                                    Birthday: new DateTime(2020, 05, 23),
                                    Money: 123.45M));
}

Variable Length Reader

There are 2 flavors for mapping: indexed or sequential.

Indexed is useful when you want to map columns by its indexes.

[Fact]
public void Given_value_using_standard_format_should_parse_without_extra_configuration()
{
    var reader = new VariableLengthReaderBuilder<(string Name, DateTime Birthday, decimal Money, Color Color)>()
        .Map(x => x.Name, indexColumn: 0)
        .Map(x => x.Birthday, 1)
        .Map(x => x.Money, 2)
        .Map(x => x.Color, 3)
        .Build(";");
  
    var result = reader.Parse("foo bar baz ; 2020.05.23 ; 0123.45; LightBlue");
  
    result.Should().BeEquivalentTo((Name: "foo bar baz",
                                    Birthday: new DateTime(2020, 05, 23),
                                    Money: 123.45M,
                                    Color: Color.LightBlue));
}

Sequential is useful when you want to map columns by its order.

[Fact]
public void Given_ignored_columns_and_value_using_standard_format_should_parse_without_extra_configuration()
{
    var reader = new VariableLengthReaderSequentialBuilder<(string Name, DateTime Birthday, decimal Money)>()
        .Map(x => x.Name)
        .Skip(1)
        .Map(x => x.Birthday)
        .Skip(2)
        .Map(x => x.Money)
        .Build(";");
  
    var result = reader.Parse("foo bar baz ; IGNORE; 2020.05.23 ; IGNORE ; IGNORE ; 0123.45");
  
    result.Should().BeEquivalentTo((Name: "foo bar baz",
                                    Birthday: new DateTime(2020, 05, 23),
                                    Money: 123.45M));
}

Default Type Convert - Reader

You can define default converters for some type if you has a custom format.
The following example defines all decimals values will be divided by 100 before assigning,
furthermore all dates being parsed on ddMMyyyy format.
This feature is avaible for both fixed and variable length.

[Fact]
public void Given_types_with_custom_format_should_allow_define_default_parser_for_type()
{
    var reader = new FixedLengthReaderBuilder<(decimal Balance, DateTime Date, decimal Debit)>()
        .Map(x => x.Balance, 0, 12)
        .Map(x => x.Date, 13, 8)
        .Map(x => x.Debit, 22, 6)
        .DefaultTypeConvert(value => decimal.Parse(value) / 100)
        .DefaultTypeConvert(value => DateTime.ParseExact(value, "ddMMyyyy", null))
        .Build();

    var result = reader.Parse("012345678901 23052020 012345");

    result.Should().BeEquivalentTo((Balance: 0123456789.01M,
                                    Date: new DateTime(2020, 05, 23),
                                    Debit: 123.45M));
}

Custom Property Convert - Reader

You can define a custom converter for field/property.
Custom converters have priority case a default type convert is defined.
This feature is avaible for both fixed and variable length.

[Fact]
public void Given_members_with_custom_format_should_use_custom_parser()
{
    var reader = new VariableLengthReaderBuilder<(int Age, int MotherAge, int FatherAge)>()
        .Map(x => x.Age, 0)
        .Map(x => x.MotherAge, 1, value => int.Parse(value) + 3)
        .Map(x => x.FatherAge, 2)
        .Build(";");

    var result = reader.Parse(" 15 ; 40 ; 50 ");

    result.Should().BeEquivalentTo((Age: 15,
                                    MotherAge: 43,
                                    FatherAge: 50));
}

Nested Properties Mapping - Reader

Just like a regular property, you can also configure nested properties mapping.
The nested objects are created only if it was mapped, which avoids stack overflow problems.
This feature is avaible for both fixed and variable length.

[Fact]
public void Given_nested_mapped_property_should_create_nested_instance_to_parse()
{
    var reader = new VariableLengthReaderBuilder<Person>()
        .Map(x => x.BirthDay, 0)
        .Map(x => x.Name, 1)
        .Map(x => x.Mother.BirthDay, 2)
        .Map(x => x.Mother.Name, 3)
        .Build(";");

    var result = reader.Parse("2020.05.23 ; son name ; 1980.01.15 ; mother name");

    result.Should().BeEquivalentTo(new Person
    {
        BirthDay = new DateTime(2020, 05, 23),
        Name = "son name",
        Mother = new Person
        {
            BirthDay = new DateTime(1980, 01, 15),
            Name = "mother name",
        }
    });
}

Fixed Length Writer

There are 2 flavors for mapping: indexed or sequential.

Both indexed and sequential builders accept the following optional parameters in Map methods:

  • format
  • padding direction
  • padding character

Indexed is useful when you want to map columns by its position: start/length.

[Fact]
public void Given_value_using_standard_format_should_parse_without_extra_configuration()
{
    // Arrange

    var writer = new FixedLengthWriterBuilder<(string Name, DateTime Birthday, decimal Money)>()
        .Map(x => x.Name, startIndex: 0, length: 12)
        .Map(x => x.Birthday, 12, 11, "yyyy.MM.dd", paddingChar: ' ')
        .Map(x => x.Money, 23, 7, precision: 2)
        .Build();

    var instance = (Name: "foo bar baz",
                    Birthday: new DateTime(2020, 05, 23),
                    Money: 01234.567M);

    // create buffer with 50 positions, all set to white space by default
    Span<char> destination = Enumerable.Repeat(element: ' ', count: 50).ToArray();

    // Act

    var success = writer.TryFormat(instance, destination, out var charsWritten);

    // Assert

    success.Should().BeTrue();

    var result = destination.Slice(0, charsWritten);

    result.Should().Be("foo bar baz 2020.05.23 0123456");
}

Sequential is useful when you want to map columns by its order, so you just need specify the lengths.

[Fact]
public void Given_value_using_standard_format_should_parse_without_extra_configuration()
{
    // Arrange

    var writer = new FixedLengthWriterSequentialBuilder<(string Name, DateTime Birthday, decimal Money)>()
        .Map(x => x.Name, length: 11)
        .Skip(1)
        .Map(x => x.Birthday, 10, "yyyy.MM.dd")
        .Skip(1)
        .Map(x => x.Money, 7, precision: 2)
        .Build();

    var instance = (Name: "foo bar baz",
                    Birthday: new DateTime(2020, 05, 23),
                    Money: 01234.567M);

    // create buffer with 50 positions, all set to white space by default
    Span<char> destination = Enumerable.Repeat(element: ' ', count: 50).ToArray();

    // Act

    var success = writer.TryFormat(instance, destination, out var charsWritten);

    // Assert

    success.Should().BeTrue();

    var result = destination.Slice(0, charsWritten);

    result.Should().Be("foo bar baz 2020.05.23 0123456");
}

Variable Length Writer

There are 2 flavors for mapping: indexed or sequential.

Both indexed and sequential builders accept the format optional parameter in Map method.

Indexed is useful when you want to map columns by its indexes.

[Fact]
public void Given_value_using_standard_format_should_parse_without_extra_configuration()
{
    // Arrange 

    var writer = new VariableLengthWriterBuilder<(string Name, DateTime Birthday, decimal Money, Color Color)>()
        .Map(x => x.Name, indexColumn: 0)
        .Map(x => x.Birthday, 1, "yyyy.MM.dd")
        .Map(x => x.Money, 2)
        .Map(x => x.Color, 3)
        .Build(" ; ");

    var instance = ("foo bar baz", new DateTime(2020, 05, 23), 0123.45M, Color.LightBlue);

    Span<char> destination = new char[100];

    // Act

    var success = writer.TryFormat(instance, destination, out var charsWritten);

    // Assert

    success.Should().BeTrue();

    var result = destination.Slice(0, charsWritten);

    result.Should().Be("foo bar baz ; 2020.05.23 ; 123.45 ; LightBlue");
}

Sequential is useful when you want to map columns by its order.

[Fact]
public void Given_value_using_standard_format_should_parse_without_extra_configuration()
{
    // Arrange 

    var writer = new VariableLengthWriterSequentialBuilder<(string Name, DateTime Birthday, decimal Money)>()
        .Map(x => x.Name)
        .Skip(1)
        .Map(x => x.Birthday, "yyyy.MM.dd")
        .Map(x => x.Money)
        .Build(" ; ");

    var instance = ("foo bar baz", new DateTime(2020, 05, 23), 0123.45M);

    Span<char> destination = new char[100];

    // Act

    var success = writer.TryFormat(instance, destination, out var charsWritten);

    // Assert

    success.Should().BeTrue();

    var result = destination.Slice(0, charsWritten);

    result.Should().Be("foo bar baz ;  ; 2020.05.23 ; 123.45");
}

Default Type Convert - Writer

You can define default converters for some type if you has a custom format.
The following example defines all decimals values will be multiplied by 100 before writing (precision 2),
furthermore all dates being written on ddMMyyyy format.
This feature is avaible for both fixed and variable length.

[Fact]
public void Given_types_with_custom_format_should_allow_define_default_parser_for_type()
{
    // Arrange

    var writer = new FixedLengthWriterBuilder<(decimal Balance, DateTime Date, decimal Debit)>()
        .Map(x => x.Balance, 0, 12, padding: Padding.Left, paddingChar: '0')
        .Map(x => x.Date, 13, 8)
        .Map(x => x.Debit, 22, 6, padding: Padding.Left, paddingChar: '0')
        .DefaultTypeConvert<decimal>((span, value) => (((long)(value * 100)).TryFormat(span, out var written), written))
        .DefaultTypeConvert<DateTime>((span, value) => (value.TryFormat(span, out var written, "ddMMyyyy"), written))
        .Build();

    var instance = (Balance: 123456789.01M,
                    Date: new DateTime(2020, 05, 23),
                    Debit: 123.45M);

    // create buffer with 50 positions, all set to white space by default
    Span<char> destination = Enumerable.Repeat(element: ' ', count: 50).ToArray();

    // Act

    var success = writer.TryFormat(instance, destination, out var charsWritten);

    // Assert

    success.Should().BeTrue();

    var result = destination.Slice(0, charsWritten);

    result.Should().Be("012345678901 23052020 012345");
}

Custom Property Convert - Writer

You can define a custom converter for field/property.
Custom converters have priority case a default type convert is defined.
This feature is avaible for both fixed and variable length.

[Fact]
public void Given_specified_custom_parser_for_member_should_have_priority_over_custom_parser_for_type()
{
    // Assert

    var writer = new VariableLengthWriterBuilder<(int Age, int MotherAge, int FatherAge)>()
        .Map(x => x.Age, 0)
        .Map(x => x.MotherAge, 1, (span, value) => ((value + 2).TryFormat(span, out var written), written))
        .Map(x => x.FatherAge, 2)
        .Build(" ; ");

    var instance = (Age: 15,
                    MotherAge: 40,
                    FatherAge: 50);

    Span<char> destination = new char[50];

    // Act

    var success = writer.TryFormat(instance, destination, out var charsWritten);

    // Assert

    success.Should().BeTrue();

    var result = destination.Slice(0, charsWritten);

    result.Should().Be("15 ; 42 ; 50");
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].