All Projects → red0124 → Ssp

red0124 / Ssp

Licence: mit
C++ CSV parser

Programming Languages

cpp
1120 projects

Projects that are alternatives of or similar to Ssp

Docto
Simple command line utility for converting .doc & .xls files to any supported format such as Text, RTF, CSV or PDF
Stars: ✭ 220 (+633.33%)
Mutual labels:  csv, conversion
Length.js
📏 JavaScript library for length units conversion.
Stars: ✭ 292 (+873.33%)
Mutual labels:  parser, conversion
Pxi
🧚 pxi (pixie) is a small, fast, and magical command-line data processor similar to jq, mlr, and awk.
Stars: ✭ 248 (+726.67%)
Mutual labels:  csv, parser
Csv Parser
Fast, header-only, extensively tested, C++11 CSV parser
Stars: ✭ 90 (+200%)
Mutual labels:  csv, parser
Stream Parser
⚡ PHP7 / Laravel Multi-format Streaming Parser
Stars: ✭ 391 (+1203.33%)
Mutual labels:  csv, parser
Omniparser
omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
Stars: ✭ 148 (+393.33%)
Mutual labels:  csv, parser
Node Csv
Full featured CSV parser with simple api and tested against large datasets.
Stars: ✭ 3,068 (+10126.67%)
Mutual labels:  csv, parser
Dataclass factory
Modern way to convert python dataclasses or other objects to and from more common types like dicts or json-like structures
Stars: ✭ 116 (+286.67%)
Mutual labels:  deserialization, conversion
Choetl
ETL Framework for .NET / c# (Parser / Writer for CSV, Flat, Xml, JSON, Key-Value, Parquet, Yaml, Avro formatted files)
Stars: ✭ 372 (+1140%)
Mutual labels:  csv, parser
Csv Parser
A modern C++ library for reading, writing, and analyzing CSV (and similar) files.
Stars: ✭ 359 (+1096.67%)
Mutual labels:  csv, parser
Csvparser
C++ parser for CSV file format
Stars: ✭ 65 (+116.67%)
Mutual labels:  csv, parser
Structured Text Tools
A list of command line tools for manipulating structured text data
Stars: ✭ 6,180 (+20500%)
Mutual labels:  csv, conversion
Java
jsoniter (json-iterator) is fast and flexible JSON parser available in Java and Go
Stars: ✭ 1,308 (+4260%)
Mutual labels:  parser, deserialization
Node Csv Stringify
CSV stringifier implementing the Node.js `stream.Transform` API
Stars: ✭ 179 (+496.67%)
Mutual labels:  csv, parser
Go
A high-performance 100% compatible drop-in replacement of "encoding/json"
Stars: ✭ 10,248 (+34060%)
Mutual labels:  parser, deserialization
Sqlparser
Simple SQL parser meant for querying CSV files
Stars: ✭ 249 (+730%)
Mutual labels:  csv, parser
Gotenberg
A Docker-powered stateless API for PDF files.
Stars: ✭ 3,272 (+10806.67%)
Mutual labels:  conversion, csv
Js Quantities
JavaScript library for quantity calculation and unit conversion
Stars: ✭ 335 (+1016.67%)
Mutual labels:  parser, conversion
Swiftcsv
CSV parser for Swift
Stars: ✭ 511 (+1603.33%)
Mutual labels:  csv, parser
Node Csv Parse
CSV parsing implementing the Node.js `stream.Transform` API
Stars: ✭ 768 (+2460%)
Mutual labels:  csv, parser
   __________ ____ 
  / ___/ ___// __ \
  \__ \\__ \/ /_/ /
 ___/ /__/ / ____/ 
/____/____/_/      

License ubuntu-latest-gcc ubuntu-latest-clang ubuntu-latest-icc windows-msys2-gcc windows-msys2-clang

A header only "csv" parser which is fast and versatile with modern C++ api. Requires compiler with C++17 support. Can also be used to convert strings to specific types.

Conversion for floating point values invoked using fast-float .
Function traits taken from qt-creator .

Example

Lets say we have a csv file containing students in a given format '$name,$age,$grade' and we want to parse and print all the valid values:

$ cat students.csv
James Bailey,65,2.5
Brian S. Wolfe,40,1.9
Nathan Fielder,37,Really good grades
Bill (Heath) Gates,65,3.3
#include <iostream>
#include <ss/parser.hpp>

int main() {
    ss::parser p{"students.csv", ","};

    for(auto& [name, age, grade] : p.iterate<std::string, int, double>()) {
        if (p.valid()) {
            std::cout << name << ' ' << age << ' ' << grade << std::endl;
        }
    }

    return 0;
}

And if we compile and execute the program we get the following output:

$ ./a.out
James Bailey 65 2.5
Brian S. Wolfe 40 1.9
Bill (Heath) Gates 65 3.3

Features

Installation

$ git clone https://github.com/red0124/ssp
$ cd ssp
$ cmake --configure .
$ sudo make install

Note, this will also install the fast_float library
The library supports CMake and meson build systems

Usage

Conversions

An alternate loop to the example above would look like:

while(!p.eof()) {
    auto [name, age, grade] = p.get_next<std::string, int, double>();
    if (p.valid()) {
        std::cout << name << ' ' << age << ' ' << grade << std::endl;
    }
}

The alternate example will be used to show some of the features of the library. The get_next method returns a tuple of objects specified inside the template type list.

If a conversion could not be applied, the method would return a tuple of default constructed objects, and the valid method would return false, for example if the third (grade) column in our csv could not be converted to a double the conversion would fail.

If get_next is called with a tuple as template parameter it would behave identically to passing the same tuple parameters to get_next:

using student = std::tuple<std::string, int, double>;

// returns std::tuple<std::string, int, double>
auto [name, age, grade] = p.get_next<student>();

Note, it does not always return a student tuple since the returned tuples parameters may be altered as explained below (no void, no restrictions, ...)

Whole objects can be returned using the get_object function which takes the tuple, created in a similar way as get_next does it, and creates an object out of it:

struct student {
    std::string name;
    int age;
    double grade;
};
// returns student
auto student = p.get_object<student, std::string, int, double>();

This works with any object if the constructor could be invoked using the template arguments given to get_object:

// returns std::vector<std::string> containing 3 elements
auto vec = p.get_object<std::vector<std::string>, std::string, std::string, 
                        std::string>();

An iteration loop as in the first example which returns objects would look like:

for(auto& student : p.iterate_object<student, std::string, int, double>()) {
// ...
}

And finally, using something I personally like to do, a struct (class) with a tied method which returns a tuple of references to to the members of the struct.

struct student {
    std::string name;
    int age;
    double grade;

    auto tied() { return std::tie(name, age, grade); }
};

The method can be used to compare the object, serialize it, deserialize it, etc. Now get_next can accept such a struct and deduce the types to which to convert the csv.

// returns student
auto s = p.get_next<student>();

This works with the iteration loop too. Note, the order in which the members of the tied method are returned must match the order of the elements in the csv.

Setup

By default, many of the features supported by the parser are disabled. They can be enabled within the template parameters of the parser. For example, to enable quoting and escaping the parser would look like:

ss::parser<ss::quote<'"'>, ss::escape<'\\'>> p0{file_name};

The order of the defined setup parameters is not important:

// equivalent to p0
ss::parser<ss::escape<'\\'>, ss::quote<'"'>> p1{file_name};

The setup can also be predefined:

using my_setup = ss::setup<ss::escape<'\\'>, ss::quote<'"'>>;
// equivalent to p0 and p1
ss::parser<my_setup> p2{file_name};

Invalid setups will be met with static_asserts. Note, each setup parameter defined comes with a slight performance loss, so use them only if needed.

Quoting

Quoting can be enabled by defining ss::quote within the setup parameters. A single character can be defined as the quoting character, for example to use " as a quoting character:

ss::parser<ss::quote<'"'>> p{file_name};

Double quote can be used to escape a quote inside a quoted row.

"James ""Bailey""" -> 'James "Bailey"'

Unterminated quotes result in an error (if multiline is not enabled).

"James Bailey,65,2.5 -> error

Escaping

Escaping can be enabled by defining ss::escape within the setup parameters. Multiple character can be defined as escaping characters.It simply removes any special meaning of the character behind the escaped character, anything can be escaped. For example to use \ as an escaping character:

ss::parser<ss::escape<'\\'>> p{file_name};

Double escape can be used to escape an escape.

James \\Bailey -> 'James \Bailey'

Unterminated escapes result in an error.

James Bailey,65,2.5\\0 -> error

Its usage has more impact when used with quoting or spacing:

"James \"Bailey\"" -> 'James "Bailey"'

Spacing

Spacing can be enabled by defining ss::trim , ss::trim_left or ss::trim_right within the setup parameters. Multiple character can be defined as spacing characters, for example to use ' ' as an spacing character ss::trim<' '> needs to be defined. It removes any space from both sides of the row. To trim only the right side ss::trim_right can be used, and intuitively ss::trim_left to trim only the left side. If ss::trim is enabled, those lines would have an equivalent output:

James Bailey,65,2.5
  James Bailey  ,65,2.5
James Bailey,  65,    2.5   

Escaping and quoting can be used to leave the space if needed.

" James Bailey " -> ' James Bailey '
  " James Bailey "   -> ' James Bailey '
\ James Bailey\  -> ' James Bailey '
  \ James Bailey\    -> ' James Bailey '
"\ James Bailey\ " -> ' James Bailey '

Multiline

Multiline can be enabled by defining ss::multilne within the setup parameters. It enables the possibility to have the new line characters within rows. The new line character needs to be either escaped or within quotes so either ss::escape or ss::quote need to be enabled. There is a specific problem when using multiline, for example, if a row had an unterminated quote, the parser would assume it to be a new line within the row, so until another quote is found, it will treat it as one line which is fine usually, but it can cause the whole csv file to be treated as a single line by mistake. To prevent this ss::multiline_restricted can be used which accepts an unsigned number representing the maximum number of lines which can be allowed as a single multiline. Examples:

ss::parser<ss::multiline, ss::quote<'\"'>, ss::escape<'\\'>> p{file_name};
"James\n\n\nBailey" -> 'James\n\n\nBailey'
James\\n\\n\\nBailey -> 'James\n\n\nBailey'
"James\n\n\n\n\nBailey" -> 'James\n\n\n\n\nBailey'
ss::parser<ss::multiline_restricted<4>, ss::quote<'\"'>, ss::escape<'\\'>> p{file_name};
"James\n\n\nBailey" -> 'James\n\n\nBailey'
James\\n\\n\\nBailey -> 'James\n\n\nBailey'
"James\n\n\n\n\nBailey" -> error

Example

An example with a more complicated setup:

ss::parser<ss::escape<'\\'>, 
           ss::quote<'"'>,
           ss::trim<' ', '\t'>,
           ss::multiline_restricted<5>> p{file_name};

while(!p.eof()) {
    auto [name, age, grade] = p.get_next<std::string, int, double>();
    if(!p.valid()) {
        continue;
    }
    std::cout << "'" << name << ' ' << age << ' ' << grade << "'" << std::endl;
}

input:

      "James Bailey"   ,  65  ,     2.5\t\t\t
\t \t Brian S. Wolfe, "40" ,  "\1.9"
   "\"Nathan Fielder"""   ,  37  ,   Really good grades
"Bill
\"Heath""
Gates",65,   3.3

output:

'James Bailey 65 2.5'
'Brian S. Wolfe 40 1.9'
'Bill
"Heath"
Gates 65 3.3'

Special types

Passing void makes the parser ignore a column. In the given example void could be given as the second template parameter to ignore the second (age) column in the csv, a tuple of only 2 parameters would be retuned:

// returns std::tuple<std::string, double>
auto [name, grade] = p.get_next<std::string, void, double>();

Works with different types of conversions too:

using student = std::tuple<std::string, void, double>;

// returns std::tuple<std::string, double>
auto [name, grade] = p.get_next<student>();

To ignore a whole row, ignore_next could be used, returns false if eof:

bool parser::ignore_next();

std::optional could be passed if we wanted the conversion to proceed in the case of a failure returning std::nullopt for the specified column:

// returns std::tuple<std::string, int, std::optional<double>>
auto [name, age, grade] = p.get_next<std::string, int, std::optional<double>();
if(grade) {
    // do something with grade
}

Similar to std::optional, std::variant could be used to try other conversions if the previous failed (Note, conversion to std::string will always pass):

// returns std::tuple<std::string, int, std::variant<double, char>>
auto [name, age, grade] = 
    p.get_next<std::string, int, std::variant<double, char>();
if(std::holds_alternative<double>(grade)) {
    // grade set as double
} else if(std::holds_alternative<char>(grade)) {
    // grade set as char
}

Restrictions

Custom restrictions can be used to narrow down the conversions of unwanted values. ss::ir (in range) and ss::ne (none empty) are one of those:

// ss::ne makes sure that the name is not empty
// ss::ir makes sure that the grade will be in range [0, 10]
// returns std::tuple<std::string, int, double>
auto [name, age, grade] = 
    p.get_next<ss::ne<std::string>, int, ss::ir<double, 0, 10>>();

If the restrictions are not met, the conversion will fail. Other predefined restrictions are ss::ax (all except), ss::nx (none except) and ss::oor (out of range), ss::lt (less than), ...(see restrictions.hpp):

// all ints exept 10 and 20
ss::ax<int, 10, 20>
// only 10 and 20
ss::nx<int, 10, 20>
// all values except the range [0, 10]
ss::oor<int, 0, 10>

To define a restriction, a class/struct needs to be made which has a ss_valid method which returns a bool and accepts one object. The type of the conversion will be the same as the type of the passed object within ss_valid and not the restriction itself. Optionally, an error method can be made to describe the invalid conversion.

template <typename T>
struct even {
    bool ss_valid(const T& value) const {
        return value % 2 == 0;
    }

    // optional
    const char* error() const {
        return "number not even";
    }
};
// only even numbers will pass
// returns std::tuple<std::string, int>
auto [name, age] = p.get_next<std::string, even<int>, void>();

Custom conversions

Custom types can be used when converting values. A specialization of the ss::extract function needs to be made and you are good to go. A custom conversion for an enum would look like this:

enum class shape { circle, square, rectangle, triangle };

template <>
inline bool ss::extract(const char* begin, const char* end, shape& dst) {
    const static std::unordered_map<std::string, shape>
        shapes{{"circle", shape::circle},
               {"square", shape::square},
               {"rectangle", shape::rectangle},
               {"triangle", shape::triangle}};

    if (auto it = shapes.find(std::string(begin, end)); it != shapes.end()) {
        dst = it->second;
        return true;
    }
    return false;
}

The shape enum will be used in an example below. The inline is there just to prevent multiple definition errors. The function returns true if the conversion was a success, and false otherwise. The function uses const char* begin and end for performance reasons.

Error handling

Detailed error messages can be accessed via the error_msg method, and to enable them ss::string_error needs to be included in the setup. If ss::string_error is not defined, the error_msg method will not be defined either.

const std::string& parser::error_msg();
bool parser::valid();
bool parser::eof();

// ...
ss::parser<ss::string_error> parser;

An error can be detected using the valid method which would return false if the file could not be opened, or if the conversion could not be made (invalid types, invalid number of columns, ...). The eof method can be used to detect if the end of the file was reached.

Substitute conversions

The parser can also be used to effectively parse files whose rows are not always in the same format (not a classical csv but still csv-like). A more complicated example would be the best way to demonstrate such a scenario.

Supposing we have a file containing different shapes in given formats:

  • circle RADIUS
  • square SIDE
  • rectangle SIDE_A SIDE_B
  • triangle SIDE_A SIDE_B SIDE_C
rectangle 2 3
circle 10
triangle 3 4 5
...

The delimiter is " ", and the number of columns varies depending on which shape it is. We are required to read the file and to store information (shape and area) of the shapes into a data structure in the same order as they are in the file.

ss::parser p{"shapes.txt", " "};
if (!p.valid()) {
    std::cout << p.error_msg() << std::endl;
    exit(EXIT_FAILURE);
}

std::vector<std::pair<shape, double>> shapes;

while (!p.eof()) {
    // non negative double
    using udbl = ss::gte<double, 0>;

    auto [circle_or_square, rectangle, triangle] =
        p.try_next<ss::nx<shape, shape::circle, shape::square>, udbl>()
            .or_else<ss::nx<shape, shape::rectangle>, udbl, udbl>()
            .or_else<ss::nx<shape, shape::triangle>, udbl, udbl, udbl>()
            .values();

    if (circle_or_square) {
        auto& [s, x] = circle_or_square.value();
        double area = (s == shape::circle) ? x * x * M_PI : x * x;
        shapes.emplace_back(s, area);
    }

    if (rectangle) {
        auto& [s, a, b] = rectangle.value();
        shapes.emplace_back(s, a * b);
    }

    if (triangle) {
        auto& [s, a, b, c] = triangle.value();
        double sh = (a + b + c) / 2;
        if (sh >= a && sh >= b && sh >= c) {
            double area = sqrt(sh * (sh - a) * (sh - b) * (sh - c));
            shapes.emplace_back(s, area);
        }
    }
}

/* do something with the stored shapes */
/* ... */

It is quite hard to make an error this way since most things will be checked at compile time.

The try_next method works in a similar way as get_next but returns a composit which holds a tuple with an optional to the tuple returned by get_next. This composite has an or_else method (looks a bit like tl::expected) which is able to try additional conversions if the previous failed. or_else also returns a composite, but in its tuple is the optional to the tuple of the previous conversions and an optional to the tuple of the new conversion. (sounds more complicated than it is.

To fetch the tuple from the composite the values method is used. The value of the above used conversion would look something like this:

std::tuple<
    std::optional<std::tuple<shape, double>>,
    std::optional<std::tuple<shape, double, double>>,
    std::optional<std::tuple<shape, double, double, double>>
    >

Similar to the way that get_next has a get_object alternative, try_next has a try_object alternative, and or_else has a or_object alternative. Also all rules applied to get_next also work with try_next , or_else, and all the other composite conversions.

Each of those composite conversions can accept a lambda (or anything callable) as an argument and invoke it in case of a valid conversion. That lambda itself need not have any arguments, but if it does, it must either accept the whole tuple/object as one argument or all the elements of the tuple separately. If the lambda returns something that can be interpreted as false the conversion will fail, and the next conversion will try to apply. Rewriting the whole while loop using lambdas would look like this:

// non negative double
using udbl = ss::gte<double, 0>;

p.try_next<ss::nx<shape, shape::circle, shape::square>, udbl>(
     [&](const auto& data) {
         const auto& [s, x] = data;
         double area = (s == shape::circle) ? x * x * M_PI : x * x;
         shapes.emplace_back(s, area);
     })
    .or_else<ss::nx<shape, shape::rectangle>, udbl, udbl>(
        [&](const shape s, const double a, const double b) {
            shapes.emplace_back(s, a * b);
        })
    .or_else<ss::nx<shape, shape::triangle>, udbl, udbl, udbl>(
        [&](auto&& s, auto& a, const double& b, double& c) {
            double sh = (a + b + c) / 2;
            if (sh >= a && sh >= b && sh >= c) {
                double area = sqrt(sh * (sh - a) * (sh - b) * (sh - c));
                shapes.emplace_back(s, area);
            }
        });

It is a bit less readable, but it removes the need to check which conversion was invoked. The composite also has an on_error method which accepts a lambda which will be invoked if no previous conversions were successful. The lambda can take no arguments or just one argument, an std::string, in which the error message is stored if string_error is enabled:

p.try_next<int>()
    .on_error([](const std::string& e) { /* int conversion failed */ })
    .or_object<x, double>()
    .on_error([] { /* int and x (all) conversions failed */ });

See unit tests for more examples.

Rest of the library

First of all, type_traits.hpp and function_traits.hpp contain many handy traits used in the parser. Most of them are operating on tuples of elements and can be utilized in projects.

The converter

ss::parser is used to manipulate on files. It has a builtin file reader, but the conversions themselves are done using the ss::converter.

To convert a string the convert method can be used. It accepts a c-string as input and a delimiter, as std::string, and retruns a tuple of objects in the same way get_next does it for the parser. A whole object can be returned too using the convert_object method, again in an identical way get_object doest it for the parser.

ss::converter c;

auto [x, y, z] = c.convert<int, double, char>("10::2.2::3", "::");
if (c.valid()) {
    // do something with x y z
}

auto s = c.convert_object<student, std::string, int, double>("name,20,10", ",");
if (c.valid()) {
    // do something with s
}

All setup parameters, special types and restrictions work on the converter too.
Error handling is also identical to error handling of the parser.

The converter has also the ability to just split the line, tho it does not change it (kinda statically), hence the name of the library and depending if either quoting or escaping are enabled it may change the line, rather than creating a copy, for performance reasons (the name of the library does not apply anymore, I may change it). It returns an std::vector of pairs of pointers, begin and end, each pair representing a split segment (column) of the whole string. The vector can then be used in a overloaded convert method. This allows the reuse of the same line without splitting it on every conversion.

ss::converter c;
auto split_line = c.split("circle 10", " ");
auto [s, r] = c.convert<shape, int>(split_line);

Using the converter is also an easy and fast way to convert single values.

ss::converter c;
std::string s;
std::cin >> s;
int num = c.convert<int>(s.c_str());

The same setup parameters also apply for the converter, tho multiline has not impact on it. Since escaping and quoting potentially modify the content of the given line, a converter which has those setup parameters defined does not have the same convert method, the input line cannot be const.

Using as a project dependency

CMake

If the repository is cloned within the CMake project, it can be added in the following way:

add_subdirectory(ssp)

Alternatively, it can be fetched from the repository:

include(FetchContent)
FetchContent_Declare(
  ssp
  GIT_REPOSITORY https://github.com/red0124/ssp.git
  GIT_TAG origin/master
  GIT_SHALLOW TRUE)

FetchContent_MakeAvailable(ssp)

Either way, after you prepare the target, you just have to invoke it in your project:

target_link_libraries(project PUBLIC ssp fast_float)

Meson

Create an ssp.wrap file in your subprojects directory with the following content:

[wrap-git]
url = https://github.com/red0124/ssp
revision = origin/master

Then simply fetch the dependency and it is ready to be used:

ssp_sub = subproject('ssp')
ssp_dep = ssp_sub.get_variable('ssp_dep')
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].