All Projects → J-F-Liu → Pom

J-F-Liu / Pom

Licence: mit
PEG parser combinators using operator overloading without macros.

Programming Languages

rust
11053 projects

Projects that are alternatives of or similar to Pom

ParsecSharp
The faster monadic parser combinator library for C#
Stars: ✭ 23 (-92.58%)
Mutual labels:  parsing, parser-combinators, peg
Pegtl
Parsing Expression Grammar Template Library
Stars: ✭ 1,295 (+317.74%)
Mutual labels:  parser-combinators, peg, parsing
Lug
Parsing expression grammar (PEG) embedded domain specific language and parsing machine for C++17
Stars: ✭ 44 (-85.81%)
Mutual labels:  parser-combinators, peg, parsing
metal
A Java library for parsing binary data formats, using declarative descriptions.
Stars: ✭ 13 (-95.81%)
Mutual labels:  parsing, parser-combinators
Syntax
Write value-driven parsers quickly in Swift with an intuitive SwiftUI-like DSL
Stars: ✭ 134 (-56.77%)
Mutual labels:  parsing, parser-combinators
parson
Yet another PEG parser combinator library and DSL
Stars: ✭ 52 (-83.23%)
Mutual labels:  parsing, peg
Ramble
A R parser based on combinatory parsers.
Stars: ✭ 19 (-93.87%)
Mutual labels:  parsing, parser-combinators
parser-lang
A parser combinator library with declarative superpowers
Stars: ✭ 25 (-91.94%)
Mutual labels:  parsing, parser-combinators
arborist
Arborist is a PEG parser that supports left-associative left recursion
Stars: ✭ 17 (-94.52%)
Mutual labels:  parsing, peg
loquat
Monadic parser combinators for JavaScript / TypeScript
Stars: ✭ 47 (-84.84%)
Mutual labels:  parsing, parser-combinators
cppcombinator
parser combinator and AST generator in c++17
Stars: ✭ 20 (-93.55%)
Mutual labels:  parsing, peg
ohm-editor
An IDE for the Ohm language (JavaScript edition)
Stars: ✭ 78 (-74.84%)
Mutual labels:  parsing, peg
autumn
A Java parser combinator library written with an unmatched feature set.
Stars: ✭ 112 (-63.87%)
Mutual labels:  parsing, parser-combinators
Funcparserlib
Recursive descent parsing library for Python based on functional combinators
Stars: ✭ 250 (-19.35%)
Mutual labels:  parser-combinators, parsing
pyrser
A PEG Parsing Tool
Stars: ✭ 32 (-89.68%)
Mutual labels:  parsing, peg
Combine
A parser combinator library for Elixir projects
Stars: ✭ 174 (-43.87%)
Mutual labels:  parser-combinators, parsing
Parjs
JavaScript parser-combinator library
Stars: ✭ 145 (-53.23%)
Mutual labels:  parser-combinators, parsing
Parsing With Haskell Parser Combinators
🔍 A step-by-step guide to parsing using Haskell parser combinators.
Stars: ✭ 72 (-76.77%)
Mutual labels:  parser-combinators, parsing
latex2unicode
Convert LaTeX markup to Unicode (in Scala and Java)
Stars: ✭ 28 (-90.97%)
Mutual labels:  parsing, peg
pe
Fastest general-purpose parsing library for Python with a familiar API
Stars: ✭ 21 (-93.23%)
Mutual labels:  parsing, peg

pom

Crates.io Build Status Docs Discord

PEG parser combinators created using operator overloading without macros.

Document

What is PEG?

PEG stands for parsing expression grammar, is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. Unlike CFGs, PEGs cannot be ambiguous; if a string parses, it has exactly one valid parse tree. Each parsing function conceptually takes an input string as its argument, and yields one of the following results:

  • success, in which the function may optionally move forward or consume one or more characters of the input string supplied to it, or
  • failure, in which case no input is consumed.

Read more on Wikipedia.

What is parser combinator?

A parser combinator is a higher-order function that accepts several parsers as input and returns a new parser as its output. Parser combinators enable a recursive descent parsing strategy that facilitates modular piecewise construction and testing.

Parsers built using combinators are straightforward to construct, readable, modular, well-structured and easily maintainable. With operator overloading, a parser combinator can take the form of an infix operator, used to glue different parsers to form a complete rule. Parser combinators thereby enable parsers to be defined in an embedded style, in code which is similar in structure to the rules of the formal grammar. And the code is easier to debug than macros.

The main advantage is that you don't need to go through any kind of code generation step, you're always using the vanilla language underneath. Aside from build issues (and the usual issues around error messages and debuggability, which in fairness are about as bad with macros as with code generation), it's usually easier to freely intermix grammar expressions and plain code.

List of predefined parsers and combinators

Basic Parsers Description
empty() Always succeeds, consume no input.
end() Match end of input.
any() Match any symbol and return the symbol.
sym(t) Match a single terminal symbol t.
seq(s) Match sequence of symbols.
list(p,s) Match list of p, separated by s.
one_of(set) Success when current input symbol is one of the set.
none_of(set) Success when current input symbol is none of the set.
is_a(predicate) Success when predicate return true on current input symbol.
not_a(predicate) Success when predicate return false on current input symbol.
take(n) Read n symbols.
skip(n) Skip n symbols.
call(pf) Call a parser factory, can be used to create recursive parsers.
Parser Combinators Description
p | q Match p or q, return result of the first success.
p + q Match p and q, if both succeed return a pair of results.
p - q Match p and q, if both succeed return result of p.
p * q Match p and q, if both succeed return result of q.
p >> q Parse p and get result P, then parse q and return result of q(P).
-p Success when p succeeds, doesn't consume input.
!p Success when p fails, doesn't consume input.
p.opt() Make parser optional. Returns an Option.
p.repeat(m..n) p.repeat(0..) repeat p zero or more times
p.repeat(1..) repeat p one or more times
p.repeat(1..4) match p at least 1 and at most 3 times
p.repeat(5) repeat p exactly 5 times
p.map(f) Convert parser result to desired value.
p.convert(f) Convert parser result to desired value, fails in case of conversion error.
p.pos() Get input position after matching p.
p.collect() Collect all matched input symbols.
p.discard() Discard parser output.
p.name(_) Give parser a name to identify parsing errors.
p.expect(_) Mark parser as expected, abort early when failed in ordered choice.

The choice of operators is established by their operator precedence, arity and "meaning". Use * to ignore the result of first operand on the start of an expression, + and - can fulfill the need on the rest of the expression.

For example, A * B * C - D + E - F will return the results of C and E as a pair.

Example code

extern crate pom;
use pom::parser::*;

let input = b"abcde";
let parser = sym(b'a') * none_of(b"AB") - sym(b'c') + seq(b"de");
let output = parser.parse(input);
assert_eq!(output, Ok( (b'b', vec![b'd', b'e']) ) );

Example JSON parser

extern crate pom;
use pom::parser::*;
use pom::Parser;

use std::collections::HashMap;
use std::str::{self, FromStr};

#[derive(Debug, PartialEq)]
pub enum JsonValue {
	Null,
	Bool(bool),
	Str(String),
	Num(f64),
	Array(Vec<JsonValue>),
	Object(HashMap<String,JsonValue>)
}

fn space() -> Parser<u8, ()> {
	one_of(b" \t\r\n").repeat(0..).discard()
}

fn number() -> Parser<u8, f64> {
	let integer = one_of(b"123456789") - one_of(b"0123456789").repeat(0..) | sym(b'0');
	let frac = sym(b'.') + one_of(b"0123456789").repeat(1..);
	let exp = one_of(b"eE") + one_of(b"+-").opt() + one_of(b"0123456789").repeat(1..);
	let number = sym(b'-').opt() + integer + frac.opt() + exp.opt();
	number.collect().convert(str::from_utf8).convert(|s|f64::from_str(&s))
}

fn string() -> Parser<u8, String> {
	let special_char = sym(b'\\') | sym(b'/') | sym(b'"')
		| sym(b'b').map(|_|b'\x08') | sym(b'f').map(|_|b'\x0C')
		| sym(b'n').map(|_|b'\n') | sym(b'r').map(|_|b'\r') | sym(b't').map(|_|b'\t');
	let escape_sequence = sym(b'\\') * special_char;
	let string = sym(b'"') * (none_of(b"\\\"") | escape_sequence).repeat(0..) - sym(b'"');
	string.convert(String::from_utf8)
}

fn array() -> Parser<u8, Vec<JsonValue>> {
	let elems = list(call(value), sym(b',') * space());
	sym(b'[') * space() * elems - sym(b']')
}

fn object() -> Parser<u8, HashMap<String, JsonValue>> {
	let member = string() - space() - sym(b':') - space() + call(value);
	let members = list(member, sym(b',') * space());
	let obj = sym(b'{') * space() * members - sym(b'}');
	obj.map(|members|members.into_iter().collect::<HashMap<_,_>>())
}

fn value() -> Parser<u8, JsonValue> {
	( seq(b"null").map(|_|JsonValue::Null)
	| seq(b"true").map(|_|JsonValue::Bool(true))
	| seq(b"false").map(|_|JsonValue::Bool(false))
	| number().map(|num|JsonValue::Num(num))
	| string().map(|text|JsonValue::Str(text))
	| array().map(|arr|JsonValue::Array(arr))
	| object().map(|obj|JsonValue::Object(obj))
	) - space()
}

pub fn json() -> Parser<u8, JsonValue> {
	space() * value() - end()
}

fn main() {
	let input = br#"
	{
        "Image": {
            "Width":  800,
            "Height": 600,
            "Title":  "View from 15th Floor",
            "Thumbnail": {
                "Url":    "http://www.example.com/image/481989943",
                "Height": 125,
                "Width":  100
            },
            "Animated" : false,
            "IDs": [116, 943, 234, 38793]
        }
    }"#;

	println!("{:?}", json().parse(input));
}

You can run this example with the following command:

cargo run --example json

Benchmark

Parser Time to parse the same JSON file
pom: json_byte 621,319 ns/iter (+/- 20,318)
pom: json_char 627,110 ns/iter (+/- 11,463)
pest: json_char 13,359 ns/iter (+/- 811)

Lifetimes and files

String literals have a static lifetime so they can work with the static version of Parser imported from pom::Parser. Input read from a file has a shorter lifetime. In this case you should import pom::parser::Parser and declare lifetimes on your parser functions. So

fn space() -> Parser<u8, ()> {
    one_of(b" \t\r\n").repeat(0..).discard()
}

would become

fn space<'a>() -> Parser<'a, u8, ()> {
    one_of(b" \t\r\n").repeat(0..).discard()
}
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].