All Projects → paurkedal → ppx_regexp

paurkedal / ppx_regexp

Licence: LGPL-3.0 and 2 other licenses found Licenses found LGPL-3.0 COPYING.LESSER GPL-3.0 COPYING Unknown COPYING.LINKING
Matching Regular Expressions with OCaml Patterns

Programming Languages

ocaml
1615 projects
shell
77523 projects

Projects that are alternatives of or similar to ppx regexp

es6-template-regex
Regular expression for matching es6 template delimiters in a string.
Stars: ✭ 15 (-70.59%)
Mutual labels:  regular-expression
Data-Wrangling-with-Python
Simplify your ETL processes with these hands-on data sanitation tips, tricks, and best practices
Stars: ✭ 90 (+76.47%)
Mutual labels:  regular-expression
FormaleSysteme
Unterlagen zur Vorlesung "Formale Systeme", Fakultät Informatik, TU Dresden
Stars: ✭ 31 (-39.22%)
Mutual labels:  regular-expression
LLRegex
Regular expression library in Swift, wrapping NSRegularExpression.
Stars: ✭ 18 (-64.71%)
Mutual labels:  regular-expression
riskybird
Regular expression authors best friend
Stars: ✭ 48 (-5.88%)
Mutual labels:  regular-expression
expand-brackets
Expand POSIX bracket expressions (character classes) in glob patterns.
Stars: ✭ 26 (-49.02%)
Mutual labels:  regular-expression
regexp-expand
Show the ELisp regular expression at point in rx form.
Stars: ✭ 18 (-64.71%)
Mutual labels:  regular-expression
globrex
Glob to regular expression with support for extended globs.
Stars: ✭ 52 (+1.96%)
Mutual labels:  regular-expression
dregex
Dregex is a JVM library that implements a regular expression engine using deterministic finite automata (DFA). It supports some Perl-style features and yet retains linear matching time, and also offers set operations.
Stars: ✭ 37 (-27.45%)
Mutual labels:  regular-expression
copycat
A PHP Scraping Class
Stars: ✭ 70 (+37.25%)
Mutual labels:  regular-expression
parsesig
A Telegram bot that forwards messages from one private/public channel to another after formatting
Stars: ✭ 40 (-21.57%)
Mutual labels:  regular-expression
CVparser
CVparser is software for parsing or extracting data out of CV/resumes.
Stars: ✭ 28 (-45.1%)
Mutual labels:  regular-expression
irrec
composable regular expressions based on Kleene algebras and recursion schemes
Stars: ✭ 14 (-72.55%)
Mutual labels:  regular-expression
montre
The original timed regular expression matcher over temporal behaviors
Stars: ✭ 14 (-72.55%)
Mutual labels:  regular-expression
strgen
A Python module for a template language that generates randomized data
Stars: ✭ 34 (-33.33%)
Mutual labels:  regular-expression
termco
Regular Expression Counts of Terms and Substrings
Stars: ✭ 24 (-52.94%)
Mutual labels:  regular-expression
effcee
Effcee is a C++ library for stateful pattern matching of strings, inspired by LLVM's FileCheck
Stars: ✭ 76 (+49.02%)
Mutual labels:  regular-expression
ppx string interpolation
PPX rewriter that enables string interpolation in OCaml
Stars: ✭ 34 (-33.33%)
Mutual labels:  ppx-rewriter
xray
Hexrays decompiler plugin that colorizes and filters the decompiler's output based on regular expressions
Stars: ✭ 97 (+90.2%)
Mutual labels:  regular-expression
ctparse
Parse natural language time expressions in python
Stars: ✭ 96 (+88.24%)
Mutual labels:  regular-expression

Build Status

Two PPXes for Working with Regular Expressions

This repo provides two PPXes providing regular expression-based routing:

  • ppx_regexp maps to re with the conventional last-match extraction into string and string option.
  • ppx_tyre maps to Tyre providing typed extraction into options, lists, tuples, objects, and polymorphic variants.

Another difference is that ppx_regexp works directly on strings essentially hiding the library calls, while ppx_tyre provides Tyre.t and Tyre.route which can be composed an applied using the Tyre library.

ppx_regexp - Regular Expression Matching with OCaml Patterns

This syntax extension turns

function%pcre
| {|re1|} -> e1
...
| {|reN|} -> eN
| _ -> e0

into suitable invocations of the Re library, and similar for match%pcre. The patterns are plain strings of the form accepted by Re_pcre, with the following additions:

  • (?<var>...) defines a group and binds whatever it matches as var. The type of var will be string if the match is guaranteed given that the whole pattern matches, and string option if the variable is bound to or nested below an optionally matched group.

  • ?<var> at the start of a pattern binds group 0 as var : string. This may not be the full string if the pattern is unanchored.

A variable is allowed for the universal case and is bound to the matched string. A regular alias is currently not allowed for patterns, since it is not obvious whether is should bind the full string or group 0.

Example

The following prints out times and hosts for SMTP connections to the Postfix daemon:

(* Link with re, re.pcre, lwt, lwt.unix.
   Preprocess with ppx_regexp.
   Adjust to your OS. *)

open Lwt.Infix

let check_line =
  (function%pcre
   | {|(?<t>.*:\d\d) .* postfix/smtpd\[[0-9]+\]: connect from (?<host>[a-z0-9.-]+)|} ->
      Lwt_io.printlf "%s %s" t host
   | _ ->
      Lwt.return_unit)

let () = Lwt_main.run begin
  Lwt_io.printl "SMTP connections from:" >>= fun () ->
  Lwt_stream.iter_s check_line (Lwt_io.lines_of_file "/var/log/syslog")
end

ppx_tyre - Syntax Support for Tyre Routes

Typed regular expressions

This PPX compiles

[%tyre {|re|}]

into 'a Tyre.t.

For instance, We can define a pattern that recognize strings of the form "dim:3x5" like so:

# open Tyre ;;
# let dim = [%tyre "dim:(?&int)x(?&int)"] ;;
val dim : (int * int) Tyre.t

The syntax (?&id) allows to call a typed regular expression named id of type 'a Tyre.t, such as Tyre.int.

For convenience, you can also use named capture groups to name the captured elements.

# let dim = [%tyre "dim:(?<x>(?&int))x(?&y:int)"] ;;
val dim : < x : int; y : int > Tyre.t

Names given using the syntax (?<foo>re) will be used for the fields of the results. (?&y:int) is a shortcut for (?<y>(?&int)). This can also be used for alternatives, for instance:

# let id_or_name = [%tyre "id:(?&id:int)|name:(?<name>[[:alnum:]]+)"] ;;
val id_or_name : [ `id of int | `name of string ] Tyre.t

Expressions of type Tyre.t can then be composed as part of bigger regular expressions, or compiled with Tyre.compile. See tyre's documentation for details.

Routes

ppx_tyre can also be used for routing, in the style of ppx_regexp:

    function%tyre
    | {|re1|} -> e1
    ...
    | {|reN|} -> eN

is turned into a 'a Type.route, where re, re1, ... are regular expressions using the same syntax as above. "re" as v is considered like (?<v>re) and "re1" | "re2" is turned into a regular expression alternative.

Once routes are defined, matching is done with Tyre.exec.

Details

The syntax follow Perl's syntax:

  • re? extracts an option of what re extracts.
  • re+, re*, re{n,m} extracts a list of what re extracts.
  • (?&qname) refers to any identifier bound to a typed regular expression of type 'a Tyre.t.
  • Normal parens are non-capturing.
  • There are two ways to capture:
    • Anonymous capture (+re)
    • Named capture (?<v>re)
  • One or more (?<v>re) at the top level can be used to bind variables instead of as ....
  • One or more (?<v>re) in a sequence extracts an object where each method v is bound to what re extracts.
  • An alternative with one (?<v>re) per branch extracts a polymorphic variant where each constructor `v receives what re extracts as its argument.
  • (?&v:qname) is a shortcut for (?<v>(?&qname)).

Limitations

No Pattern Guards

Pattern guards are not supported. This is due to the fact that all match cases are combined into a single regular expression, so if one of the patterns succeed, the match is committed before we can check the guard condition.

No Exhaustiveness Check

The syntax extension will always warn if no catch-all case is provided. No exhaustiveness check is attempted. Doing it right would require reimplementing full regular expression parsing and an algorithm which would ideally produce a counter-example.

Bug Reports

The processor is currently new and not well tested. Please break it and file bug reports in the GitHub issue tracker. Any exception raised by generated code except for Match_failure is a bug.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].