All Projects → telekons → one-more-re-nightmare

telekons / one-more-re-nightmare

Licence: BSD-2-Clause license
A fast regular expression compiler in Common Lisp

Programming Languages

common lisp
692 projects

Projects that are alternatives of or similar to one-more-re-nightmare

Regex
A pure Swift NFA implementation of a regular expression engine
Stars: ✭ 27 (-74.04%)
Mutual labels:  regex, regular-expression-engine
moar
Deterministic Regular Expressions with Backreferences
Stars: ✭ 19 (-81.73%)
Mutual labels:  regex, regular-expression-engine
dregex
Dregex is a JVM library that implements a regular expression engine using deterministic finite automata (DFA). It supports some Perl-style features and yet retains linear matching time, and also offers set operations.
Stars: ✭ 37 (-64.42%)
Mutual labels:  regex, regular-expression-engine
loadkit
Java 资源加载器,充分拓展ClassLoader#getResources(name)的能力,实现递归加载,支持普通风格 / 包名风格 / ANT风格 / 正则风格路径的资源加载同时支持自定义过滤器,通常作为框架的基础类库。
Stars: ✭ 39 (-62.5%)
Mutual labels:  regex
logwatch
日志采集工具
Stars: ✭ 22 (-78.85%)
Mutual labels:  regex
replace
Generic file search & replace tool, written in Python 3
Stars: ✭ 28 (-73.08%)
Mutual labels:  regex
lc-data-intro
Library Carpentry: Introduction to Working with Data (Regular Expressions)
Stars: ✭ 16 (-84.62%)
Mutual labels:  regex
django-redirects
↪️ ✅ redirects as they should be, with full control.
Stars: ✭ 32 (-69.23%)
Mutual labels:  regex
greptile
Fast grep implementation in python, with recursive search and replace
Stars: ✭ 17 (-83.65%)
Mutual labels:  regex
antk
Redkato, - Indonesian anime scraper
Stars: ✭ 14 (-86.54%)
Mutual labels:  regex
js-diacritic-regex
Creates the inverse of transliterated string to a regex. What? Basically, diacritic insensitiveness
Stars: ✭ 20 (-80.77%)
Mutual labels:  regex
java-core
Collections of solutions for micro-tasks created while building modules as part of project. Also has very fun stuffs :)
Stars: ✭ 35 (-66.35%)
Mutual labels:  regex
globrex
Glob to regular expression with support for extended globs.
Stars: ✭ 52 (-50%)
Mutual labels:  regex
ocaml-re-nfa
OCaml code to construct an NFA from a regular expression
Stars: ✭ 44 (-57.69%)
Mutual labels:  regex
VBA-JSON-parser
Backus-Naur Form JSON Parser based on RegEx for VBA
Stars: ✭ 75 (-27.88%)
Mutual labels:  regex
expand-brackets
Expand POSIX bracket expressions (character classes) in glob patterns.
Stars: ✭ 26 (-75%)
Mutual labels:  regex
PastaBean
Python Script to Scrape Pastebin with Regex
Stars: ✭ 0 (-100%)
Mutual labels:  regex
git-search-replace
A utility on top of Git for project-wide search-and-replace that includes filenames too
Stars: ✭ 42 (-59.62%)
Mutual labels:  regex
Socially
Socially is a textView which is able to create separate clickable views according to your requirements.
Stars: ✭ 28 (-73.08%)
Mutual labels:  regex
unmatcher
Regular expressions reverser for Python
Stars: ✭ 26 (-75%)
Mutual labels:  regex

one-more-re-nightmare

one-more-re-nightmare is a regular expression engine that uses the technique presented in Regular-expression derivatives revisited to interpret and compile regular expressions. We use a few tricks to make matching quite fast:

  • We use a deterministic finite automaton to have O(n) runtime.
  • We run the Common Lisp compiler to generate machine code, rather than interpreting a DFA or bytecode, or jumping through closures (like CL-PPCRE does).
  • We generate specialised code for each array type, so everything is inlined.
  • If you use the one-more-re-nightmare-simd system on SBCL 2.1.10 or newer with AVX2, we even use vectorised scanning of constant prefixes of regular expressions.

Thanks to Gilbert Baumann for suggesting I use derivatives to compile regular expressions, and then for informing me of how to handle submatching properly, and my discrete mathematics teachers for formally introducing me to finite state machines.

Please see the reference book for how to use one-more-re-nightmare, or an article on the history and theory involved.

While the syntax is admittedly wonky (but somewhat more like how regular expressions are presented in papers), one-more-re-nightmare makes its best effort to implement POSIX semantics for matching (as described in the specification for how regcomp works and regular expression definitions). Any behaviour contrary to POSIX is a bug.

A lousy benchmark

CL-USER> (let ((s (make-string 1000000 :initial-element #\a)))
           (setf (aref s 333333) #\b)
           (setf (aref s 555555) #\c)
           (the-cost-of-nothing:bench
            (all-string-matches "ab|ac" s)))

CL-USER> (let ((s (make-string 1000000 :initial-element #\a)))
           (setf (aref s 333333) #\b)
           (setf (aref s 555555) #\c)
           (the-cost-of-nothing:bench
            (cl-ppcre:all-matches-as-strings "ab|ac" s)))

Note that, by nature of calling the Common Lisp compiler, one-more-re-nightmare will take longer to compile a regular expression, so it is better suited for many matching operations with few expressions. It does cache compiled expressions when using the high-level interface, so the initial cost may amortize well over many calls; and constant regular expression strings are compiled at compile-time, with no runtime overhead whatsoever.

engine SBCL Clozure CL SBCL with AVX2 ditto, SIMPLE-BASE-STRING
o-m-r-n 0.57ms 2.93ms 0.18ms 55µs
compilation time 4.65ms 3.76ms 6.82ms 6.43ms
cl-ppcre 22.8ms 45.3ms 22.8ms 21.6ms
break even after 209kchars 88.7kchars 301kchars 305kchars
Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].