All Projects → jaynetics → js_regex

jaynetics / js_regex

Licence: MIT license
Converts Ruby regexes to JavaScript regexes.

Programming Languages

ruby
36898 projects - #4 most used programming language

Projects that are alternatives of or similar to js regex

edit
A stand-alone implementation of the Acme text editor's command language.
Stars: ✭ 29 (-45.28%)
Mutual labels:  regular-expression
ctparse
Parse natural language time expressions in python
Stars: ✭ 96 (+81.13%)
Mutual labels:  regular-expression
ppx regexp
Matching Regular Expressions with OCaml Patterns
Stars: ✭ 51 (-3.77%)
Mutual labels:  regular-expression
riskybird
Regular expression authors best friend
Stars: ✭ 48 (-9.43%)
Mutual labels:  regular-expression
expand-brackets
Expand POSIX bracket expressions (character classes) in glob patterns.
Stars: ✭ 26 (-50.94%)
Mutual labels:  regular-expression
FormaleSysteme
Unterlagen zur Vorlesung "Formale Systeme", Fakultät Informatik, TU Dresden
Stars: ✭ 31 (-41.51%)
Mutual labels:  regular-expression
LLRegex
Regular expression library in Swift, wrapping NSRegularExpression.
Stars: ✭ 18 (-66.04%)
Mutual labels:  regular-expression
ccxx
This is a cross-platform library software library about c, c ++, unix4, posix. Include gtest, benchmark, cmake, process lock, daemon, libuv, lua, cpython, re2, json, yaml, mysql, redis, opencv, qt, lz4, oci ... https://hub.docker.com/u/oudream
Stars: ✭ 31 (-41.51%)
Mutual labels:  regular-expression
irrec
composable regular expressions based on Kleene algebras and recursion schemes
Stars: ✭ 14 (-73.58%)
Mutual labels:  regular-expression
globrex
Glob to regular expression with support for extended globs.
Stars: ✭ 52 (-1.89%)
Mutual labels:  regular-expression
dregex
Dregex is a JVM library that implements a regular expression engine using deterministic finite automata (DFA). It supports some Perl-style features and yet retains linear matching time, and also offers set operations.
Stars: ✭ 37 (-30.19%)
Mutual labels:  regular-expression
effcee
Effcee is a C++ library for stateful pattern matching of strings, inspired by LLVM's FileCheck
Stars: ✭ 76 (+43.4%)
Mutual labels:  regular-expression
strgen
A Python module for a template language that generates randomized data
Stars: ✭ 34 (-35.85%)
Mutual labels:  regular-expression
CVparser
CVparser is software for parsing or extracting data out of CV/resumes.
Stars: ✭ 28 (-47.17%)
Mutual labels:  regular-expression
cheat-sheet-pdf
📜 A Cheat-Sheet Collection from the WWW
Stars: ✭ 728 (+1273.58%)
Mutual labels:  regular-expression
parsesig
A Telegram bot that forwards messages from one private/public channel to another after formatting
Stars: ✭ 40 (-24.53%)
Mutual labels:  regular-expression
copycat
A PHP Scraping Class
Stars: ✭ 70 (+32.08%)
Mutual labels:  regular-expression
pamatcher
A pattern matching library for JavaScript iterators
Stars: ✭ 23 (-56.6%)
Mutual labels:  regular-expression
RgxGen
Regex: generate matching and non matching strings based on regex pattern.
Stars: ✭ 45 (-15.09%)
Mutual labels:  regular-expression
xray
Hexrays decompiler plugin that colorizes and filters the decompiler's output based on regular expressions
Stars: ✭ 97 (+83.02%)
Mutual labels:  regular-expression

JsRegex

Gem Version Build Status Build Status codecov

This is a Ruby gem that translates Ruby's regular expressions to the JavaScript flavor.

It can handle far more of Ruby's regex capabilities than a search-and-replace approach, and if any incompatibilities remain, it returns helpful warnings to indicate them.

This means you'll have better chances of translating your regexes, and if there is still a problem, at least you'll know.

Installation

Add it to your gemfile or run

gem install js_regex

Usage

In Ruby:

require 'js_regex'

ruby_hex_regex = /0x\h+/i

js_regex = JsRegex.new(ruby_hex_regex)

js_regex.warnings # => []
js_regex.source # => '0x[0-9A-Fa-f]+'
js_regex.options # => 'i'

An options: argument lets you force options:

JsRegex.new(/x/i, options: 'g').to_h
# => {source: 'x', options: 'gi'}

Set the g flag like this if you want to use the regex to find or replace multiple matches per string.

To inject the result directly into JavaScript, use #to_s or String interpolation. E.g. in inline JavaScript in HAML or SLIM you can simply do:

var regExp = #{js_regex};

Use #to_json if you want to send it as JSON or #to_h to include it as a data attribute of a DOM element.

render json: js_regex

js_regex.to_h # => {source: '[0-9A-Fa-f]+', options: ''}

To turn the data attribute or parsed JSON back into a regex in JavaScript, use the new RegExp() constructor:

var regExp = new RegExp(jsonObj.source, jsonObj.options);

Heed the Warnings

You might have noticed the empty warnings array in the example above:

js_regex = JsRegex.new(ruby_hex_regex)
js_regex.warnings # => []

If this array isn't empty, that means that your Ruby regex contained some stuff that can't be carried over to JavaScript. You can still use the result, but this is not recommended. Most likely it won't match the same strings as your Ruby regex.

advanced_ruby_regex = /(?<!fizz)buzz/

js_regex = JsRegex.new(advanced_ruby_regex)
js_regex.warnings # => ["Dropped unsupported negative lookbehind assertion '(?<!fizz)' at index 0"]
js_regex.source # => 'buzz'

There is also a strict initializer, JsRegex::new!, which raises a JsRegex::Error if there are incompatibilites. This is particularly useful if you use JsRegex to convert regex-like strings, e.g. strings entered by users, as a JsRegex::Error might also occur if the given regex is invalid:

begin
  user_input = '('
  JsRegex.new(user_input)
rescue JsRegex::Error => e
  e.message # => "Premature end of pattern (missing group closing parenthesis)"
end

Supported Features

In addition to the conversions supported by the default approach, this gem will correctly handle the following features:

Description Example
escaped meta chars \\A
dot matching astral chars /./ =~ '😋'
Ruby's multiline mode [1] /.+/m
Ruby's free-spacing mode / http (s?) /x
atomic groups [2] a(?>bc|b)c
conditionals [2] (?(1)b), (?('a')b|c)
option groups/switches (?i-m:..), (?x)..
local encoding options (?u:\w)
absence groups /\*(?~\*/)\*/
possessive quantifiers [2] ++, *+, ?+
chained quantifiers /A{4}{6}/ =~ 'A' * 24
hex types \h and \H \H\h{6}
bell and escape shortcuts \a, \e
all literals, including \n eval("/\n/")
newline-ready anchor \Z last word\Z
generic linebreak \R data.split(/\R/)
meta and control escapes /\M-\C-X/
numeric backreferences \1, \k<1>
relative backreferences \k<-1>
named backreferences \k<foo>
numeric subexpression calls \g<1>
relative subexpression calls \g<-1>
named subexpression calls \g<foo>
nested sets [a-z[A-Z]]
types in sets [a-z\h]
properties in sets [a-z\p{sc}]
set intersections [\w&&[^a]]
recursive set negation [^a[^b]]
posix types [[:alpha:]]
posix negations [[:^alpha:]]
codepoint lists \u{61 63 1F601}
unicode properties \p{Arabic}, \p{Dash}
unicode abbreviations \p{Mong}, \p{Sc}
unicode negations \p{^L}, \P{L}, \P{^L}
astral plane properties [2] \p{emoji}
astral plane literals [2] 😁
astral plane ranges [2] [😁-😲]

[1] Keep in mind that Ruby's multiline mode is more of a "dot-all mode" and totally different from JavaScript's multiline mode.

[2] See here for information about how this is achieved.

Unsupported Features

Currently, the following functionalities can't be carried over to JavaScript. If you try to convert a regex that uses these features, corresponding parts of the pattern will be dropped from the result.

In most of these cases that will lead to a warning, but changes that are not considered risky happen without warning. E.g. comments are removed silently because that won't lead to any operational differences between the Ruby and JavaScript regexes.

Description Example Warning
lookbehind (?<=, (?<!, \K yes
whole pattern recursion \g<0> yes
backref by recursion level \k<1+1> yes
previous match anchor \G yes
extended grapheme type \X yes
variable length absence groups (?~(a+|bar)) yes
working word boundary anchors \b, \B yes [3]
capturing group names (?<a>, (?'a' no
comment groups (?#comment) no
inline comments (in x-mode) /[a-z] # comment/x no

[3] \b and \B are carried over, but generate a warning because they only recognize ASCII word chars in JavaScript. This holds true for all JavaScript versions and RegExp modes.

How it Works

JsRegex uses the gem regexp_parser to parse a Ruby Regexp.

It traverses the AST returned by regexp_parser depth-first, and converts it to its own tree of equivalent JavaScript RegExp tokens, marking some nodes for treatment in a second pass.

The second pass then carries out all modifications that require knowledge of the complete tree.

After the second pass, JsRegex flat-maps the final tree into a new source string.

Many Regexp tokens work in JavaScript just as they do in Ruby, or allow for a straightforward replacement, but some conversions are a little more involved.

Atomic groups and possessive quantifiers are missing in JavaScript, so the only way to emulate their behavior is by substituting them with backreferenced lookahead groups.

Astral plane characters convert to ranges of surrogate pairs, so they don't require ES6.

Properties and posix classes expand to equivalent character sets, or surrogate pair alternations if necessary. The gem regexp_property_values helps by reading out their codepoints from Onigmo.

Character sets a.k.a. bracket expressions offer many more features in Ruby compared to JavaScript. To work around this, JsRegex calls on the gem character_set to calculate the matched codepoints of the whole set and build a completely new set string for all except the most simple cases.

Conditionals expand to equivalent expressions in the second pass, e.g. (<)?foo(?(1)>) expands to (?:<foo>|foo) (simplified example).

Subexpression calls are replaced with the conversion result of their target, e.g. (.{3})\g<1> expands to (.{3})(.{3}).

The tricky bit here is that these expressions may be nested, and that their expansions may increase the capturing group count. This means that any following backreferences need an update. E.g. (.{3})\g<1>(.)\2 (which matches strings like "FooBarXX") converts to (.{3})(.{3})(.)\3.

Contributions

Feel free to send suggestions, point out issues, or submit pull requests.

Outlook

Possible future improvements might include an "ES6 mode" using the u flag, which would allow for more concise representations of astral plane properties and sets.

As far as supported conversions are concerned, this gem is pretty much feature-complete. Most of the unsupported features listed above are either impossible or impractical to replicate in JavaScript.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].