Cheap and reliable Node.js hosting starts at $3/month, and $1/month static HTML hosting

Created with love in Canada, visit hostnodejs.com today

Feel like to post an Ad? Learn Details

All Projects → fent → Ret.js

fent / Ret.js

Licence: mit

Tokenizes a string that represents a regular expression.

Programming Languages

javascript

184084 projects - #8 most used programming language

Labels

node parser regular-expressions

Projects that are alternatives of or similar to Ret.js

Dan Jurafsky Chris Manning Nlp

My solution to the Natural Language Processing course made by Dan Jurafsky, Chris Manning in Winter 2012.

Stars: ✭ 124 (+77.14%)

Mutual labels: parser, regular-expressions

Npeg

PEGs for Nim, another take

Stars: ✭ 163 (+132.86%)

Mutual labels: parser, regular-expressions

Regex

The Hoa\Regex library.

Stars: ✭ 308 (+340%)

Mutual labels: parser, regular-expressions

Cssparser.js

cssparser.js is a parser that generate json from css with matched orders & structures.

Stars: ✭ 61 (-12.86%)

Mutual labels: parser

Bibliothecary

📔 Libraries.io Package Manager Manifest Parsers

Stars: ✭ 62 (-11.43%)

Mutual labels: parser

Parser Javascript

Browser sniffing gone too far — A useragent parser library for JavaScript

Stars: ✭ 66 (-5.71%)

Mutual labels: parser

Anglesharp.js

👼 Extends AngleSharp with a .NET-based JavaScript engine.

Stars: ✭ 68 (-2.86%)

Mutual labels: parser

Url Highlight

PHP library to parse urls from string input

Stars: ✭ 61 (-12.86%)

Mutual labels: parser

Php Svg Lib

SVG file parsing / rendering library

Stars: ✭ 1,146 (+1537.14%)

Mutual labels: parser

Csvparser

C++ parser for CSV file format

Stars: ✭ 65 (-7.14%)

Mutual labels: parser

Astexplorer.app

https://astexplorer.net with ES Modules support and Hot Reloading

Stars: ✭ 65 (-7.14%)

Mutual labels: parser

Quill Delta Parser

A PHP library to parse and render Quill WYSIWYG Deltas into HTML - Flexibel and extendible for custom elements.

Stars: ✭ 63 (-10%)

Mutual labels: parser

Charly Vm

Fibers, Closures, C-Module System | NaN-boxing, bytecode-VM written in C++

Stars: ✭ 66 (-5.71%)

Mutual labels: parser

Csstree

A tool set for CSS including fast detailed parser, walker, generator and lexer based on W3C specs and browser implementations

Stars: ✭ 1,121 (+1501.43%)

Mutual labels: parser

Oga

Read-only mirror of https://gitlab.com/yorickpeterse/oga

Stars: ✭ 1,147 (+1538.57%)

Mutual labels: parser

Mlua

An interpreter of lua-like language written in C++

Stars: ✭ 61 (-12.86%)

Mutual labels: parser

Atoma

Atom, RSS and JSON feed parser for Python 3

Stars: ✭ 67 (-4.29%)

Mutual labels: parser

Obonet

OBO-formatted ontologies → networkx (Python 3)

Stars: ✭ 64 (-8.57%)

Mutual labels: parser

Dexbox

A lightweight dex file parsing library

Stars: ✭ 64 (-8.57%)

Mutual labels: parser

Cppast.codegen

An extensible library providing C# PInvoke codegen from C/C++ files for .NET

Stars: ✭ 65 (-7.14%)

Mutual labels: parser

View All Similar Projects ➔

Regular Expression Tokenizer

Tokenizes strings that represent a regular expressions.

Usage

const ret = require('ret');

let tokens = ret(/foo|bar/.source);

tokens will contain the following object

{
  "type": ret.types.ROOT
  "options": [
    [ { "type": ret.types.CHAR, "value", 102 },
      { "type": ret.types.CHAR, "value", 111 },
      { "type": ret.types.CHAR, "value", 111 } ],
    [ { "type": ret.types.CHAR, "value",  98 },
      { "type": ret.types.CHAR, "value",  97 },
      { "type": ret.types.CHAR, "value", 114 } ]
  ]
}

Reconstructing Regular Expressions from Tokens

The reconstruct function accepts an any token and returns, as a string, the component of the regular expression that is associated with that token.

import { reconstruct, types } from 'ret'
const tokens = ret(/foo|bar/.source)
const setToken = {
    "type": types.SET,
    "set": [
      { "type": types.CHAR, "value": 97 },
      { "type": types.CHAR, "value": 98 },
      { "type": types.CHAR, "value": 99 }
    ],
    "not": true
  }
reconstruct(tokens)                               // 'foo|bar'
reconstruct({ "type": types.CHAR, "value": 102 }) // 'f'
reconstruct(setToken)                             // '^abc'

Token Types

ret.types is a collection of the various token types exported by ret.

ROOT

Only used in the root of the regexp. This is needed due to the posibility of the root containing a pipe | character. In that case, the token will have an options key that will be an array of arrays of tokens. If not, it will contain a stack key that is an array of tokens.

{
  "type": ret.types.ROOT,
  "stack": [token1, token2...],
}

{
  "type": ret.types.ROOT,
  "options" [
    [token1, token2...],
    [othertoken1, othertoken2...]
    ...
  ],
}

GROUP

Groups contain tokens that are inside of a parenthesis. If the group begins with ? followed by another character, it's a special type of group. A ':' tells the group not to be remembered when exec is used. '=' means the previous token matches only if followed by this group, and '!' means the previous token matches only if NOT followed.

Like root, it can contain an options key instead of stack if there is a pipe.

{
  "type": ret.types.GROUP,
  "remember" true,
  "followedBy": false,
  "notFollowedBy": false,
  "stack": [token1, token2...],
}

{
  "type": ret.types.GROUP,
  "remember" true,
  "followedBy": false,
  "notFollowedBy": false,
  "options" [
    [token1, token2...],
    [othertoken1, othertoken2...]
    ...
  ],
}

POSITION

\b, \B, ^, and $ specify positions in the regexp.

{
  "type": ret.types.POSITION,
  "value": "^",
}

SET

Contains a key set specifying what tokens are allowed and a key not specifying if the set should be negated. A set can contain other sets, ranges, and characters.

{
  "type": ret.types.SET,
  "set": [token1, token2...],
  "not": false,
}

RANGE

Used in set tokens to specify a character range. from and to are character codes.

{
  "type": ret.types.RANGE,
  "from": 97,
  "to": 122,
}

REPETITION

{
  "type": ret.types.REPETITION,
  "min": 0,
  "max": Infinity,
  "value": token,
}

REFERENCE

References a group token. value is 1-9.

{
  "type": ret.types.REFERENCE,
  "value": 1,
}

CHAR

Represents a single character token. value is the character code. This might seem a bit cluttering instead of concatenating characters together. But since repetition tokens only repeat the last token and not the last clause like the pipe, it's simpler to do it this way.

{
  "type": ret.types.CHAR,
  "value": 123,
}

Errors

ret.js will throw errors if given a string with an invalid regular expression. All possible errors are

Invalid group. When a group with an immediate ? character is followed by an invalid character. It can only be followed by !, =, or :. Example: /(?_abc)/
Nothing to repeat. Thrown when a repetitional token is used as the first token in the current clause, as in right in the beginning of the regexp or group, or right after a pipe. Example: /foo|?bar/, /{1,3}foo|bar/, /foo(+bar)/
Unmatched ). A group was not opened, but was closed. Example: /hello)2u/
Unterminated group. A group was not closed. Example: /(1(23)4/
Unterminated character class. A custom character set was not closed. Example: /[abc/

Regular Expression Syntax

Regular expressions follow the JavaScript syntax.

The following latest JavaScript additions are not supported yet:

\p and \P: Unicode property escapes
(?<group>) and \k<group>: Named groups
(?<=) and (?<!): Negative lookbehind assertions

Examples

/abc/

{
  "type": ret.types.ROOT,
  "stack": [
    { "type": ret.types.CHAR, "value": 97 },
    { "type": ret.types.CHAR, "value": 98 },
    { "type": ret.types.CHAR, "value": 99 }
  ]
}

/[abc]/

{
  "type": ret.types.ROOT,
  "stack": [{
    "type": ret.types.SET,
    "set": [
      { "type": ret.types.CHAR, "value": 97 },
      { "type": ret.types.CHAR, "value": 98 },
      { "type": ret.types.CHAR, "value": 99 }
    ],
    "not": false
  }]
}

/[^abc]/

{
  "type": ret.types.ROOT,
  "stack": [{
    "type": ret.types.SET,
    "set": [
      { "type": ret.types.CHAR, "value": 97 },
      { "type": ret.types.CHAR, "value": 98 },
      { "type": ret.types.CHAR, "value": 99 }
    ],
    "not": true
  }]
}

/[a-z]/

{
  "type": ret.types.ROOT,
  "stack": [{
    "type": ret.types.SET,
    "set": [
      { "type": ret.types.RANGE, "from": 97, "to": 122 }
    ],
    "not": false
  }]
}

/\w/

// Similar logic for `\W`, `\d`, `\D`, `\s` and `\S`    
{
  "type": ret.types.ROOT,
  "stack": [{
    "type": ret.types.SET,
    "set": [{
      { "type": ret.types.CHAR, "value": 95 },
      { "type": ret.types.RANGE, "from": 97, "to": 122 },
      { "type": ret.types.RANGE, "from": 65, "to": 90 },
      { "type": ret.types.RANGE, "from": 48, "to": 57 }
    }],
    "not": false
  }]
}

/./

// any character but CR, LF, U+2028 or U+2029
{
  "type": ret.types.ROOT,
  "stack": [{
    "type": ret.types.SET,
    "set": [ 
      { "type": ret.types.CHAR, "value": 10 },
      { "type": ret.types.CHAR, "value": 13 },
      { "type": ret.types.CHAR, "value": 8232 },
      { "type": ret.types.CHAR, "value": 8233 }
    ],
    "not": true
  }]
}

/a*/

{
  "type": ret.types.ROOT,
  "stack": [{ 
    "type": ret.types.REPETITION, 
    "min": 0,
    "max": Infinity,
    "value": { "type": ret.types.CHAR, "value": 97 }
  }]
}

/a+/

{
  "type": ret.types.ROOT,
  "stack": [{ 
    "type": ret.types.REPETITION, 
    "min": 1,
    "max": Infinity,
    "value": { "type": ret.types.CHAR, "value": 97 },
  }]
}

/a?/

{
  "type": ret.types.ROOT,
  "stack": [{ 
    "type": ret.types.REPETITION, 
    "min": 0,
    "max": 1,
    "value": { "type": ret.types.CHAR, "value": 97 }
  }]
}

/a{3}/

{
  "type": ret.types.ROOT,
  "stack": [{ 
    "type": ret.types.REPETITION, 
    "min": 3,
    "max": 3,
    "value": { "type": ret.types.CHAR, "value": 97 }
  }]
}

/a{3,5}/

{
  "type": ret.types.ROOT,
  "stack": [{ 
    "type": ret.types.REPETITION, 
    "min": 3,
    "max": 5,
    "value": { "type": ret.types.CHAR, "value": 97 }
  }]
}

/a{3,}/

{
  "type": ret.types.ROOT,
  "stack": [{ 
    "type": ret.types.REPETITION, 
    "min": 3,
    "max": Infinity,
    "value": { "type": ret.types.CHAR, "value": 97 }
  }]
}

/(a)/

{
  "type": ret.types.ROOT,
  "stack": [{ 
    "type": ret.types.GROUP, 
    "stack": { "type": ret.types.CHAR, "value": 97 },
    "remember": true
  }]
}

/(?:a)/

{
  "type": ret.types.ROOT,
  "stack": [{ 
    "type": ret.types.GROUP, 
    "stack": { "type": ret.types.CHAR, "value": 97 },
    "remember": false
  }]
}

/(?=a)/

{
  "type": ret.types.ROOT,
  "stack": [{ 
    "type": ret.types.GROUP, 
    "stack": { "type": ret.types.CHAR, "value": 97 },
    "remember": false,
    "followedBy": true
  }]
}

/(?!a)/

{
  "type": ret.types.ROOT,
  "stack": [{ 
    "type": ret.types.GROUP, 
    "stack": { "type": ret.types.CHAR, "value": 97 },
    "remember": false,
    "notFollowedBy": true
  }]
}

/a|b/

{
  "type": ret.types.ROOT,
  "options": [
    [{ "type": ret.types.CHAR, "value": 97 }], 
    [{ "type": ret.types.CHAR, "value": 98 }] 
  ]
}

/(a|b)/

{
  "type": ret.types.ROOT,
  "stack": [
    "type": ret.types.GROUP,
    "remember": true,
    "options": [
      [{ "type": ret.types.CHAR, "value": 97 }], 
      [{ "type": ret.types.CHAR, "value": 98 }] 
    ]
  ]
}

/^/

{
  "type": ret.types.ROOT,
  "stack": [{
    "type": ret.types.POSITION,
    "value": "^"
  }]
}

/$/

{
  "type": ret.types.ROOT,
  "stack": [{
    "type": ret.types.POSITION,
    "value": "$"
  }]
}

/\b/

{
  "type": ret.types.ROOT,
  "stack": [{
    "type": ret.types.POSITION,
    "value": "b"
  }]
}

/\B/

{
  "type": ret.types.ROOT,
  "stack": [{
    "type": ret.types.POSITION,
    "value": "B"
  }]
}

/\1/

{
  "type": ret.types.ROOT,
  "stack": [{
    "type": ret.types.REFERENCE,
    "value": 1
  }]
}

Install

npm install ret

Tests

Tests are written with vows

npm test

Security

To report a security vulnerability, please use the Tidelift security contact. Tidelift will coordinate the fix and disclosure.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].

Stars: ✭ 70

Visit Git Page 🔗Visit User Page 🔗Visit Issues Page (3) 🔗