All Projects → curious-odd-man → RgxGen

curious-odd-man / RgxGen

Licence: Apache-2.0 license
Regex: generate matching and non matching strings based on regex pattern.

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to RgxGen

Grex
A command-line tool and library for generating regular expressions from user-provided test cases
Stars: ✭ 4,847 (+10671.11%)
Mutual labels:  regex, regexp, regular-expression, regex-pattern, regular-expressions
moar
Deterministic Regular Expressions with Backreferences
Stars: ✭ 19 (-57.78%)
Mutual labels:  regex, regexp, regular-expression, regex-pattern, regular-expressions
Regex For Regular Folk
🔍💪 Regular Expressions for Regular Folk — A visual, example-based introduction to RegEx [BETA]
Stars: ✭ 242 (+437.78%)
Mutual labels:  regex, regexp, regular-expression, regular-expressions
Regexpu
A source code transpiler that enables the use of ES2015 Unicode regular expressions in ES5.
Stars: ✭ 201 (+346.67%)
Mutual labels:  regex, regexp, regular-expression
Regex Dos
👮 👊 RegEx Denial of Service (ReDos) Scanner
Stars: ✭ 143 (+217.78%)
Mutual labels:  regex, regexp, regular-expression
Regex
An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.
Stars: ✭ 2,125 (+4622.22%)
Mutual labels:  regex, regexp, regular-expressions
regex-comuns
Um estudo de regex comuns
Stars: ✭ 15 (-66.67%)
Mutual labels:  regex, regular-expression, regex-pattern
regexp-expand
Show the ELisp regular expression at point in rx form.
Stars: ✭ 18 (-60%)
Mutual labels:  regex, regexp, regular-expression
python-hyperscan
A CPython extension for the Hyperscan regular expression matching library.
Stars: ✭ 112 (+148.89%)
Mutual labels:  regex, regexp, regular-expressions
url-regex-safe
Regular expression matching for URL's. Maintained, safe, and browser-friendly version of url-regex. Resolves CVE-2020-7661 for Node.js servers.
Stars: ✭ 59 (+31.11%)
Mutual labels:  regex, regexp, regular-expressions
Regaxor
A regular expression fuzzer.
Stars: ✭ 35 (-22.22%)
Mutual labels:  regex, regexp, regular-expression
globrex
Glob to regular expression with support for extended globs.
Stars: ✭ 52 (+15.56%)
Mutual labels:  regex, regexp, regular-expression
Proposal Regexp Unicode Property Escapes
Proposal to add Unicode property escapes `\p{…}` and `\P{…}` to regular expressions in ECMAScript.
Stars: ✭ 112 (+148.89%)
Mutual labels:  regex, regexp, regular-expressions
Orchestra
One language to be RegExp's Successor. Visually readable and rich, technically safe and extended, naturally scalable, advanced, and optimized
Stars: ✭ 103 (+128.89%)
Mutual labels:  regex, regexp, regular-expression
Youtube Regex
Best YouTube Video ID regex. Online: https://regex101.com/r/rN1qR5/2 and http://regexr.com/3anm9
Stars: ✭ 87 (+93.33%)
Mutual labels:  regex, regexp, regular-expressions
Hyperscan Java
Match tens of thousands of regular expressions within milliseconds - Java bindings for Intel's hyperscan 5
Stars: ✭ 66 (+46.67%)
Mutual labels:  regex, regexp, regular-expression
Regexr
For composing regular expressions without the need for double-escaping inside strings.
Stars: ✭ 53 (+17.78%)
Mutual labels:  regex, regexp, regular-expression
Emoji Regex
A regular expression to match all Emoji-only symbols as per the Unicode Standard.
Stars: ✭ 1,134 (+2420%)
Mutual labels:  regex, regexp, regular-expression
expand-brackets
Expand POSIX bracket expressions (character classes) in glob patterns.
Stars: ✭ 26 (-42.22%)
Mutual labels:  regex, regexp, regular-expression
cregex
A small implementation of regular expression matching engine in C
Stars: ✭ 72 (+60%)
Mutual labels:  regex, regexp, regular-expression

Regex: generate matching and non-matching strings

This is a java library that, given a regex pattern, allows to:

  1. Generate matching strings
  2. Iterate through unique matching strings
  3. Generate not matching strings

Table of contents

Status
Try it now
Usage
Supported Syntax
Configuration
Limitations
Other similar libraries
Support

Status

License Maven Central javadoc

Build status:

Latest Release Latest snapshot
Build Status Build Status
codecov codecov

Try it now!!!

Follow the link to Online IDE with created project: JDoodle. Enter your pattern and see the results.

Usage

Maven dependency

The Latest RELEASE:

<dependency>
    <groupId>com.github.curious-odd-man</groupId>
    <artifactId>rgxgen</artifactId>
    <version>1.3</version>
</dependency>

The Latest SNAPSHOT:

<project>
    <repositories>
        <repository>
            <id>snapshots-repository</id>
            <url>https://oss.sonatype.org/content/repositories/snapshots/</url>
        </repository>
    </repositories>
    
    <!--  .... -->
    
    <dependency>
        <groupId>com.github.curious-odd-man</groupId>
        <artifactId>rgxgen</artifactId>
        <version>1.4-SNAPSHOT</version>
    </dependency>
</project>

Changes in snapshot:

  • Docs: Improved javadoc for StringIterator class. #59
  • Fixed: Incorrect unique values generation for pattern a?b|c #61

Code:

public class Main {
    public static void main(String[] args){
        RgxGen rgxGen = new RgxGen("[^0-9]*[12]?[0-9]{1,2}[^0-9]*");         // Create generator
        String s = rgxGen.generate();                                        // Generate new random value
        Optional<BigInteger> estimation = rgxGen.getUniqueEstimation();      // The estimation (not accurate, see Limitations) how much unique values can be generated with that pattern.
        StringIterator uniqueStrings = rgxGen.iterateUnique();               // Iterate over unique values (not accurate, see Limitations)
        String notMatching = rgxGen.generateNotMatching();                   // Generate not matching string
    }
}
public class Main {
    public static void main(String[] args){
        RgxGen rgxGen = new RgxGen("[^0-9]*[12]?[0-9]{1,2}[^0-9]*");         // Create generator
        Random rnd = new Random(1234);
        String s = rgxGen.generate(rnd);                                     // Generate first value
        String s1 = rgxGen.generate(rnd);                                    // Generate second value
        String s2 = rgxGen.generate(rnd);                                    // Generate third value
        String notMatching = rgxGen.generateNotMatching(rnd);                // Generate not matching string
        // On each launch s, s1 and s2 will be the same
    }
}

Supported syntax

Supported syntax
Pattern Description
. Any symbol
? One or zero occurrences
+ One or more occurrences
* Zero or more occurrences
\r Carriage return CR character
\t Tab character
\n Line feed LF character.
\d A digit. Equivalent to [0-9]
\D Not a digit. Equivalent to [^0-9]
\s Carriage Return, Space, Tab, Newline, Vertical Tab, Form Feed
\S Anything, but Carriage Return, Space, Tab, Newline, Vertical Tab, Form Feed
\w Any word character. Equivalent to [a-zA-Z0-9_]
\W Anything but a word character. Equivalent to [^a-zA-Z0-9_]
\i Places same value as capture group with index i. i is any integer number.
\Q and \E Any characters between \Q and \E, including metacharacters, will be treated as literals.
\b and \B These characters are ignored. No validation is performed!
\xXX and \x{XXXX} Hexadecimal value of unicode characters 2 or 4 digits
{a} and {a,b} Repeat a; or min a max b times. Use {n,} to repeat at least n times.
[...] Single character from ones that are inside brackets. [a-zA-Z] (dash) also supported
[^...] Single character except the ones in brackets. [^a] - any symbol except 'a'
() To group multiple characters for the repetitions
foo(?=bar) and (?<=foo)bar Positive lookahead and lookbehind. These are equivalent to foobar
foo(?!bar) and (?<!foo)bar Negative lookahead and lookbehind.
(a|b) Alternatives
\ Escape character (use \\ (double backslash) to generate single \ character)

RgxGen treats any other characters as literals - those are generated as is.

Configuration

RgxGen can be configured on global or instance level.

Please refer to the following enum for all available properties: com.github.curiousoddman.rgxgen.config.RgxGenOption.

Each property value will be looked up in this order:

  1. Local RgxGen instance config
  2. Global RgxGen config
  3. Default values

Create Configuration

Use new RgxGenProperties() to create properties object. RgxGenProperties extends java.util.Properties and can be used in all the same ways.

Code
public class Main {
    public static void main(String[] args){
        // Create properties object (RgxGenProperties extends java.util.Properties)
        RgxGenProperties properties = new RgxGenProperties();
        // Set value "20" for INFINITE_PATTERN_REPETITION option in properties
        RgxGenOption.INFINITE_PATTERN_REPETITION.setInProperties(properties, 20);
        // ... now properties can be passed to RgxGen
    }
}

Set Global Configuration

Set a global configuration using RgxGen.setDefaultProperties(properties);

Code
public class Main {
    public static void main(String[] args){
        RgxGenProperties properties = createAndConfigureProperitesObject();
    
        RgxGen rgxGen_1 = new RgxGen("xxx");        // Created for example purposes
        // Set default properties. 
        // NOTE! only instances created after setDefaultProperties are affected.
        // e.g. rgxGen_1 will have default value of INFINITE_PATTERN_REPETITION option
        // and rgxGen_2 will have value "20" for the property, unless local config specified.
        RgxGen.setDefaultProperties(properties);
        RgxGen rgxGen_2 = new RgxGen("xxx");
    }
}

Set Local Configuration

Set a local configuration using rgxGen.setProperties(localProperties); on existing RgxGen instance.

Code
public class Main {
    public static void main(String[] args){
        RgxGenProperties properties = createAndConfigureProperitesObject();
        RgxGen.setDefaultProperties(properties);
    
        // Create properties object (RgxGenProperties extends java.util.Properties)
        RgxGenProperties localProperties = createAndConfigureLocalProperitesObject();
        RgxGen rgxGen_3 = new RgxGen("xxx"); 
        // Set local configuration for rgxGen_3
        // Note, for options that are not defined in localProperties, will try find option inside properties, since these are set globally prior creation of rgxGen_3 instance creation 
        rgxGen_3.setProperties(localProperties);
    }
}

Limitations

Lookahead and Lookbehind

Currently these two have very limited support. Please refer to #63. I'm currently working on the solution, but I cannot say when I come up with something.

Estimation

rgxGen.getUniqueEstimation() - might not be accurate, because it does not count actual unique values, but only counts different states of each building block of the expression. For example: "(a{0,2}|b{0,2})" will be estimated as 6, though actual number of unique values is 5. That is because left and right alternative can produce same value. At the same time "(|(a{1,2}|b{1,2}))" will be correctly estimated to 5, though it will generate same values.

Uniqueness

For the similar reasons as with estimations - requested unique values iterator can contain duplicates.

Infinite patterns

By design a+, a* and a{n,} patterns in regex imply infinite number of characters should be matched. When generating data, that would mean values of infinite length might be generated. It is highly doubtful anyone would require a string of infinite length, thus I've artificially limited repetitions in such patterns to 100 symbols, when generating random values. This value can be changed - please refer to configuration section.

On the contrast, when generating unique values - the number of maximum repetitions is Integer.MAX_VALUE.

Use a{n,m} if you require some specific number of repetitions. It is suggested to avoid using such infinite patterns to generate data based on regex.

Not matching values generation

The general rule is - I am trying to generate not matching strings of same length as would be matching strings, though it is not always possible. For example pattern . - any symbol - would yield empty string as not matching string. Another example a{0,2} - this pattern could yield empty string, but for not matching string the resulting strings would be only 1 or 2 symbols long. I chose these approaches because they seem predictable and easier to implement.

Other tools to generate values by regex and why this might be better

There are 2 more libraries available to achieve same goal:

  1. https://github.com/mifmif/Generex
  2. http://code.google.com/p/xeger

Though I found they have following issues:

  1. All of them build graph which can easily produce OOM exception. For example pattern a{60000}, or IPV6 regex pattern.
  2. Alternatives - only 2 alternatives gives equal probability of each alternative to appear in generated values. For example: (a|b) the probability of a and b is equal. For (a|b|c) it would be expected to have a or b or c with probability 33.(3)% each. Though really the probabilities are a=50%, and b=25% and c=25% each. For longer alternatives you might never get the last alternative.
  3. They are quite slow

Support

I plan to support this library, so you're welcome to open issues or reach me by e-mail in case of any questions. Any suggestions, feature requests or bug reports are welcome!

Please vote up my answer on StackOverflow to help others find this library.

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].