All Projects → gliwka → Hyperscan Java

gliwka / Hyperscan Java

Licence: bsd-3-clause
Match tens of thousands of regular expressions within milliseconds - Java bindings for Intel's hyperscan 5

Programming Languages

java
68154 projects - #9 most used programming language

Projects that are alternatives of or similar to Hyperscan Java

Regulex
🚧 Regular Expression Excited!
Stars: ✭ 4,877 (+7289.39%)
Mutual labels:  regex, regular-expression, regexp
regexp-expand
Show the ELisp regular expression at point in rx form.
Stars: ✭ 18 (-72.73%)
Mutual labels:  regex, regexp, regular-expression
Regaxor
A regular expression fuzzer.
Stars: ✭ 35 (-46.97%)
Mutual labels:  regex, regexp, regular-expression
Regex For Regular Folk
🔍💪 Regular Expressions for Regular Folk — A visual, example-based introduction to RegEx [BETA]
Stars: ✭ 242 (+266.67%)
Mutual labels:  regex, regular-expression, regexp
Commonregex
🍫 A collection of common regular expressions for Go
Stars: ✭ 733 (+1010.61%)
Mutual labels:  regex, regular-expression, regexp
moar
Deterministic Regular Expressions with Backreferences
Stars: ✭ 19 (-71.21%)
Mutual labels:  regex, regexp, regular-expression
Onigmo
Onigmo is a regular expressions library forked from Oniguruma.
Stars: ✭ 536 (+712.12%)
Mutual labels:  regex, regular-expression, regexp
Regex Dos
👮 👊 RegEx Denial of Service (ReDos) Scanner
Stars: ✭ 143 (+116.67%)
Mutual labels:  regex, regular-expression, regexp
Regexr
For composing regular expressions without the need for double-escaping inside strings.
Stars: ✭ 53 (-19.7%)
Mutual labels:  regex, regular-expression, regexp
globrex
Glob to regular expression with support for extended globs.
Stars: ✭ 52 (-21.21%)
Mutual labels:  regex, regexp, regular-expression
Picomatch
Blazing fast and accurate glob matcher written JavaScript, with no dependencies and full support for standard and extended Bash glob features, including braces, extglobs, POSIX brackets, and regular expressions.
Stars: ✭ 393 (+495.45%)
Mutual labels:  regex, regular-expression, regexp
Rex
Your RegEx companion.
Stars: ✭ 283 (+328.79%)
Mutual labels:  regex, regular-expression, regexp
Regexpu
A source code transpiler that enables the use of ES2015 Unicode regular expressions in ES5.
Stars: ✭ 201 (+204.55%)
Mutual labels:  regex, regular-expression, regexp
Regexp2
A full-featured regex engine in pure Go based on the .NET engine
Stars: ✭ 389 (+489.39%)
Mutual labels:  regex, regular-expression, regexp
Grex
A command-line tool and library for generating regular expressions from user-provided test cases
Stars: ✭ 4,847 (+7243.94%)
Mutual labels:  regex, regular-expression, regexp
cregex
A small implementation of regular expression matching engine in C
Stars: ✭ 72 (+9.09%)
Mutual labels:  regex, regexp, regular-expression
Orchestra
One language to be RegExp's Successor. Visually readable and rich, technically safe and extended, naturally scalable, advanced, and optimized
Stars: ✭ 103 (+56.06%)
Mutual labels:  regex, regular-expression, regexp
expand-brackets
Expand POSIX bracket expressions (character classes) in glob patterns.
Stars: ✭ 26 (-60.61%)
Mutual labels:  regex, regexp, regular-expression
RgxGen
Regex: generate matching and non matching strings based on regex pattern.
Stars: ✭ 45 (-31.82%)
Mutual labels:  regex, regexp, regular-expression
Emoji Regex
A regular expression to match all Emoji-only symbols as per the Unicode Standard.
Stars: ✭ 1,134 (+1618.18%)
Mutual labels:  regex, regular-expression, regexp

hyperscan-java

Maven Central example workflow name

hyperscan is a high-performance multiple regex matching library.

It uses hybrid automata techniques to allow simultaneous matching of large numbers (up to tens of thousands) of regular expressions and for the matching of regular expressions across streams of data.

This project is a third-party developed JNA based java wrapper for the hyperscan project to enable developers to integrate hyperscan in their java (JVM) based projects.

Add it to your project

This project is available on maven central.

Maven

<dependency>
    <groupId>com.gliwka.hyperscan</groupId>
    <artifactId>hyperscan</artifactId>
    <version>1.0.0</version>
</dependency>

Gradle

compile group: 'com.gliwka.hyperscan', name: 'hyperscan', version: '1.0.0'

sbt

libraryDependencies += "com.gliwka.hyperscan" %% "hyperscan" % "1.0.0"

Simple example

import com.gliwka.hyperscan.wrapper;

...

//we define a list containing all of our expressions
LinkedList<Expression> expressions = new LinkedList<Expression>();

//the first argument in the constructor is the regular pattern, the latter one is a expression flag
//make sure you read the original hyperscan documentation to learn more about flags
//or browse the ExpressionFlag.java in this repo.
expressions.add(new Expression("[0-9]{5}", EnumSet.of(ExpressionFlag.SOM_LEFTMOST)));
expressions.add(new Expression("Test", ExpressionFlag.CASELESS));


//we precompile the expression into a database.
//you can compile single expression instances or lists of expressions

//since we're interacting with native handles always use try-with-resources or call the close method after use
try(Database db = Database.compile(expressions)) {
    //initialize scanner - one scanner per thread!
    //same here, always use try-with-resources or call the close method after use
    try(Scanner scanner = new Scanner())
    {
        //allocate scratch space matching the passed database
        scanner.allocScratch(db);


        //provide the database and the input string
        //returns a list with matches
        //synchronized method, only one execution at a time (use more scanner instances for multithreading)
        List<Match> matches = scanner.scan(db, "12345 test string");

        //matches always contain the expression causing the match and the end position of the match
        //the start position and the matches string it self is only part of a matach if the
        //SOM_LEFTMOST is set (for more details refer to the original hyperscan documentation)
    }

    // Save the database to the file system for later use
    try(OutputStream out = new FileOutputStream("db")) {
        db.save(out);
    }

    // Later, load the database back in. This is useful for large databases that take a long time to compile.
    // You can compile them offline, save them to a file, and then quickly load them in at runtime.
    // The load has to happen on the same type of platform as the save.
    try (InputStream in = new FileInputStream("db");
         Database loadedDb = Database.load(in)) {
        // Use the loadedDb as before.
    }
}
catch (CompileErrorException ce) {
    //gets thrown during  compile in case something with the expression is wrong
    //you can retrieve the expression causing the exception like this:
    Expression failedExpression = ce.getFailedExpression();
}
catch(IOException ie) {
  //IO during serializing / deserializing failed
}

Limitations of hyperscan-java

hyperscan only supports a subset of regular expressions. Notable exceptions are for example backreferences and capture groups. Please read the hyperscan developer reference so you get a good unterstanding how hyperscan works and what the limitations are.

Native libraries

This wrapper ships with pre-compiled hyperscan binaries for windows, linux (glibc >=2.12) and osx for x86_64 CPUs. You can find the repository with the native libraries here

Documentation

The hyperscan developer reference explains hyperscan. The javadoc is located here.

Contributing

Feel free to raise issues or submit a pull request.

Credits

Shoutout to @eliaslevy, @krzysztofzienkiewicz and @swapnilnawale for all the great contributions.

Thanks to Intel for opensourcing hyperscan!

License

BSD 3-Clause License

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].