All Projects → alexflint → Go Restructure

alexflint / Go Restructure

Licence: mit
Match regular expressions into struct fields

Programming Languages

go
31211 projects - #10 most used programming language
golang
3204 projects

Projects that are alternatives of or similar to Go Restructure

Git Hound
Git plugin that prevents sensitive data from being committed.
Stars: ✭ 269 (-52.81%)
Mutual labels:  regular-expression
Commit Watcher
Find interesting and potentially hazardous commits in git projects
Stars: ✭ 345 (-39.47%)
Mutual labels:  regular-expression
Hae
HaE - BurpSuite Highlighter and Extractor
Stars: ✭ 397 (-30.35%)
Mutual labels:  regular-expression
Repren
Rename anything
Stars: ✭ 275 (-51.75%)
Mutual labels:  regular-expression
Generex
A Java library for generating String from a regular expression.
Stars: ✭ 316 (-44.56%)
Mutual labels:  regular-expression
Subconverter
Utility to convert between various subscription format
Stars: ✭ 4,912 (+761.75%)
Mutual labels:  regular-expression
mattermost-plugin-autolink
Automatically rewrite text matching a regular expression into a markdown link.
Stars: ✭ 100 (-82.46%)
Mutual labels:  regular-expression
Regulex
🚧 Regular Expression Excited!
Stars: ✭ 4,877 (+755.61%)
Mutual labels:  regular-expression
Lexmachine
Lex machinary for go.
Stars: ✭ 335 (-41.23%)
Mutual labels:  regular-expression
Stringr
A fresh approach to string manipulation in R
Stars: ✭ 397 (-30.35%)
Mutual labels:  regular-expression
Rex
Your RegEx companion.
Stars: ✭ 283 (-50.35%)
Mutual labels:  regular-expression
Regex
Regular expressions for swift
Stars: ✭ 306 (-46.32%)
Mutual labels:  regular-expression
Regexp2
A full-featured regex engine in pure Go based on the .NET engine
Stars: ✭ 389 (-31.75%)
Mutual labels:  regular-expression
Re Flex
The regex-centric, fast lexical analyzer generator for C++ with full Unicode support. Faster than Flex. Accepts Flex specifications. Generates reusable source code that is easy to understand. Introduces indent/dedent anchors, lazy quantifiers, functions for lex/syntax error reporting, and more. Seamlessly integrates with Bison and other parsers.
Stars: ✭ 274 (-51.93%)
Mutual labels:  regular-expression
Regexplain
🔍 An RStudio addin slash regex utility belt
Stars: ✭ 413 (-27.54%)
Mutual labels:  regular-expression
pcre-net
PCRE.NET - Perl Compatible Regular Expressions for .NET
Stars: ✭ 114 (-80%)
Mutual labels:  regular-expression
Minta
✳️  Electron app for generating regular expressions
Stars: ✭ 353 (-38.07%)
Mutual labels:  regular-expression
Onigmo
Onigmo is a regular expressions library forked from Oniguruma.
Stars: ✭ 536 (-5.96%)
Mutual labels:  regular-expression
Chinamobilephonenumberregex
Regular expressions that match the mobile phone number in mainland China. / 一组匹配中国大陆手机号码的正则表达式。
Stars: ✭ 4,440 (+678.95%)
Mutual labels:  regular-expression
Picomatch
Blazing fast and accurate glob matcher written JavaScript, with no dependencies and full support for standard and extended Bash glob features, including braces, extglobs, POSIX brackets, and regular expressions.
Stars: ✭ 393 (-31.05%)
Mutual labels:  regular-expression

GoDoc Build Status

Match regular expressions into struct fields

go get github.com/alexflint/go-restructure

This package allows you to express regular expressions by defining a struct, and then capture matched sub-expressions into struct fields. Here is a very simple email address parser:

import "github.com/alexflint/go-restructure"

type EmailAddress struct {
	_    struct{} `^`
	User string   `\w+`
	_    struct{} `@`
	Host string   `[^@]+`
	_    struct{} `$`
}

func main() {
	var addr EmailAddress
	restructure.Find(&addr, "[email protected]")
	fmt.Println(addr.User) // prints "joe"
	fmt.Println(addr.Host) // prints "example.com"
}

(Note that the above is far too simplistic to be used as a serious email address validator.)

The regular expression that was executed was the concatenation of the struct tags:

^(\w+)@([^@]+)$

The first submatch was inserted into the User field and the second into the Host field.

You may also use the regexp: tag key, but keep in mind that you must escape quotes and backslashes:

type EmailAddress struct {
	_    string `regexp:"^"`
	User string `regexp:"\\w+"`
	_    string `regexp:"@"`
	Host string `regexp:"[^@]+"`
	_    string `regexp:"$"`
}

Nested Structs

Here is a slightly more sophisticated email address parser that uses nested structs:

type Hostname struct {
	Domain string   `\w+`
	_      struct{} `\.`
	TLD    string   `\w+`
}

type EmailAddress struct {
	_    struct{} `^`
	User string   `[a-zA-Z0-9._%+-]+`
	_    struct{} `@`
	Host *Hostname
	_    struct{} `$`
}

func main() {
	var addr EmailAddress
	success, _ := restructure.Find(&addr, "[email protected]")
	if success {
		fmt.Println(addr.User)        // prints "joe"
		fmt.Println(addr.Host.Domain) // prints "example"
		fmt.Println(addr.Host.TLD)    // prints "com"
	}
}

Compare this to using the standard library regexp.FindStringSubmatchIndex directly:

func main() {
	content := "[email protected]"
	expr := regexp.MustCompile(`^([a-zA-Z0-9._%+-]+)@((\w+)\.(\w+))$`)
	indices := expr.FindStringSubmatchIndex(content)
	if len(indices) > 0 {
		userBegin, userEnd := indices[2], indices[3]
		var user string
		if userBegin != -1 && userEnd != -1 {
			user = content[userBegin:userEnd]
		}

		domainBegin, domainEnd := indices[6], indices[7]
		var domain string
		if domainBegin != -1 && domainEnd != -1 {
			domain = content[domainBegin:domainEnd]
		}

		tldBegin, tldEnd := indices[8], indices[9]
		var tld string
		if tldBegin != -1 && tldEnd != -1 {
			tld = content[tldBegin:tldEnd]
		}

		fmt.Println(user)   // prints "joe"
		fmt.Println(domain) // prints "example"
		fmt.Println(tld)    // prints "com"
	}
}

Optional fields

When nesting one struct within another, you can make the nested struct optional by marking it with ?. The following example parses floating point numbers with optional sign and exponent:

// Matches "123", "1.23", "1.23e-4", "-12.3E+5", ".123"
type Float struct {
	Sign     *Sign     `?`      // sign is optional
	Whole    string    `[0-9]*`
	Period   struct{}  `\.?`
	Frac     string    `[0-9]+`
	Exponent *Exponent `?`      // exponent is optional
}

// Matches "e+4", "E6", "e-03"
type Exponent struct {
	_    struct{} `[eE]`
	Sign *Sign    `?`         // sign is optional
	Num  string   `[0-9]+`
}

// Matches "+" or "-"
type Sign struct {
	Ch string `[+-]`
}

When an optional sub-struct is not matched, it will be set to nil:

"1.23" -> {
  "Sign": nil,
  "Whole": "1",
  "Frac": "23",
  "Exponent": nil
}

"1.23e+45" -> {
  "Sign": nil,
  "Whole": "1",
  "Frac": "23",
  "Exponent": {
    "Sign": {
      "Ch": "+"
    },
    "Num": "45"
  }
}

Finding multiple matches

The following example uses Regexp.FindAll to extract all floating point numbers from a string, using the same Float struct as in the example above.

src := "There are 10.4 cats for every 100 dogs in the United States."
floatRegexp := restructure.MustCompile(Float{}, restructure.Options{})
var floats []Float
floatRegexp.FindAll(&floats, src, -1)

To limit the number of matches set the third parameter to a positive number.

Getting begin and end positions for submatches

To get the begin and end position of submatches, use the restructure.Submatch struct in place of string:

Here is an example of matching python imports such as import foo as bar:

type Import struct {
	_       struct{}             `^import\s+`
	Package restructure.Submatch `\w+`
	_       struct{}             `\s+as\s+`
	Alias   restructure.Submatch `\w+`
}

var importRegexp = restructure.MustCompile(Import{}, restructure.Options{})

func main() {
	var imp Import
	importRegexp.Find(&imp, "import foo as bar")
	fmt.Printf("IMPORT %s (bytes %d...%d)\n", imp.Package.String(), imp.Package.Begin, imp.Package.End)
	fmt.Printf("    AS %s (bytes %d...%d)\n", imp.Alias.String(), imp.Alias.Begin, imp.Alias.End)
}

Output:

IMPORT foo (bytes 7...10)
    AS bar (bytes 14...17)

Regular expressions inside JSON

To run a regular expression as part of a json unmarshal, just implement the JSONUnmarshaler interface. Here is an example that parses the following JSON string containing a quaternion:

{
	"Var": "foo",
	"Val": "1+2i+3j+4k"
}

First we define the expressions for matching quaternions in the form 1+2i+3j+4k:

// Matches "1", "-12", "+12"
type RealPart struct {
	Sign string `regexp:"[+-]?"`
	Real string `regexp:"[0-9]+"`
}

// Matches "+123", "-1"
type SignedInt struct {
	Sign string `regexp:"[+-]"`
	Real string `regexp:"[0-9]+"`
}

// Matches "+12i", "-123i"
type IPart struct {
	Magnitude SignedInt
	_         struct{} `regexp:"i"`
}

// Matches "+12j", "-123j"
type JPart struct {
	Magnitude SignedInt
	_         struct{} `regexp:"j"`
}

// Matches "+12k", "-123k"
type KPart struct {
	Magnitude SignedInt
	_         struct{} `regexp:"k"`
}

// matches "1+2i+3j+4k", "-1+2k", "-1", etc
type Quaternion struct {
	Real *RealPart
	I    *IPart `regexp:"?"`
	J    *JPart `regexp:"?"`
	K    *KPart `regexp:"?"`
}

// matches the quoted strings `"-1+2i"`, `"3-4i"`, `"12+34i"`, etc
type QuotedQuaternion struct {
	_          struct{} `regexp:"^"`
	_          struct{} `regexp:"\""`
	Quaternion *Quaternion
	_          struct{} `regexp:"\""`
	_          struct{} `regexp:"$"`
}

Next we implement UnmarshalJSON for the QuotedQuaternion type:

var quaternionRegexp = restructure.MustCompile(QuotedQuaternion{}, restructure.Options{})

func (c *QuotedQuaternion) UnmarshalJSON(b []byte) error {
	if !quaternionRegexp.Find(c, string(b)) {
		return fmt.Errorf("%s is not a quaternion", string(b))
	}
	return nil
}

Now we can define a struct and unmarshal JSON into it:

type Var struct {
	Name  string
	Value *QuotedQuaternion
}

func main() {
	src := `{"name": "foo", "value": "1+2i+3j+4k"}`
	var v Var
	json.Unmarshal([]byte(src), &v)
}

The result is:

{
  "Name": "foo",
  "Value": {
    "Quaternion": {
      "Real": {
        "Sign": "",
        "Real": "1"
      },
      "I": {
        "Magnitude": {
          "Sign": "+",
          "Real": "2"
        }
      },
      "J": {
        "Magnitude": {
          "Sign": "+",
          "Real": "3"
        }
      },
      "K": {
        "Magnitude": {
          "Sign": "+",
          "Real": "4"
        }
      }
    }
  }
}

Index of examples

Benchmarks

See benchmarks document

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].