This chapter shows you how to recursively rewrite a REL expression, and how to use Flavor
s to express your regex on other flavors/languages than Scala/Java.
An advantage to having a manipulable expression tree, other than reusing components, is that you can transform them as you please.
REL offers a way to do such manipulation quite simply using Scala’s powerful pattern matching. By passing a Rewriter
to a RE
object’s map
method, you can recursively rewrite this object’s subtree. A Rewriter
is actually a PartialFunction[RE, RE]
.
For example, we have a regex matching and capturing a UUID in its canonical form (lowercase hexadecimal, 8-4-4-4-12 digits). It is then used in a more complex expression as a capturing group.
val s = RE("-")
val h = RE("[0-9a-f]")
val uuid = h{8} - s - h{4} - s - h{4} - s - h{4} - s - h{12}
val complexExpression = /* … */ a ~ (uuid \ "uuid1") ~
b ~ (uuid \ "uuid2") ~ c /* … */
Say we want to match a complexExpression
elsewhere, without capturing the uuid. We can just transform capturing our capturing "uuid"
groups into non-capturing groups:
val toOther: Rewriter = {
case Group(_, uuid, _) => uuid.ncg
}
val other = complexExpression map toOther
Now, say we want uppercase hexadecimal in this expression, h
is being also used in other places than uuid
. We can complete our Rewriter
:
val H = RE("[0-9A-F]")
val toOther: Rewriter = {
case `h` => H
case Group(_, uuid, _) => uuid.ncg
}
val other = complexExpression map toOther
Other languages and tools have other regex flavors, with (sometimes subtle) differences in implementation and additional or lacking features (with respect to Java’s regex flavor). If we want to use our regexes in other flavors, we can apply some transformation to obtain compatible regexes (up to a point, the limit being unimplemented, unreplicable features).
\w
should match all letters, including diacritics (accented letters). Thus, DotNETFlavor
will transform \w
s (when used with μ
/Word
) into [a-zA-Z0-9_]
to avoid unwanted surprises.
DotNETFlavor
therefore changes a++
into the equivalent expression (?>a+)
.
JavaScriptFlavor
mimics a++
(or (?>a+)
) with (?=(a+))\1
. It is a stretch, since it add a possibly undesired capturing group, but it’s still better than no support.
JavaScriptFlavor
will throw an IllegalArgumentException
when you try to convert an expression containing a look-behind.
DotNETFlavor
(as well as the Java7Flavor
) inlines the group names for capture ((?<name>expr)
) and reference (\k<name>
).
Flavor
s expose two main methods: .express(re: RE)
and .translate(re: RE)
. The first one returns a Tuple2[String, List[String]]
, whose first element is the translated regex string and whose second is a list of the group names (in order of appearance) allowing you to perform a mapping to capturing group indexes (like Scala does) if needed. The second method only performs the translation of a RE
term into another.
The following flavors are bundled with REL:
For example, to express a regex in the .NET regex flavor:
val myRegex = ^^ - (α.++ \ "firstWord")
DotNETFlavor.translate(myRegex) // approximately* ^^ - (?>(α.+) \ "firstWord")
DotNETFlavor.express(myRegex)._1 === "\A(?<firstWord>(?>[a-zA-Z]+))"
DotNETFlavor.express(myRegex)._2.toString === "List(firstWord)"
* approximately because the named capturing group will also have an inline naming strategy (for which there is no short DSL syntax, thus skipped here for the sake of simplicity)
But Flavors are not limited to other regex implementations. You can define your own for various uses, e.g.:
RE
tree