Tree rewriting & Flavors

This chapter shows you how to recursively rewrite a REL expression, and how to use Flavors to express your regex on other flavors/languages than Scala/Java.

Subtree rewriting

An advantage to having a manipulable expression tree, other than reusing components, is that you can transform them as you please.

REL offers a way to do such manipulation quite simply using Scala’s powerful pattern matching. By passing a Rewriter to a RE object’s map method, you can recursively rewrite this object’s subtree. A Rewriter is actually a PartialFunction[RE, RE].

For example, we have a regex matching and capturing a UUID in its canonical form (lowercase hexadecimal, 8-4-4-4-12 digits). It is then used in a more complex expression as a capturing group.

val s = RE("-")
val h = RE("[0-9a-f]")
val uuid = h{8} - s - h{4} - s - h{4} - s - h{4} - s - h{12}
val complexExpression = /* … */ a ~ (uuid \ "uuid1") ~
    b ~ (uuid \ "uuid2") ~ c /* … */

Say we want to match a complexExpression elsewhere, without capturing the uuid. We can just transform capturing our capturing "uuid" groups into non-capturing groups:

val toOther: Rewriter = {
  case Group(_, uuid, _) => uuid.ncg
}
val other = complexExpression map toOther

Now, say we want uppercase hexadecimal in this expression, h is being also used in other places than uuid. We can complete our Rewriter:

val H = RE("[0-9A-F]")
val toOther: Rewriter = {
  case `h` => H
  case Group(_, uuid, _) => uuid.ncg
}
val other = complexExpression map toOther

Flavors

Other languages and tools have other regex flavors, with (sometimes subtle) differences in implementation and additional or lacking features (with respect to Java’s regex flavor). If we want to use our regexes in other flavors, we can apply some transformation to obtain compatible regexes (up to a point, the limit being unimplemented, unreplicable features).

Flavors expose two main methods: .express(re: RE) and .translate(re: RE). The first one returns a Tuple2[String, List[String]], whose first element is the translated regex string and whose second is a list of the group names (in order of appearance) allowing you to perform a mapping to capturing group indexes (like Scala does) if needed. The second method only performs the translation of a RE term into another.

The following flavors are bundled with REL:

For example, to express a regex in the .NET regex flavor:

val myRegex = ^^ - (α.++ \ "firstWord")
DotNETFlavor.translate(myRegex) // approximately* ^^ - (?>(α.+) \ "firstWord")
DotNETFlavor.express(myRegex)._1 === "\A(?<firstWord>(?>[a-zA-Z]+))"
DotNETFlavor.express(myRegex)._2.toString === "List(firstWord)"

* approximately because the named capturing group will also have an inline naming strategy (for which there is no short DSL syntax, thus skipped here for the sake of simplicity)

But Flavors are not limited to other regex implementations. You can define your own for various uses, e.g.: