DSL Syntax

Some examples are noted DSL expression → resulting regex.
All assume:

import fr.splayce.rel._
import Implicits._
val a = RE("aa")
val b = RE("bb")

Operators

Binary operators

Operation	REL Syntax	RE Output
Alternative	`a \| b`	`aa\|bb`
Concatenation (protected)	`a ~ b`	`(?:aa)(?:bb)`
Concatenation (unprotected)	`a - b`	`aabb`

Generally speaking, you should start with protected concatenation. It is harder to read once serialized, but it is far safer from unwanted side-effects when reusing regex parts.

Quantifiers / repeaters

When used in the table below, the dot syntax a.? is recommended for clearer priority.

Quantifier	Greedy	Reluctant / Lazy	Possessive	Output for (greedy)
Option	`a.?`	`a.??`	`a.?+`	`(?:aa)?`
≥ 1	`a.+`	`a.+?`	`a.++`	`(?:aa)+`
≥ 0	`a.*`	`a.*?`	`a.*+`	`(?:aa)*`
At most	`a < 3`	`a.<?(3)`*	`a <+ 3`	`(?:aa){0,3}`
At least	`a > 3`	`a >? 3`	`a >+ 3`	`(?:aa){3,}`
In range	`a(1, 3)`, `a{1 to 3}` or `a{1 -> 3}`	`a(1, 3, Reluctant)`	`a(1, 3, Possessive)`	`(?:aa){1,3}`
Exactly	`a{3}` or `a(3)`	N/A	N/A	`(?:aa){3}`

* For reluctant at-most repeater, dotted form a.<?(3) is mandatory, standalone <? being syntactically significant in Scala (XMLSTART).

Look-around

	Prefixed form	Dotted form	Output
Look-ahead	`?=(a)`	`a.?=`	`(?=aa)`
Look-behind	`?<=(a)`	`a.?<=`	`(?<=aa)`
Negative look-ahead	`?!(a)`	`a.?!`	`(?!aa)`
Negative look-behind	`?<!(a)`	`a.?<!`	`(?<!aa)`

Grouping

Type	REL Syntax	Output
Named capturing	`a \ "group_a"`	`(aa)`.
Unnamed capturing *	`a.g`	`(aa)`
Back-reference	`g!`	`\1`**
Non-capturing	`a.ncg` or `a.%`	`(?:aa)`
Non-capturing, with flags	`a.ncg("i-d")` or `"i-d" ?: a`	`(?i-d:aa)`
Atomic	`a.ag`, `?>(a)` or `a.?>`	`(?>aa)`

* A unique group name is generated internally.
** Back-reference on most recent (i.e. rightmost previous) group g. val g = (a|b).g; g - a - !g → (aa|bb)aa\1

In a named capturing group, the name group_a will be passed to the Regex constructor, and queryable on corresponding Matches. If you export the regex to a flavor that supports inline embedding of capturing group names (like Java 7 or .NET), the name will be included in the output: (?<group_a>aa).

In non-capturing groups, REL tries not to uselessly wrap non-breaking entities — like single characters (a, \u00F0), character classes (\w, [^a-z], \p{Lu}), other groups — in order to produce ever-so-slightly less unreadable output. Non-capturing groups with flags are combined when nested, giving priority to innermost flags: a.ncg("-d").ncg("id") → (?i-d:aa).

Constants

A few “constants” (expression terms with no repetitions, capturing groups, or unprotected alternatives) are also predefined. Some of them have a UTF-8 Greek symbol alias for conciseness (import rel.Symbols._ to use them), uppercase for negation. You can add your own by instancing case class RECst(expr).

Object name	Symbol	Output / Matches
`Epsilon`	`ε`	Empty string
`Dot`	`τ`	`.`
`MLDot`	`ττ`	`[\s\S]` (will match any char, including line terminators, even when the `DOTALL` or `MULTILINE` modes are disabled)
`LineTerminator`	`Τ`*	`(?:\r\n?\|[\u000A-\u000C\u0085\u2028\u2029])` (line terminators, PCRE/Perl’s `\R`)
`AlphaLower`	none	`[a-z]`
`AlphaUpper`	none	`[A-Z]`
`Alpha`	`α`	`[a-zA-Z]`
`NotAlpha`	`Α`*	`[^a-zA-Z]`
`Letter`	`λ`	`\p{L}` (unicode letters, including diacritics)
`NotLetter`	`Λ`	`\P{L}`
`LetterLower`	none	`\p{Ll}`
`LetterUpper`	none	`\p{Lu}`
`Digit`	`δ`	`\d`
`NotDigit`	`Δ`	`\D`
`WhiteSpace`	`σ`	`\s`
`NotWhiteSpace`	`Σ`	`\S`
`Word`	`μ`	`\w` (`Alpha` or `_`)
`NotWord`	`Μ`*	`\W`
`WordBoundary`	`ß`	`\b`
`NotWordBoundary`	`Β`*	`\B`
`LineBegin`	`^`	`^`
`LineEnd`	`$`	`$`
`InputBegin`	`^^`	`\A`
`InputEnd`	`$$`	`\z`

* Those are uppercase α/ß/μ/τ, not latin A/B/M/T

Next Page ❧ Extractors

REL