Some examples are noted DSL expression → resulting regex.
All assume:
import fr.splayce.rel._
import Implicits._
val a = RE("aa")
val b = RE("bb")
| Operation | REL Syntax | RE Output |
|---|---|---|
| Alternative | a | b |
aa|bb |
| Concatenation (protected) | a ~ b |
(?:aa)(?:bb) |
| Concatenation (unprotected) | a - b |
aabb |
Generally speaking, you should start with protected concatenation. It is harder to read once serialized, but it is far safer from unwanted side-effects when reusing regex parts.
When used in the table below, the dot syntax a.? is recommended for clearer priority.
| Quantifier | Greedy | Reluctant / Lazy | Possessive | Output for (greedy) |
|---|---|---|---|---|
| Option | a.? |
a.?? |
a.?+ |
(?:aa)? |
| ≥ 1 | a.+ |
a.+? |
a.++ |
(?:aa)+ |
| ≥ 0 | a.* |
a.*? |
a.*+ |
(?:aa)* |
| At most | a < 3 |
a.<?(3)* |
a <+ 3 |
(?:aa){0,3} |
| At least | a > 3 |
a >? 3 |
a >+ 3 |
(?:aa){3,} |
| In range | a(1, 3), a{1 to 3} or a{1 -> 3} |
a(1, 3, Reluctant) |
a(1, 3, Possessive) |
(?:aa){1,3} |
| Exactly | a{3} or a(3) |
N/A | N/A | (?:aa){3} |
* For reluctant at-most repeater, dotted form a.<?(3) is mandatory, standalone <? being syntactically significant in Scala (XMLSTART).
| Prefixed form | Dotted form | Output | |
|---|---|---|---|
| Look-ahead | ?=(a) |
a.?= |
(?=aa) |
| Look-behind | ?<=(a) |
a.?<= |
(?<=aa) |
| Negative look-ahead | ?!(a) |
a.?! |
(?!aa) |
| Negative look-behind | ?<!(a) |
a.?<! |
(?<!aa) |
| Type | REL Syntax | Output |
|---|---|---|
| Named capturing | a \ "group_a" |
(aa). |
| Unnamed capturing * | a.g |
(aa) |
| Back-reference | g! |
\1** |
| Non-capturing | a.ncg or a.% |
(?:aa) |
| Non-capturing, with flags | a.ncg("i-d") or "i-d" ?: a |
(?i-d:aa) |
| Atomic | a.ag, ?>(a) or a.?> |
(?>aa) |
* A unique group name is generated internally.
** Back-reference on most recent (i.e. rightmost previous) group g. val g = (a|b).g; g - a - !g → (aa|bb)aa\1
In a named capturing group, the name group_a will be passed to the Regex constructor, and queryable on corresponding Matches. If you export the regex to a flavor that supports inline embedding of capturing group names (like Java 7 or .NET), the name will be included in the output: (?<group_a>aa).
In non-capturing groups, REL tries not to uselessly wrap non-breaking entities — like single characters (a, \u00F0), character classes (\w, [^a-z], \p{Lu}), other groups — in order to produce ever-so-slightly less unreadable output. Non-capturing groups with flags are combined when nested, giving priority to innermost flags: a.ncg("-d").ncg("id") → (?i-d:aa).
A few “constants” (expression terms with no repetitions, capturing groups, or unprotected alternatives) are also predefined. Some of them have a UTF-8 Greek symbol alias for conciseness (import rel.Symbols._ to use them), uppercase for negation. You can add your own by instancing case class RECst(expr).
| Object name | Symbol | Output / Matches |
|---|---|---|
Epsilon |
ε |
Empty string |
Dot |
τ |
. |
MLDot |
ττ |
[\s\S] (will match any char, including line terminators, even when the DOTALL or MULTILINE modes are disabled) |
LineTerminator |
Τ* |
(?:\r\n?|[\u000A-\u000C\u0085\u2028\u2029]) (line terminators, PCRE/Perl’s \R) |
AlphaLower |
none | [a-z] |
AlphaUpper |
none | [A-Z] |
Alpha |
α |
[a-zA-Z] |
NotAlpha |
Α* |
[^a-zA-Z] |
Letter |
λ |
\p{L} (unicode letters, including diacritics) |
NotLetter |
Λ |
\P{L} |
LetterLower |
none | \p{Ll} |
LetterUpper |
none | \p{Lu} |
Digit |
δ |
\d |
NotDigit |
Δ |
\D |
WhiteSpace |
σ |
\s |
NotWhiteSpace |
Σ |
\S |
Word |
μ |
\w (Alpha or _) |
NotWord |
Μ* |
\W |
WordBoundary |
ß |
\b |
NotWordBoundary |
Β* |
\B |
LineBegin |
^ |
^ |
LineEnd |
$ |
$ |
InputBegin |
^^ |
\A |
InputEnd |
$$ |
\z |
* Those are uppercase α/ß/μ/τ, not latin A/B/M/T