Replace multiple instances of all Unicode whitespaces by a single space.
Split CamelCase words.
Pseudo ASCII folding, remove diacritical marks (and some common variants and ligatures) on characters.
Normalize frequent Unicode double quotes to ASCII quotation mark U+0022
/ "
.
Normalize CJK Fullwidth characters to their ASCII equivalents.
No-op cleaner
Normalize all Unicode line breaks and vertical tabs to ASCII new line U+000A
/ \n
.
Transform text in lowercase
Combines SingleQuoteNormalizer and DoubleQuoteNormalizer
Normalize frequent Unicode single quotes / apostrophes to ASCII apostrophe U+0027
/ '
.
Trim text
Replace multiple instances of regular whitespaces \s+
by a single space.
Normalize all Unicode spaces and horizontal tabs to ASCII spaces U+0020
.