Replace multiple instances of all Unicode whitespaces by a single space.
Split CamelCase words.
Pseudo ASCII folding, remove diacritical marks (and some common variants and ligatures) on characters.
Normalize frequent Unicode double quotes to ASCII quotation mark U+0022 / ".
Normalize CJK Fullwidth characters to their ASCII equivalents.
No-op cleaner
Normalize all Unicode line breaks and vertical tabs to ASCII new line U+000A / \n.
Transform text in lowercase
Combines SingleQuoteNormalizer and DoubleQuoteNormalizer
Normalize frequent Unicode single quotes / apostrophes to ASCII apostrophe U+0027 / '.
Trim text
Replace multiple instances of regular whitespaces \s+ by a single space.
Normalize all Unicode spaces and horizontal tabs to ASCII spaces U+0020.