Unicode Property Escapes — \p{Letter}, \p{Script=Hiragana}
Use Unicode property escapes (\p{Letter}, \p{Script=...}, \p{Emoji}) to match by Unicode category, script, or property. Requires the u flag in JavaScript.
Detailed Explanation
Unicode Property Escapes
Unicode property escapes (ES2018) let you match characters by Unicode metadata rather than enumerating code points. They require the u flag.
Basic Syntax
\p{PropertyName} // matches characters with this property
\P{PropertyName} // matches characters WITHOUT this property
\p{Property=Value} // for properties that take values
Common Properties
| Escape | Matches |
|---|---|
\p{Letter} or \p{L} |
Any letter (any script) |
\p{Number} or \p{N} |
Any digit or numeral |
\p{White_Space} |
Any whitespace |
\p{Punctuation} |
Any punctuation |
\p{Emoji} |
Any emoji code point |
\p{Extended_Pictographic} |
Emoji-like (broader than \p{Emoji}) |
\p{Lowercase} |
Lowercase letters |
\p{Uppercase} |
Uppercase letters |
Script Property
\p{Script=Hiragana} // matches ひらがな
\p{Script=Katakana} // matches カタカナ
\p{Script=Han} // matches 漢字 (Chinese characters / Kanji)
\p{Script=Latin} // matches abc, é
\p{Script=Cyrillic} // matches Привет
Tested Examples
| Pattern | Input | Match |
|---|---|---|
\p{L}+ |
"hello world 123" |
hello, world |
\p{Script=Hiragana}+ |
"今日は" |
は |
\p{Script=Han}+ |
"今日は" |
今日 |
\p{Emoji}+ |
"hi 👋✨" |
👋✨ |
\P{ASCII}+ |
"hello 世界" |
世界 |
Validating Multilingual Names
A name field that should accept letters in any script:
^[\p{L}\p{M}\s'-]+$
\p{M} covers combining marks (accents and diacritics).
Splitting Mixed-Script Text
"hello世界こんにちは".match(/\p{L}+/gu)
// ["hello", "世界こんにちは"]
To separate scripts:
"hello世界".match(/\p{Script=Latin}+|\p{Script=Han}+/gu)
// ["hello", "世界"]
Browser Support
Supported in all modern engines. Node 10+, Chrome 64+, Safari 11.1+, Firefox 78+. For older targets, transpile with Babel or fall back to explicit character ranges.
Why It Matters
Without property escapes, internationalized validation requires hand-maintained character lists, which inevitably drift behind Unicode updates. Property escapes track the Unicode standard automatically.
Use Case
Validating user names in any script, splitting mixed-language text into runs of consistent script, detecting which writing systems appear in a document, or building emoji-aware tokenizers.