Unicode Property Escapes — \p{Letter}, \p{Script=Hiragana}

Use Unicode property escapes (\p{Letter}, \p{Script=...}, \p{Emoji}) to match by Unicode category, script, or property. Requires the u flag in JavaScript.

Advanced Techniques

Detailed Explanation

Unicode Property Escapes

Unicode property escapes (ES2018) let you match characters by Unicode metadata rather than enumerating code points. They require the u flag.

Basic Syntax

\p{PropertyName}      // matches characters with this property
\P{PropertyName}      // matches characters WITHOUT this property
\p{Property=Value}    // for properties that take values

Common Properties

Escape Matches
\p{Letter} or \p{L} Any letter (any script)
\p{Number} or \p{N} Any digit or numeral
\p{White_Space} Any whitespace
\p{Punctuation} Any punctuation
\p{Emoji} Any emoji code point
\p{Extended_Pictographic} Emoji-like (broader than \p{Emoji})
\p{Lowercase} Lowercase letters
\p{Uppercase} Uppercase letters

Script Property

\p{Script=Hiragana}    // matches ひらがな
\p{Script=Katakana}    // matches カタカナ
\p{Script=Han}         // matches 漢字 (Chinese characters / Kanji)
\p{Script=Latin}       // matches abc, é
\p{Script=Cyrillic}    // matches Привет

Tested Examples

Pattern Input Match
\p{L}+ "hello world 123" hello, world
\p{Script=Hiragana}+ "今日は"
\p{Script=Han}+ "今日は" 今日
\p{Emoji}+ "hi 👋✨" 👋✨
\P{ASCII}+ "hello 世界" 世界

Validating Multilingual Names

A name field that should accept letters in any script:

^[\p{L}\p{M}\s'-]+$

\p{M} covers combining marks (accents and diacritics).

Splitting Mixed-Script Text

"hello世界こんにちは".match(/\p{L}+/gu)
// ["hello", "世界こんにちは"]

To separate scripts:

"hello世界".match(/\p{Script=Latin}+|\p{Script=Han}+/gu)
// ["hello", "世界"]

Browser Support

Supported in all modern engines. Node 10+, Chrome 64+, Safari 11.1+, Firefox 78+. For older targets, transpile with Babel or fall back to explicit character ranges.

Why It Matters

Without property escapes, internationalized validation requires hand-maintained character lists, which inevitably drift behind Unicode updates. Property escapes track the Unicode standard automatically.

Use Case

Validating user names in any script, splitting mixed-language text into runs of consistent script, detecting which writing systems appear in a document, or building emoji-aware tokenizers.

Try It — Regex Cheat Sheet

Open full tool