Regex to Extract Emoji from Text — Unicode Property Escapes
Regex to detect and extract emoji using Unicode property escapes \p{Extended_Pictographic}. Handles ZWJ sequences, skin tones, and flag emoji.
Detailed Explanation
Extracting Emoji from Text
Emoji detection looks deceptively simple but requires Unicode property escapes to handle multi-codepoint sequences correctly. Modern JavaScript (ES2018+) supports \p{Extended_Pictographic} with the u flag.
Single Emoji
\p{Extended_Pictographic}
(Use the u flag.) Matches single-codepoint emoji like 😀 and 🎉.
Including Skin-Tone Modifiers and ZWJ Sequences
Emoji like 👩💻 (woman technologist) are sequences joined by zero-width joiners (\u200D). To capture them as one match:
\p{Extended_Pictographic}(?:\u200D\p{Extended_Pictographic})*
For full coverage including emoji modifiers (skin tones), variation selectors, and regional indicators (flags):
(?:\p{Extended_Pictographic}(?:\uFE0F|\u20E3)?\p{Emoji_Modifier}?(?:\u200D\p{Extended_Pictographic}(?:\uFE0F|\u20E3)?\p{Emoji_Modifier}?)*)
Tested Examples
| Input | Single | Sequence-Aware |
|---|---|---|
"hi 😀" |
😀 | 😀 |
"👩💻 is a developer" |
👩, 💻 | 👩💻 |
"🇯🇵 Japan" |
🇯, 🇵 | 🇯🇵 |
"thumbs up: 👍🏽" |
👍, 🏽 | 👍🏽 |
Counting Visible Emoji
JavaScript’s text.length returns UTF-16 units, not characters. Use a sequence-aware regex to count emoji:
const count = [...text.matchAll(
/\p{Extended_Pictographic}(?:\u200D\p{Extended_Pictographic})*/gu
)].length;
Stripping Emoji
text.replace(/\p{Extended_Pictographic}/gu, "")
Practical Recommendation
For accurate emoji handling in production (segmentation, normalization), use Intl.Segmenter with granularity grapheme. Reserve the regex approach for scanning, not analysis.
Use Case
Detecting emoji-only messages in chat support to escalate them appropriately, stripping emoji for legacy systems that mishandle wide characters, or counting emoji frequency in user reviews for sentiment analysis.