Regex to Extract Emoji from Text — Unicode Property Escapes

Regex to detect and extract emoji using Unicode property escapes \p{Extended_Pictographic}. Handles ZWJ sequences, skin tones, and flag emoji.

Extraction

Detailed Explanation

Extracting Emoji from Text

Emoji detection looks deceptively simple but requires Unicode property escapes to handle multi-codepoint sequences correctly. Modern JavaScript (ES2018+) supports \p{Extended_Pictographic} with the u flag.

Single Emoji

\p{Extended_Pictographic}

(Use the u flag.) Matches single-codepoint emoji like 😀 and 🎉.

Including Skin-Tone Modifiers and ZWJ Sequences

Emoji like 👩‍💻 (woman technologist) are sequences joined by zero-width joiners (\u200D). To capture them as one match:

\p{Extended_Pictographic}(?:\u200D\p{Extended_Pictographic})*

For full coverage including emoji modifiers (skin tones), variation selectors, and regional indicators (flags):

(?:\p{Extended_Pictographic}(?:\uFE0F|\u20E3)?\p{Emoji_Modifier}?(?:\u200D\p{Extended_Pictographic}(?:\uFE0F|\u20E3)?\p{Emoji_Modifier}?)*)

Tested Examples

Input Single Sequence-Aware
"hi 😀" 😀 😀
"👩‍💻 is a developer" 👩, 💻 👩‍💻
"🇯🇵 Japan" 🇯, 🇵 🇯🇵
"thumbs up: 👍🏽" 👍, 🏽 👍🏽

Counting Visible Emoji

JavaScript’s text.length returns UTF-16 units, not characters. Use a sequence-aware regex to count emoji:

const count = [...text.matchAll(
  /\p{Extended_Pictographic}(?:\u200D\p{Extended_Pictographic})*/gu
)].length;

Stripping Emoji

text.replace(/\p{Extended_Pictographic}/gu, "")

Practical Recommendation

For accurate emoji handling in production (segmentation, normalization), use Intl.Segmenter with granularity grapheme. Reserve the regex approach for scanning, not analysis.

Use Case

Detecting emoji-only messages in chat support to escalate them appropriately, stripping emoji for legacy systems that mishandle wide characters, or counting emoji frequency in user reviews for sentiment analysis.

Try It — Regex Cheat Sheet

Open full tool