Regex to Extract Hashtags from Text

Regex to extract #hashtags from social media posts, blog content, and notes. Supports ASCII, Unicode (Japanese, emoji), and underscore-allowed variants.

Extraction

Detailed Explanation

Extracting Hashtags

Hashtags appear in social posts, note-taking apps, and blog content. The basic shape is # followed by a sequence of letters and digits, but rules vary across platforms.

ASCII Hashtags

(?:^|\s)(#[A-Za-z0-9_]+)

The leading group ensures #tag is at the start of the text or after whitespace, so color#abc (a CSS color) is not matched.

Unicode Hashtags (Japanese, Emoji-Friendly)

(?:^|\s)(#[\p{L}\p{N}_\p{Extended_Pictographic}]+)

Requires the u flag. Matches #プログラミング and hashtags containing emoji.

Tested Examples

Input ASCII Unicode
"Loving #JavaScript today" #JavaScript #JavaScript
"Multiple #tags #here #and-here" #tags, #here, #and same
"#【news】 not a tag because of bracket"
"#プログラミング" #プログラミング
"order #1234" — (digits-only often excluded)

Reject Numeric-Only Tags

Some platforms ignore #1234. Add a lookahead requiring at least one letter:

(?:^|\s)(#(?=\w*[A-Za-z])[A-Za-z0-9_]+)

JavaScript Extraction

const tags = [...text.matchAll(/(?:^|\s)(#[\p{L}\p{N}_]+)/gu)]
  .map(m => m[1]);

Counting Hashtag Frequency

const counts = tags.reduce((acc, t) => (acc[t] = (acc[t] ?? 0) + 1, acc), {});

Practical Notes

Match Twitter’s rules carefully if you are mirroring its behavior: hashtags can contain letters, digits, and underscores, but cannot be entirely numeric, and the maximum length is platform-defined.

Use Case

Extracting topical tags from blog post bodies for tag-cloud generation, analyzing social media exports for trending hashtags, or auto-suggesting tags based on note content.

Try It — Regex Cheat Sheet

Open full tool