Question 1

Text Segmentation with Intl.Segmenter (Word, Sentence, Grapheme)

Accepted Answer

## Intl.Segmenter: Locale-Aware Text Segmentation

Intl.Segmenter splits text into meaningful segments (words, sentences, or grapheme clusters) according to locale-specific rules. This is particularly important for languages that do not use spaces between words.

### Word Segmentation

javascript
// English: spaces separate words
const en = new Intl.Segmenter('en', { granularity: 'word' });
const words = [...en.segment('Hello world!')].filter(s => s.isWordLike);
// [{segment: "Hello"}, {segment:

Question 2

When is this useful?

Accepted Answer

Text segmentation is essential for search engines processing CJK text, word counters that need accurate counts for Japanese and Chinese, text editors implementing word-boundary navigation, spell checkers for non-space-separated languages, and any application that needs to correctly count characters including emoji. A Twitter-like character counter that uses string.length will count a family emoji as 11 characters instead of 1. A search engine indexing Japanese text needs word boundaries to create an inverted index.

Text Segmentation with Intl.Segmenter (Word, Sentence, Grapheme)

Detailed Explanation

Intl.Segmenter: Locale-Aware Text Segmentation

Word Segmentation

Why Spaces Are Not Enough

Sentence Segmentation

Grapheme Cluster Segmentation

Practical Uses

Browser Support

Use Case

Try It — Locale String Tester

Related Topics