Count Words in Text Online

Count the number of words in any text instantly. Learn how word counting works, how different languages handle word boundaries, and common word-splitting algorithms used in text processing.

Basic Counting

Detailed Explanation

How Word Counting Works

Word counting seems simple at first glance, but the definition of a "word" is surprisingly nuanced. At its core, a word counter splits text on whitespace boundaries and counts the resulting tokens.

The Basic Algorithm

The most common approach uses a regular expression to match sequences of non-whitespace characters:

function countWords(text) {
  const trimmed = text.trim();
  if (trimmed.length === 0) return 0;
  return trimmed.split(/\s+/).length;
}

This splits on one or more whitespace characters (spaces, tabs, newlines) and counts the resulting array length. An empty string check prevents returning 1 for blank input.

Handling Edge Cases

Real-world text introduces several complications:

  • Multiple spaces"hello world" should count as 2 words, not 5. The \s+ regex handles this by matching one or more consecutive whitespace characters.
  • Leading/trailing whitespace — Trimming the input before splitting prevents phantom empty strings at the start or end of the array.
  • Hyphenated words — Is "well-known" one word or two? Most word counters treat it as one word since the hyphen is not whitespace.
  • Contractions — "don't" and "it's" are each counted as a single word because the apostrophe is not a whitespace character.
  • Numbers — "There are 42 items" contains 4 words. Numbers surrounded by whitespace are counted as words.

CJK and Non-Latin Scripts

For Chinese, Japanese, and Korean (CJK) text, words are not separated by spaces. A more sophisticated approach uses the Intl.Segmenter API:

const segmenter = new Intl.Segmenter("ja", { granularity: "word" });
const segments = [...segmenter.segment(text)];
const wordCount = segments.filter(s => s.isWordLike).length;

This API uses Unicode-aware segmentation rules that understand word boundaries in non-Latin scripts.

Unicode Word Boundaries

The Unicode standard defines word boundary rules in UAX #29. These rules handle complex cases like emoji sequences (\u{1F468}\u{200D}\u{1F4BB} is one "word"), zero-width joiners, and combining characters. Modern word counters should respect these boundaries for accurate counts across all languages.

Use Case

Word counting is essential for writers meeting article length requirements, students checking essay word counts, SEO professionals optimizing content length, and social media managers crafting posts within platform limits. Bloggers typically aim for 1,500-2,500 words for SEO-optimized articles.

Try It — Word Counter

Open full tool