Count Sentences in Text Accurately

Q: Count Sentences in Text Accurately

## Sentence Counting and Boundary Detection Counting sentences requires identifying where one sentence ends and another begins. While periods seem like obvious markers, real-world text is full of ambiguities. ### The Naive Approach A simple regex splits on sentence-ending punctuation: javascript function countSentences(text) { const matches = text.match(/[.!?]+/g); return matches ? matches.length : 0; } This works for simple text like "Hello world. How are you?" (2 sentences) but fails

Count sentences in any text using intelligent boundary detection. Learn how sentence segmentation works, handling abbreviations, decimal numbers, and edge cases in sentence counting.

Basic Counting

Detailed Explanation

Sentence Counting and Boundary Detection

Counting sentences requires identifying where one sentence ends and another begins. While periods seem like obvious markers, real-world text is full of ambiguities.

The Naive Approach

A simple regex splits on sentence-ending punctuation:

function countSentences(text) {
  const matches = text.match(/[.!?]+/g);
  return matches ? matches.length : 0;
}

This works for simple text like "Hello world. How are you?" (2 sentences) but fails on many real-world inputs.

Why Sentence Splitting Is Hard

Consider these problematic cases:

Abbreviations: "Dr. Smith went to Washington." — 1 sentence, not 2
Decimal numbers: "The price is $3.50." — 1 sentence, not 2
Ellipsis: "Wait... what happened?" — 1 sentence, not 4
URLs: "Visit example.com." — 1 sentence, not 2
Initials: "J.K. Rowling wrote the series." — 1 sentence, not 3
Quotations: '"Really?" she asked.' — 1 or 2 depending on interpretation

A Better Algorithm

A more robust approach combines regex with heuristics:

function countSentences(text) {
  if (!text.trim()) return 0;
  // Split on sentence-ending punctuation followed by space + capital
  // or end of string
  const sentences = text
    .replace(/([.!?])\s*(?=[A-Z])/g, "$1|")
    .split("|");
  return sentences.filter(s => s.trim().length > 0).length;
}

This looks for a period, exclamation mark, or question mark followed by whitespace and a capital letter, which is a stronger signal of a sentence boundary.

The Intl.Segmenter Approach

Modern browsers support sentence-level segmentation:

const segmenter = new Intl.Segmenter("en", { granularity: "sentence" });
const sentences = [...segmenter.segment(text)];
const count = sentences.length;

This handles abbreviations, decimal points, and many edge cases correctly because it uses ICU (International Components for Unicode) rules internally.

Multi-Language Considerations

Different languages use different sentence terminators. Thai uses a space character, Japanese uses 。, and Greek uses ; for questions. The Intl.Segmenter API handles these conventions when the correct locale is specified.

Use Case

Sentence counting is used by writers analyzing readability (shorter sentences improve clarity), NLP researchers preprocessing text corpora, language learners tracking writing complexity, and content tools that calculate average sentence length for readability scores like Flesch-Kincaid.

Try It — Word Counter

Open full tool →