Zalgo Text Detection and Content Moderation

Learn how to detect and moderate Zalgo text in user-generated content, including detection algorithms, threshold-based filtering, and sanitization strategies.

Practical Usage

Detailed Explanation

Moderating Zalgo Text

For platforms that handle user-generated content, Zalgo text can be disruptive to readability and user experience. Effective moderation requires detection and either removal or limitation of excessive combining marks.

Detection Algorithm

The key insight is that legitimate text rarely has more than 2–3 combining marks per base character. Zalgo text typically has 5+.

function detectZalgo(text, threshold = 3) {
  let consecutiveCombining = 0;
  for (const char of text) {
    if (/\p{Mn}/u.test(char)) {
      consecutiveCombining++;
      if (consecutiveCombining > threshold) return true;
    } else {
      consecutiveCombining = 0;
    }
  }
  return false;
}

Scoring System

For more nuanced moderation, assign a "zalgo score" to text:

function zalgoScore(text) {
  const baseChars = [...text].filter(c => !/\p{Mn}/u.test(c)).length;
  const combiningChars = [...text].filter(c => /\p{Mn}/u.test(c)).length;
  if (baseChars === 0) return 0;
  return combiningChars / baseChars;
}

// Score interpretation:
// 0-1:   Normal text (some accents)
// 1-3:   Light zalgo
// 3-8:   Medium zalgo
// 8+:    Heavy zalgo

Moderation Strategies

  1. Block: Reject content with a zalgo score above a threshold
  2. Strip: Remove all combining marks (aggressive, removes legitimate accents)
  3. Limit: Cap combining marks at 2–3 per base character (preserves accents)
  4. Flag: Mark content for human review
  5. Render-limit: Display only the first N combining marks per character

Implementation Example (Limiter)

function limitCombining(text, maxPerChar = 2) {
  let result = '';
  let count = 0;
  for (const char of text) {
    if (/\p{Mn}/u.test(char)) {
      if (count < maxPerChar) {
        result += char;
        count++;
      }
    } else {
      count = 0;
      result += char;
    }
  }
  return result;
}

This approach preserves legitimate diacritics (like é and ñ) while removing excessive zalgo stacking.

Use Case

Content moderation for Zalgo text is critical for chat applications, forums, social media platforms, gaming communities, and any system that accepts user-generated text where excessive combining marks could disrupt readability or layout.

Try It — Zalgo Text Generator

Open full tool