How to Strip and Remove Zalgo from Text

Learn techniques to programmatically remove Zalgo combining marks from text using regex, Unicode categories, and various programming languages.

Practical Usage

Detailed Explanation

Removing Zalgo Text

Stripping Zalgo means removing all combining diacritical marks from text, restoring it to its clean, readable form. This is essential for content moderation, text processing, and data cleaning.

JavaScript Regex Approach

The most common approach uses a regex that matches the combining diacritical marks range:

function stripZalgo(text) {
  return text.replace(/[\u0300-\u036f]/g, '');
}

// For broader coverage (extended combining marks):
function stripZalgoFull(text) {
  return text.replace(
    /[\u0300-\u036f\u1ab0-\u1aff\u1dc0-\u1dff\u20d0-\u20ff\ufe20-\ufe2f]/g,
    ''
  );
}

Python

import unicodedata

def strip_zalgo(text):
    return ''.join(
        c for c in text
        if unicodedata.category(c) != 'Mn'  # Mn = Mark, Nonspacing
    )

Using Unicode Categories

The Unicode General Category Mn (Mark, Nonspacing) covers all combining marks. This is the most reliable approach as it does not depend on specific code point ranges:

// Using Unicode property escapes (modern JS):
function stripZalgo(text) {
  return text.replace(/\p{Mn}/gu, '');
}

Preserving Legitimate Diacritics

A challenge: stripping ALL combining marks also removes legitimate accents (é, ñ, ü). To preserve legitimate diacritics while removing excess:

function stripExcessCombining(text, maxPerChar = 2) {
  let result = '';
  let combiningCount = 0;
  for (const char of text) {
    if (/\p{Mn}/u.test(char)) {
      combiningCount++;
      if (combiningCount <= maxPerChar) result += char;
    } else {
      combiningCount = 0;
      result += char;
    }
  }
  return result;
}

This limits combining marks to a maximum per character, preserving normal accented text while removing zalgo excess.

Use Case

Stripping Zalgo is essential for content moderation systems, chat applications, forum software, and any text processing pipeline that needs to handle user-generated content that may contain malicious or disruptive Unicode combining marks.

Try It — Zalgo Text Generator

Open full tool