Unicode Normalizer

Normalize text to NFC, NFD, NFKC, or NFKD forms and compare results side by side with character-level breakdowns.

About This Tool

The Unicode Normalizer is a free browser-based tool that converts text into any of the four Unicode normalization forms defined by the Unicode Standard: NFC (Canonical Composition), NFD (Canonical Decomposition), NFKC (Compatibility Composition), and NFKD (Compatibility Decomposition).

Unicode allows the same visual character to be represented in multiple ways. For example, the letter é can be stored as a single precomposed code point (U+00E9, LATIN SMALL LETTER E WITH ACUTE) or as two code points (U+0065 + U+0301, the letter e followed by a combining acute accent). These two representations look identical on screen but have different byte sequences, which can cause subtle bugs in string comparison, search, database lookups, and security checks.

This tool lets you paste text and instantly see the result in all four normalization forms side by side, with character counts, byte sizes, and code point breakdowns. The Composition / Decomposition view visually shows how characters like é split into base letters and combining marks, helping you understand exactly what each form does.

If you need to inspect individual characters in detail, try the Unicode Inspector. For detecting the encoding of a file or text blob, the Encoding Detector can identify character sets. And for general string transformations, the Text Case Converter handles naming conventions.

All normalization is performed using JavaScript’s built-in String.prototype.normalize() method. No data ever leaves your browser — processing is 100% client-side.

How to Use

  1. Paste or type text into the Input Text area. Characters with diacritics, ligatures, or CJK text work best for demonstrating normalization differences.
  2. View the Compare Forms tab to see the text normalized to NFC, NFD, NFKC, and NFKD simultaneously, with code point and byte size counts for each form.
  3. Look at the Same as input / Changed badges to quickly identify which forms alter the text.
  4. Check the Comparison Summary table for a side-by-side count of code points, UTF-8 bytes, and UTF-16 units across all forms.
  5. Switch to the Character Breakdown tab to see each grapheme decomposed into its component code points with names and byte sizes.
  6. Click the Copy button on any normalization form to copy the normalized text to your clipboard.
  7. Use the Clear button to reset the input and start over with new text.

Popular Unicode Normalization Examples

View all normalization examples →

FAQ

What is Unicode normalization?

Unicode normalization is the process of converting text into a standard form so that equivalent character sequences are stored identically. The Unicode Standard defines four normalization forms: NFC, NFD, NFKC, and NFKD. This ensures that text comparisons, searches, and storage are consistent regardless of how characters were originally encoded.

What is the difference between NFC and NFD?

NFC (Canonical Composition) combines base characters and combining marks into single precomposed characters when possible. For example, 'e' + combining acute accent becomes 'é'. NFD (Canonical Decomposition) does the opposite: it decomposes precomposed characters into their base character plus combining marks. Both forms are canonically equivalent, meaning they represent the same text.

When should I use NFKC or NFKD instead of NFC/NFD?

Use NFKC or NFKD when you want compatibility decomposition, which maps visually similar but semantically different characters to a common form. For example, the ligature 'fi' (fi) becomes 'fi' in NFKC/NFKD, and fullwidth letters become their ASCII equivalents. This is useful for search indexing, username validation, and security checks. NFC/NFD preserve compatibility characters as-is.

Which normalization form should I use for my application?

NFC is the most commonly recommended form and is the W3C recommendation for the web. It is also the default for macOS HFS+ filenames and most database systems. Use NFKC for search and comparison tasks where you want to treat visually similar characters as identical. NFD is used by macOS for filenames in APFS. The choice depends on your specific use case.

Is my data safe?

Yes. All normalization is performed entirely in your browser using JavaScript's built-in String.prototype.normalize() method. No text, characters, or any other data is ever sent to any server. You can verify this by checking the Network tab in your browser's developer tools.

Does normalization change the visual appearance of text?

NFC and NFD do not change the visual appearance — they only change the underlying code point representation. NFKC and NFKD may change the appearance because they replace compatibility characters: for example, superscript digits become regular digits, and ligatures are split into individual letters.

How does normalization affect string comparison?

Without normalization, two visually identical strings may not be equal in a byte-by-byte comparison. For example, 'é' (U+00E9) and 'é' (U+0065 + U+0301) look the same but have different byte sequences. Normalizing both strings to the same form before comparison ensures consistent results. This is critical for databases, search engines, and authentication systems.

Related Tools