Question 1

Unicode Normalization in Databases

Accepted Answer

## Normalization in Databases

Databases store text as bytes, and most databases do not automatically normalize Unicode on insert. This means the same visual text can be stored in different byte sequences, causing equality checks and unique constraints to behave unexpectedly.

### The Duplicate Problem

Without normalization, a UNIQUE constraint on a username column might allow both:
- café (U+00E9, NFC form)
- café (U+0065 + U+0301, NFD form)

These look identical to users but are different by

Question 2

When is this useful?

Accepted Answer

Critical for any application storing user-generated text in a database, especially names, usernames, email addresses, and search terms. Without normalization, databases can contain invisible duplicates that bypass unique constraints and cause inconsistent query results.

Unicode Normalization in Databases

Detailed Explanation

Normalization in Databases

The Duplicate Problem

PostgreSQL

MySQL

MongoDB

Best Practice: Normalize on Write

Use Case

Try It — Unicode Normalizer

Related Topics