Question 1

Latin Extended Characters and Multi-Byte UTF-8

Accepted Answer

## Beyond ASCII: Latin Extended Characters

Characters like é, ü, ñ, ç, and å are common in European languages. While they look like single characters, their encoding details reveal important differences from plain ASCII.

### Example String

café naïve résumé

### Length Measurements

| Metric | Value |
|--------|-------|
| JavaScript .length | 17 |
| Code points | 17 |
| Grapheme clusters | 17 |
| UTF-8 bytes | 21 |
| UTF-16 bytes | 34 |
| UTF-32 bytes | 68 |

### Why UTF-8 Bytes Differ

Cha

Question 2

When is this useful?

Accepted Answer

When building applications for European markets (French, German, Spanish, Portuguese), understanding that accented characters use 2 bytes in UTF-8 is essential for accurate storage estimation and VARCHAR limit calculations.

Latin Extended Characters and Multi-Byte UTF-8

Detailed Explanation

Beyond ASCII: Latin Extended Characters

Example String

Length Measurements

Why UTF-8 Bytes Differ

Precomposed vs Decomposed Forms

Database Implications

Use Case

Try It — String Length Calculator

Related Topics

Metric	Value
JavaScript `.length`	17
Code points	17
Grapheme clusters	17
UTF-8 bytes	21
UTF-16 bytes	34
UTF-32 bytes	68