Basic Latin Alphabet — A to Z in Unicode

Explore the Unicode encoding of the basic Latin alphabet (A–Z, a–z) — their code points, UTF-8 single-byte representation, and how uppercase/lowercase mapping works.

Basic Characters

Detailed Explanation

The Basic Latin Alphabet in Unicode

The Latin letters A through Z (uppercase) occupy code points U+0041 to U+005A, and their lowercase counterparts a through z occupy U+0061 to U+007A. These 52 characters form the core of the Basic Latin Unicode block (U+0000 to U+007F), which mirrors the ASCII standard exactly.

Code Point Structure

The uppercase and lowercase ranges are separated by exactly 32 (0x20) in their code point values:

A = U+0041 (65 decimal)    a = U+0061 (97 decimal)
B = U+0042 (66 decimal)    b = U+0062 (98 decimal)
...
Z = U+005A (90 decimal)    z = U+007A (122 decimal)

This 0x20 offset is a deliberate design choice from ASCII that makes case conversion trivial with bitwise operations: toggling bit 5 flips between upper and lower case.

UTF-8 Encoding

Every Basic Latin character is encoded as a single byte in UTF-8, identical to its ASCII value. The letter A is byte 0x41, and z is byte 0x7A. This means Basic Latin text has the same byte length in ASCII, Latin-1, and UTF-8.

Beyond Basic Latin

Unicode extends the Latin script far beyond A–Z with blocks like Latin-1 Supplement (U+0080–U+00FF) for accented characters, Latin Extended-A (U+0100–U+017F), and Latin Extended-B (U+0180–U+024F). The Unicode Inspector lets you compare how these extended characters differ in byte count and encoding from their basic counterparts.

Fullwidth Latin Letters

Japanese text processing uses fullwidth Latin letters (U+FF21–U+FF3A for A–Z, U+FF41–U+FF5A for a–z). These occupy 3 bytes in UTF-8 and are visually wider, designed to match the width of CJK characters in monospaced layouts.

Use Case

Use this reference when building character validation logic, implementing case-insensitive comparisons, or understanding why certain string operations behave differently for Basic Latin versus extended Latin characters.

Try It — Unicode Inspector

Open full tool