CJK Character Length: Chinese, Japanese, Korean Text

Learn how Chinese, Japanese, and Korean characters affect string length in UTF-8 (3 bytes each), UTF-16 (2 bytes each), and UTF-32 encodings.

Multi-byte Characters

Detailed Explanation

CJK Characters: 3 Bytes in UTF-8

Chinese, Japanese (Kanji/Hiragana/Katakana), and Korean (Hangul) characters occupy the Unicode range U+4E00–U+9FFF (CJK Unified Ideographs) and related blocks. These characters require 3 bytes each in UTF-8.

Example String

東京都渋谷区 (Tokyo Shibuya-ku)

Length Measurements

Metric Value
JavaScript .length 5
Code points 5
Grapheme clusters 5
UTF-8 bytes 15
UTF-16 bytes 10
UTF-32 bytes 20

UTF-8 vs UTF-16 for CJK

This is one of the rare cases where UTF-16 is more compact than UTF-8. Each CJK character costs 3 bytes in UTF-8 but only 2 bytes in UTF-16. For text that is predominantly CJK, UTF-16 saves about 33% storage compared to UTF-8.

However, UTF-8 is still preferred for web content because:

  1. Mixed content (CJK + ASCII) is common, and ASCII characters are 1 byte in UTF-8 vs 2 in UTF-16
  2. UTF-8 is the standard encoding for HTML, JSON, and HTTP
  3. UTF-8 has no byte-order issues (no BOM needed)

Japanese Mixed Text

Japanese text typically mixes Kanji, Hiragana, Katakana, and ASCII:

こんにちは世界!Hello!

Here, Hiragana (こんにちは) and Kanji (世界) are 3 bytes each in UTF-8, the full-width exclamation (!) is 3 bytes, while "Hello!" is 6 bytes. The total UTF-8 size is much larger than the character count suggests.

Database Considerations

MySQL's utf8 encoding (deprecated) only supports up to 3 bytes per character, which covers basic CJK. However, utf8mb4 (4 bytes) is required for emoji and supplementary CJK characters. Always use utf8mb4 for modern applications.

Use Case

When building applications for Asian markets or handling multilingual content, knowing that CJK characters use 3 UTF-8 bytes each is essential for accurate storage planning, API payload size estimation, and database column sizing.

Try It — String Length Calculator

Open full tool