Calculate Byte Size of Text — UTF-8, UTF-16, ASCII

Calculate the byte size of any text in UTF-8, UTF-16, and ASCII encodings. Learn how different character encodings affect storage size and why the same text can have different byte sizes across encodings.

Text Metrics

Detailed Explanation

Text Byte Size Calculation

The byte size of text depends entirely on the character encoding used. The same string can occupy dramatically different amounts of storage depending on whether it is encoded as UTF-8, UTF-16, or ASCII.

Calculating Byte Size in JavaScript

The TextEncoder API provides accurate UTF-8 byte counts:

function getByteSize(text) {
  const encoder = new TextEncoder(); // defaults to UTF-8
  const encoded = encoder.encode(text);
  return encoded.byteLength;
}

For multiple encodings:

function getByteSizes(text) {
  const utf8 = new TextEncoder().encode(text).byteLength;
  const utf16 = text.length * 2; // approximate
  const ascii = text.replace(/[^\x00-\x7F]/g, "").length;
  return { utf8, utf16, ascii };
}

Encoding Comparison

Character UTF-8 UTF-16 ASCII
A (U+0041) 1 byte 2 bytes 1 byte
é (U+00E9) 2 bytes 2 bytes N/A
(U+4E16) 3 bytes 2 bytes N/A
Emoji (U+1F600) 4 bytes 4 bytes N/A

UTF-8 Variable-Width Encoding

UTF-8 uses 1-4 bytes per character:

  • 1 byte: U+0000 to U+007F (ASCII compatible) — English letters, digits, basic punctuation
  • 2 bytes: U+0080 to U+07FF — accented characters, Greek, Cyrillic, Arabic, Hebrew
  • 3 bytes: U+0800 to U+FFFF — CJK characters, most symbols
  • 4 bytes: U+10000 to U+10FFFF — emoji, historic scripts, musical notation

This variable width makes UTF-8 extremely efficient for English-dominated text but less efficient for CJK-heavy content.

Why Byte Size Matters

  1. Database storage — VARCHAR(255) in MySQL means 255 bytes in UTF-8, which may be fewer than 255 characters
  2. API payloads — many APIs limit request/response body size in bytes, not characters
  3. File size estimation — predicting storage requirements for text data
  4. Network bandwidth — byte size determines transmission time
  5. Cookie limits — browser cookies are limited to ~4,096 bytes total

BOM (Byte Order Mark)

UTF-8 files sometimes start with a BOM (\xEF\xBB\xBF, 3 bytes). UTF-16 files use \xFF\xFE or \xFE\xFF (2 bytes). These invisible markers add to the byte count but are not visible characters.

Use Case

Backend developers calculating database storage requirements use byte size to choose appropriate column types. Frontend developers building form validation need byte-aware limits for API fields. DevOps engineers estimating log storage costs, and data engineers designing ETL pipelines that process text data in specific encodings all rely on accurate byte size calculations.

Try It — Word Counter

Open full tool