Unicode Inspector

Paste text to inspect each character's Unicode code point, UTF-8/UTF-16 encoding, name, category, and block.

About This Tool

The Unicode Inspector is a free browser-based tool that provides detailed information about every character in a string. Whether you are debugging encoding issues, analyzing multilingual text, or learning about Unicode internals, this tool gives you a comprehensive character-by-character breakdown instantly.

For each character you enter, the inspector displays the Unicode code point (U+XXXX format), the raw UTF-8 byte sequence in hexadecimal, the UTF-16 code units, the official Unicode character name, its general category (Letter, Number, Punctuation, Symbol, Separator, Control, or Other), and the Unicode block it belongs to (e.g. Basic Latin, CJK Unified Ideographs, Emoticons). You can also see the byte count in UTF-8, which is essential when working with protocols or databases that enforce byte-length limits.

The summary statistics panel shows totals at a glance: total characters, code points, byte sizes in both UTF-8 and UTF-16 encodings, and the number of unique characters. This is useful for estimating storage requirements or validating string length constraints across different encodings.

You can search the character table by code point (U+0041), character name, category, or Unicode block. Click any row to open a detailed panel with all properties and quick-copy buttons. The tool correctly handles supplementary-plane characters such as emoji and CJK Extension B ideographs, which require surrogate pairs in UTF-16.

If you work with text processing, consider pairing this tool with the Word & Character Counter for document statistics, the String Escape/Unescape tool for encoding special characters in code, or the Text Case Converter for transforming letter case across scripts.

All processing runs entirely in your browser using JavaScript string APIs. No data is sent to any server — your text stays on your machine at all times. This makes it safe to inspect sensitive or proprietary content without privacy concerns.

How to Use

  1. Type or paste any text into the input area at the top of the page.
  2. View the summary statistics bar for total characters, code points, UTF-8 bytes, UTF-16 bytes, and unique character count.
  3. Browse the character table to see each character's code point, UTF-8 bytes, UTF-16 units, name, category, block, and byte count.
  4. Use the search bar to filter characters by code point (e.g. U+00E9), character name, category, or Unicode block.
  5. Click a row in the table to open the detailed character panel with all properties displayed in a card layout.
  6. Click the Copy buttons to copy the character, its code point, or UTF-8 bytes to your clipboard.
  7. Press Ctrl+Shift+C to copy the currently selected character. Click Clear to reset the input and start over.

Popular Unicode Inspector Examples

View all Unicode Inspector examples →

FAQ

What information does the Unicode Inspector show?

For each character it displays: the rendered character, Unicode code point (U+XXXX), UTF-8 byte sequence in hex, UTF-16 code units in hex, the Unicode character name, general category (Letter, Number, Punctuation, Symbol, Separator, Control, Other), Unicode block name, and UTF-8 byte count. Summary statistics include totals for characters, code points, UTF-8 bytes, UTF-16 bytes, and unique characters.

How does it handle emoji and supplementary characters?

The tool uses JavaScript's Unicode-aware string iteration (Symbol.iterator) to correctly split text into individual Unicode code points, even when a character requires a surrogate pair in UTF-16. For example, the globe emoji (U+1F30D) is shown as a single character with its 4-byte UTF-8 encoding and 2 UTF-16 code units.

Can I search for a specific code point?

Yes. Type a code point in U+XXXX format (e.g. U+00E9 for e with acute accent), a hex value with 0x prefix, or a decimal number into the search bar. You can also search by character name, category, or Unicode block name.

What is the difference between UTF-8 bytes and UTF-16 code units?

UTF-8 uses 1 to 4 bytes per character — ASCII characters use 1 byte, most European accented characters use 2 bytes, CJK ideographs use 3 bytes, and emoji use 4 bytes. UTF-16 uses 2 or 4 bytes (1 or 2 code units of 16 bits each). Characters in the Basic Multilingual Plane (U+0000 to U+FFFF) use 1 code unit; supplementary characters above U+FFFF use a surrogate pair of 2 code units.

How accurate are the character names?

The tool includes a built-in lookup table covering ASCII characters, common punctuation, currency symbols, special Unicode characters (zero-width spaces, BOM, etc.), and names generated from Unicode block ranges for CJK, Hiragana, Katakana, Hangul, and emoji. For less common characters, a descriptive name based on the code point and block is provided.

Is my data safe?

Yes. All processing runs entirely in your browser using JavaScript. No data is sent to any server. You can verify this by checking the Network tab in your browser's developer tools while using the tool. Your text never leaves your machine.

Can I use this to debug encoding issues?

Absolutely. The tool is ideal for identifying invisible characters (zero-width spaces, byte order marks, non-breaking spaces), mojibake (incorrectly decoded text), and unexpected characters in data files. The UTF-8 byte display helps you verify whether characters are encoded as expected.

Related Tools