HTML Special Character Escaping and Entity Encoding

Learn how to escape special characters in HTML using named entities, numeric references, and hex references. Covers the five mandatory escapes, attribute encoding, and preventing XSS through proper HTML escaping.

Web & HTML

Detailed Explanation

HTML Special Character Escaping

HTML uses certain characters as part of its syntax — angle brackets for tags, ampersands for entities, and quotes for attributes. When these characters appear in content, they must be escaped to prevent the browser from misinterpreting them.

The Five Mandatory Escapes

&  → &    (ampersand — starts an entity reference)
<  → &lt;     (less-than — starts a tag)
>  → &gt;     (greater-than — ends a tag)
"  → &quot;   (double quote — ends attribute values)
'  → &#39;    (single quote / apostrophe — ends attribute values)

Named vs. Numeric Entities

HTML supports three encoding formats:

&amp;     — named entity
&#38;     — decimal numeric reference
&#x26;    — hexadecimal numeric reference

All three represent the ampersand character. Named entities are more readable; numeric references cover any Unicode code point.

Context-Specific Escaping

The required escaping depends on where the content appears:

  • In text content: Escape & and < at minimum.
  • In double-quoted attributes: Also escape ".
  • In single-quoted attributes: Also escape '.
  • In unquoted attributes: Escape all whitespace, &, <, >, ", ', =, and backticks. (Best practice: always quote attributes.)
  • Inside <script> and <style>: HTML entities are not decoded; use JavaScript or CSS escaping instead.

XSS Prevention

Proper HTML escaping is the primary defense against Cross-Site Scripting (XSS). User input inserted into HTML without escaping can execute arbitrary JavaScript:

<!-- Vulnerable: unescaped user input -->
<div>Hello, <script>alert('XSS')</script></div>

<!-- Safe: escaped user input -->
<div>Hello, &lt;script&gt;alert('XSS')&lt;/script&gt;</div>

Non-Breaking Spaces and Special Characters

&nbsp; (non-breaking space) prevents word wrapping at that position. Other common entities include &mdash; (em dash), &copy; (copyright), &euro; (euro sign), and &hellip; (ellipsis).

Programmatic Escaping

Most frameworks provide built-in HTML escaping: React escapes JSX by default, Django uses |escape, PHP has htmlspecialchars(), and Go provides html.EscapeString().

Use Case

HTML escaping is fundamental for web development. It is required when rendering user-generated content, building email templates, generating HTML reports, creating static site generators, inserting data into HTML attributes, and any context where untrusted text is displayed in a web page to prevent XSS vulnerabilities.

Try It — String Escape/Unescape

Open full tool