HTML Special Character Escaping and Entity Encoding
Learn how to escape special characters in HTML using named entities, numeric references, and hex references. Covers the five mandatory escapes, attribute encoding, and preventing XSS through proper HTML escaping.
Detailed Explanation
HTML Special Character Escaping
HTML uses certain characters as part of its syntax — angle brackets for tags, ampersands for entities, and quotes for attributes. When these characters appear in content, they must be escaped to prevent the browser from misinterpreting them.
The Five Mandatory Escapes
& → & (ampersand — starts an entity reference)
< → < (less-than — starts a tag)
> → > (greater-than — ends a tag)
" → " (double quote — ends attribute values)
' → ' (single quote / apostrophe — ends attribute values)
Named vs. Numeric Entities
HTML supports three encoding formats:
& — named entity
& — decimal numeric reference
& — hexadecimal numeric reference
All three represent the ampersand character. Named entities are more readable; numeric references cover any Unicode code point.
Context-Specific Escaping
The required escaping depends on where the content appears:
- In text content: Escape
&and<at minimum. - In double-quoted attributes: Also escape
". - In single-quoted attributes: Also escape
'. - In unquoted attributes: Escape all whitespace,
&,<,>,",',=, and backticks. (Best practice: always quote attributes.) - Inside
<script>and<style>: HTML entities are not decoded; use JavaScript or CSS escaping instead.
XSS Prevention
Proper HTML escaping is the primary defense against Cross-Site Scripting (XSS). User input inserted into HTML without escaping can execute arbitrary JavaScript:
<!-- Vulnerable: unescaped user input -->
<div>Hello, <script>alert('XSS')</script></div>
<!-- Safe: escaped user input -->
<div>Hello, <script>alert('XSS')</script></div>
Non-Breaking Spaces and Special Characters
(non-breaking space) prevents word wrapping at that position. Other common entities include — (em dash), © (copyright), € (euro sign), and … (ellipsis).
Programmatic Escaping
Most frameworks provide built-in HTML escaping: React escapes JSX by default, Django uses |escape, PHP has htmlspecialchars(), and Go provides html.EscapeString().
Use Case
HTML escaping is fundamental for web development. It is required when rendering user-generated content, building email templates, generating HTML reports, creating static site generators, inserting data into HTML attributes, and any context where untrusted text is displayed in a web page to prevent XSS vulnerabilities.