XML Character Escaping and CDATA Sections

Q: XML Character Escaping and CDATA Sections

## XML Character Escaping XML requires escaping certain characters to prevent them from being interpreted as markup. The rules are similar to HTML but stricter — XML parsers are not forgiving of errors and will reject malformed documents entirely. ### The Five Predefined Entities XML defines exactly five named entities: & → & (ampersand) < → (greater-than) " → " (double quote) ' → ' (apostrophe / single quote) Unlike HTML, XML does not def

Learn XML character escaping rules including the five predefined entities, numeric character references, CDATA sections, and the differences between XML attribute and element content escaping.

Data Formats

Detailed Explanation

XML Character Escaping

XML requires escaping certain characters to prevent them from being interpreted as markup. The rules are similar to HTML but stricter — XML parsers are not forgiving of errors and will reject malformed documents entirely.

The Five Predefined Entities

XML defines exactly five named entities:

&amp;   → & (ampersand)
&lt;    → < (less-than)
&gt;    → > (greater-than)
&quot;  → " (double quote)
&apos;  → ' (apostrophe / single quote)

Unlike HTML, XML does not define named entities for characters like © or —. You must use numeric references or define custom entities in a DTD.

Numeric Character References

XML supports decimal and hexadecimal references for any Unicode code point:

&#169;     → © (copyright, decimal)
&#xA9;     → © (copyright, hex)
&#x1F600;  → 😀 (emoji, hex)

Element Content vs. Attribute Values

In element content, you must escape & and <:

<message>Tom &amp; Jerry</message>
<code>if (a &lt; b)</code>

In attribute values, you must additionally escape the quote delimiter:

<item name="3&quot; bolt"/>
<item name='3&apos; pole'/>

CDATA Sections

CDATA sections allow unescaped text (except for the closing sequence ]]>):

<script><![CDATA[
  if (a < b && c > d) {
    // No escaping needed inside CDATA
  }
]]></script>

CDATA is particularly useful for embedding code, SQL, or other content with many special characters. The only sequence you cannot include is ]]> — if needed, split the CDATA section.

Processing Instructions and Comments

Inside processing instructions (<? ?>) and comments (), entity references are not processed. In comments, the sequence -- is forbidden by the XML specification.

XML vs. HTML Entity Handling

HTML5 parsers are lenient — they recover from unescaped ampersands and missing semicolons. XML parsers halt on any well-formedness error. This strictness means XML escaping must be complete and correct.

Use Case

XML escaping is required when generating SOAP messages, building RSS/Atom feeds, creating SVG graphics, constructing XSLT stylesheets, working with Office XML formats (DOCX, XLSX), generating Android resource files, building Maven POM files, and any system that exchanges data in XML format.

Try It — String Escape/Unescape

Open full tool →