CDATA Sections in XML and JSON Mapping
Learn how XML CDATA sections preserve unescaped content and how they map to JSON strings. Covers when to use CDATA, escape sequences, and conversion best practices.
Detailed Explanation
CDATA (Character Data) sections allow you to include text in XML that would otherwise need escaping. They are enclosed in <![CDATA[...]]> markers and are commonly used to embed HTML, code snippets, or any text with special XML characters.
XML with CDATA:
<article>
<title>Introduction to HTML</title>
<content><![CDATA[
<p>Use <strong>bold</strong> for emphasis.</p>
<p>Entities like & are written literally here.</p>
]]></content>
<code><![CDATA[
if (x < 10 && y > 5) {
return true;
}
]]></code>
</article>
Converted to JSON:
{
"article": {
"title": "Introduction to HTML",
"content": "\n <p>Use <strong>bold</strong> for emphasis.</p>\n <p>Entities like & are written literally here.</p>\n ",
"code": "\n if (x < 10 && y > 5) {\n return true;\n }\n "
}
}
Key points about CDATA:
- CDATA preserves literal text. Inside a CDATA section,
<,>,&are not treated as XML markup. This is why you can embed raw HTML or code without escaping every special character. - In JSON, CDATA becomes a plain string. The CDATA wrapper is stripped, and the content becomes a JSON string value. Newlines become
\n, quotes become\", and the result is a standard escaped JSON string. - The only forbidden sequence inside CDATA is
]]>, which terminates the CDATA section. If your content contains this exact sequence, you must split the CDATA section.
When converting JSON to XML:
If a JSON string contains characters that are special in XML (<, >, &), the converter has two choices:
| Strategy | Output for "x < 10" |
|---|---|
| Entity escaping | <value>x < 10</value> |
| CDATA wrapping | <value><![CDATA[x < 10]]></value> |
CDATA is preferred when:
- The string contains significant amounts of HTML or code
- Readability of the XML matters (entity-escaped HTML is nearly unreadable)
- The content has many special characters that would require extensive escaping
Entity escaping is preferred when:
- The string is short and has few special characters
- You want the XML to be more compact
- The consuming parser does not need to handle CDATA
Use Case
Storing blog post content that contains HTML markup in an XML-based CMS feed, where the HTML must be preserved verbatim without entity encoding that would break the markup structure.