XML Mixed Content and JSON Representation
Understand XML mixed content where elements contain both text and child elements. Learn the challenges of representing mixed content in JSON and common approaches.
Detailed Explanation
Mixed content is an XML feature where an element contains both text and child elements interleaved. It is common in document-oriented XML (HTML, DocBook, XHTML) but has no natural JSON representation.
XML with mixed content:
<paragraph>
This is <bold>important</bold> text with a
<link href="https://example.com">hyperlink</link> inside it.
</paragraph>
Here, the <paragraph> element contains:
- Text node:
"This is " - Element:
<bold>important</bold> - Text node:
" text with a\n " - Element:
<link> - Text node:
" inside it."
JSON representation challenges:
There is no standard way to represent this in JSON. Common approaches:
Approach 1: Concatenate all text (lossy)
{
"paragraph": "This is important text with a hyperlink inside it."
}
This loses the element structure entirely.
Approach 2: Use a special #text key
{
"paragraph": {
"#text": ["This is ", " text with a\n ", " inside it."],
"bold": "important",
"link": {
"@href": "https://example.com",
"#text": "hyperlink"
}
}
}
This preserves structure but loses the interleaving order of text and elements.
Approach 3: Ordered token array (JsonML-like)
["paragraph",
"This is ",
["bold", "important"],
" text with a\n ",
["link", {"href": "https://example.com"}, "hyperlink"],
" inside it."
]
This preserves both structure and order but produces JSON that is harder to query.
Practical guidance:
- If you are converting data-oriented XML (configurations, API responses), mixed content is rare and you can use the simple key-value approach.
- If you are converting document-oriented XML (HTML, publishing formats), mixed content is the norm and you need an ordered representation.
- Consider keeping mixed-content sections as raw XML strings in JSON if full fidelity is needed:
"paragraph": "<bold>important</bold> text".
Most JSON-to-XML converters do not produce mixed content by default, since JSON has no natural way to express it.
Use Case
Parsing rich-text XHTML content from a content management system into a JSON structure that a React component can render, preserving the inline formatting tags within paragraph text.