XML Entity References — Predefined, Numeric, and Custom Entities

Master XML entity references: the 5 predefined entities, numeric character references, custom entity declarations in DTDs, and how entities are resolved during XML parsing and formatting.

Advanced

Detailed Explanation

XML Entity References

XML entities are placeholders that are replaced with their defined values during parsing. They are essential for including special characters, reusable text fragments, and external content in XML documents.

The 5 Predefined Entities

XML defines exactly five built-in entities that are always available:

Entity Character Description
&lt; < Less-than sign
&gt; > Greater-than sign
&amp; & Ampersand
&quot; " Double quote
&apos; ' Apostrophe

These entities must be used when the literal character would be misinterpreted as XML markup. For example, < in text content must be written as &lt;.

Numeric Character References

Any Unicode character can be included using numeric references:

  • Decimal: &#169; produces (c) (copyright symbol)
  • Hexadecimal: &#xA9; produces (c) (same character)
  • Emoji: &#x1F600; produces a grinning face emoji

Numeric references are useful for characters not available on the keyboard or not supported by the document's encoding.

Custom Entities (DTD)

You can define custom entities in a Document Type Definition:

<!DOCTYPE document [
  <!ENTITY company "Acme Corporation">
  <!ENTITY copyright "&#169; 2024 Acme Corporation. All rights reserved.">
]>
<document>
  <header>&company; - Confidential</header>
  <footer>&copyright;</footer>
</document>

Custom entities act like text macros — each &company; reference is expanded to the full text during parsing.

External Entities

External entities reference content from external files or URLs:

<!ENTITY chapter1 SYSTEM "chapter1.xml">

Security warning: External entities can be exploited in XXE (XML External Entity) attacks. Most modern parsers disable external entity resolution by default.

Parameter Entities

Used within DTDs only, parameter entities (declared with %) allow reuse of DTD fragments:

<!ENTITY % common-attrs "id ID #IMPLIED class CDATA #IMPLIED">
<!ELEMENT item (#PCDATA)>
<!ATTLIST item %common-attrs;>

Formatting Considerations

When formatting XML, entity references should be preserved as-is. The formatter should not expand entities (replacing &lt; with < would break the document) or introduce new entities where literal characters are valid.

Use Case

Understanding XML entities is crucial for developers handling XML data that contains special characters (internationalized content, mathematical formulas, legal symbols), building XML documents programmatically, securing XML parsers against XXE attacks, and working with DTD-validated XML in publishing and document management systems.

Try It — XML Formatter

Open full tool