Infer Schema from HTML/XHTML Fragments
Generate JSON Schema from well-formed HTML/XHTML fragments with attributes, nested elements, and mixed text content.
Detailed Explanation
HTML/XHTML to JSON Schema
Well-formed XHTML can be processed as XML, making it possible to infer a JSON Schema from HTML structures. This is useful when you need to validate the structure of HTML templates or components.
Example XHTML
<div class="card" id="card-1">
<header>
<h2 class="title">Product Name</h2>
<span class="badge">New</span>
</header>
<div class="body">
<p>Product description goes here.</p>
<ul class="features">
<li>Feature one</li>
<li>Feature two</li>
<li>Feature three</li>
</ul>
</div>
<footer>
<button type="submit" disabled="disabled">Buy Now</button>
<span class="price">$29.99</span>
</footer>
</div>
Schema Highlights
- Attributes as properties:
@class,@id,@type,@disabledare all captured - Mixed content: Elements with both text and children get
#textproperties - Nested structure: The div > header > h2 hierarchy is fully represented
- Arrays: Multiple
<li>elements become an array
Practical Considerations
Note that HTML often contains mixed content where text and elements are interleaved. The converter handles this with the #text property. Also, HTML attributes like class, id, and disabled are treated the same as any other XML attribute.
Limitations
- The XML must be well-formed (XHTML). Regular HTML5 with unclosed tags (like
<br>,<img>) will fail to parse - Entities like
need to be either defined or replaced with numeric entities - Inline event handlers and script content may contain characters that break XML parsing
When to Use This
This approach works best for validating the structure of server-rendered HTML templates, email templates (which are often XHTML-based), or component markup in frameworks that output well-formed XML.
Use Case
When building validation for HTML templates, email templates, or XHTML-based component libraries. The generated schema can verify that templates contain the expected structure and attributes.