Infer Schema from HTML/XHTML Fragments

Generate JSON Schema from well-formed HTML/XHTML fragments with attributes, nested elements, and mixed text content.

Complex Structures

Detailed Explanation

HTML/XHTML to JSON Schema

Well-formed XHTML can be processed as XML, making it possible to infer a JSON Schema from HTML structures. This is useful when you need to validate the structure of HTML templates or components.

Example XHTML

<div class="card" id="card-1">
  <header>
    <h2 class="title">Product Name</h2>
    <span class="badge">New</span>
  </header>
  <div class="body">
    <p>Product description goes here.</p>
    <ul class="features">
      <li>Feature one</li>
      <li>Feature two</li>
      <li>Feature three</li>
    </ul>
  </div>
  <footer>
    <button type="submit" disabled="disabled">Buy Now</button>
    <span class="price">$29.99</span>
  </footer>
</div>

Schema Highlights

  • Attributes as properties: @class, @id, @type, @disabled are all captured
  • Mixed content: Elements with both text and children get #text properties
  • Nested structure: The div > header > h2 hierarchy is fully represented
  • Arrays: Multiple <li> elements become an array

Practical Considerations

Note that HTML often contains mixed content where text and elements are interleaved. The converter handles this with the #text property. Also, HTML attributes like class, id, and disabled are treated the same as any other XML attribute.

Limitations

  • The XML must be well-formed (XHTML). Regular HTML5 with unclosed tags (like <br>, <img>) will fail to parse
  • Entities like &nbsp; need to be either defined or replaced with numeric entities
  • Inline event handlers and script content may contain characters that break XML parsing

When to Use This

This approach works best for validating the structure of server-rendered HTML templates, email templates (which are often XHTML-based), or component markup in frameworks that output well-formed XML.

Use Case

When building validation for HTML templates, email templates, or XHTML-based component libraries. The generated schema can verify that templates contain the expected structure and attributes.

Try It — XML to JSON Schema

Open full tool