Infer Schema from XML Sitemap Files
Generate JSON Schema from XML sitemap files with URL entries, change frequencies, priorities, and sitemap index structures.
Real-World XML
Detailed Explanation
XML Sitemap to JSON Schema
XML sitemaps are used by search engines to discover and index website pages. They follow a well-defined structure but with some variation in optional fields.
Example Sitemap
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/</loc>
<lastmod>2024-01-15</lastmod>
<changefreq>daily</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>https://example.com/about</loc>
<lastmod>2024-01-10</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>https://example.com/blog</loc>
<changefreq>weekly</changefreq>
<priority>0.9</priority>
</url>
</urlset>
Generated Schema
{
"type": "object",
"properties": {
"urlset": {
"type": "object",
"properties": {
"url": {
"type": "array",
"items": {
"type": "object",
"properties": {
"loc": { "type": "string" },
"lastmod": { "type": "string" },
"changefreq": { "type": "string" },
"priority": { "type": "number" }
}
}
}
}
}
}
}
Schema Insights
- URL array: Multiple
<url>elements correctly become an array - Priority as number: Values like
1.0,0.8,0.9are detected as numbers - Optional fields: If
lastmodis missing from some entries (like the third URL above), required field detection will not include it - Dates as strings:
lastmodvalues remain strings since JSON Schema has no native date type
Sitemap Index
The same approach works for sitemap index files:
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-posts.xml</loc>
<lastmod>2024-01-15</lastmod>
</sitemap>
</sitemapindex>
Enhancing the Schema
For production use, you would add:
format: "uri"tolocenum: ["always", "hourly", "daily", "weekly", "monthly", "yearly", "never"]tochangefreqminimum: 0.0, maximum: 1.0topriorityformat: "date"tolastmod
Use Case
When building sitemap generators, validators, or SEO tools that need to verify sitemap structure. The schema can validate generated sitemaps before submission to search engines, ensuring they conform to the expected format.