Infer Schema from XML Sitemap Files

Generate JSON Schema from XML sitemap files with URL entries, change frequencies, priorities, and sitemap index structures.

Real-World XML

Detailed Explanation

XML Sitemap to JSON Schema

XML sitemaps are used by search engines to discover and index website pages. They follow a well-defined structure but with some variation in optional fields.

Example Sitemap

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2024-01-15</lastmod>
    <changefreq>daily</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://example.com/about</loc>
    <lastmod>2024-01-10</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
  </url>
  <url>
    <loc>https://example.com/blog</loc>
    <changefreq>weekly</changefreq>
    <priority>0.9</priority>
  </url>
</urlset>

Generated Schema

{
  "type": "object",
  "properties": {
    "urlset": {
      "type": "object",
      "properties": {
        "url": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "loc": { "type": "string" },
              "lastmod": { "type": "string" },
              "changefreq": { "type": "string" },
              "priority": { "type": "number" }
            }
          }
        }
      }
    }
  }
}

Schema Insights

  • URL array: Multiple <url> elements correctly become an array
  • Priority as number: Values like 1.0, 0.8, 0.9 are detected as numbers
  • Optional fields: If lastmod is missing from some entries (like the third URL above), required field detection will not include it
  • Dates as strings: lastmod values remain strings since JSON Schema has no native date type

Sitemap Index

The same approach works for sitemap index files:

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-posts.xml</loc>
    <lastmod>2024-01-15</lastmod>
  </sitemap>
</sitemapindex>

Enhancing the Schema

For production use, you would add:

  • format: "uri" to loc
  • enum: ["always", "hourly", "daily", "weekly", "monthly", "yearly", "never"] to changefreq
  • minimum: 0.0, maximum: 1.0 to priority
  • format: "date" to lastmod

Use Case

When building sitemap generators, validators, or SEO tools that need to verify sitemap structure. The schema can validate generated sitemaps before submission to search engines, ensuring they conform to the expected format.

Try It — XML to JSON Schema

Open full tool