Base64 Data in JSON
How and why Base64 is used to embed binary data in JSON. Covers encoding strategies, schema conventions, performance impact, and real-world API patterns.
Detailed Explanation
JSON only supports text strings, numbers, booleans, null, arrays, and objects. It has no native binary data type. When you need to include binary data (images, files, encrypted payloads) in a JSON document, Base64 encoding is the standard solution.
Basic pattern:
{
"fileName": "report.pdf",
"mimeType": "application/pdf",
"content": "JVBERi0xLjQKJcOkw7zDtsO..."
}
A common convention is to include the MIME type alongside the Base64 content so the consumer knows how to interpret the decoded bytes.
Real-world API examples:
Kubernetes Secrets store all values as Base64:
{
"apiVersion": "v1",
"kind": "Secret",
"data": {
"username": "YWRtaW4=",
"password": "cDRzc3cwcmQ="
}
}
GitHub API returns file contents as Base64:
{
"name": "README.md",
"encoding": "base64",
"content": "IyBNeSBQcm9qZWN0Cg..."
}
Schema conventions: In JSON Schema and OpenAPI/Swagger, binary data is defined using format: "byte" (Base64-encoded) or format: "binary" (raw binary, typically for file uploads):
{
"type": "string",
"format": "byte",
"description": "Base64-encoded file content"
}
Performance considerations:
- Base64 adds approximately 33% size overhead. A 1MB file becomes roughly 1.37MB of JSON string data.
- JSON parsers must allocate memory for the entire Base64 string. For large payloads, this can cause memory pressure. Consider streaming JSON parsers or multipart uploads for files over a few megabytes.
- Base64 strings in JSON must escape certain characters. In practice, standard Base64 output does not contain characters that need JSON escaping (
"or\), so no additional overhead is added.
Alternatives to Base64 in JSON:
- Multipart form data for file uploads (avoids the 33% overhead).
- Separate endpoints for binary data, with the JSON containing only a URL reference.
- MessagePack or CBOR instead of JSON, which support native binary types.
Common mistake: Double-encoding. If you Base64-encode data and then JSON-stringify the result, the Base64 string is properly included. But if you accidentally Base64-encode the entire JSON object, or Base64-encode an already-encoded string, you get nested encoding that consumers must decode multiple times.
Use Case
Transmitting user-uploaded avatar images through a REST API that only accepts JSON payloads, where the image is Base64-encoded in the request body.