Number vs String Type Inference in CSV
Handle numeric type detection when converting CSV to JSON. Learn about auto-detection rules, precision pitfalls, and when to keep numbers as strings.
Detailed Explanation
Type Inference: Numbers vs Strings
CSV is an untyped format -- every value is a string. When converting to JSON, you must decide whether to output "42" (string) or 42 (number). This decision has significant downstream consequences.
Auto-detection rules
A typical type inference engine applies these checks in order:
- Empty string →
nullor"" - Boolean → if value is exactly
"true"or"false", convert to boolean - Integer → if value matches
/^-?\d+$/and is within safe range, convert to number - Float → if value matches
/^-?\d+\.\d+$/, convert to number - Everything else → keep as string
Example
id,zipCode,temperature,label
1,01234,72.5,active
2,90210,-3.2,inactive
3,00501,98.6,true
Without type inference:
{ "id": "1", "zipCode": "01234", "temperature": "72.5", "label": "active" }
With type inference:
{ "id": 1, "zipCode": 1234, "temperature": 72.5, "label": "active" }
Notice the problem: "01234" becomes 1234, losing the leading zero. This is a data corruption bug.
When NOT to infer numbers
Certain fields look numeric but must remain strings:
- ZIP codes:
"01234"(leading zeros are significant) - Phone numbers:
"0035512345678" - ID fields:
"000042"(fixed-width identifiers) - Credit card numbers: Exceed JavaScript's
Number.MAX_SAFE_INTEGER - Version numbers:
"1.2.3"is not a float
Safe integer range
JavaScript numbers (IEEE 754 doubles) can exactly represent integers up to 2^53 - 1 (9,007,199,254,740,991). Larger integers like database IDs or blockchain hashes lose precision:
Number("9007199254740993") // → 9007199254740992 (wrong!)
For such values, keep them as strings or use BigInt in post-processing.
Best practice
Default to strings and let users opt into type inference per column. This prevents silent data corruption and gives consumers explicit control over type handling.
Use Case
Converting a bank transaction CSV export to JSON where account numbers and routing numbers must remain as strings despite being purely numeric, while transaction amounts need to be actual numbers for calculation.