CSV Delimiter Options: Tab, Semicolon, Pipe
Handle CSV files with non-comma delimiters including tab-separated (TSV), semicolon-separated, and pipe-delimited formats. Includes auto-detection tips.
Detailed Explanation
Working with Non-Comma Delimiters
Despite the name "Comma-Separated Values," CSV files often use other delimiters. Different regions, tools, and data sources prefer different separators.
Common delimiters
| Delimiter | Name | Common source |
|---|---|---|
, |
Comma | Default in US/UK locales, most programming tools |
; |
Semicolon | European locales (where comma is the decimal separator) |
\t |
Tab (TSV) | Database exports, Unix tools, clipboard from spreadsheets |
| ` | ` | Pipe |
Why semicolons in Europe?
In countries like Germany, France, and Brazil, the comma is used as a decimal separator (3,14 instead of 3.14). To avoid ambiguity, CSV files from these locales use semicolons:
Produkt;Preis;Menge
Widget;9,99;100
Gadget;24,50;75
If you parse this with a comma delimiter, the entire line becomes a single field. Always check the locale context of your data source.
Tab-separated values (TSV)
TSV files use tab characters as delimiters. They are popular because tabs rarely appear in data values, eliminating most quoting issues:
name\tage\tcity
Alice\t30\tNew York
Bob\t25\tSan Francisco
When pasting tabular data from a spreadsheet into a web form, browsers typically use tab separation. If your tool detects tab characters in the input, it should automatically switch to TSV mode.
Auto-detection strategy
A practical delimiter detection algorithm:
- Read the first 5 lines of the file.
- For each candidate delimiter (
,,;,\t,|), count how many fields each line produces. - The delimiter that gives the most consistent field count across all lines is likely the correct one.
- Break ties by preferring comma > tab > semicolon > pipe.
Converting between delimiters
Changing a file's delimiter is a common preprocessing step. Parse with the source delimiter, then serialize with the target delimiter, properly quoting any values that contain the new delimiter.
Use Case
Ingesting data from a European ERP system that exports semicolon-delimited CSV files and converting them to standard comma-delimited format for a US-based analytics pipeline.