Convert Data with Unicode and Special Characters
Handle Unicode characters, emoji, accented letters, and other non-ASCII content when converting between TSV and CSV formats.
Detailed Explanation
Unicode and Special Characters
Modern data frequently contains Unicode characters such as accented letters, CJK characters, emoji, and mathematical symbols. The TSV/CSV converter handles all Unicode content correctly because JavaScript strings are natively Unicode.
Example: International Data (TSV)
Name City Notes
Jean-Pierre Dupré Paris Café owner ☕
田中太郎 東京 デベロッパー 💻
María García México City Estudiante 🌟
Михаил Иванов Москва Инженер
Generated CSV Output
Name,City,Notes
Jean-Pierre Dupré,Paris,Café owner ☕
田中太郎,東京,デベロッパー 💻
María García,México City,Estudiante 🌟
Михаил Иванов,Москва,Инженер
Character Encoding
The converter processes text as JavaScript strings, which use UTF-16 internally. This means:
- All Unicode code points are supported, including supplementary plane characters (emoji, rare CJK characters)
- No mojibake: Characters are not corrupted during conversion
- BOM handling: If your input starts with a UTF-8 BOM (byte order mark), it is preserved
Special Characters That Trigger Quoting
Only the following characters trigger quoting in the output:
- The target delimiter (comma or tab)
- The quote character (double or single quote)
- Newline characters (\n or \r)
Unicode characters like é, ñ, ü, å, CJK characters, and emoji do not trigger quoting because they are not syntactically significant in CSV/TSV.
Downloaded File Encoding
When you use the Download button, the file is saved as UTF-8. Most modern applications (Excel 2016+, Google Sheets, LibreOffice) handle UTF-8 CSV files correctly.
Use Case
Converting international customer data, multilingual content databases, or any dataset containing non-ASCII characters between TSV and CSV while preserving all Unicode characters correctly.