Detect and Remove Soft Hyphens (SHY)
Find soft hyphen characters (U+00AD) hidden in text from word processors and web pages. Remove them to prevent unexpected hyphenation and string issues.
Detailed Explanation
Soft Hyphens: The Conditional Break
A soft hyphen (SHY, U+00AD) is an invisible character that marks a position where a word may be hyphenated if it needs to break across lines. Unlike a regular hyphen (-), a soft hyphen is only visible when the word actually wraps at that point; otherwise, it is completely invisible.
Where Soft Hyphens Come From
- Word processors: Microsoft Word, LibreOffice, and Pages insert soft hyphens when you use their hyphenation features.
- Web content: The HTML entity
­renders as U+00AD in the DOM. When you copy text from a web page, these become actual SHY characters. - PDF extraction: Text extracted from professionally typeset PDFs often contains soft hyphens at word-break points.
- CMS output: Content management systems may auto-hyphenate long words with soft hyphens.
How the Visualizer Shows SHY
Each soft hyphen appears as a purple [SHY] marker in the visualization. This makes them easy to spot in the output.
Problems Caused by Soft Hyphens
# Search doesn't match:
"inter\u00ADnational" !== "international"
# URL is broken:
https://example.com/inter\u00ADnational-guide
# Variable name is invalid:
const inter\u00ADval = 1000; // syntax error in most languages
Soft hyphens are particularly insidious because they only become visible when text wraps, so the same text might look different at different window widths.
Detection and Removal
- Paste your text into the Whitespace Visualizer.
- Enable the SHY toggle to see [SHY] markers where soft hyphens exist.
- Check the statistics panel for the count.
- In the Clean section, enable SHY and click Clean to remove all soft hyphens.
- Verify the output — the words should now be continuous without any hidden break points.
Soft hyphens are almost always safe to remove unless you specifically need them for typographic layout in HTML (where CSS hyphens property is the modern alternative).
Use Case
A content editor notices that search results are inconsistent — some articles containing the word 'international' don't appear when searched. The Whitespace Visualizer reveals soft hyphens at 'inter[SHY]national' in the article body, copied from a Word document. Removing the soft hyphens fixes the search indexing.
Try It — Whitespace Visualizer
Related Topics
Find and Remove Zero-Width Spaces (ZWS)
Zero-Width Characters
Detect Zero-Width Joiners and Non-Joiners (ZWJ/ZWNJ)
Zero-Width Characters
Find and Remove Non-Breaking Spaces (NBSP)
Common Characters
Clean Hidden Characters from Copy-Pasted Text
Common Characters
Detect and Remove Byte Order Mark (BOM)
Unicode Whitespace