Bash Text Processing - grep, sed, awk, sort, cut
Master text processing in bash with grep for searching, sed for substitution, awk for column extraction, sort for ordering, and cut for field selection.
Detailed Explanation
Text Processing in Bash
Bash excels at text processing thanks to powerful utilities like grep, sed, awk, sort, and cut. These tools can be combined with pipes to build sophisticated data processing pipelines.
grep - Pattern Searching
grep searches files for lines matching a pattern. It supports basic and extended regular expressions:
grep "error" /var/log/syslog # simple string search
grep -ri "todo" src/ # recursive, case-insensitive
grep -n "function" app.js # show line numbers
grep -v "^#" config.ini # exclude comment lines
grep -c "import" *.ts # count matches per file
grep -E "error|warn|fatal" app.log # extended regex (OR)
sed - Stream Editing
sed performs text transformations on a stream. It is most commonly used for search-and-replace operations:
sed 's/old/new/g' file.txt # replace all occurrences
sed -i 's/localhost/prod.server/g' cfg # in-place editing
sed '/^$/d' file.txt # delete empty lines
sed -n '10,20p' file.txt # print lines 10-20
sed 's/^/ /' file.txt # indent every line
awk - Column Processing
awk is a pattern-scanning language that excels at processing columnar data:
awk '{print $1, $3}' data.txt # print columns 1 and 3
awk -F',' '{print $2}' users.csv # CSV field extraction
awk '{sum += $3} END {print sum}' data # sum a column
awk '$3 > 1000' transactions.txt # filter by column value
awk 'NR==1 || $2 > 50' data.txt # header + matching rows
sort - Ordering Lines
sort arranges lines in a specified order:
sort file.txt # alphabetical sort
sort -rn numbers.txt # reverse numeric sort
sort -t',' -k2 data.csv # sort by column 2 (CSV)
sort -u wordlist.txt # sort and deduplicate
cut - Field Extraction
cut extracts specific fields or character ranges from each line:
cut -d',' -f1,3 data.csv # fields 1 and 3 from CSV
cut -d':' -f1 /etc/passwd # first field (usernames)
cut -c1-10 file.txt # first 10 characters
Combining Tools
The real power comes from combining these tools with pipes:
cat access.log | grep "POST" | awk '{print $7}' | sort | uniq -c | sort -rn | head -10
This pipeline extracts POST request paths from an access log, counts unique paths, and shows the top 10 most frequent ones.
Use Case
Text processing commands are indispensable for log analysis, data extraction, configuration management, and report generation. A developer might use grep to find all TODO comments in a codebase, sed to update configuration values across multiple files, or awk to extract metrics from CSV exports. These commands form the backbone of shell scripting for data manipulation.
Try It — Bash Cheat Sheet
Related Topics
Bash File Operations - ls, cp, mv, rm, find, chmod
File Operations
Bash Pipes and Redirects - |, >, >>, 2>, <, <<
Pipes & Redirects
Bash For Loops - Iterating Over Lists, Ranges, and Files
Control Flow
Bash String Manipulation - Substring, Replace, Trim, Case
String Manipulation
Bash Script Basics - Shebang, Arguments, Exit Codes, set Options
Script Basics