Bash Text Processing - grep, sed, awk, sort, cut

Master text processing in bash with grep for searching, sed for substitution, awk for column extraction, sort for ordering, and cut for field selection.

Text Processing

Detailed Explanation

Text Processing in Bash

Bash excels at text processing thanks to powerful utilities like grep, sed, awk, sort, and cut. These tools can be combined with pipes to build sophisticated data processing pipelines.

grep - Pattern Searching

grep searches files for lines matching a pattern. It supports basic and extended regular expressions:

grep "error" /var/log/syslog         # simple string search
grep -ri "todo" src/                  # recursive, case-insensitive
grep -n "function" app.js            # show line numbers
grep -v "^#" config.ini              # exclude comment lines
grep -c "import" *.ts                # count matches per file
grep -E "error|warn|fatal" app.log   # extended regex (OR)

sed - Stream Editing

sed performs text transformations on a stream. It is most commonly used for search-and-replace operations:

sed 's/old/new/g' file.txt              # replace all occurrences
sed -i 's/localhost/prod.server/g' cfg  # in-place editing
sed '/^$/d' file.txt                    # delete empty lines
sed -n '10,20p' file.txt               # print lines 10-20
sed 's/^/  /' file.txt                  # indent every line

awk - Column Processing

awk is a pattern-scanning language that excels at processing columnar data:

awk '{print $1, $3}' data.txt           # print columns 1 and 3
awk -F',' '{print $2}' users.csv        # CSV field extraction
awk '{sum += $3} END {print sum}' data  # sum a column
awk '$3 > 1000' transactions.txt        # filter by column value
awk 'NR==1 || $2 > 50' data.txt        # header + matching rows

sort - Ordering Lines

sort arranges lines in a specified order:

sort file.txt              # alphabetical sort
sort -rn numbers.txt       # reverse numeric sort
sort -t',' -k2 data.csv   # sort by column 2 (CSV)
sort -u wordlist.txt       # sort and deduplicate

cut - Field Extraction

cut extracts specific fields or character ranges from each line:

cut -d',' -f1,3 data.csv          # fields 1 and 3 from CSV
cut -d':' -f1 /etc/passwd         # first field (usernames)
cut -c1-10 file.txt               # first 10 characters

Combining Tools

The real power comes from combining these tools with pipes:

cat access.log | grep "POST" | awk '{print $7}' | sort | uniq -c | sort -rn | head -10

This pipeline extracts POST request paths from an access log, counts unique paths, and shows the top 10 most frequent ones.

Use Case

Text processing commands are indispensable for log analysis, data extraction, configuration management, and report generation. A developer might use grep to find all TODO comments in a codebase, sed to update configuration values across multiple files, or awk to extract metrics from CSV exports. These commands form the backbone of shell scripting for data manipulation.

Try It — Bash Cheat Sheet

Open full tool