Networking

Unix and Linux network configuration. Multiple network interfaces. Bridged NICs. High-availability network configurations.

Applications

Reviews of latest Unix and Linux software. Helpful tips for application support admins. Automating application support.

Data

Disk partitioning, filesystems, directories, and files. Volume management, logical volumes, HA filesystems. Backups and disaster recovery.

Monitoring

Distributed server monitoring. Server performance and capacity planning. Monitoring applications, network status and user activity.

Commands & Shells

Cool Unix shell commands and options. Command-line tools and application. Things every Unix sysadmin needs to know.

Home » Commands & Shells, Featured

Approximate String Matching

Submitted by on August 18, 2021 – 11:15 am

On occasion I find myself searching for something in log files or in my Bash history, but I can’t quite remember what it is that I am looking for. Come to think of it, this happens a lot.

I usually get some of the keywords wrong and those I do get right may be out of order. In other words, grep just doesn’t cut it. There are a couple of excellent command-line utilities available for just this sort of task – fzf and agrep – that specialize in approximate string matching.

The fzf – or “fuzzy find” – is geared more toward interactive use. The most common way sysadmins use this utility is by running history|fzf. This allows for very flexible searching of your shell history for commands long forgotten.

But fzf also offers some non-interactive functionality that can come in handy in scripts. In the example below, we will download Cervantes’ “Don Quixote” and then will try to find the oft-quoted line about battle wounds and honor. I can’t quite remember how it went, so this would be a good test for fzf.

First, we will download the book to a temp file, while doing some basic text formatting, such as converting text to Unix format and joining some of the sentences to reduce the number of line breaks.

f="$(mktemp)"
curl -s0 -k http://www.gutenberg.org/files/996/996.txt > "${f}"
cat -v "${f}" | \
sed 's/\^M$//g' | \
sed -r '/[[:alnum:]]$/N;s/\n/ /' | \
sponge "${f}"

And now we can try to find the relevant line:

cat $f | fzf --filter 'battle wounds honor' | \
grep -i 'wounds' | head -1 | grep --color -Ei "wounds|$"

# To which Don Quixote replied, "Wounds received in battle confer honour 
# instead of taking it away; and so, friend Panza, say no more, but, as I

This is awesome. And in case the first hit found by fzf is not what you were looking form just change head -1 to, say, head -10 and look through the matching lines.

Now, agrep is quite a bit less flexible (not to mention slower), but it can also work:

agrep -i -k -E 5 'battle wounds honour' $f

# To which Don Quixote replied, "Wounds received in battle confer honour 
# instead of taking it away; and so, friend Panza, say no more, but, as I

Note that I cheated a little by using the British English spelling of “honor”. Unlike fzf, agrep is not good with spelling discrepancies.

Print Friendly, PDF & Email

Leave a Reply