Networking

Unix and Linux network configuration. Multiple network interfaces. Bridged NICs. High-availability network configurations.

Applications

Reviews of latest Unix and Linux software. Helpful tips for application support admins. Automating application support.

Data

Disk partitioning, filesystems, directories, and files. Volume management, logical volumes, HA filesystems. Backups and disaster recovery.

Monitoring

Distributed server monitoring. Server performance and capacity planning. Monitoring applications, network status and user activity.

Commands & Shells

Cool Unix shell commands and options. Command-line tools and application. Things every Unix sysadmin needs to know.

Home » Featured, Postfix

Watch the Log

Submitted by on June 18, 2013 – 8:06 pm

In the past few days my Postfix server has been having occasional problems talking to the mail gateway. They problem would come and go. The Postfix server would timeout trying to connect to the gateway and keep retrying. In the end, the emails would be delivered, but not without some delays. Troubleshooting these sort of fleeting issues sometimes feels like trying to touch a mirage.

I needed something to watch the maillog and, upon detecting the tell-tale error message, launch tcpdump for a few minutes to see what exactly was going on with the network at the time of the problem, as opposed to the time when I decide to wake up. Here’s a fairly simple script that will monitor the maillog of a specific error message and run tcpdump the first time it finds a match. The script will then exit, but this can be easily modified to run continuously. Just mind the disk space need for tcpdump output.

Make the necessary adjustments, save the script as /var/adm/bin/maillog_tcpdump.sh (or whatever) and run like so:

nohup /var/adm/bin/maillog_tcpdump.sh &

Here’s the script. Don’t forget to change email addresses for notification and grep patterns.
#!/bin/bash
configure() {
	notify_email="email1@domain.com,email2@domain.com"
	logdir="/var/log"
	logfilename="maillog"
	logfile="${logdir}/${logfilename}"
	if [ ! -r "${logfile}" ]
	then
		echo "ERROR: Log file ${logfile} not found. Exiting..."
		exit 1
	fi
	outfile="/tmp/`hostname -s`.`date +'%s'`.pdump"
	if [ -f "${outfile}" ]
	then
		/bin/rm -f "${outfile}"
	fi
}

logmon() {
	OLDCOUNT=$(grep -Fwc 'suspended' "${logfile}")
	while :
	do
		COUNT=$(grep -Fw 'suspended' "${logfile}" | egrep -c "mxgateway01|mxgateway02")
		DIFF=$((COUNT-OLDCOUNT))
		if [ $DIFF -gt 0 ]
		then
			echo "Running tcpdump"
			nohup tcpdump -w "${outfile}" -s 0 host mxgateway01 and port 25 or host mxgateway02 and port 25 &
			echo "Sleeping 10 minutes"
			sleep 600
			echo "Killing tcpdump"
			killall tcpdump
			echo "Resetting count"
			OLDCOUNT=$(grep -Fwc 'suspended' "${logfile}")
			echo "Sending notification"
			echo "Check ${outfile} on `hostname -s`" | mailx -s "Tcpdump complete on `hostname -s`" ${notify_email}
			exit # remove this line to keep the script running indefinitely
		fi
		OLDCOUNT=$COUNT
		sleep 15
	done < "${logfile}"
}

# RUNTIME
configure
logmon

 

Print Friendly, PDF & Email

Leave a Reply