Plotting Log Data
While I am on the subject of selecting time ranges from logs, a practical application presented itself. A lonely server in a far-away land with a habit of running out of memory could only be monitored via stunnel and reverse SSH. The server crashed again today and, after bringing it back to life, I looked at whatever meager data the SSH monitor collected to see if there was anything useful.
The SSH monitoring script is simple: every fifteen minutes a connection is made to the server via reverse SSH tunnel and the uptime command is executed. If connection cannot be established in a couple of minutes, an alert is dispatched. So the log file looks something like this:
12:00:08 up 2 days, 3:38, 1 user, load average: 2.81, 1.80, 2.80 12:15:09 up 2 days, 3:53, 1 user, load average: 1.15, 2.15, 2.78 12:30:10 up 2 days, 4:08, 1 user, load average: 6.30, 2.44, 2.42 12:45:07 up 2 days, 4:23, 1 user, load average: 2.38, 6.80, 5.01 13:00:06 up 2 days, 4:38, 1 user, load average: 2.57, 1.70, 2.84
I thought I saw a pattern in the system load numbers, but then it could’ve been just lack of sleep. Basically, I needed to plot the 15-minute average load and I had no fancy tools like Splunk. There are two problems: the log has no day – only hours and minutes – and the timestamp is in a timezone different from mine. So the solution was to select just those lines covering just the right number of 15-minute intervals since last midnight in the log’s local timezone.
(( d = $(date -d "1970-01-01 $(date +'%H:%MSGT')" +%s) / ( 15 * 60 ) ))
tail -${d} /var/log/ssh_monitor.log | gnuplot -e "set terminal dumb 84 28; unset key; set style data labels; set xdata time; set xlabel 'Time'; set ylabel 'Load'; set autoscale; set timefmt '%H:%M'; plot '-' using 5:12:ytic(12) with lines"And the result:
Load
6.77 ++*++-++-++-++-++-+++-++-++-++-++-++-+++-++-++-++-++-+++-++-++-++-++-++-++
+ * + + + + + + + + + +
| * |
6.09 ++ * * ++
5.90 ++ * * * ++
| * * ** |
5.46 ++ * * * ** ++
| * * * * * * |
| * ** * * * * |
| * * * * * * * |
| * * * * * * * |
| * * * * * * * |
| * * * * * * * |
| * * * * * * * |
| * * * * * * * |
| * * * * * * * |
3.26 ++ * * * * * *** * * ++
3.02 ++ *** * * * * ** * * * ++
2.96 ++ ***** * *** * * * * *** ++
2.65 ++ *** ** * * * * * * * * ++
2.52 ++ * ** * * * * ** * ++
2.20 ++ ** ** ** ** * ++
1.94 ++ + + + ** + + * + + *+ + * ++
1.83 ++-++-++-++-++-++-+++-++-++*++-++-++-+++-++-++-++-++-+++-++-++-++-++-++*++
09:30:000:00:0010:30:001:00:001:30:0012:00:002:30:003:00:003:30:0014:00:00:30:00
Time
So I wasn’t imagining things: there was a pattern.
And now a bit of explanation. The following command tells you how many seconds have elapsed since midnight in the specified timezone (SGT, in this case):
date -d "1970-01-01 $(date +'%H:%MSGT')" +%s
Then it’s a simple matter of dividing it into 15-minute intervals and then grabbing that many lines from the end of the log file. And the
gnuplot bit is pretty standard.
