18 June 2009


I have a computer with a dodgy onboard ethernet card.* It used to be that it would fail every year or so, printing a bunch of messages like this to /var/log/messages:

May 19 04:34:01 tashtego kernel: NETDEV WATCHDOG: eth0: transmit timed out
May 19 04:34:01 tashtego kernel: sky2 hardware hung? flushing

These days, it does that a lot more often. I ought to replace it, but since it's just (ha!) a backup box, I'm not very inclined. I had an equally dodgy solution: a watchdog script. Now you can have it, too:

logger user.info "batman, robin here. monitoring network..."
wget --spider -q http://www.harvard.edu
if [[ $? != 0 ]] ; then
logger user.warn "holy flapjacks, batman, harvard timed out!"
sleep 60;
wget --spider -q http://www.mit.edu
if [[ $? != 0 ]] ; then
sleep 30;
logger user.warn "holy bran muffins batman, so did mit!"
wget --spider -q http://www.google.com
if [[ $? != 0 ]] ; then
logger user.err "holy s---, batman, google is down too. I'm rebooting!"
logger user.err "batman, reboot happened with status $? ; wtf?"
logger user.info "oh, we're fine..."

This is not an elegant solution by any stretch of the imagination, but it does the job. I trigger it from root's cron every five minutes, like this:

0,5,10,15,20,25,30,35,40,45,50,55 * * * * /root/watchdog.sh

* I'm sure it's not made any longer, but if you're curious:

02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 19)
Subsystem: Giga-byte Technology Marvell 88E8053 Gigabit Ethernet Controller (Gigabyte)

No comments:

Post a Comment

About Me

blog at barillari dot org Older posts at http://barillari.org/blog