18 June 2009

Uptime fail

On Tuesday, I knew something was amiss when my ssh sessions on barillari.org suddenly stopped echoing. Unfortunately, I was 200 miles away at the time. I called Z., who kindly checked the cables and hit the reboot switch, but it remained unpingable. So much for a 400+ day uptime.

When I came to check it out, I discovered that I must have upgraded the kernel at some point during the year-plus period it was switched on. The system booted the upgraded kernel which for some reason renamed all of the drives from /dev/hd* to /dev/sd*. This freaked out grub, which couldn't find /dev/hda1. I bypassed that by hitting 'e' on the grub screen and changing root=/dev/hda1 to root=/dev/sda1. When the system booted, the drive-mounting process (or maybe it was fsck) freaked out again, because all of the fstab entries were wrong. Fortunately, it let me enter the root password, fix the fstab entries, and boot the machine. (The kernel upgrade also reordered the drives, so hdc became sdb. Gag.) I'm still not sure what brought the system down to begin with, but if there's a lesson in this, it's to never upgrade your kernel ever.

(Man, I remember the days when I used to compile my own kernel. The things that one puts up with when one is younger...)

