28 November 2009

A problem with no solution

MySQL has recovered without a hiccup from every crash I've thrown at it, until this Thanksgiving when I started a long-running process and inadvertently knocked my laptop's power cord out of its socket before I went to bed. When I woke up, the computer was off, and the database refused to start. The error message was something I hope never to see on a production system:

091128 22:56:37 InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
InnoDB: Doing recovery: scanned up to log sequence number 21 3049941951
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 892204 row operations to undo
InnoDB: Trx id counter is 0 705024
091128 22:56:38 InnoDB: Starting an apply batch of log records to the database...
InnoDB: Progress in percents: 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 1472705.
InnoDB: You may have to recover from a backup.
091128 22:56:38 InnoDB: Page dump in ascii and hex (16384 bytes):
len 16384; hex (page dump snipped for brevity)
091128 22:56:38 InnoDB: Page checksum 125329116, prior-to-4.0.14-form checksum 707139577
InnoDB: stored checksum 431743609, prior-to-4.0.14-form stored checksum 742971059
InnoDB: Page lsn 21 3049098859, low 4 bytes of lsn at page end 3049675044
InnoDB: Page number (if stored to page already) 1472705,
InnoDB: space id (if created with >= MySQL-4.1.1 and stored already) 0
InnoDB: Page may be an index page where index id is 0 636
InnoDB: Database page corruption on disk or a failed
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 1472705.
InnoDB: You may have to recover from a backup.
InnoDB: It is also possible that your operating
InnoDB: system has corrupted its own file cache
InnoDB: and rebooting your computer removes the
InnoDB: error.
InnoDB: If the corrupt page is an index page
InnoDB: you can also try to fix the corruption
InnoDB: by dumping, dropping, and reimporting
InnoDB: the corrupt table. You can use CHECK
InnoDB: TABLE to scan your table for corruption.
InnoDB: See also http://dev.mysql.com/doc/refman/5.1/en/forcing-recovery.html
InnoDB: about forcing recovery.

Now, that was bad, but it wasn't a total disaster. I thought I'd just set innodb-force-recovery and drop the tables that the process had been editing. (The process was reading a very large log table and to turn it into a summary table.) Except that what followed foreclosed that possibility:

091128 22:56:38 - mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help diagnose
the problem, but since we have already crashed, something is definitely wrong
and this may fail.

It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 337721 K
bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

thd: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = (nil) thread_stack 0x20000
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

I updated to the latest stable MySQL from mysql.com, built it with
CC=gcc CFLAGS="-O2 -g" CXX=gcc CXXFLAGS="-O2 -g -felide-constructors -fno-exceptions -fno-rtti" ./configure --prefix=/usr/local/mysql --with-debug=full --with-extra-charsets=complex --with-plugins=innobase && make
, and ran it, to no avail.

As a postmortem, I whipped out GDB and did a trace, which I posted with a bug report. I don't know if anything will come of that, but I assume any segfault is considered to be a bug.

Today's lesson: have a backup.

Update: The MySQL people asked for the files. Alas, by that point, I had already deleted them.

No comments:

Post a Comment

About Me

blog at barillari dot org Older posts at http://barillari.org/blog