24 September 2009

Disks are slow

I have a simple SQL query. It does an inner join of a small (1000 row) table against three or four other tables. The indices are all properly configured, so looking up the joined rows should take close to constant time. But the query still took 30+ seconds.

Then I remembered: the other tables are huge. And disk seeks are really slow.

You can run hdparm -tT and learn that your disk can pull 150 MB/sec straight from the metal -- but that's _sequential_ access. Peter Norvig reminded budding programmers that seeks are four orders of magnitude slower than sequential fetches. You can download from this page a tiny C program that tells you how slow your disk really is. My vintage-2008 laptop can seek about 55 times a second.

In the above query, the joined tables were colossal and the joined rows widely separated, so every read took a seek. Even with O(1) lookup time, if the constant in front of the big O is about 18 ms, that's 18 seconds. Since the query joined several such tables, the actual duration was a few times that.

Since the data that this query interrogates change rarely, I can just cache the result. I wrote a little Python decorator for that. It uses cPickle, but could easily be adapted to use memcached. Note that this function doesn't check for staleness. It expects an external process to purge stale cache entries.
def diskcache_wrap(fcn):
""" wrap this function in a diskcache access """
def wrapper(*args, **kwargs):
" the wrapper iteself "
hashl = hashlib.sha1()
if args:
hashl.update(cPickle.dumps(args))
if kwargs:
hashl.update(cPickle.dumps(kwargs))
digest = hashl.hexdigest()
path = os.path.join(CACHEDIR, "wrap-%s-%s" % (fcn.__name__, digest))
if os.path.exists(path):
try:
return cPickle.load(open(path))
except (cPickle.UnpicklingError, EOFError, IOError, ValueError):
os.unlink(path) # probably broken, trash it
result = fcn(*args, **kwargs)
if not os.path.exists(CACHEDIR):
os.mkdir(CACHEDIR)
cPickle.dump(result, open(path,'w'))
return result
return wrapper

No comments:

Post a Comment

About Me

blog at barillari dot org Older posts at http://barillari.org/blog