08 August 2010


I have fourteen years of saved email -- about 290,000 messages. Most of this is useless junk: bulk mail, long-expired announcements, reminders about bills or bank statements, mailing lists, error messages, and spam. But buried in that muck is almost all of my online correspondence since 1996.

Accordingly, that message store can answer important questions. The one I have in mind is: Who did I used to talk to? Who have I fallen out of touch with?

Superficially, this is simple: just write a program to list everyone whom I've emailed and who has emailed me back (or vice versa). Sort them by the date of last contact, let me filter by the number of messages.

That's the C- approach. A better solution would acknowledge that people use different email addresses. Multiple-emails-per-person creates two complications, one minor and one major. The minor complication is that some people will appear in the list multiple times, once for each address. A bigger deal is that some people will get lost. If I only exchange one or two pieces of email with someone (for instance, because we were in a class together and tended to talk in person), but from different addresses, that approach will miss it entirely. I write to someone@corporate-mail.com He responds from someone@personal-mail.com. The naive approach above won't connect those two messages.

Fortunately, email headers often contain names as well as addresses: "Lyndon Johnson ." If you consolidate addresses with the same name or similar names (e.g., normalize "Johnson, Lyndon B." and "Lyndon Baines Johnson" to "Lyndon Johnson"), you might be able to group addresses by person. I implemented another C- solution: it works, but it asks the user about each potential merge. Unfortunately, the low-frequency addresses that you want are also intermixed with spam, so it will take a little while to say "yes" or "no" to each one if your mail store has a fair bit of spam in it -- as mine does. I'll implement a fix for this (perhaps a spam filter) at some point.

I put this together, building on top of my college pal Mihai's super-cool Mail Trends. Here's what it looks like:

You can set the minimum days since the last message observed, minimum messages from, and minimum to. You can also hide an entry for a month, three months, a year (if you want to be reminded to contact someone, but not just yet), or forever (to filter out mail from, say, your cable company help desk).

The system is a bit like etacts, but it isn't hosted---everything lives on your personal computer. (I'm pretty paranoid about email and don't like the idea of a random company having access to it, regardless of how trustworthy they may be. All of the code in this system is open-source, so you can see what it does for yourself.)

Right now, this is strictly nerd-ware: you will need to know a fair bit about basic Un*x tools and possibly a bit of Python programming to get it to work. If the response is positive, I can certainly put together a nicely-packaged version. I think it would make a nice mobile-phone app (so it can remind you "hey, it's been six months since you've talked with x").

If you would like to try it, here's how. Open a shell prompt. (On Windows, you will probably need Cygwin.)

1) Install Python 2.6 (or 2.7) and the Cheetah template language and (optionally) the CherryPy web development system, version 3. On a Debian/Ubuntu-based system, you should be able to just type:
$ sudo apt-get install python-cheetah python2.6 python-cherrypy3

2) Download mail-trends-lost-contacts.tar.gz. (If you prefer, you can also download mail-trends and apply my patch instead.)

3) Run the program. If you use Gmail, the command will be something like:

python2.6 main.py --server=imap.gmail.com --use_ssl --username=you@gmail.com --me=you@gmail.com,you@some.other.place.com --skip_labels

Be sure to replace you@gmail.com with your actual address and list all of the addresses from which you send or receive mail under --me=, separated with commas. (If you don't list them, the program can't figure out which messages are actually from or to you.

If you use a non-Gmail imap server, the command is slightly different:

python2.6 main.py --me=you@school.edu,you@work.com,you@personal.net --server=your.imap.server.com --use_ssl --username=YOUR_IMAP_USERNAME --skip-mailboxes=spam,trash

Instead of specifying --skip-mailboxes=, you can also specify --include-mailboxes=, which will include only the mailboxes listed.

If you want to try the address-consolidation feature (which will ask you lots of questions), add the option --interactive-disambiguation. If you want to use the "remind me in x days" feature, add the option "--web-server=10000", where 10000 is the port on which to run the web server.

To use the system, go to in your browser (if you used the --web-server option, setting the port appropriately) or open the file out/index.html (if you didn't use that option).

Enjoy. Let me know what you think.

No comments:

Post a Comment

About Me

blog at barillari dot org Older posts at http://barillari.org/blog