Originally published at the Self blog.
Basic informations here: https://github.com/Bystroushaak/self_mbox
I am beginner, so naturally I want to learn as much information about self, as available. Existing web pages didn’t provide enough, so I decided to read whole Self mail-list archive to gain clearer idea about what Self is and in what state it is.
Mail-lists doesn’t work exactly like the classical web-discussions and forums, to which I and people from my generation are used to. I don’t really mind the mail interface, but lack of history and context for newcomers is discouraging.
Self mail-list archive is available via Yahoo web-interface, which really isn’t best for reading big datasets, and also via Usenet, which is this weird old thing. Even if you can make it work and find suitable server, you may find, that it doesn’t archive whole history. For example, news.gmane.org, which I am using archives only 72 messages.
So I’ve decided to put everything together into one .mbox file, which I then uploaded to my mail account, so I could read it from everywhere (home, notebook, mobile phone, ebook reader, ..).
Mbox file is format used by mail programs to store multiple messages in one file. It is supported (at least for import) by almost all desktop mail clients.
I’ve found the 1990-1998 archive here in HTML format:
It was created by HMonArc, so it can be easily converted back to .mboxusing mhn2mbox. Resulting .mbox file is here:
Archives from 1998 – 2016 may be found in the Self user group at Yahoo groups, which has really bad user interface. Luckily, the interface uses REST backend, which may be used for obtaining almost original posts.
For this, you may use the grab_self_mail_list.py script:
Output from this script is Shelve file used as cache to prevent redownload of everything every time you run the script. The cache file may be then converted to .mbox format by using
This will give you file with only Yahoo posts:
You may join it together with Old Self interest and get full archive. If you don’t want to do it manually, it may be downloaded from here:
This file may be then loaded to your favorite mail client (after you unpack it from lzma), where you may process the messages as you wish.
Resulting file contains 4334 emails and it took me 26 days to read it all. What I’ve found there will be subject of my next blog.