Folder Compaction

Folder compaction is the act of removing left-over messages from the local storage which are no longer needed. It’s particular issue for mbox stores, because each new message is just concatenated to the end of the mbox file and you can’t really delete them without rewriting the entire file. So deleted messages are usually just left in place and cleaned up later, via compaction.

NOTE: compaction is currently an mbox-specific operation. But it might also be required for other storage back-ends. For example, maildir stores could potentially end up with messages on disk which are no longer in the database.

But for the sake of simplicity, this document will just talk about mbox.

The Process

Compacting a folder means building a new mbox file, omitting any messages which are no longer in the database.

The .storeToken value for each message in the database must be updated to point at the offset to the message in the new file, as it will have changed.

We need to be careful that the mbox and database (.msf) files are updated in sync. A compacted mbox file will not work with the .storeToken values in the old database. Conversely, an updated database with new .storeToken values will not work against the old mbox file.

Ideally we’d like to install the new mbox file and the new database file atomically. But this can’t be done.

The most atomic operation we have available is being able to rename a file on the same filesystem. So we can write to temporary files, then use rename to install them into place.

But even then we could (for example) lose power or crash after installing the new database file, but before installing the new mbox file. This would be leave the two files out of sync. So we also need to include a method to detect and mitigate that state.

The Details

Say we’re compacting mbox foo and it’s corresponding foo.msf database file (in the same directory).

These are the steps we take:

  1. Create a .compact-temp subdirectory in the same location (i.e. on the same filesystem).

  2. Copy foo.msf to .compact-temp/foo.msf.compacting.

  3. Read messages out from foo, writing the ones we want to keep into .compact-temp/foo.compacting.

  4. Rename .compact-temp/foo.compacting to .compact-temp/foo.compacted.

  5. Update the .storeToken values in .compact-temp/foo.msf.compacting to correspond with .compact-temp/foo.compacted.

  6. Move foo to .compact-temp/foo.original.

  7. Rename .compact-temp/foo.msf.compacting to .compact-temp/foo.msf.compacted.

  8. Move foo.msf to .compact-temp/foo.msf.original.

  9. Rename .compact-temp/foo.msf.compacted to foo.msf

  10. Rename .compact-temp/foo.compacted to foo

  11. Done!

  12. Clean up by deleting .compact-temp/foo.original and .compact-temp/foo.msf.original, then by removing .compact-temp dir entirely.

So the filename conventions are:

  • .compacting - a file being compacted, but not yet complete.

  • .compacted - a compacted file, but not yet installed (uncommitted data).

  • .original - the original uncompacted file, moved out of the ‘live’ location.

If we crash or power-off before step 6:

  • the old foo and foo.msf will be left intact, so the app will continue working

  • there’ll be a residual .compact-temp dir left there, with partially-processed .compacting files. These aren’t any use and can be deleted.

  • There may be a completed “foo.compacted” mbox, but that’s no use without the “foo.msf.compacted” DB to go with it, and that doesn’t show up until step 7.

Steps 6 through 10 are all file renames and should be very rapid and simple. (Rough benchmark: they complete within 50 microseconds or so, but that was a debug build, with full logging).

So, we could be unlucky and crash or power-off, in which case we might be left in a bad state (e.g. we could crash at step 7, leaving the old foo.msf, but no foo).

However, the combination of .compacted and .original files gives us enough information to later detect what’s gone wrong and recover.

Recovery

This table gives the state of the relevant files and where they appear at each step of the compaction process.

after                  foo  foo.msf    foo foo.msf
step    foo  foo.msf   .compacted      .original
---------------------------------------------------------------
1       old  old       -    -          -    -
2       old  old       -    -          -    -
3       old  old       -    -          -    -
4       old  old       yes  -          -    -
5       old  old       yes  -          -    -
6       -    old       yes  -          yes  -
7       -    old       yes  yes        yes  -
8       -    -         yes  yes        yes  yes
9       -    new       yes  -          yes  yes
10      new  new       -    -          yes  yes
11      new  new       -    -          yes  yes
12      new  new       -    -          -    -

There’s not currently an automatic process to detect and recover from incomplete compaction. But we can use this table to map out the steps such a process would need to follow. Such a process could be applied manually if needed. Something like this:

if .compact-temp/ dir exists:
  if foo is missing:
    if .compact-temp/foo.compacted and .compact-temp/foo.msf.compacted exist:
      copy them to foo and foo.msf
    else if .compact-temp/foo.original and .compact-temp/foo.msf.original exist:
      copy them to foo and foo.msf
    else if foo.msf exists and .compact-temp/foo.compacted exists:
      copy .compact-temp/foo.compacted to foo

(We could refine this to avoid a rollback if we’d crashed after step 9, but probably best to keep things simple!)

Implementation

The compaction system is layered, to try and keep the moving parts decoupled.

From the top down:

The entry point for compacting folders is the bare function AsyncCompactFolders(). It takes a list of folders to compact sequentially, and a listener to call when the whole thing is done. Internally, it is implemented using the BatchCompactor class.

BatchCompactor orchestrates the whole operation, instantiating a FolderCompactor to compact each folder. It also handles GUI updates - taking progress and error reports from the FolderCompactor and presenting them appropriately.

FolderCompactor compacts a single folder. It handles updating the database and various folder housekeeping and notifications. The actual mbox compaction is delegated to the storage layer, via nsIMsgPluggableStore.AsyncCompact(). The FolderCompacter passes itself as a listener to AsyncCompact(), which invokes callbacks to determine which messages to keep and when the mbox files are complete and ready to install.

The mbox AsyncCompact() implementation of nsIMsgPluggableStore.AsyncCompact() is implemented by the MboxCompactor class.

MboxCompactor iterates over each message in the store. It uses the listener (provided by FolderCompactor) to ask which messages should be written into the new mbox file. When done, it’ll invoke the .onCompactionComplete on the FolderCompactor to let it know it should install it’s own changes (the new DB file). When that callback returns, the new mbox file will be installed. MboxCompactor can also update the X-Mozilla-* headers inserted into messages in local folders as it goes.