Folder Compaction
Folder compaction is the act of removing left-over messages from the local storage which are no longer needed. It’s particular issue for mbox stores, because each new message is just concatenated to the end of the mbox file and you can’t really delete them without rewriting the entire file. So deleted messages are usually just left in place and cleaned up later, via compaction.
NOTE: compaction is currently an mbox-specific operation. But it might also be required for other storage back-ends. For example, maildir stores could potentially end up with messages on disk which are no longer in the database.
But for the sake of simplicity, this document will just talk about mbox.
The Process
Compacting a folder means building a new mbox file, omitting any messages which are no longer in the database.
The .storeToken
value for each message in the database must be updated to point at the offset to the message in the new file, as it will have changed.
We need to be careful that the mbox and database (.msf) files are updated in sync.
A compacted mbox file will not work with the .storeToken
values in the old database.
Conversely, an updated database with new .storeToken
values will not work against the old mbox file.
Ideally we’d like to install the new mbox file and the new database file atomically. But this can’t be done.
The most atomic operation we have available is being able to rename a file on the same filesystem. So we can write to temporary files, then use rename to install them into place.
But even then we could (for example) lose power or crash after installing the new database file, but before installing the new mbox file. This would be leave the two files out of sync. So we also need to include a method to detect and mitigate that state.
The Details
Say we’re compacting mbox foo
and it’s corresponding foo.msf
database file (in the same directory).
These are the steps we take:
Create a
.compact-temp
subdirectory in the same location (i.e. on the same filesystem).Copy
foo.msf
to.compact-temp/foo.msf.compacting
.Read messages out from
foo
, writing the ones we want to keep into.compact-temp/foo.compacting
.Rename
.compact-temp/foo.compacting
to.compact-temp/foo.compacted
.Update the
.storeToken
values in.compact-temp/foo.msf.compacting
to correspond with.compact-temp/foo.compacted
.Move
foo
to.compact-temp/foo.original
.Rename
.compact-temp/foo.msf.compacting
to.compact-temp/foo.msf.compacted
.Move
foo.msf
to.compact-temp/foo.msf.original
.Rename
.compact-temp/foo.msf.compacted
tofoo.msf
Rename
.compact-temp/foo.compacted
tofoo
Done!
Clean up by deleting
.compact-temp/foo.original
and.compact-temp/foo.msf.original
, then by removing.compact-temp
dir entirely.
So the filename conventions are:
.compacting
- a file being compacted, but not yet complete..compacted
- a compacted file, but not yet installed (uncommitted data)..original
- the original uncompacted file, moved out of the ‘live’ location.
If we crash or power-off before step 6:
the old
foo
andfoo.msf
will be left intact, so the app will continue workingthere’ll be a residual
.compact-temp
dir left there, with partially-processed.compacting
files. These aren’t any use and can be deleted.There may be a completed “foo.compacted” mbox, but that’s no use without the “foo.msf.compacted” DB to go with it, and that doesn’t show up until step 7.
Steps 6 through 10 are all file renames and should be very rapid and simple. (Rough benchmark: they complete within 50 microseconds or so, but that was a debug build, with full logging).
So, we could be unlucky and crash or power-off, in which case we might be left in a bad state (e.g. we could crash at step 7, leaving the old foo.msf
, but no foo
).
However, the combination of .compacted
and .original
files gives us enough information to later detect what’s gone wrong and recover.
Recovery
This table gives the state of the relevant files and where they appear at each step of the compaction process.
after foo foo.msf foo foo.msf
step foo foo.msf .compacted .original
---------------------------------------------------------------
1 old old - - - -
2 old old - - - -
3 old old - - - -
4 old old yes - - -
5 old old yes - - -
6 - old yes - yes -
7 - old yes yes yes -
8 - - yes yes yes yes
9 - new yes - yes yes
10 new new - - yes yes
11 new new - - yes yes
12 new new - - - -
There’s not currently an automatic process to detect and recover from incomplete compaction. But we can use this table to map out the steps such a process would need to follow. Such a process could be applied manually if needed. Something like this:
if .compact-temp/ dir exists:
if foo is missing:
if .compact-temp/foo.compacted and .compact-temp/foo.msf.compacted exist:
copy them to foo and foo.msf
else if .compact-temp/foo.original and .compact-temp/foo.msf.original exist:
copy them to foo and foo.msf
else if foo.msf exists and .compact-temp/foo.compacted exists:
copy .compact-temp/foo.compacted to foo
(We could refine this to avoid a rollback if we’d crashed after step 9, but probably best to keep things simple!)
Implementation
The compaction system is layered, to try and keep the moving parts decoupled.
From the top down:
The entry point for compacting folders is the bare function AsyncCompactFolders().
It takes a list of folders to compact sequentially, and a listener to call when the whole thing is done.
Internally, it is implemented using the BatchCompactor
class.
BatchCompactor
orchestrates the whole operation, instantiating a FolderCompactor
to compact each folder.
It also handles GUI updates - taking progress and error reports from the FolderCompactor
and presenting them appropriately.
FolderCompactor
compacts a single folder. It handles updating the database and various folder housekeeping and notifications.
The actual mbox compaction is delegated to the storage layer, via nsIMsgPluggableStore.AsyncCompact()
.
The FolderCompacter
passes itself as a listener to AsyncCompact()
, which invokes callbacks to determine which messages to keep and when the mbox files are complete and ready to install.
The mbox AsyncCompact()
implementation of nsIMsgPluggableStore.AsyncCompact()
is implemented by the MboxCompactor
class.
MboxCompactor
iterates over each message in the store. It uses the listener (provided by FolderCompactor
) to ask which messages should be written into the new mbox file.
When done, it’ll invoke the .onCompactionComplete
on the FolderCompactor
to let it know it should install it’s own changes (the new DB file).
When that callback returns, the new mbox file will be installed.
MboxCompactor
can also update the X-Mozilla-*
headers inserted into messages in local folders as it goes.