Introduction
Thunderbird uses a per-folder database model and this document aims to describe the roles and interactions of a folder, its database, and the local storage.
Foundational concepts
It is important to note that locally each folder will contain both:
message database file - contains metadata about each message in the folder (relatively small)
message store file - contains the raw message information of each message in the folder (can be quite large). Currently this is either an
mbox
file or amaildir
directory.
Message Database File
Folders group messages and each folder has a message database file ending in
.msf
(e.g. INBOX.msf
) that describes what is in the folder. This file is a
Mork
database file that contains the header information for each message in the
folder.
NOTE
Mork is a Mozilla specific technology that will eventually be phased out with upcoming work to implement a global message database (Bug 1572000) where it will be replaced with a single SQLite database that spans all folders.
This database file contains metadata about each message, some of which parsed from the message headers, and some are added later (e.g. read status, tags, spam score). You can find a list of the parsed headers here. Some of the database header fields have data that is “massaged” from one or more message headers. So it’s not always a case of verbatim copying values for the headers we’re interested in. Another point worth noting is that there are fields in the database which don’t appear in the message headers. For example, the spam rating we assign.
The way the database is updated depends on the folder’s protocol (IMAP, EWS, etc.) and the folder knows what protocol it’s based on. In this way, the folder and protocol are tightly intertwined. Also the folder has to know about the protocol in order to issue commands to the server (“set the ‘read’ flag on message 1453”), and to receive data back from it (eg “a new message has arrived”).
So a folder always knows about the specific protocol it’s representing. The protocols tend to use “sink” interfaces to talk back to the folder (the folder implements the sink interfaces, e.g. nsIImapMessageSink). But all the sink interfaces are currently protocol-specific, so it’s not much of an abstraction.
For example, the database for IMAP folders is initially populated by requesting the raw RFC5322 headers from the server (without the body) for all messages. So you’re in the situation where you know about a message (it’s in the DB), but it might not have a local downloaded copy - so no message body. For local folders, you’re always dealing with full, raw RFC5322 messages. As they are streamed into the folder (either from POP3, say, or from copying from another folder), the headers are parsed and the DB entries are added.
This database file does not exist on the IMAP or EWS server; it is only stored locally. The server stores messages in whatever way it wants to. EWS or IMAP servers have to send Thunderbird the raw message (or just the raw headers) in RFC5322 form. As long as they do that, we don’t care how they store things locally.
NOTE
Other protocols could use a wire format other than RFC 5322 (e.g. JMAP uses JSON and RSS uses atom or XML). But we come from an email-centric viewpoint and RFC5322 is used for local storage and display. So non-RFC 5322 messages need to be converted (forcibly coerced!) into RFC 5322.
Our .msf database files are also used for the other folder types: local folders, NNTP, rss, etc. The database “schema” is largely the same, although the various different folder types do poke a few protocol-specific values in here and there. And there’s a few annoying cases where the same fields have slightly different meanings to different protocols (eg 1930003). But basically, the database has all the info needed to display a list of messages in a folder, regardless of folder type.
Local Message Storage
The local message storage is implemented in either an mbox
file or a maildir
directory. It uses the nsIMsgPluggableStore
interface to represent the
locally downloaded set of messages in a given folder. Both mbox
files and
maildir
directories (set in per-connection preferences) will contain the
complete raw message information for each message in that folder that has been
locally downloaded. For the rest of this document, we will refer to this stored
complete set of message information in the folder as the message archive.
NOTE
Auto downloading of messages is optional for IMAP, so it is possible to have a populated database while the message archive is empty. However, the IMAP folder default is to generate the database file and download all of the messages in the background (into the message archive). Conversely, EWS currently only stores database entries, and downloads messages as needed (e.g. when the user requests to display a message).
IMAP
When Thunderbird requests folder information from an IMAP server, if the folder is set to not automatically download all of the messages in the folder, then the server sends over just the message headers, in order to populate the message database. In this case, the message archive continues to remain empty and when a user selects a message to view, the message is downloaded “on demand”.
If an IMAP folder is set to automatically download all of the messages in the folder, then the server sends over the message headers and the database is populated. This makes the list of messages appear immediately to the user. Then the actual full downloading proceeds in the background to create a fully populated message archive.
EWS
When Thunderbird requests folder information from an EWS server, the server sends over the list of the message headers and the database file is generated. The message archive remains empty until a user views a message, triggering the downloading of that message and initially populating the message archive.
Locating a specific message in a folder
The database file contains a column for the storeToken
that points to a
location of a specific message within the folder.
mbox
: ThestoreToken
value is a number that represents an offset from the start of the database file.maildir
: ThestoreToken
value is the filename of the message.
The message database references local messages with this storeToken
.
Deleting a message from a folder
Since mbox
is the default for local storage, we will focus on the mbox
case.
If a folder has many messages, then the corresponding mbox
file can become
quite large and rewriting this file frequently would be a major hit on
performance. For this reason, every time a message is deleted, we do not rewrite
the mbox file to remove the message. There are extra header fields to help us
mark a message as being deleted without needing to rewrite the entire mbox file.
When a message is deleted, there will be the “deleted” flag added to the header
in both the database and mbox
files. If the message exists on the server as
well, then the server is told to kick off a deletion operation.
Folder Compaction
Folder compaction is simply the act of rebuilding the mbox
file to remove any
messages with the “deleted” flag. The frequency of auto-compaction can be set in
the Thunderbird settings. When this operation is initiated, it goes through
every message in the local storage and keeps all messages that have not been
marked as deleted.
IMAP Example
Folder marked to not download messages
Consider the scenario where a new employee sets up Thunderbird with their
corporate email and is given access to shared folders that contain thousands of
emails. The corresponding mbox
file of all of those messages in one folder
could be massive so the user would not want this to be automatically downloaded.
(This is a case where the folder would be set to not download automatically.)
The user has freshly connected and has not clicked on any of these IMAP messages
to download yet. In this case, the user would have a message database file and
an empty mbox
file.
Then the user clicks and reads 5 messages. Now their database file is the same
and because the folder has been marked to not download, their mbox
file
remains empty.
Then the user decides to delete 2 messages. Now their database file has the
“deleted” flag added to those 2 messages to be deleted and the mbox
file is
still empty.
Then either the user initiates a folder compaction operation or it happens on the scheduled frequency. Now their database file has those messages removed from it and a delete operation kicks off on the IMAP server.
Folder marked to download messages
Consider the scenario where a person sets up Thunderbird with their personal
gmail account. The inbox of gmail by default is set to download messages. When
this account is connected, the message headers are gathered and the message
database for that folder (inbox, in this case) is generated. Then the message
downloads begin in the background to form the mbox
file that contains the raw
message data from each message in the inbox.
Then the user decides to delete 2 messages. Now the “deleted” flag has been
added to those 2 messages (in both the message database and mbox
file).
Then either the user initiates a folder compaction operation or it happens on the scheduled frequency. Now their database file has those messages removed from it, their mbox file is rewritten to remove those messages, and a delete operation kicks off on the IMAP server.
EWS Example
Consider any case involving an EWS folder. When an EWS server is connected, the
message headers are gathered to generate the message database for a given folder
and the mbox
file starts off being empty.
Then the user clicks and reads 5 messages. Now their database file is the same
and they have an mbox
file populated with the 5 raw messages.
Then the user decides to delete 2 messages. Now the “deleted” flag has been
added to those 2 messages (in both the message database and mbox
file) and a
delete operation kicks off on the EWS server immediately.
Then either the user initiates a folder compaction operation or it happens on the scheduled frequency. Now their database file has those messages removed from it, and their mbox file is rewritten to remove those message.