Background
There are several problems with our existing mail architecture.
- Current mail server pools seem larger than necessary. For example,
the POP/IMAP pool consists of 16 servers (of various sizes), while the
SMTP pool consists of 12.
- The central mail storage is on Network Appliance filers, which only
operate over NFS, and are expensive to purchase and maintain.
- Being dependent on NFS limits our ability to switch to more
efficient mailbox formats that may become more desirable as user
spools become larger.
- The most significant feature of Network Appliance filers are
snapshots, which allow us to restore users spools. However,
there can be a gap of up to two hours between the time a spool is
damaged and the time a snapshot is taken.
There are several benefits to our existing mail architecture.
- The Network Appliance filers provide very high performance for
network based storage, have very high availability, and have
an excellent track record of support.
- The complexity of the service is hidden from users, who generally need
only to know imap.columbia.edu and send.columbia.edu.
- The service is based entirely on free, OpenSource software.
Proposed Architecture
The centerpiece of the proposed architecture is a small pool of
servers referred to here as local mail concentrators. These
are mail hubs that manage user spools (inboxen) locally and provide
access to NFS based ~/mail, via IMAP, POP, and SMTP.
The local spools would be stored in mbx or a similar indexed
format on some form of RAID. The mailboxes stored in ~/mail
would remain in mbox format.
Assumptions
- Demand for access to INBOX/mbox is significantly greater
than that for access to ~/mail.
- Duplicating on delivery provides an acceptable alternative to snapshots.
- IMAP Referrals are not a solution as they are dependent on proper
implementation in clients, and there is no equivalent for POP.
- Opt-in for POP will not solve all mailbox corruption issues.
Mail Delivery
- As currently happens, incoming mail from authenticated users is
delivered to send.columbia.edu and incoming mail from
unauthenticated users is delivered to columbia.edu.
- Incoming mail is filtered according to its authentication status.
Currently, filtering is processed on the incoming smtp hosts. In
this proposal, the filtering can continue on the incoming smtp
hosts, or it can be handed off to a separate pool.
- After filtering, some mail will be immediately queued for
outgoing delivery. This includes mail sent from authenticated
users, and potentially mail from other Columbia sources that is
routed through AcIS servers for filtering. Other mail will be
queued for internal delivery.
- Mail queued for internal delivery will be handed off to a
delivery relay. The delivery relay will determine if the
recipients of each message are local or forwarded. If forwarded,
the mail will be queued for outgoing delivery. If local, the
delivery relay will determine which local mail concentrator is
responsible for the recipient and will pass the message there.
- The local mail concentrator will write two copies of the message,
one to the user's active spool and the other to a write-only
redundant spool. The redundant spools are rotated nightly, and
purged after a fixed amount of time as determined by policy (for
example, after seven days).
Note: If a message for a local user is forwarded by the
delivery relay directly to the outgoing relay, no copy of the
message will be written to the redundant spool. If the
forwarding is handled by the local mail concentrator, a copy can
be maintained. Thus, where .forward and
.procmailrc are processed is significant.
Note: Because messages are copied to the redundant spool
on delivery and are erased every n days, messages older
than n days that are kept in active spool can not be
easily retrieved if the active spool is damaged. Restoral from
tape would be necessary (or see description of mail reading,
below).
Mail Reading
- The client connects to imap.columbia.edu or
pop.columbia.edu, both of which are proxy servers.
- The proxy server determines which local mail concentrator holds
the mail spool for the user, and relays the session there.
- The local mail concentrator retrieves the spool from the active copy
on its local filesystem, and retrieves any mailboxes stored in
~/mail from the filers over NFS.
Note: One option to address the issue of spool restorals
mentioned in the mail delivery section above would be to copy the
user's entire active spool to the redundant spool on the user's
first login of the day.
Next Steps
- Evaluate the performance and cost of each proposed component
against the performance and cost of the current infrastructure.
- Evaluate the costs and benefits of using a commercial solution
for mail filtering, and if the evaluation is positive evaluate
and select a vendor.
|