>> Mail Architecture Proposal
Status | Content

Status
This document is a proposal.

Content

Background

There are several problems with our existing mail architecture.
  1. Current mail server pools seem larger than necessary. For example, the POP/IMAP pool consists of 16 servers (of various sizes), while the SMTP pool consists of 12.

  2. The central mail storage is on Network Appliance filers, which only operate over NFS, and are expensive to purchase and maintain.

  3. Being dependent on NFS limits our ability to switch to more efficient mailbox formats that may become more desirable as user spools become larger.

  4. The most significant feature of Network Appliance filers are snapshots, which allow us to restore users spools. However, there can be a gap of up to two hours between the time a spool is damaged and the time a snapshot is taken.

There are several benefits to our existing mail architecture.
  1. The Network Appliance filers provide very high performance for network based storage, have very high availability, and have an excellent track record of support.

  2. The complexity of the service is hidden from users, who generally need only to know imap.columbia.edu and send.columbia.edu.

  3. The service is based entirely on free, OpenSource software.

Proposed Architecture

The centerpiece of the proposed architecture is a small pool of servers referred to here as local mail concentrators. These are mail hubs that manage user spools (inboxen) locally and provide access to NFS based ~/mail, via IMAP, POP, and SMTP.

The local spools would be stored in mbx or a similar indexed format on some form of RAID. The mailboxes stored in ~/mail would remain in mbox format.

Assumptions

  1. Demand for access to INBOX/mbox is significantly greater than that for access to ~/mail.

  2. Duplicating on delivery provides an acceptable alternative to snapshots.

  3. IMAP Referrals are not a solution as they are dependent on proper implementation in clients, and there is no equivalent for POP.

  4. Opt-in for POP will not solve all mailbox corruption issues.

Mail Delivery

  1. As currently happens, incoming mail from authenticated users is delivered to send.columbia.edu and incoming mail from unauthenticated users is delivered to columbia.edu.

  2. Incoming mail is filtered according to its authentication status. Currently, filtering is processed on the incoming smtp hosts. In this proposal, the filtering can continue on the incoming smtp hosts, or it can be handed off to a separate pool.

  3. After filtering, some mail will be immediately queued for outgoing delivery. This includes mail sent from authenticated users, and potentially mail from other Columbia sources that is routed through AcIS servers for filtering. Other mail will be queued for internal delivery.

  4. Mail queued for internal delivery will be handed off to a delivery relay. The delivery relay will determine if the recipients of each message are local or forwarded. If forwarded, the mail will be queued for outgoing delivery. If local, the delivery relay will determine which local mail concentrator is responsible for the recipient and will pass the message there.

  5. The local mail concentrator will write two copies of the message, one to the user's active spool and the other to a write-only redundant spool. The redundant spools are rotated nightly, and purged after a fixed amount of time as determined by policy (for example, after seven days).

    Note: If a message for a local user is forwarded by the delivery relay directly to the outgoing relay, no copy of the message will be written to the redundant spool. If the forwarding is handled by the local mail concentrator, a copy can be maintained. Thus, where .forward and .procmailrc are processed is significant.

    Note: Because messages are copied to the redundant spool on delivery and are erased every n days, messages older than n days that are kept in active spool can not be easily retrieved if the active spool is damaged. Restoral from tape would be necessary (or see description of mail reading, below).

Mail Reading

  1. The client connects to imap.columbia.edu or pop.columbia.edu, both of which are proxy servers.

  2. The proxy server determines which local mail concentrator holds the mail spool for the user, and relays the session there.

  3. The local mail concentrator retrieves the spool from the active copy on its local filesystem, and retrieves any mailboxes stored in ~/mail from the filers over NFS.

    Note: One option to address the issue of spool restorals mentioned in the mail delivery section above would be to copy the user's entire active spool to the redundant spool on the user's first login of the day.

Next Steps

  1. Evaluate the performance and cost of each proposed component against the performance and cost of the current infrastructure.

  2. Evaluate the costs and benefits of using a commercial solution for mail filtering, and if the evaluation is positive evaluate and select a vendor.