[Postfixbuch-users] Hochverfügbares Mailsystem
stepken
stepken at web.de
So Feb 8 05:25:52 CET 2009
Sehr interessant ist das System "slots and stores" von Fastmail.fm,
einem sehr, sehr großen Provider mit excellenter Performance und
mehreren 100 Millionen Mailboxen. Daran kann man lernen, wie es richtig
geht, also ein System, was mal wirklich skaliert, hochverfügbar ist, und
nicht unter Last zusammenbricht, wie ein RAID5,10 o.ä. .... Das System
ist auch für andere hochverfügbare Serversysteme geeignet. Sehr
interessant auch ist GlusterFS.
Have fun, Guido Stepken
Rob Müller von Fastmail.fm hierzu:
We don't use a murder setup. Two main reasons. 1) Murder wasn't very
mature when we started 2) The main advantage murder gives you is a set
of proxies (imap/pop/lmtp) to connect users to the appropriate backends,
which we ended up using other software for, and a unified mailbox
namespace if you want to do mailbox sharing, something we didn't really
need either. Also the unifed mailbox needs a global mailboxes.db
somewhere. As it was, because the skiplist backend mmaps the entire
mailboxes.db file into memory, and we had multiple machines with 100M+
mailboxes.db files, I didn't really like the idea of dealing with a
500M+ mailboxes.db file.
We don't use a shared SAN storage. When we started out we didn't have
that much money, so purchasing an expensive SAN unit wasn't an option.
What we have has evolved over time to our current point. Basically we
now have a hardware set that is quite nicely balanced with regard to
spool IO vs metadata IO vs CPU, and a storage configuration that gives
us replication with good failure capability, but without having to waste
lots of hardware on just having replica machines.
IMAP/POP frontend - We used to use perdition, but have now changed to
nginx (http://blog.fastmail.fm/?p=592). As you can read from the linked
blog post, nginx is great.
LMTP delivery - We use a custom written perl daemon that forwards lmtp
deliveries from postfix to the appropriate backend server. It also does
the spam scanning, virus checking and a bunch of other in house stuff.
Servers - We use servers with attached SATA-to-SCSI RAID units with
battery backed up caches. We have a mix of large drives for the email
spool, and smaller faster drives for meta-data. That's the reason we
sponsored the metapartition config options
(http://cyrusimap.web.cmu.edu/imapd/changes.html).
Replication - We initial started with pairs of machines, half of each
being a replica and half a master replicating between each other, but
that meant on a failure, one machine became fully loaded with masters.
masters take a much bigger IO hit than replicas. Instead we went with a
system we calls "slots" and "stores". Each machine is divided into a set
of "slots". "slots" from different machines are then paired as a
replicated "store" with a master and replica. So say you have 20 slots
per machine (half master, half replica), and 10 machines, then if one
machine fails, on average you only have to distribute one more master
slot to each of the other machines. Much better on IO. Some more details
in this blog post on our replication trials...
http://blog.fastmail.fm/?p=576
Yep, this means we need quite a bit more software to manage the setup,
but now that it's done, it's quite nice and works well. For maintenance,
we can safely fail all masters off a server in a few minutes, about
10-30 seconds a store. Then we can take the machine down, do whatever we
want, bring it back up, wait for replication to catch up again, then
fail any masters we want back on to the server.
Unfortunately most of this software is in house and quite specific to
our setup, it's not very "generic" (e.g. it assumes particular disk
layouts and sizes, machines, database tables, hostnames, etc) to manage
and track it all, so it's not something we're going to release.
Rob
Mehr Informationen über die Mailingliste Postfixbuch-users