Every once in a while, people ask us something along the lines of “why do I need DRBD? Can’t I accomplish what it does by other means?” You mean build high availability clusters with block-level synchronous replication? Well, sure you can. But all of the available alternatives have serious drawbacks.
- Use a SAN. Well, SANs are great in terms of management, even though they’re sometimes prohibitively expensive. But your regular SAN box does not offer physically distributed redundancy at the data level. In order words, if your SAN box goes down, all your beautiful high availability infrastructure turns to shreds. And even if you naïvely believe (and you shouldn’t) that a storage box can never crash, just think about air conditioning going down in just that part of your data center where your storage shelf is at. Repair time of several hours means down time of several hours, even though your servers in a different cabinet may be up and running. You’re dealing with a single point of failure. Not high availability in my book.
- Use a SAN with native replication. This means using two separate storage boxes with synchronous block-level replication between them. This eliminates the above-mentioned SPOF and is available from just about any SAN vendor (using proprietary implementations under various product names). The downside is that it costs you serious bucks, and I am not referring to just the additional piece of hardware. Those firmware licenses can hit six figures. Plus, switchover times (changing the direction of replication) can be extremely long, up to 4 minutes in some cases. And there is little to no support for replication management from open source cluster management software.
- Use a SAN with host-based mirroring. This means that you have two separate SAN boxes, hosts import LUNs from both, and mirror those pairs using software RAID (such as md). Eliminates the SPOF and saves you dollars on firmware licensing. Downsides: you still need a SAN (and the associated infrastructure — fibre channel, for example, isn’t exactly cheap), and as such your clusters are still not shared-nothing. And the integration with open source cluster management is also lacking.
- Use host-based mirroring between a local device, and a network block device. In this case, you have one disk that is local, and another that is exported from a remote host using NBD or iSCSI. Those two disks are then mirrored with software RAID. Now this one is really terrible in terms of management. Role reversal always requires some custom glue, no support from cluster manages is available whatsoever, and split brain detection is poor or non existant. So if you really want to go down that alley then do — but please don’t call it a high availability cluster.
Compare this to DRBD: no need for a SAN, so you can use it to operate a fully shared-nothing cluster. No firmware licensing cost. No need for expensive infrastructure as everything can replicate over regular IP networks. Role switch in a matter of seconds. Tight integration with both Pacemaker and Red Hat Cluster Suite. Excellent split brain detection to make sure you don’t wreck your data accidentally. So if you are considering alternatives then that’s perfectly fine, but our soaring usage numbers are there for a reason.