“Alternatives” to DRBD

Every once in a while, people ask us something along the lines of “why do I need DRBD? Can’t I accomplish what it does by other means?” You mean build high availability clusters with block-level synchronous replication? Well, sure you can. But all of the available alternatives have serious drawbacks.

Use a SAN. Well, SANs are great in terms of management, even though they’re sometimes prohibitively expensive. But your regular SAN box does not offer physically distributed redundancy at the data level. In order words, if your SAN box goes down, all your beautiful high availability infrastructure turns to shreds. And even if you naïvely believe (and you shouldn’t) that a storage box can never crash, just think about air conditioning going down in just that part of your data center where your storage shelf is at. Repair time of several hours means down time of several hours, even though your servers in a different cabinet may be up and running. You’re dealing with a single point of failure. Not high availability in my book.
Use a SAN with native replication. This means using two separate storage boxes with synchronous block-level replication between them. This eliminates the above-mentioned SPOF and is available from just about any SAN vendor (using proprietary implementations under various product names). The downside is that it costs you serious bucks, and I am not referring to just the additional piece of hardware. Those firmware licenses can hit six figures. Plus, switchover times (changing the direction of replication) can be extremely long, up to 4 minutes in some cases. And there is little to no support for replication management from open source cluster management software.
Use a SAN with host-based mirroring. This means that you have two separate SAN boxes, hosts import LUNs from both, and mirror those pairs using software RAID (such as md). Eliminates the SPOF and saves you dollars on firmware licensing. Downsides: you still need a SAN (and the associated infrastructure — fibre channel, for example, isn’t exactly cheap), and as such your clusters are still not shared-nothing. And the integration with open source cluster management is also lacking.
Use host-based mirroring between a local device, and a network block device. In this case, you have one disk that is local, and another that is exported from a remote host using NBD or iSCSI. Those two disks are then mirrored with software RAID. Now this one is really terrible in terms of management. Role reversal always requires some custom glue, no support from cluster manages is available whatsoever, and split brain detection is poor or non existant. So if you really want to go down that alley then do — but please don’t call it a high availability cluster.

Compare this to DRBD: no need for a SAN, so you can use it to operate a fully shared-nothing cluster. No firmware licensing cost. No need for expensive infrastructure as everything can replicate over regular IP networks. Role switch in a matter of seconds. Tight integration with both Pacemaker and Red Hat Cluster Suite. Excellent split brain detection to make sure you don’t wreck your data accidentally. So if you are considering alternatives then that’s perfectly fine, but our soaring usage numbers are there for a reason.

This entry was posted on Wednesday, September 16th, 2009 at 8:59 and is filed under Pacemaker, Technical. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

10 Responses to “Alternatives” to DRBD

Marc Cousin says:

September 16, 2009 at 11:41

Hi. While I still totally agree with you on the conclusion, there is still another costly SAN option : you can add an extra layer, such as falconstor’s ipstor to do the mirroring between two san arrays. Not cheap as a solution (6 figures also), but at least there are no switchover downtimes when an array goes down (and I agree with you, it happens, and it is complete chaos when it happens).

Reply
- Florian Haas says:
  
  September 16, 2009 at 11:45
  
  Yes, so that’s SAN with a replication that is neither “native” (as in running inside the SAN controller firmware) nor “host-based” (as in running on the cluster host). Still, as you say, cost is a huge factor there as well. Generally you can resolve almost any high availability challenge by throwing more dollars at it, the question is whether you’re still being cost effective.
  
  Reply
bob says:

November 10, 2009 at 14:54

So… what’s the problem with sw RAID over two iSCSI?

* no licenses
* no fibrechannel infrastructure
* no problem?

It’s an honest question, it seems you left that one out.

Reply
- Florian Haas says:
  
  November 10, 2009 at 16:36
  
  If you want to use this for failover, it’s exactly the same manageability nightmare as the option with MD and NBD. And if you want to use iSCSI, why bother with host based mirroring on the initiator end when you can instead make your iSCSI target highly available, replicated, and redundant at the data level?
  
  Reply
  - bob says:
    
    November 12, 2009 at 20:18
    
    What is more nightmarish if you host-based replication (say raid1) to two iSCSI targets than with DRBD? Isn’t it basically the same model?
    
    Well, except that clustered FSs (GFS, OCFS2) may not work great on top of a indivitually but concurrently assembled raid. But if you’re not running clustered FSs then that shouldn’t matter.
    
    And the point of not making a redundant iSCSI target is because that’s what makes traditional SANs expensive, isn’t it?
    
    Reply
    - Florian Haas says:
      
      November 12, 2009 at 20:28
      
      It’s definitely not the same model, as MD over iSCSI is not cluster aware.
      And what I meant is not to purchase a proprietary redundant storage device and spend tons of bucks, but instead configure a DRBD-backed open source iSCSI target with Pacemaker to serve as your SAN. Using commodity hardware. With an 80% cost reduction.
      
      Reply
      - bob says:
        
        November 15, 2009 at 13:29
        
        I’m not getting what the difference is though in practice and how “events” are handled.
        
        Say you have two separate iSCSI targets. You can use these in two ways:
        
        1) DRBD. Attach one on machine A, and the other on machine B. Connect them with DRBD and mount the filesystem on machine A only. Run the app or whatever on machine A. When machine A or the iSCSI target attached to machine A fails, machine B becomes master and will have to trigger a script to mount the filesystem and start the app.
        
        2) MD. Attach both iSCSI targets on machine A, mount and run app. When machine A fails, attach, mount & run on machine B instead. When one iSCSI target fails, continue running on the existing one.
        
        Now that I’ve written it down I see the problem with option 2. How do I make sure that only one of the machines are read/write with the iSCSI targets at any one time. There is no “lock” here.
        
        Is that what you meant?
Adam says:

December 21, 2009 at 17:10

I’m with bob. Sounds like DRBD adds nothing to MD+NBD other than split-brain detection.

– a

Reply
- Florian Haas says:
  
  January 7, 2010 at 10:01
  
  I wonder why there’s zero integration in any cluster stack other than SteelEye (commercial) for the MD/NBD bundle then. Maybe you want to write one?
  
  Reply
  - Aleš Kapica says:
    
    January 5, 2012 at 0:17
    
    I use in my cluster enviroment for system discs virtual machines SW RAID 6 devices from NBD devices. DRBD is used too – as shared storage for it.
    
    Reply

Florian's blog