No, DRBD doesn’t magically make your application crash safe

It is a common misconception that DRBD (or any block-level data replication) solution can magically make an application crash-safe that intrinsically isn’t. Baron highlights that misconception in a recent blog post.

I want to reiterate and stress that point here: if your application can’t reliably survive a node crash, it won’t successfully fail over on a replicated (or shared, for that matter) data device. But if it can, and DRBD is replicating synchronously, then DRBD won’t break it. In other words: try pulling the power plug on your machine while your app is running, and power back on. If your application recovers to a consistent state, you’re clear. If it doesn’t, don’t bother adding DRBD until you fix that.

You must fix any layer in your stack that isn’t crash safe, if you even want to start thinking about high availability. ext2, which Baron mentions in his post, isn’t crash safe. MySQL with a database using the MyISAM storage engine isn’t crash safe. KVM with virtual block devices in cache=writeback mode isn’t crash safe. Running on a RAID controller with the write cache enabled when its battery is dead isn’t crash safe.

Thus, if you want high availability, use ext3. Or ext4. Or any journaling file system. Use InnoDB for MySQL. Use cache=none for KVM. And check those batteries. It’s that simple.

5 Responses to No, DRBD doesn’t magically make your application crash safe

  1. Robert says:

    Nice point, some customers expect “the IT magic” to happen and forget about proper end-to-end system design.

    Just a short question about “…But if it can, then DRBD won’t break it….”. Is this also true for asynchronous DRBD with multiple LUN’s in the meantime ?

    I remember some time ago – write order consistency was an issue with multiple lun’s in a VG and running async DRBD. Does DRBD now provide consistency groups (on the webpage on the “roadmap” I found that it still needs to be implemented.

    Regards,
    Robert

    • Florian Haas says:

      Robert, post updated to specifically mention synchronous replication. Thanks for pointing that out. Even when replicating asynchronously, though, the block device on the peer will always be in a “consistent” state from the block layer’s perspective. And if your app is crash safe, it will recover properly from that. But since the peer may have lagged slightly behind its master at the time of the crash, you may recover to, say, one transaction earlier than you were. Such is the nature of asynchronous replication.

      And no, we don’t guarantee write-after-write dependency across multiple devices. But if you want to carve your DRBD into volumes, just do it the other way round: make your DRBD an LVM PV, create a volume group from that, and split that into multiple logical volumes.

  2. Shlomi Noach says:

    Hi Florian,

    Having a failed battery does not affect DRBD, am I wrong? The data should be replicated regardless of battery.

    • Florian Haas says:

      Well certainly the data will be replicated. Question is whether it ever makes it to the physical disk. This problem is actually rather easy to resolve by configuring your controller to automatically switch to write-through mode once the battery dies — or by using a controller with flash-and-capacitor-backed write cache.

  3. [...] heartbeat monitoring script can make nearly any application highly available, as long as it can recover cleanly from a [...]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: