I recently came across this blog post with the catchy title of “DRBD and MySQL: Just Say No”. Now while I have absolutely no issue with people not liking DRBD or finding that it doesn’t fit their needs, I couldn’t help but notice that the post recycles some persistent myths about DRBD, which could use some correction.
I’ve tried to reply using a blog comment, but alas it seems I was moderated to
/dev/null. Enter the “Write Post” button on my trusted WordPress dashboard.
So let’s look at the alleged “MySQL with DRBD Minuses” mentioned in said post I am referring to:
DRBD partition corruption means failover node would be unusable (disadvantage of shared storage) and failback could destroy original master too.
If the filesystem that sits on top of DRBD gets corrupted, it will be equally corrupted on the peer node after failover. DRBD is a block device; it is agnostic of layers above it. That much is correct. Which incidentally is also the reason why you can use DRBD to add HA not only to databases, but to file services, virtualization, storage, etc. But I’m going off on a tangent.
“Failback could destroy the original master too”, however, is plain false. DRBD won’t “destroy the original master” any more than it already was if the filesystem on top of DRBD was fried beforehand.
Now as for the actual partition (meaning the backing device DRBD resides upon), DRBD adds to data security, not the opposite. DRBD will automatically detach from backing devices that throw I/O errors. And if you happen to get random errors (“bit flips”) in local data blocks due to I/O subsystem malfunction, online verify will catch that.
If the master panics, then after failover both fsck and transaction logs replay must be performed.
Transaction log replay, yes. But
fsck? These days this amounts to running a journal replay. Takes under a second in most circumstances.
NIC and network corruption is also propagated.
False again. We have end-to-end replication integrity checking to prevent just that.
Failover node is a cold standby, cannot accept database traffic if that would change the DRBD partition.
The failover node is a hot standby, it’s just not a running slave node from the database’s standpoint. And, nothing stops you from running two databases on two servers on two DRBD devices laid out in a “criss-cross” fashion, converging on one node in case of node failure.
Could generate a lot of network traffic.
On a busy database, yes. But if you follow our design guidelines that always recommend a separate DRBD replication connection, this won’t hurt your application at all.
Cannot do maintenance on cold standby database.
But you can do anything you want with a database that you run off a DRBD LVM snapshot. Works on a Secondary node too.
2 heartbeats needed on a reliable, local network.
So I don’t see how this would be a minus, but then maybe that’s just me.
So to sum up yes we’ve seen all this before, and surprise surprise Eric Bergen’s “DRBD in the real world” has been quoted in that post as well. Now while I concede that some of the points Eric had made were valid at the time (and some continue to be), a lot of what he said then is now outdated, superseded, or has been addressed in DRBD releases made months ago. But to Eric’s credit, he fostered a lively discussion in the comments to his post, so I do encourage you to take a look.