DRBD limitations (or are they?)

I’ve recently received one database admin’s personal list of DRBD’s perceived limitations. While I’m certainly the last person to say that DRBD is limitation-free (hey, it’s software), I’d like to address these specifically — because really, in my humble opinion, most of them aren’t limitations at all.

The Secondary host sits idle, that’s wasted investment.

With DRBD and any active/passive failover solution, you always have two options:

  1. Run all resources on one host. When that host fails, let another take over. Means that one host is practically idle at any given time, but your application doesn’t take any performance hit in case of failover.
  2. Run half of your resources on one host, and the rest on the other. When one host fails, let all resources converge on the surviving host. Means better hardware utilization, but increased load in the failed-host case, and a possible performance penalty.

DRBD/Heartbeat is well suited for both scenarios. It is simply up to the administrator to decide whether the priority is on utilization when both hosts are alive, or on performance when one goes down.

Failover is not instantaneous, nor transparent.

Heartbeat failover is not instantaneous, but it’s good enough (usually less than 30 seconds) for most applications requiring failover capability. If you require sub-second failover for your database, get MySQL Cluster, Oracle RAC, etc.

But hey, Heartbeat failover is transparent. Your clients connect to a virtual IP address that fails over along with the rest of your resources. They’re unlikely to ever even find out that a failover situation occurred. How much more transparency do you need?

Recovery requires a lot of time to recover the database.

In case of manual switchover, no. Heartbeat will close your application on one node, move it over, and start it there, along with the corresponding DRBD devices.

In case of automatic failover, yes. But think about it: when do you automatically fail over? You guessed it: when your primary host goes down hard. Now if computers nicely shut down all applications and relinquished all open resources immediately before going down hard,  making any subsequent recovery process unnecessary, well, it would be a beautiful world… seriously, failover after a hard crash on the writable node will trigger some sort of recovery process, no matter what.

Recovery process can fail — requires reload.

Database recovery processes are generally outside DRBD’s realm. So if a crash recovery process actually fails, DRBD isn’t really to blame — especially when the database doesn’t support transactions and data is being lost in the recovery process because of that.

DRBD requires database journal capability, MySQL MyISAM does not work.

Not true. DRBD doesn’t require a transaction-capable backend when used in conjunction with a database. It’s just that since you build clusters for high availability, and thus for the very purpose of not losing data, you wouldn’t sensibly use a non-transactional database in any production setting anyway.

With DRBD, database operation is not continuous: planned downtime is still required.

Yes. There will always be major modifications that affect your application as a whole, and will thus necessitate planned system down time. This isn’t limited to database applications, by the way. However, see the next item.

DRBD does not address scaling or performance.

Quite the contrary. For example, to my knowledge, DRBD is the only solution for MySQL (besides MySQL Cluster, obviously) that enables scale-up without taking down the database master, following this procedure:

  1. Switch all resources over to backup server.
  2. Take down primary server.
  3. Pull primary server, upgrade CPU, stick in more RAM, add bigger disks, whatever. Or even replace the entire server with one with identical DRBD configuration.
  4. Bring up upgraded server.
  5. Resync.
  6. Switch back resources.
  7. Repeat on backup server.

Pure scale-out on the other hand isn’t what DRBD adds, specifically, to MySQL. DRBD works very well in conjunction with MySQL Replication to form a complete scale-out solution, though.

OS Limitations – DRBD runs only on Linux.

Entirely correct.

6 Responses to DRBD limitations (or are they?)

  1. Limitations of DRBD for MySQL: Fact or Myth?

    Last week, someone lists me several limitations of DRBD for MySQL. They are,

    Idle resource – secondary host sits idle, wasted investment
    Failover is not instant, nor transparent
    -”Cold standby” failover
    Recovery requires time to start / …

  2. […] DRBD limitations (or are they?) « Florian’s blog – January 23rd ( tags: mysql drbd ha article database linux performance tips tricks ) […]

  3. Peet says:

    Thanks Florian,

    We run a cluster with mysql,httpd,etc on DRBD. Other than performance issues, I have to say it is amazing. We used to run Protocol C Primary/Primary, but backups would kill performance, so we now use Protocol A Primary/Secondary.


    • Florian Haas says:

      Peet, if performance during backup runs is an issue, that’s often due to I/O contention between the backup application and your cluster service (probably MySQL). This is frequently caused by using an older kernel together with the CFQ I/O scheduler.
      Under those circumstances, switching to Protocol A is a bit like throwing the baby out with the bath water. Instead, stick to protocol C and try the deadline scheduler.

      • Peet says:

        Thank you Florian,

        I implemented your suggestion, did some tests and it seem to be quite stable. I’ll monitor and let you know if I experience issues.

        Would you agree that replacing the disks with SSDs would further improve performance for mysql (random read/write).


        • Florian Haas says:

          Well your app appears to be both I/O bound and latency critical. Which makes this a storybook situation where SSDs are highly likely to help. If you’ve got the cash.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: