Internal metadata, and why we recommend it

One of the things that repeatedly seem to puzzle users about the DRBD is the question of whether to use internal or external metadata. Remember, DRBD sets aside a small area on a local disk (on every cluster node) where it keeps the Activity Log, the quick-sync bitmap, data generation UUIDs, and a few other bits and pieces for local housekeeping.

The specific aspect that is to be discussed here is the Activity Log. Without going into too much detail, let’s be satisfied with the factoid that DRBD “occasionally” (it’s a little more involved in reality) writes to the AL, and has to wait for that write to complete before it can handle user data again. This wait is the crucial point. It’s usually on the order of just a few milliseconds, but on busy systems this can add up to where it throttles throughput just a little.

Now, what makes I/O fast or slow (on rotational hard drives, solid state is a different matter)? That’s right, it’s disk seeks. So when we use internal meta data, so the theory goes, the read-write head has to do something in the data area, then move to the AL and do something there, then move back to the data area, and so forth. Which, intuitively, can be speeded up if you put user data and meta data on different spindles. Different “logical” disks won’t do, it has to be on a separate spindle, so read-write heads can move in parallel. Again, this is as the naïve theory goes. Use external meta data, devise a clever scheme on how to spread your meta data apart from your user data, and you’ll be fine. And you can call yourself a great wizard in storage subsystem tuning. Well, not quite, unfortunately.

The problem is, you’ve made a crucial mistake in performance tuning. You are completely ignoring the effects of a battery-backed write cache. If, as we always recommend, you use a reasonable useful storage controller, which comes with a decent write cache and a battery backup unit, then the whole issue is moot. Because then you are no longer waiting for actual disk seeks to complete. What you think you are writing into disk sectors actually goes into a piece of controller RAM, and completes pretty much instantaneously. It’s the controller’s job to get this stuff onto stable storage later, and guarantee that it does so even in the face of a power failure. That’s what the BBU is for. But the whole idea of avoiding disk seeks for meta data writes is pretty much irrelevant now.

Which means you can scrap your grand user data/meta data distribution scheme and focus on important issues.

Bottom line: if using external metadata actually improves your performance versus internal metadata, you have underlying performance problems to fix. And you should fix those rather than patch them up at the DRBD level.

One Response to Internal metadata, and why we recommend it

  1. PieterB says:

    Nice article.

    I have a question about this article. You say the meta data is used for internal housekeeping. Is this used for internal housekeeping only? The docs mention the following disadvantage to internal metadata: “In case of the lower-level device being a single physical hard disk (as opposed to a RAID set), internal meta data may negatively affect write throughput. The performance of write requests by the application may trigger an update of the meta data in DRBD.”.

    Am I right that in case of a diskcrash in DRBD-node A that the housekeeping of metadata will be done in DRBD-node B? And obsiously be slow, because of the network latency.

    It would be good to document the BBU-argumentin the DRBD-docs as well.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Get every new post delivered to your Inbox.

%d bloggers like this: