Important info for LVM-on-DRBD users added to User’s Guide

November 21, 2011

My former employer has managed to roll a fresh build of the DRBD User’s Guide. This contains an important addition to the chapter on LVM: when running LVM on top of DRBD (that is, a DRBD device acting as a Physical Volume to an LVM VG), don’t forget to update your initrd after modifying your LVM filter configuration.

This is something that many users tripped over previously, so I’m happy to finally see it available in the published Guide.

There are several other additions and fixes currently pending, so I’m hoping they will publish those soon, too.


Updated OCF Resource Agent Developer’s Guide now available

November 18, 2011

I have just published an updated OCF Resource Agent Developer’s Guide. This guide is the definitive handbook for authors of, and contributors to, resource agents for the Pacemaker based Linux High Availability stack. Read the rest of this entry »


Asterisk High Availability coming to the Pacemaker cluster stack

November 17, 2011

The Pacemaker based Linux cluster stack is gaining a freshly supported service: the Asterisk open-source PBX. hastexo‘s Martin Loschwitz has contributed a resource agent for the popular telephony stack.

Read the rest of this entry »


Now available: Slides from Percona Live and Linuxcon Europe

November 1, 2011

The slides from last week’s talks I (co-)presented at Percona Live and Linuxcon Europe are now available from our web site.

All slides are available entirely free of charge for logged-in users on our web site. To log in, you don’t even need to register — just use your Google Profile, or Google Apps account, or your WordPress account, or anything else that uses OpenID, and you’ll be good to go.

Comments on our slides are, of course, always highly appreciated.


Running DRBD on your Android

April 1, 2011

So you own an Android phone? I do too (mine is a Samsung GT-I9000), and it doubles as my Twitter client, guitar tuner, Ultimate rulebook and a ton of other things. And of course, I’ve DRBD’d it so I can sync everything on it to a logical volume on my desktop. You do know that DRBD can run a mixed-architecture cluster, so syncing from your ARM-based handset to your x86 (or x86_64) laptop is no problem at all.

How to do that? Not that hard. First, obviously, root your phone. Then, download DRBD off the Market.

You’ll need BusyBox and some way to get a shell on your phone. I prefer ConnectBot. And from there? As you normally would. Just use drbdadm (the -c flag may come in handy for non-default drbd.conf locations) Set up a logical volume on your laptop, sync, and poof you’ve got your phone backup. Then, just disconnect or stop DRBD on your laptop.

Hint: if you use Advanced Task Killer (I do), make sure to put the DRBD app on the ignore list. The Android build has a special #define enabled that has DRBD wait patiently for incoming connections, but not initiate outbound connections to its peer. This also means that DRBD won’t mind at all if you suspend your phone. Just don’t kill the app.

A full Tech Guide, “Running DRBD on Android”, is available from LINBIT’s web site. The QR code at right will take you just there. No QR reader app? Try Barcode Scanner.


LINBIT Technical Guides now available on our web site

December 2, 2010

If you run (or plan to deploy) high availability clusters — with or without DRBD — you might find a new section on our web site handy. Our Technical Guides collection is a compilation of LINBIT expert HA knowledge, which we’re opening up to everyone.

Yes, this also includes PDF versions of the DRBD User’s Guide and the Linux-HA User’s Guide.

More Technical Guides will be added as we go along. LINBIT Cluster Stack support customers will receive new Tech Guides approximately one month before they pop up on the public web site.

Downloading the Tech Guides is free of charge and requires prior registration. If you’ve already registered for trying out the DRBD Management Console, you may reuse your download credentials — no need to re-register.


DRBD != fsck != DIX

October 28, 2010

Every once in a while, we hear of users with corruption in a file system that sits on top of DRBD. That may be easy or tricky to resolve. If you’re lucky, a simple fsck will resolve the corruption. If you’re not quite that lucky, you may have to get out your backups.

But that’s typically not DRBD’s fault. Typically not at all, not in the least bit. DRBD is a block device, and as such it has no idea what rests on top of it. It has no concept of a filesystem, let alone its integrity. That of course is true for any other block device as well. If you have, say, RAID-1, and something corrupts the file system on top of it, then of course that corruption will be happily replicated across both component devices. DRBD is no different, except that its component devices are stored across distinct physical nodes.

And even if everything about your filesystem is logically correct, there’s still the chance that a user fat-fingers rm and nukes all your precious data, and DRBD will happily replicate that too. Just like RAID. In a nutshell: just like RAID, DRBD does not replace backups.

DRBD does bend over backwards in making sure that it is replicating data correctly, catching all sorts of network issues in the process and optionally doing an end-to-end checksum over everything it replicates. It can also immediately detach from a backing device if the latter is acting up in any way and throwing I/O errors. But it can only make sure that it correctly replicates whatever it’s being handed down from above — there is no way for it to second-guess whether that is actually good data.

Likewise, when DRBD reads data, it does so from its underlying block device. And if it happens to be fed garbage from there, there’s nothing it can do about that either (unless the read actually produces an I/O error, in which case we can detach, read transparently from the peer over the network, and all is dandy). So if you have silent data corruption introduced by your controller, or by a disk that’s gone haywire, then it will feed the application garbage. However, and this is a big plus compared to going without DRBD, DRBD gives you the option of switching your application over to another node, with presumably better hardware, where that read corruption does not occur. And you can keep your users happy while you’re fixing the other box with the shot I/O stack.

So no, DRBD does not replace the occasional fsck or whatever other data integrity features your filesystem may come with. DRBD also does not absolve you of adding a BBU (or capacitor-backed flash) to your controller write cache, or of having to turn off your disk write cache (which is always volatile). DRBD also does not protect against dd-ing a bunch of random data somewhere in the middle of the block device causing your filesystem to jump and scream.

Now, if you want complete, end-to-end I/O integrity checking, check out Linux DIX (Data Integrity Extensions), brought to you by a team around Martin Petersen at Oracle. I had the pleasure of sitting in his talk at LinuxCon this year. It’s in Linux as of 2.6.27, check out the project page for details. What’s nice about this is that it’s a Linux first — no other operating system, at this time, is known to have anything comparable.