On DRBD connection timeouts

Here is a question recently seen on drbd-user:

I cannot get the timeout parameter in [drbd.conf] to work (I set it up as in all the examples I saw). I set it low (say 1 second), kill the remote box IO [re]commences after 10 seconds (as the other parameters state).

Anything I’m doing wrong?

Well, sort of. The timeout parameter specifies the timeout DRBD uses for the blocking sockets it transmits data over. So if you issue I/O on the DRBD device while Connected, and a corresponding replication packet does not complete within timeout, then DRBD concludes that the peer has gone away and transitions to WFConnection, effectively switching into disconnected mode.

If however the connection is lost while the DRBD device is idle (not handling any write I/O), then there are no packets to replicate, and none to wait for. By itself, this would mean that while idle, DRBD would be unable to detect that its peer has gone away. Clearly, this would not be desirable.

Here’s where DRBD’s in-protocol “pings” come into play. Don’t confuse this with real ICMP echo requests. A DRBD “ping” is simply a no-op message inside the DRBD replication layer, as in the peers shouting at each other, “hello, I’m still here.” DRBD sends these “pings” in a configurable interval, specified by the ping-int configuration option and defaulting to 10 seconds. DRBD “pings” time out within the time specified as ping-timeout, which by default is 0.5 seconds.

So: while I/O is being issued on a device, it’s timeout that governs disconnection. While it is idle, however, disconnection is initiated by a “ping” packet (which is issued every 10 seconds, unless otherwise configured with ping-int) not being received within half a second (unless otherwise configured with ping-timeout).

Finally, for the sake of completeness, I should add that there is also a connect-int option, which is the interval DRBD uses for re-connecting to the peer in case of a connection failure. timeout must be lower than both connect-int and ping-int, otherwise it will be ignored. That is the issue that the user I quoted ran into.

All of this is, of course, explained in more detail in the drbd.conf man page.

