DRBD 8.2.0 introduces protocol integrity checksums

DRBD 8.2.0, released today, includes a much requested new feature, embodied in the new data-integrity-alg configuration option: DRBD protocol level data integrity checksums.

A few months ago, some users alerted us to DRBD replication issues where DRBD supposedly “ate their data”, i.e. corrupted replicated data in transit. Eventually we traced those problems not to DRBD errors, but in fact to network drivers messing up TCP checksums or segmentation. Typically this was related to using either TCP segmentation offloading (TSO) or TCP checksum offloading. However, at the time DRBD had no way of detecting these errors — you would only find out if you switched over to your Secondary, only to find your data not having been replicated properly.

With DRBD 8.2.0, you can check the integrity of replicated data in transit. To that end, enter the following in your /etc/drbd.conf:

net {
    data-integrity-alg sha1;
}

Thus, DRBD will generate (and validate) a SHA-1 hash for every replication packet transmitted. If a replicated packet fails to validate, DRBD simply disconnect and record an appropriate message to the kernel log.

Needless to say, you should only enable this in pre-production testing of your DRBD cluster. For example, you might want to enable the data integrity check, then run dd writing from /dev/random to your DRBD device for 24 hours or so, and if you do not experience any disconnect during that time, you should be safely able to disable checksums again.

Checksums are not limited to using the SHA-1 digest algorithm. You can use any algorithm provided by your running kernel, which should be at least the following:

  • SHA-1,
  • SHA-256,
  • SHA-384,
  • SHA-512,
  • MD5,
  • CRC-32C.

Our friends at MySQL played no small part in the creation of this feature. Thanks a lot to them.

One more thing about the versioning: we usually guarantee wire compatibility only between adjacent minor releases of DRBD. 8.2 is an exception; its protocol is wire compatible with DRBD 8.0. DRBD 8.1 is the internal branch name for the code being prepared for inclusion in mainline Linux.

4 Responses to DRBD 8.2.0 introduces protocol integrity checksums

  1. Alexander Rubin says:

    Do you recommend to disable checksumming in production due to added CPU utilization or due to that it is new/untested feature?
    Do you think MD5 or CRC-32C will better from the performance point of view?
    Do you recommend increasing al-extends to help with performance of checksumming?

  2. Florian Haas says:

    Alexander,

    #1, due to added CPU utilization/computational overhead.

    #2, CRC-32C will most likely generate less overhead than MD5. However, MD5 is the stronger hash of the two and strictly speaking, provides a more reliable validation of replicated data. Up to you to define your priorities.

    #3, while increasing al-extents will improve your write performance due to a reduction in metadata operations on the Primary (for the price of having a longer resync time upon a Primary crash), this is unrelated to checksumming.

  3. Alexander Rubin says:

    Thanks, Florian.

    Version 8.2.0 – is it stable enough to run in production with disabled checksumming? Or – do you recommend downgrade it to 8.1.0 or 8.0.latest after the test (after we made sure that everything is fine with the data consistency)?

  4. Florian Haas says:

    Alexander, from our perspective there are no arguments against using 8.2.0 in production. The two branches will continue to coexist for some time, with fixes from one branch being pushed to the other.

Leave a reply to Florian Haas Cancel reply