It is a common misconception that DRBD (or any block-level data replication) solution can magically make an application crash-safe that intrinsically isn’t. Baron highlights that misconception in a recent blog post.
I want to reiterate and stress that point here: if your application can’t reliably survive a node crash, it won’t successfully fail over on a replicated (or shared, for that matter) data device. But if it can, and DRBD is replicating synchronously, then DRBD won’t break it. In other words: try pulling the power plug on your machine while your app is running, and power back on. If your application recovers to a consistent state, you’re clear. If it doesn’t, don’t bother adding DRBD until you fix that.
You must fix any layer in your stack that isn’t crash safe, if you even want to start thinking about high availability. ext2, which Baron mentions in his post, isn’t crash safe. MySQL with a database using the MyISAM storage engine isn’t crash safe. KVM with virtual block devices in
cache=writeback mode isn’t crash safe. Running on a RAID controller with the write cache enabled when its battery is dead isn’t crash safe.