wiki:MultiQuorumDisk
Last modified 3 years ago Last modified on 05/23/11 16:04:02

Multiple Quourm Disks

This page will be dedicated as a whiteboard project to make multiple quorum disks work.

I/O Timing

This project's quorum disk algorithm has real-time requirements on I/O to the shared disk. With a single disk, this isn't much of a problem, but on multiple disks, this can become a problem.

I/O Queueing & Cancellation

One of the problems with disk access is its speed - particularly if there is replication taking place over a slow WAN. Qdiskd, however, does not require every write or read to complete. That is, as long as at least one write gets out within our time to half or more of the quorum disks, we will not get evicted by the master. So, one of the things we can do is queue up I/Os if there's a slowdown, and replace elements on the queue if new data for the same device+offset comes in. This way, if access to a particular disk is slow from a host, we do not make the problem worse.

Disk Split Brain

With a cluster of networked computers, one of the often mentioned topics is split-brain and how to prevent it. A split brain in psychology is - quite literally - when the left hand does not know what the right hand is doing. A cluster of (storage-area) networked disks is no different. So, with multiple quorum disks, one of the key elements is split brain prevention of the quorum disks - that is - deciding which disk(s) to use in the event of nonuniform access to the disks from the various cluster members.

Master-Wins

Currently, there's a bugzilla open about qdiskd. In the event of a an split with no defined heuristics, "master-wins" is supposed to occur. That is, the master QDiskd node is supposed to remain "quorate" while all others become "inquorate" at or around the same time. This bug will need to be fixed in order to help resolve "quorum disk split brains" - e.g. when there are two disks, with half of the cluster members seeing one disk and the other half seeing the other disk (EEEEK!)

Algorithms

Paxos

Tree-Quorum Protocol