wiki:FailoverDomains
Last modified 3 years ago Last modified on 05/22/11 16:30:05

Note: Failover domains may be used with both traditional services and virtual machines.

Failover Domains

A failover domain is an ordered subset of members to which a service may be bound. The following is a list of semantics governing the options as to how the different configuration options affect the behavior of a failover domain:

  • preferred node or preferred member: The preferred node was the member designated to run a given service if the member is online. We can emulate this behavior by specifying an unordered, unrestricted failover domain of exactly one member.
  • restricted domain: Services bound to the domain may only run on cluster members which are also members of the failover domain. If no members of the failover domain are available, the service is placed in the stopped state.
  • unrestricted domain: Services bound to this domain may run on all cluster members, but will run on a member of the domain whenever one is available. This means that if a service is running outside of the domain and a member of the domain comes online, the service will migrate to that member.
  • ordered domain: The order specified in the configuration dictates the order of preference of members within the domain. The highest-ranking member of the domain will run the service whenever it is online. This means that if member A has a higher-rank than member B, the service will migrate to A if it was running on B if A transitions from offline to online.
  • unordered domain: Members of the domain have no order of preference; any member may run the service. Services will always migrate to members of their failover domain whenever possible, however, in an unordered domain.
  • nofailback: Enabling this option for an ordered failover domain will prevent automated fail-back after a more-preferred node rejoins the cluster. Consequently, nofailback requires an ordered domain in order to be meaningful. When nofailback is used, the following two behaviors should be noted:
    • If a subset of cluster nodes forms a quorum, the node with the highest priority in the failover domain is selected to run a service bound to the domain. After this point, a higher priority member joining the cluster will not trigger a relocation.
    • When a service is running outside of its unrestricted failover domain and a cluster member boots which is a part of the service's failover domain, the service will relocate to that member. That is, nofailback does not prevent transitions from outside of a failover domain to inside a failover domain. After this point, a higher priority member joining the cluster will not trigger a relocation.

Ordering, restriction, and nofailback are flags and may be combined in almost any way (ie, ordered+restricted, unordered+unrestricted, etc.). These combinations affect both where services start after initial quorum formation and which cluster members will take over services in the event that the service has failed.

Behavior Examples

Given a cluster comprised of this set of members: {A, B, C, D, E, F, G}

Ordered, restricted failover domain {A, B, C}.

  • With nofailback unset: A service 'S' will always run on member 'A' whenever member 'A' is online and there is a quorum. If all members of {A, B, C} are offline, the service will not run. If the service is running on 'C' and 'A' transitions online, the service will migrate to 'A'.
  • With nofailback set: A service 'S' will run on the highest priority cluster member when a quorum is formed. If all members of {A, B, C} are offline, the service will not run. If the service is running on 'C' and 'A' transitions online, the service will remain on 'C' unless 'C' fails, at which point it will fail over to 'A'.

Unordered, restricted failover domain {A, B, C}.

  • A service 'S' will only run if there is a quorum and at least one member of {A, B, C} is online. If another member of the domain transitions online, the service does not relocate.

Ordered, unrestricted failover domain {A, B, C}.

  • With nofailback unset: A service 'S' will run whenever there is a quorum. If a member of the failover domain is online, the service will run on the highest-priority member, otherwise a member of the cluster will be chosen at random to run the service. That is, the service will run on 'A' whenever 'A' is online, followed by 'B'.
  • With nofailback set: A service 'S' will run whenever there is a quorum. If a member of the failover domain is online at quorum formation, the service will run on the highest-priority member of the failover domain. That is, if 'B' is online (but 'A' is not), the service will run on 'B'. If, at some later point, 'A' joins the cluster, the service will not relocate to 'A'.

Unordered, unrestricted failover domain {A, B, C}.

  • This is also called a "Set of Preferred Members". When one or more members of the failover domain are online, the service will run on a nonspecific online member of the failover domain. If another member of the failover domain transitions online, the service does not relocate.

Configuration File Format

<rm>
   <failoverdomains>

     <!--
         name       - Name of your failover domain
         restricted - 0 or 1, defines whether the domain is restricted
         ordered    - 0 or 1, defines whether the domain is ordered
         nofailback - 0 or 1, defines whether or not the nofailback option
                      is enabled for the domain
      -->
     <failoverdomain name="name" restricted="[0|1]" ordered="[0|1]" nofailback="[0|1]">

       <!--
           name     - Node name (from the <clusternodes> section of cluster.conf)
           priority - 1 to 100, inclusive.  Defines the node's priority (1 is highest, 
                      100 is lowest) in an ordered failover domain.  If the domain is not
                      ordered, this has no effect.
        -->
          
       <failoverdomainnode name="node1" priority="1..100" />
     </failoverdomain>
  </failoverdomains>
</rm>