Changes between Version 8 and Version 9 of Fence

05/22/11 18:22:02 (6 years ago)

Added table of contents macro. Backed all titles up one level.


  • Fence

    v8 v9  
    1 === What it Is === 
     3== What it Is == 
    24'''I/O fencing''', or '''fencing''' is an active countermeasure taken by a cluster in order to prevent a presumed-dead or misbehaving cluster member from writing data to a piece of critical shared media.  The act of cutting this presumed-dead member prevents data corruption on shared media. 
    79While not strictly required in order for something to be considered '''fencing''' in the classical sense, linux-cluster adds an additional requirement in order to protect your data to all supported fencing agents: ''verification''.  If a fencing agent can not verify an action has completed, then the action is presumed to have failed. 
    9 === What it is not === 
     11== What it is not == 
    1012'''Fencing''' is not synonymous with '''power cycling'''.  A host may be '''fenced''' from shared storage without its power being cut.  '''Power cycling''' is a form of fencing, but it is certainly not the only form of fencing. 
    12 === Why You Need It === 
     14== Why You Need It == 
    1315Fencing prevents data corruption and increases availability by reducing uncertainty in a cluster of computers. 
    3436In the second case, data corruption is ''prevented''. 
    36 === Technologies used by cluster software Today === 
     38== Technologies used by cluster software Today == 
    3739Note, this might be non-exhaustive. 
    39 ==== I/O Fencing Variants ==== 
     41=== I/O Fencing Variants === 
    4042 * '''power fencing''' - As the saying goes, "Dead Nodes corrupt no Data".  If a host does not have power, it can not issue I/O.  This can be done using external power switches like those available from [ APC] and [ WTI], or with integrated power management, such as iLO, IPMI, DRAC, RSA, etc. 
    4143 * '''fibre channel zoning''' - typically done on fibre channel switches, the host's paths to a shared SAN are cut off.  A reboot is required prior to restoring a node's connectivity to shared storage in order to ensure any backed-up I/Os are removed. 
    4648 * '''virtual machine destruction''' - Instruction is given to a hypervisor to destroy a given virtual machine.  This is functionally equivalent to '''power fencing''' 
    48 ==== Fencing: The Masquerade ==== 
     50=== Fencing: The Masquerade === 
    4951Various kinds of things which are also used in some cluster solutions, but are do not qualify as '''I/O Fencing''' as there is no active countermeasure taken by the surviving cluster.  Generally, there is a blind assumption in place. 
    5052 * '''timeout''' - After some amount of time, just assume the node is dead and not coming back. 
    5658Also, some methods which are not fencing are safe from a data integrity standpoint.  For example, manual override is safe as long as the administrator takes actions to ensure data integrity is preserved prior to issuing the override command. 
    58 === Data Corruption Prevention in Split-Site Clusters === 
     60== Data Corruption Prevention in Split-Site Clusters == 
    5961Split-site clusters introduce problem for clustering and traditional fencing.  Effectively, if you are running a single cluster across two sites and the inter-site link is cut, you have no real way to take action to prevent I/O nor do we have an automatic way to confirm the death of the remote site. 
    6567At this point, ''winning'' site's copy of the shared data becomes authoritative and the ''losing'' site's copy of the shared data is overwritten when the inter-site link returns. 
    67 ==== Methodologies ==== 
     69=== Methodologies === 
    6870 * '''administrator intervention''' - an administrator makes the call as to which site wins 
    6971 * '''third-site arbitration''' - a third site picks one to survive and coordinates (Pacemaker 1.2)