Changes between Version 28 and Version 29 of FAQ/CMAN


Ignore:
Timestamp:
05/21/11 16:13:28 (3 years ago)
Author:
digimer
Comment:

Fixed the last copy from the old wiki to work on the new Trac syntax.

Legend:

Unmodified
Added
Removed
Modified
  • FAQ/CMAN

    v28 v29  
    11= CMAN Questions = 
    2  * [[FAQ/CMAN#cman_what|What is Cluster Manager (cman)?]] 
    3  * [[FAQ/CMAN#cman_quorum|What does Quorum mean and why is it necessary?]] 
    4  * [[FAQ/CMAN#two_node|How can I define a two-node cluster if a majority is needed to reach quorum?]] 
    5  * [[FAQ/CMAN#tie_breaker|What is a tie-breaker, and do I need one in two-node clusters?]] 
    6  * [[FAQ/CMAN#two_node_dual|If both nodes in a two-node cluster lose contact with each other, don't they try to fence each other?]] 
    7  * [[FAQ/CMAN#two_node_correct|What is the best two-node network & fencing configuration?]] 
    8  * [[FAQ/CMAN#fence_wars|What if the fenced node comes back up and still can't contact the other?  Will it corrupt my file system?]] 
    9  * [[FAQ/CMAN#cman_quorum2|I lost quorum on my six-node cluster, but my remaining three nodes can still write to my GFS volume.  Did you just lie?]] 
    10  * [[FAQ/CMAN#cman_u1u2|Can I have a mixed cluster with some nodes at RHEL4U1 and some at RHEL4U2?]] 
    11  * [[FAQ/CMAN#cman_2to3|How do I add a third node to my two-node cluster?]] 
    12  * [[FAQ/CMAN#cman_remove|I removed a node from cluster.conf but the cluster software and services kept running.  What did I do wrong?]] 
    13  * [[FAQ/CMAN#cman_rename|How can I rename my cluster?]] 
    14  * [[FAQ/CMAN#cman_shutdown|What's the proper way to shut down my cluster?]] 
    15  * [[FAQ/CMAN#cman_mismatch|Why does the cman daemon keep shutting down and reconnecting?]] 
    16  * [[FAQ/CMAN#cman_oddnodes|I've heard there are issues with using an even/odd number of nodes.  Is it true?]] 
    17  * [[FAQ/CMAN#quorum|What is a quorum disk/partition and what does it do for you?]] 
    18  * [[FAQ/CMAN#quorumdiskneeded|Is a quorum disk/partition needed for a two-node cluster?]] 
    19  * [[FAQ/CMAN#quorumdiskhow|How do I set up a quorum disk/partition?]] 
    20  * [[FAQ/CMAN#quorumdiskneeddisk|Do I really need a shared disk to use QDisk?]] 
    21  * [[FAQ/CMAN#quorumdiskvotes|Are the quorum disk votes reported in "Total_votes" from cman_tool nodes?]] 
    22  * [[FAQ/CMAN#quorumdisksize|What's the minimum size of a quorum disk/partition?]] 
    23  * [[FAQ/CMAN#quorumdisknodes|Is quorum disk/partition reserved for two-node clusters, and if not, how many nodes can it support?]] 
    24  * [[FAQ/CMAN#quorumdiskonly|In a 2 node cluster, what happens if both nodes lose the heartbeat but they can still see the quorum disk?  Don't they still have quorum and cause split-brain?]] 
    25  * [[FAQ/CMAN#cman_quorum3|If my cluster is mission-critical, can I override quorum rules and have a "last-man-standing" cluster that's still functioning?]] 
    26  * [[FAQ/CMAN#cman_rejected|My cluster won't come up.  It says: kernel: CMAN: Cluster membership rejected. What do I do?]] 
    27  * [[FAQ/CMAN#cman_grporder|Is it a problem if node order isn't the same for all nodes in cman_tool services?]] 
    28  * [[FAQ/CMAN#cman_tool_leave|Why does cman_tool leave say "cman_tool: Can't leave cluster while there are X active subsystems"?]] 
    29  * [[FAQ/CMAN#cman_tool_services|What are these services/subsystems and how do I make sense of what cman_tool services prints?]] 
    30  * [[FAQ/CMAN#cman_leaving|What can cause a node to leave the cluster?]] 
    31  * [[FAQ/CMAN#cman_hello_timer|How do I change the time interval for the heartbeat messages?]] 
    32  * [[FAQ/CMAN#cman_deadnode_timer|How do I change the time after which a non-responsive node is considered dead?]] 
    33  * [[FAQ/CMAN#split_brain|What does "split-brain" mean?]] 
    34  * [[FAQ/CMAN#cman_heartbeat_nic|What's the "right" way to get cman to use a different NIC, say, eth2 rather than eth0?]] 
    35  * [[FAQ/CMAN#broadcastmulticast|Does cluster suite use multicast or broadcast?]] 
    36  * [[FAQ/CMAN#cman_subnets|Is it possible to configure a cluster with nodes running on different networks (subnets)?]] 
    37  * [[FAQ/CMAN#broadcastmulticast2|How can I configure my RHEL4 cluster to use multicast rather than broadcast?]] 
    38  * [[FAQ/CMAN#rhel5_cman_wontstart|On RHEL5, why do I get "cman not started: Can't bind to local cman socket /usr/sbin/cman_tool"?]] 
    39  * [[FAQ/CMAN#cman_wont_start_f8|On Fedora 8, CMAN won't start, complaining about "aisexec not started".  How do I fix it?]] 
    40  * [[FAQ/CMAN#cman_cisco_switches|My RHEL5 or similar cluster won't work with my Cisco switch.]] 
    41  * [[FAQ/CMAN#cman_cisco_switches|Some nodes can not see each other; ping works!  Why?]] 
    42  * [[FAQ/CMAN#cman_cisco_switches|Two nodes in different blade frames can not see each other.  Why?]] 
    43  * [[FAQ/CMAN#large_clusters|I created a large RHEL5 cluster but it falls apart when I boot it.]] 
    44  * [[FAQ/CMAN#rejoin|[MAIN ] Killing node mynode01 because it has rejoined the cluster with existing state]] 
    45  * [[FAQ/CMAN#dirty|What is the "Dirty" flag that cman_tool shows, and should I be worried?]] 
    46  * [[FAQ/CMAN#plea|Chrissie's plea to people submitting logs for bug reports]] 
    47  
    48  
    49 <<Anchor(cman_what)>> 
    50  
    51 ==== What is Cluster Manager (cman)? ==== 
     2 * [#cman_what What is Cluster Manager (cman)?] 
     3 * [#cman_quorum What does Quorum mean and why is it necessary?] 
     4 * [#two_node How can I define a two-node cluster if a majority is needed to reach quorum?] 
     5 * [#tie_breaker What is a tie-breaker, and do I need one in two-node clusters?] 
     6 * [#two_node_dual If both nodes in a two-node cluster lose contact with each other, don't they try to fence each other?] 
     7 * [#two_node_correct What is the best two-node network & fencing configuration?] 
     8 * [#fence_wars What if the fenced node comes back up and still can't contact the other?  Will it corrupt my file system?] 
     9 * [#cman_quorum2 I lost quorum on my six-node cluster, but my remaining three nodes can still write to my GFS volume.  Did you just lie?] 
     10 * [#cman_u1u2 Can I have a mixed cluster with some nodes at RHEL4U1 and some at RHEL4U2?] 
     11 * [#cman_2to3 How do I add a third node to my two-node cluster?] 
     12 * [#cman_remove I removed a node from cluster.conf but the cluster software and services kept running.  What did I do wrong?] 
     13 * [#cman_rename How can I rename my cluster?] 
     14 * [#cman_shutdown What's the proper way to shut down my cluster?] 
     15 * [#cman_mismatch Why does the cman daemon keep shutting down and reconnecting?] 
     16 * [#cman_oddnodes I've heard there are issues with using an even/odd number of nodes.  Is it true?] 
     17 * [#quorum What is a quorum disk/partition and what does it do for you?] 
     18 * [#quorumdiskneeded Is a quorum disk/partition needed for a two-node cluster?] 
     19 * [#quorumdiskhow How do I set up a quorum disk/partition?] 
     20 * [#quorumdiskneeddisk Do I really need a shared disk to use QDisk?] 
     21 * [#quorumdiskvotes Are the quorum disk votes reported in "Total_votes" from cman_tool nodes?] 
     22 * [#quorumdisksize What's the minimum size of a quorum disk/partition?] 
     23 * [#quorumdisknodes Is quorum disk/partition reserved for two-node clusters, and if not, how many nodes can it support?] 
     24 * [#quorumdiskonly In a 2 node cluster, what happens if both nodes lose the heartbeat but they can still see the quorum disk?  Don't they still have quorum and cause split-brain?] 
     25 * [#cman_quorum3 If my cluster is mission-critical, can I override quorum rules and have a "last-man-standing" cluster that's still functioning?] 
     26 * [#cman_rejected My cluster won't come up.  It says: kernel: CMAN: Cluster membership rejected. What do I do?] 
     27 * [#cman_grporder Is it a problem if node order isn't the same for all nodes in cman_tool services?] 
     28 * [#cman_tool_leave Why does cman_tool leave say "cman_tool: Can't leave cluster while there are X active subsystems"?] 
     29 * [#cman_tool_services What are these services/subsystems and how do I make sense of what cman_tool services prints?] 
     30 * [#cman_leaving What can cause a node to leave the cluster?] 
     31 * [#cman_hello_timer How do I change the time interval for the heartbeat messages?] 
     32 * [#cman_deadnode_timer How do I change the time after which a non-responsive node is considered dead?] 
     33 * [#split_brain What does "split-brain" mean?] 
     34 * [#cman_heartbeat_nic What's the "right" way to get cman to use a different NIC, say, eth2 rather than eth0?] 
     35 * [#broadcastmulticast Does cluster suite use multicast or broadcast?] 
     36 * [#cman_subnets Is it possible to configure a cluster with nodes running on different networks (subnets)?] 
     37 * [#broadcastmulticast2 How can I configure my RHEL4 cluster to use multicast rather than broadcast?] 
     38 * [#rhel5_cman_wontstart On RHEL5, why do I get "cman not started: Can't bind to local cman socket /usr/sbin/cman_tool"?] 
     39 * [#cman_wont_start_f8 On Fedora 8, CMAN won't start, complaining about "aisexec not started".  How do I fix it?] 
     40 * [#cman_cisco_switches My RHEL5 or similar cluster won't work with my Cisco switch.] 
     41 * [#cman_cisco_switches Some nodes can not see each other; ping works!  Why?] 
     42 * [#cman_cisco_switches Two nodes in different blade frames can not see each other.  Why?] 
     43 * [#cman_hp_switches My RHEL5 or similar cluster won't work with my HP switch.] 
     44 * [#large_clusters I created a large RHEL5 cluster but it falls apart when I boot it.] 
     45 * [#rejoin Killing node mynode01 because it has rejoined the cluster with existing state] 
     46 * [#dirty What is the "Dirty" flag that cman_tool shows, and should I be worried?] 
     47 * [#plea Chrissie's plea to people submitting logs for bug reports] 
     48 * [#cman_name Limitations on cluster names] 
     49 
     50==== What is Cluster Manager (cman)? ==== #cman_what 
    5251It depends on which version of the code you are running.  Basically, cluster manager is a component of the cluster project that handles communications between nodes in the cluster. 
    5352 
     
    5857It also handles cluster membership messages, determining when a node enters or leaves the cluster. 
    5958 
    60 <<Anchor(cman_quorum)>> 
    61  
    62 ==== What does Quorum mean and why is it necessary? ==== 
     59==== What does Quorum mean and why is it necessary? ==== #cman_quorum 
    6360Quorum is a voting algorithm used by the cluster manager. 
    6461 
     
    6966Quorum doesn't prevent split-brain situations, but it does decide who is dominant and allowed to function in the cluster.  Should split-brain occur, quorum prevents more than one cluster group from doing anything. 
    7067 
    71 <<Anchor(two_node)>> 
    72  
    73 ==== How can I define a two-node cluster if a majority is needed to reach quorum? ==== 
     68==== How can I define a two-node cluster if a majority is needed to reach quorum? ==== #two_node 
    7469We had to allow two-node clusters, so we made a special exception to the quorum rules. There is a special setting "two_node" in the /etc/cluster.conf file that looks like this: 
    7570 
     
    7974This will allow one node to be considered enough to establish a quorum. Note that if you configure a quorum disk/partition, you don't want two_node="1". 
    8075 
    81 <<Anchor(tie_breaker)>> 
    82  
    83 ==== What is a tie-breaker, and do I need one in two-node clusters? ==== 
     76==== What is a tie-breaker, and do I need one in two-node clusters? ==== #tie_breaker 
    8477Tie-breakers are additional heuristics that allow a cluster partition to decide whether or not it is quorate in the event of an even-split - prior to fencing.  A typical tie-breaker construct is an ''IP tie-breaker'', sometimes called a ''ping node''.  With such a tie-breaker, nodes not only monitor each other, but also an upstream router that is on the same path as cluster communications. 
    8578 
     
    9992 * Have a two node configuration where fencing is at the fabric level - especially for SCSI reservations 
    10093 
    101 However, if you have a [[FAQ/CMAN#two_node_correct|correct network & fencing configuration]] in your cluster, a tie-breaker only adds complexity, except in pathological cases. 
    102  
    103 <<Anchor(two_node_dual)>> 
    104  
    105 ==== If both nodes in a two-node cluster lose contact with each other, don't they try to fence each other? ==== 
     94However, if you have a [#two_node_correct correct network & fencing configuration] in your cluster, a tie-breaker only adds complexity, except in pathological cases. 
     95 
     96==== If both nodes in a two-node cluster lose contact with each other, don't they try to fence each other? ==== #two_node_dual 
    10697They do.  When each node recognizes that the other has stopped responding, it will try to fence the other. It can be like a gunfight at the O.K. Coral, and the node that's quickest on the draw (first to fence the other) wins.  Unfortunately, both nodes can end up going down simultaneously, losing the whole cluster. 
    10798 
    10899It's possible to avoid this by using a network power switch that serializes the two fencing operations.  That ensures that one node is rebooted and the second never fences the first. For other configurations, see below. 
    109100 
    110 <<Anchor(two_node_correct)>> 
    111  
    112 ==== What is the best two-node network &amp; fencing configuration? ==== 
     101==== What is the best two-node network & fencing configuration? ==== #two_node_correct 
    113102In a two node cluster (where you are using two_node="1" in the cluster configuration, and w/o QDisk), there are several considerations you need to be aware of: 
    114103 
    115 If you are using per-node power management of any sort where the device is not shared between cluster nodes, it must be connected to the same network used by CMAN for cluster communication.  Failure to do so can result in both nodes simultaneously fencing each other, leaving the entire cluster dead, or end up in a [[FAQ/CMAN#fence_wars|fence loop]].  Typically, this includes all integrated power management solutions (iLO, IPMI, RSA, ERA, IBM Blade Center, Egenera Blade Frame, Dell DRAC, etc.), but also includes remote power switches (APC, WTI) if the devices are not shared between the two nodes. 
    116  
    117 It is best to use power-type fencing.  SAN or SCSI-reservation fencing might work, as long as it meets the above requirements.  If it does not, you should consider using a [[FAQ/CMAN#quorum|quorum disk or partition]] 
    118  
    119 If you can not meet the above requirements, you can use [[FAQ/CMAN#quorum|quorum disk or partition]]. 
    120  
    121 <<Anchor(fence_wars)>> 
    122  
    123 ==== What if the fenced node comes back up and still can't contact the other?  Will it corrupt my file system? ==== 
     104If you are using per-node power management of any sort where the device is not shared between cluster nodes, it must be connected to the same network used by CMAN for cluster communication.  Failure to do so can result in both nodes simultaneously fencing each other, leaving the entire cluster dead, or end up in a [#fence_wars fence loop].  Typically, this includes all integrated power management solutions (iLO, IPMI, RSA, ERA, IBM Blade Center, Egenera Blade Frame, Dell DRAC, etc.), but also includes remote power switches (APC, WTI) if the devices are not shared between the two nodes. 
     105 
     106It is best to use power-type fencing.  SAN or SCSI-reservation fencing might work, as long as it meets the above requirements.  If it does not, you should consider using a [#quorum quorum disk or partition] 
     107 
     108If you can not meet the above requirements, you can use [#quorum quorum disk or partition]. 
     109 
     110==== What if the fenced node comes back up and still can't contact the other?  Will it corrupt my file system? ==== #fence_wars 
    124111The two_node cluster.conf option allows one node to have quorum by itself.  A network partition between the nodes won't result in a corrupt fs because each node will try to fence the other when it comes up prior to mounting gfs. 
    125112 
    126113Strangely, if you have a persistent network problem and the fencing device is still accessible to both nodes, this can result in a "A reboots B, B reboots A" fencing loop. 
    127114 
    128 This problem can be worked around by using a [[FAQ/CMAN#quorum|quorum disk or partition]] to [[FAQ/CMAN#tie_breaker|break the tie]], or using [[FAQ/CMAN#two_node_correct|a specific network &amp; fencing]] configuration. 
    129  
    130 <<Anchor(cman_quorum2)>> 
    131  
    132 ==== I lost quorum on my six-node cluster, but my remaining three nodes can still write to my GFS volume.  Did you just lie? ==== 
     115This problem can be worked around by using a [#quorum quorum disk or partition] to [#tie_breaker break the tie], or using [#two_node_correct a specific network &amp; fencing] configuration. 
     116 
     117==== I lost quorum on my six-node cluster, but my remaining three nodes can still write to my GFS volume.  Did you just lie? ==== #cman_quorum2 
    133118It's possible to still write to a GFS volume, even without quorum, but ONLY if the three nodes that left the cluster didn't have the GFS volume mounted.  It's not a problem because if a partitioned cluster is ever formed that gains quorum, it will fence the nodes in the inquorate partition before doing anything. 
    134119 
    135120If, on the other hand, nodes failed while they had gfs mounted and quorum was lost, then gfs activity on the remaining nodes will be mostly blocked. If it's not then it may be a bug. 
    136121 
    137 <<Anchor(cman_u1u2)>> 
    138  
    139 ==== Can I have a mixed cluster with some nodes at RHEL4U1 and some at RHEL4U2? ==== 
     122==== Can I have a mixed cluster with some nodes at RHEL4U1 and some at RHEL4U2? ==== #cman_u1u2 
    140123You can't mix RHEL4 U1 and U2 systems in a cluster because there were changes between U1 and U2 that changed the format of internal messages that are sent around the cluster. 
    141124 
    142125Since U2, we now require these messages to be backward-compatible, so mixing U2 and U3 or U3 and U4 shouldn't be a problem. 
    143126 
    144 <<Anchor(cman_2to3)>> 
    145  
    146 ==== How do I add a third node to my two-node cluster? ==== 
     127==== How do I add a third node to my two-node cluster? ==== #cman_2to3 
    147128Unfortunately, two-node clusters are a special case.  A two-node cluster needs two nodes to establish quorum, but only one node to maintain quorum.  This special status is set by a special "two_node" option in the cman section of cluster.conf.  Unfortunately, this setting can only be reset by shutting down the cluster.  Therefore, the only way to add a third node is to: 
    148129 
     
    164145 * Start the cluster software on the additional node. 
    165146 
    166 <<Anchor(cman_remove)>> 
    167  
    168 ==== I removed a node from cluster.conf but the cluster software and services kept running.  What did I do wrong? ==== 
     147==== I removed a node from cluster.conf but the cluster software and services kept running.  What did I do wrong? ==== #cman_remove 
    169148You're supposed to stop the node '''before''' removing it from the cluster.conf. 
    170149 
    171 <<Anchor(cman_rename)>> 
    172  
    173 ==== How can I rename my cluster? ==== 
     150==== How can I rename my cluster? ==== #cman_rename 
    174151Here's the procedure: 
    175152 
     
    182159 * Remount your GFS partitions 
    183160 
    184 <<Anchor(cman_shutdown)>> 
    185  
    186 ==== What's the proper way to shut down my cluster? ==== 
     161==== What's the proper way to shut down my cluster? ==== #cman_shutdown 
    187162Halting a single node in the cluster will seem like a communication failure to the other nodes. Errors will be logged and the fencing code will get called, etc.  So there's a procedure for properly shutting down a cluster.  Here's what you should do: 
    188163 
    189164Use the "cman_tool leave remove" command before shutting down each node.  That will force the remaining nodes to adjust quorum to accomodate the missing node and not treat it as an error. 
    190165 
    191 <<Anchor(cman_mismatch)>> 
    192  
    193 ==== Why does the cman daemon keep shutting down and reconnecting? ==== 
     166Follow these steps: 
     167 
     168{{{ 
     169for i in rgmanager gfs2 gfs; do service ${i} stop; done 
     170fence_tool leave 
     171cman_tool leave remove 
     172}}} 
     173 
     174==== Why does the cman daemon keep shutting down and reconnecting? ==== #cman_mismatch 
    194175Additional info: When I try to start cman, I see these messages in /var/log/messages: 
    195176 
     
    214195This is almost always caused by a mismatch between the kernel and user space CMAN code. Update the CMAN user tools to fix the problem. 
    215196 
    216 <<Anchor(cman_oddnodes)>> 
    217  
    218 ==== I've heard there are issues with using an even/odd number of nodes.  Is it true? ==== 
     197==== I've heard there are issues with using an even/odd number of nodes.  Is it true? ==== #cman_oddnodes 
    219198No, it's not true.  There is only one special case: two node clusters have special rules for determining quorum.  See question 3 above. 
    220199 
    221 <<Anchor(quorum)>> 
    222  
    223 ==== What is a quorum disk/partition and what does it do for you? ==== 
     200==== What is a quorum disk/partition and what does it do for you? ==== #quorum 
    224201A quorum disk or partition is a section of a disk that's set up for use with components of the cluster project.  It has a couple of purposes.  Again, I'll explain with an example. 
    225202 
     
    234211A node that has lost contact with the network or the quorum disk has lost a vote, and therefore may safely be fenced. 
    235212 
    236 <<Anchor(quorumdiskneeded)>> 
    237  
    238 ==== Is a quorum disk/partition needed for a two-node cluster? ==== 
     213==== Is a quorum disk/partition needed for a two-node cluster? ==== #quorumdiskneeded 
    239214In older versions of the Cluster Project, a quorum disk was needed to break ties in a two-node cluster. Early versions of Red Hat Enterprise Linux 4 (RHEL4) did not have quorum disks, but it was added back as an optional feature in RHEL4U4. 
    240215 
     
    244219 
    245220 * If you have a special requirement to go down from X -> 1 nodes in a single transition.  For example, if you have a 3/1 network partition in a 4-node cluster - here the 1-node partition is the only node which still has network connectivity.  (Generally, the surviving node is not going to be able to handle the load...) 
    246  * If you have a special situation causing a need for a [[FAQ/CMAN#tie_breaker|tie-breaker]] in general. 
     221 * If you have a special situation causing a need for a [#tie_breaker tie-breaker] in general. 
    247222 * If you have a need to determine node-fitness based on factors which are not handled by CMAN 
    248223 
    249224In any case, please be aware that use of a quorum disk requires additional configuration information and testing. 
    250225 
    251 <<Anchor(quorumdiskhow)>> 
    252  
    253 ==== How do I set up a quorum disk/partition? ==== 
     226==== How do I set up a quorum disk/partition? ==== #quorumdiskhow 
    254227The best way to start is to do "man qdisk" and read the qdisk.5 man page.  This has good information about the setup of quorum disks. 
    255228 
    256 Note that if you configure a quorum disk/partition, you want two_node="1" or expected_votes="2" since the quorum disk solves the voting imbalance.  You want two_node="0" and expected_votes="3" (or nodes + 1 if it's not a two-node cluster).  However, since 0 is the default value for two_node, you don't need to specify it at all.  If this is an existing two-node cluster and you're changing the two_node value from "1" to "0", you'll have to stop the entire cluster and restart it after the configuration is changed (normally, the cluster doesn't have to be stopped and restarted for configuration changes, but two_node is a special case.)  Basically, you want something like this in your /etc/cluster/cluster.conf: 
     229Note that if you configure a quorum disk/partition, you don't want two_node="1" or expected_votes="2" since the quorum disk solves the voting imbalance.  You want two_node="0" and expected_votes="3" (or nodes + 1 if it's not a two-node cluster).  However, since 0 is the default value for two_node, you don't need to specify it at all.  If this is an existing two-node cluster and you're changing the two_node value from "1" to "0", you'll have to stop the entire cluster and restart it after the configuration is changed (normally, the cluster doesn't have to be stopped and restarted for configuration changes, but two_node is a special case.)  Basically, you want something like this in your /etc/cluster/cluster.conf: 
    257230 
    258231{{{ 
     
    264237  <quorumd device="/dev/mapper/lun01" votes="1"/> 
    265238}}} 
    266 Note: You don't have to use a disk or partition to prevent two-node fence-cycles; you can also [[FAQ/CMAN#two_node_correct|set your cluster up this way]].  You can set up a number of different heuristics for the qdisk daemon.  For example, you can set up a redundant NIC with a crossover cable  and use ping operations to the local router/switch to break the tie (this is typical, actually, and is called an IP [[FAQ/CMAN#tie_breaker|tie breaker]]).  A heuristic can be made to check anything, as long as it is a shared resource. 
    267  
    268 <<Anchor(quorumdiskneeddisk)>> 
    269  
    270 ==== Do I really need a shared disk to use QDisk? ==== 
     239Note: You don't have to use a disk or partition to prevent two-node fence-cycles; you can also [#two_node_correct set your cluster up this way].  You can set up a number of different heuristics for the qdisk daemon.  For example, you can set up a redundant NIC with a crossover cable  and use ping operations to the local router/switch to break the tie (this is typical, actually, and is called an IP [#tie_breaker tie breaker]).  A heuristic can be made to check anything, as long as it is a shared resource. 
     240 
     241==== Do I really need a shared disk to use QDisk? ==== #quorumdiskneeddisk 
    271242Currently, yes.  There have been suggestions to make qdiskd operate in a 'diskless' mode in order to help prevent a fence-race (i.e. prevent a node from attempting to fence another node), but no work has been done in this area (yet). 
    272243 
    273 <<Anchor(quorumdiskvotes)>> 
    274  
    275 ==== Are the quorum disk votes reported in "Total_votes" from cman_tool nodes? ==== 
     244==== Are the quorum disk votes reported in "Total_votes" from cman_tool nodes? ==== #quorumdiskvotes 
    276245Yes. if the quorum disk is registered correctly with cman you should see the votes it contributes and also it's "node name" in cman_tool nodes. 
    277246 
    278 <<Anchor(quorumdisksize)>> 
    279  
    280 ==== What's the minimum size of a quorum disk/partition? ==== 
     247==== What's the minimum size of a quorum disk/partition? ==== #quorumdisksize 
    281248The official answer is 10MB.  The real number is something like 100KB, but we'd like to reserve 10MB for possible future expansion and features. 
    282249 
    283 <<Anchor(quorumdisknodes)>> 
    284  
    285 ==== Is quorum disk/partition reserved for two-node clusters, and if not, how many nodes can it support? ==== 
     250==== Is quorum disk/partition reserved for two-node clusters, and if not, how many nodes can it support? ==== #quorumdisknodes 
    286251Currently a quorum disk/partition may be used in clusters of up to 16 nodes. 
    287252 
    288 <<Anchor(quorumdiskonly)>> 
    289  
    290 ==== In a 2 node cluster, what happens if both nodes lose the heartbeat but they can still see the quorum disk?  Don't they still have quorum and cause split-brain? ==== 
     253==== In a 2 node cluster, what happens if both nodes lose the heartbeat but they can still see the quorum disk?  Don't they still have quorum and cause split-brain? ==== #quorumdiskonly 
    291254First of all, no, they don't cause split-brain.  As soon as heartbeat contact is lost, both nodes will realize something is wrong and lock GFS until it gets resolved and someone is fenced. 
    292255 
    293256What actually happens depends on the configuration and the heuristics you build. The qdisk code allows you to build non-cluster heuristics to determine the fitness of each node beyond the heartbeat. With the heuristics in place, you can, for example, allow the node running a specific service to have priority over the other node. It's a way of saying "This node should win any tie" in case of a heartbeat failure.  The winner fences the loser. 
    294257 
    295 If both nodes still have a majority score according to their heuristics, then both nodes will try to fence each other, and the fastest node kills the other.  [[FAQ/CMAN#two_node_dual|Showdown at the Cluster Corral]]. The remaining node will have quorum along with the qdisk, and GFS will run normally under that node.  When the "loser" reboots, unlike with a cman operation, it will  become quorate with just the quorum disk/partition, so it cannot cause split-brain that way either. 
     258If both nodes still have a majority score according to their heuristics, then both nodes will try to fence each other, and the fastest node kills the other.  [#two_node_dual Showdown at the Cluster Corral]. The remaining node will have quorum along with the qdisk, and GFS will run normally under that node.  When the "loser" reboots, unlike with a cman operation, it will  become quorate with just the quorum disk/partition, so it cannot cause split-brain that way either. 
    296259 
    297260At this point (4-Apr-2007), if there are no heuristics defined whatsoever, the QDisk master node wins (and fences the non-master node). 
    298261 
    299 <<Anchor(cman_quorum3)>> 
    300  
    301 ==== If my cluster is mission-critical, can I override quorum rules and have a "last-man-standing" cluster that's still functioning? ==== 
     262==== If my cluster is mission-critical, can I override quorum rules and have a "last-man-standing" cluster that's still functioning? ==== #cman_quorum3 
    302263This may not be a good idea in most cases because of the dangers of split-brain, but there is a way you can do this: You can adjust the "votes" for the quorum disk to be equal to the number of nodes in the cluster, minus 1 
    303264 
    304265For example, if you have a four-node cluster, you can set the quorum disk votes to 3, and expected_votes to 7. That way, even if three of the four nodes die, the remaining node may still function.  That's because the quorum disk's 3 votes plus the remaining node's 1 vote makes a total of 4 votes out of 7, which is enough to establish quorum. Additionally, all of the nodes can be online - but not the qdiskd (which you might need to take down for maintenance or reconfiguration). 
    305266 
    306 <<Anchor(cman_rejected)>> 
    307  
    308 ==== My cluster won't come up.  It says: kernel: CMAN: Cluster membership rejected. What do I do? ==== 
     267==== My cluster won't come up.  It says: kernel: CMAN: Cluster membership rejected. What do I do? ==== #cman_rejected 
    309268One or more of the nodes in your cluster is rejecting the membership of this node. Check the syslog (/var/log/messages) on all remaining nodes in the cluster for messages regarding why the membership was rejected. 
    310269 
     
    334293It's because node E still thinks it's part of the cluster and still has a claim on the cluster name.  You still need to shut down the cluster software on E, or else reboot it before the correct nodes can form a cluster. 
    335294 
    336 <<Anchor(cman_grporder)>> 
    337  
    338 ==== Is it a problem if node order isn't the same for all nodes in cman_tool services? ==== 
     295==== Is it a problem if node order isn't the same for all nodes in cman_tool services? ==== #cman_grporder 
    339296No, this isn't a problem and can be ignored.  Some nodes may report [1 2 3 4 5] while others report a different order, like [4 3 5 2 1]. This merely has to do with the order in which cman join messages are received. 
    340297 
    341 <<Anchor(cman_tool_leave)>> 
    342  
    343 ==== Why does cman_tool leave say "cman_tool: Can't leave cluster while there are X active subsystems"? ==== 
     298==== Why does cman_tool leave say "cman_tool: Can't leave cluster while there are X active subsystems"? ==== #cman_tool_leave 
    344299This message indicates that you tried to leave the cluster from a node that still has active cluster resources, such as mounted GFS file systems. 
    345300 
    346301A node cannot leave the cluster if there are subsystems (e.g. DLM, GFS, rgmanager) active. You should unmount all GFS filesystems, stop the rgmanager service, stop the clvmd service, stop fenced and anything else using the cluster manager before using cman_tool leave.  You can use cman_tool status and cman_tool services to see how many (and which) services are running. 
    347302 
    348 <<Anchor(cman_tool_services)>> 
    349  
    350 ==== What are these services/subsystems and how do I make sense of what cman_tool services prints? ==== 
     303==== What are these services/subsystems and how do I make sense of what cman_tool services prints? ==== #cman_tool_services 
    351304Although this may be an over-simplification, you can think of the services as a big membership roster for different special interest groups or clubs.  Each "service-name" pair corresponds to access to a unique resource, and each node corresponds to a voting member in the club. 
    352305 
     
    417370The "state" of each service corresponds to its status in the group: "run" means it's a normal member.  There are also states corresponding to joining the group, leaving the group, recovering its locks, etc. 
    418371 
    419 <<Anchor(cman_leaving)>> 
    420  
    421 ==== What can cause a node to leave the cluster? ==== 
     372==== What can cause a node to leave the cluster? ==== #cman_leaving 
    422373A node may leave the cluster for many reasons.  Among them: 
    423374 
     
    435386 * No response to messages: This usually happens during a state transition to add or remove another node from a group.  The reporting node sent a message five times (by default) to the named node and did not get a response. 
    436387 
    437 <<Anchor(cman_hello_timer)>> 
    438  
    439 ==== How do I change the time interval for the heartbeat messages? ==== 
     388==== How do I change the time interval for the heartbeat messages? ==== #cman_hello_timer 
    440389Just add hello_timer="value" to the cman section in your cluster.conf file.  For example: 
    441390 
     
    445394The default value is 5 seconds. 
    446395 
    447 <<Anchor(cman_deadnode_timer)>> 
    448  
    449 ==== How do I change the time after which a non-responsive node is considered dead? ==== 
     396==== How do I change the time after which a non-responsive node is considered dead? ==== #cman_deadnode_timer 
    450397 * For RHEL4 and STABLE branches: Just add deadnode_timeout="value" to the cman section in your cluster.conf file. For example: 
    451398 {{{ 
     
    460407  . The default value is 10000 milliseconds (or 10 seconds). ''It is important to change this value if you are using QDisk on RHEL5/STABLE2; 21000 should work if you left QDiskd's interval/tko at their default values.'' 
    461408 
    462 <<Anchor(split_brain)>> 
    463  
    464 ==== What does "split-brain" mean? ==== 
     409==== What does "split-brain" mean? ==== #split_brain 
    465410"Split brain" is a condition whereby two or more computers or groups of computers lose contact with one another but still act as if the cluster were intact. This is like having two governments trying to rule the same country.  If multiple computers are allowed to write to the same file system without knowledge of what the other nodes are doing, it will quickly lead to data corruption and other serious problems. 
    466411 
    467412Split-brain is prevented by enforcing quorum rules (which say that no group of nodes may operate unless they are in contact with a majority of all nodes) and fencing (which makes sure nodes outside of the quorum are prevented from interfering with the cluster). 
    468413 
    469 <<Anchor(cman_heartbeat_nic)>> 
    470  
    471 ==== What's the "right" way to get cman to use a different NIC, say, eth2 rather than eth0? ==== 
     414==== What's the "right" way to get cman to use a different NIC, say, eth2 rather than eth0? ==== #cman_heartbeat_nic 
    472415There are several reasons for doing this. You may want to do this in cases where you want the cman heartbeat messages to be on a dedicated network so that a heavily used network doesn't cause heartbeat messages to be missed (and nodes in your cluster to be fenced). Second, you may have security reasons for wanting to keep these messages off of an Internet-facing network. 
    473416 
     
    484427192.168.0.1     node-01-p 
    485428}}} 
     429Once you've done this you need to make sure that your cluster.conf uses the name with the -p suffix rather than the old name. Note that -p is just a suggestion for names you could use -internal or anything else really. 
     430 
    486431If you're using RHEL4.4 or above, or 5.1 or above, that's all you need to do.  There is code in cman to look at all the active network interfaces on the node and find the one that corresponds to the entry in cluster.conf. Note that this only works on ipv4 interfaces. 
    487432 
    488 <<Anchor(broadcastmulticast)>> 
    489  
    490 ==== Does cluster suite use multicast or broadcast? ==== 
    491 By default, the older cluster infrastructure (RHEL4, STABLE and so on) uses broadcast. By default, the newer cluster infrastructure with openais (RHEL5, HEAD and so on) uses multicast. You can [[FAQ/CMAN#broadcastmulticast2|configure a RHEL4 cluster to use multicast rather than broadcast.]] However, you can't switch openais to use broadcast. 
    492  
    493 <<Anchor(cman_subnets)>> 
    494  
    495 ==== Is it possible to configure a cluster with nodes running on different networks (subnets)? ==== 
    496 Yes, it is. If you configure the cluster to use [[FAQ/CMAN#broadcastmulticast2|multicast rather than broadcast]] (there is an option for this in system-config-cluster) then the nodes can be on different subnets. 
     433==== Does cluster suite use multicast or broadcast? ==== #broadcastmulticast 
     434By default, the older cluster infrastructure (RHEL4, STABLE and so on) uses broadcast. By default, the newer cluster infrastructure with openais (RHEL5, HEAD and so on) uses multicast. You can [#broadcastmulticast2 configure a RHEL4 cluster to use multicast rather than broadcast.] However, you can't switch openais to use broadcast. 
     435 
     436==== Is it possible to configure a cluster with nodes running on different networks (subnets)? ==== #cman_subnets 
     437Yes, it is. If you configure the cluster to use [#broadcastmulticast2 multicast rather than broadcast] (there is an option for this in system-config-cluster) then the nodes can be on different subnets. 
    497438 
    498439Be careful that any switches and/or routers between the nodes are of good specification and are set to pass multicast traffic though. 
    499440 
    500 <<Anchor(broadcastmulticast2)>> 
    501  
    502 ==== How can I configure my RHEL4 cluster to use multicast rather than broadcast? ==== 
     441==== How can I configure my RHEL4 cluster to use multicast rather than broadcast? ==== #broadcastmulticast2 
    503442Put something like this in your cluster.conf file: 
    504443 
     
    508447</clusternode> 
    509448}}} 
    510 <<Anchor(rhel5_cman_wontstart)>> 
    511  
    512 ==== On RHEL5, why do I get "cman not started: Can't bind to local cman socket /usr/sbin/cman_tool"? ==== 
     449 
     450==== On RHEL5, why do I get "cman not started: Can't bind to local cman socket /usr/sbin/cman_tool"? ==== #rhel5_cman_wontstart 
    513451There is currently a known problem with RHEL5 whereby system-config-cluster is trying to improperly access /usr/sbin/cman_tool (cman_tool currently resides in /sbin). We'll correct the problem, but in the meanwhile, you can circumvent the problem by creating a symlink from /sbin/cman_tool to /usr/sbin/.  For example: 
    514452 
     
    524462Check /var/run is writable and able to hold Unix domain sockets. 
    525463 
    526 <<Anchor(cman_wont_start_f8)>> 
    527  
    528 ==== On Fedora 8, CMAN won't start, complaining about "aisexec not started".  How do I fix it? ==== 
     464==== On Fedora 8, CMAN won't start, complaining about "aisexec not started".  How do I fix it? ==== #cman_wont_start_f8 
    529465On Fedora 8 and other distributions where the core supports multiple architectures (ex: x86, x86_64), you must have a matched set of packages installed.  A cman package for x86_64 will not work with an x86 (i386/i686) openais package, and vice-versa.  To see if you have a mixed set, run: 
    530466 
     
    551487Note: If you were having trouble getting things up, there's a chance that an old aisexec process might be running on one of the nodes; make sure you kill it before trying to start again! 
    552488 
    553 <<Anchor(cman_cisco_switches)>> 
    554  
    555 ==== My RHEL5 or similar cluster won't work with my Cisco switch. ==== 
     489==== My RHEL5 or similar cluster won't work with my Cisco switch. ==== #cman_cisco_switches 
    556490'''Some nodes can not see each other; ping works!  Why?''' 
    557491 
     
    565499Since openais uses multicast for cluster communications, you may have to enable it in the switch in order to use the cluster software. 
    566500 
    567 Before making any changes to your Cisco switches it is adviseable to contact your Cisco TAC to ensure the changes will have no negative consequences in your network. Please visit this page for more information: http://www.openais.org/doku.php?id=faq:cisco_switches 
     501Before making any changes to your Cisco switches it is adviseable to contact your Cisco TAC to ensure the changes will have no negative consequences in your network. Please visit this page for more information: [http://www.openais.org/doku.php?id=faq:cisco_switches OpenAIS - Cisco Switches] 
    568502 
    569503===== Solution #2: Work around the switch ===== 
     
    572506{{{ 
    573507<cman ... > 
    574   <multicast addr="225.0.0.12" /> 
     508  <multicast addr="225.0.0.13" /> 
    575509</cman> 
    576510}}} 
    577 The address '''225.0.0.12''' is known to work in some environments when the standard openais multicast address does not.  Note, however, that this address lies within a reserved range of multicast addresses and may not be suitable for use in the future: 
     511The address '''225.0.0.x''' is known to work in some environments when the standard openais multicast address does not.  Note, however, that this address lies within a reserved range of multicast addresses and may not be suitable for use in the future: 
    578512 
    579513{{{ 
    580514225.000.000.000-231.255.255.255 Reserved                     [IANA] 
    581515}}} 
    582 Source: http://www.iana.org/assignments/multicast-addresses 
    583  
    584 <<Anchor(large_clusters)>> 
    585  
    586 ==== I created a large RHEL5 cluster but it falls apart when I boot it. ==== 
     516Source: [http://www.iana.org/assignments/multicast-addresses - IANA - Multicast Addresses] 
     517 
     518==== My RHEL5 or similar cluster won't work with my HP switch. ==== #cman_hp_switches 
     519Some HP servers and switches do not play well together when using Linux.  More information, and a workaround, is available [http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c01843037&lang=en&cc=us&taskId=101&prodSeriesId=3355016&prodTypeId=12454 here]. 
     520 
     521==== I created a large RHEL5 cluster but it falls apart when I boot it. ==== #large_clusters 
    587522The default parameters for a RHEL5 cluster are usually enough to get a small to medium size cluster running, say up to around 16 modes. 
    588523 
     
    601536These numbers are not definitive and might not work perfectly at your site. Other variables such as network and host load come into play. But they should, I hope, be a good starting point for people wanting to run larger RHEL5 clusters. 
    602537 
    603 <<Anchor(rejoin)>> 
    604  
    605 ==== [MAIN ] Killing node mynode01 because it has rejoined the cluster with existing state ==== 
     538==== [MAIN ] Killing node mynode01 because it has rejoined the cluster with existing state ==== #rejoin 
    606539What this message means is that a node was a valid member of the cluster once; it then left the cluster (without being fenced) and rejoined automatically. This can sometimes happen if the ethernet is disconnected for a time, usually a few seconds. 
    607540 
     
    610543Another (more common) cause of this, is slow responding of some Cisco switches as documented above. 
    611544 
    612 <<Anchor(dirty)>> 
    613  
    614 ==== What is the "Dirty" flag that cman_tool shows, and should I be worried? ==== 
     545==== What is the "Dirty" flag that cman_tool shows, and should I be worried? ==== #dirty 
    615546The short answer is "No, you should not be worried". 
    616547 
    617 All this flag indicates is that there are cman daemons running that have state which cannot be rebuilt without a node reboot. This can be as simple (in concept!) as a DLM lockspace or a fence domain. When a cluster has state the dirty flags is set (it cannot be reset) and this prevents two stateful clusters merging, as the two states cannot be reconciled. In some cases this can cause the message shown [[FAQ/CMAN#rejoin|above]]. 
    618  
    619 The main reason for this flag is to prevent state corruption where the cluster is evenly split (so that fencing cannot occur) and tries to merge back again. Neither side of the cluster knows if the other side's state has changed and there is no mechanism for performing a state merge. So one side gets marked disallowed or is fenced, depending on quorum. Fencing can only be done by a quorate partition. 
    620  
    621 This flag has been renamed to "HaveState" in STABLE3. 
    622  
    623 <<Anchor(plea)>> 
    624 ==== Chrissie's plea to people submitting logs for bug reports ==== 
    625  
    626 Please, please *always* attach full logs. I'd much rather have 2GB of log files 
    627 to wade through than 1K of truncated logs that don't show what I'm looking for.  
    628  
    629 I'm very good at filtering log files, it's my job and I've been doing it for a 
    630 very long time now! And it's quite possible that I might spot something 
    631 important that looks insignificant to you. 
     548All this flag indicates is that there are cman daemons running that have state which cannot be rebuilt without a node reboot. This can be as simple (in concept!) as a DLM lockspace or a fence domain. When a cluster has state the dirty flags is set (it cannot be reset) and this prevents two stateful clusters merging, as the two states cannot be reconciled. In some cases this can cause the message shown [#rejoin above]. Many daemons can set this flag. eg: fence_tool join will set it, via fenced. As will clvmd (because it instantiates a lock space). Think of it as a "we have some daemons running" flag if you like ! 
     549 
     550The main reason for the flag is to prevent state corruption where the cluster is evenly split (so that fencing cannot occur) and tries to merge back again. Neither side of the cluster knows if the other side's state has changed and there is no mechanism for performing a state merge. So one side gets marked disallowed or is fenced, depending on quorum. Fencing can only be done by a quorate partition. 
     551 
     552This flag has been renamed to "HaveState" in STABLE3 so as to panic people less. In general most users can ignore this flag. 
     553 
     554==== Chrissie's plea to people submitting logs for bug reports ==== #plea 
     555Please, please *always* attach full logs. I'd much rather have 2GB of log files to wade through than 1K of truncated logs that don't show what I'm looking for. 
     556 
     557I'm very good at filtering log files, it's my job and I've been doing it for a very long time now! And it's quite possible that I might spot something important that looks insignificant to you. 
     558 
     559==== Cluster name limitations ==== #cman_name 
     560* 15 non-NUL (ASCII 0) characters * You can use the 'alias' attribute to make a more descriptive name.