wiki:ResourceTrees
Last modified 3 years ago Last modified on 05/22/11 17:17:19

Resource Trees - Basics / Definitions

    <service name="foo" ...>
        <fs name="myfs" ...>
            <script name="script_child"/>
        </fs>
        <ip address="10.1.1.2" .../>
    </service>
  • Resource trees are XML representations of resources, their attributes, parent/child and sibling relationships. The root of a resource tree is almost always a special type of resource called a service. Resource tree, resource group, and service are usually used interchangeably on this wiki. From rgmanager's perspective, a resource tree is an atomic unit. All components of a resource tree are started on the same cluster node.
  • fs:myfs and ip:10.1.1.2 are siblings
  • fs:myfs is the parent of script:script_child
  • script:script_child is the child of fs:myfs

Parent / Child Relationships, Dependencies & Start Ordering

The rules for parent/child relationships in the resource tree are fairly simple:

  • Parents are started before children
  • Children must all stop (cleanly) before a parent may be stopped
  • From these two, you could say that a child resource is dependent on its parent resource
  • In order for a resource to be considered in good health, all of its dependent children must also be in good health

Sibling Start Ordering & Resource Child Ordering

RGManager allows specification of a start/stop ordering relationship for classes of child resources. At the top level, we have the service resource - a special resource which acts as a link between rgmanager's service placement, dependency, and real resources themselves. That is, the service resource is container for other resources. Let's examine the service resource's defined child ordering:

    <special tag="rgmanager">
        <attributes root="1" maxinstances="1"/>
        <child type="lvm" start="1" stop="9"/>
        <child type="fs" start="2" stop="8"/>
        <child type="clusterfs" start="3" stop="7"/>
        <child type="netfs" start="4" stop="6"/>
        <child type="nfsexport" start="5" stop="5"/>

        <child type="nfsclient" start="6" stop="4"/>

        <child type="ip" start="7" stop="2"/>
        <child type="smb" start="8" stop="3"/>
        <child type="script" start="9" stop="1"/>
    </special>

The start attribute is the order (1..100) of that class of resources. This means, as you have already guessed, that all lvm children are started first, followed by all fs children, followed by all script children, and so forth. Ordering within a given resource type is preserved as it exists in cluster.conf. For example, consider the following service:

    <service name="foo">
        <script name="1" .../>
        <lvm name="1" .../>
        <ip address="10.1.1.1" .../>
        <fs name="1" .../>
        <lvm name="2" .../>
    </service>
  • The start ordering would be:
      lvm:1            # All lvms are started first... 
      lvm:2            #
      fs:1             # then file systems...
      ip:10.1.1.1      # then ip addresses...
      script:1         # finally, scripts.
    
  • The stop ordering would be:
      script:1
      ip:1
      fs:1
      lvm:2
      lvm:1
    

With all the type-specified children, it's also important to note that all untyped children - children of a given resource node which do not have a <child> definition in the resource agent metadata - are all started according to their order in cluster.conf and stopped in reverse order. They are started after all type-specified children and stopped before any typed children.

For example:

    <service name="foo">
        <script name="1" .../>
        <untypedresource name="foo"/>
        <lvm name="1" .../>
        <untypedresourcetwo name="bar"/>
        <ip address="10.1.1.1" .../>
        <fs name="1" .../>
        <lvm name="2" .../>
    </service>
  • The start ordering would be:
      lvm:1
      lvm:2
      fs:1
      ip:10.1.1.1
      script:1
      untypedresource:foo
      untypedresourcetwo:bar
    
  • The stop ordering would be:
      untypedresourcetwo:bar
      untypedresource:foo
      script:1
      ip:1
      fs:1
      lvm:2
      lvm:1
    

Inheritance, the <resources> Block, and Reusing Resources

Some resources benefit from inheriting values from a parent resource. The most common, practical example I can give you is an NFS service. Here's a typical NFS service configuration, set up for resource reuse and inheritance:

    <resources>
        <nfsclient name="bob" target="bob.test.com" options="rw,no_root_squash"/>
        <nfsclient name="jim" target="jim.test.com" options="rw,no_root_squash"/>
        <nfsexport name="exports"/>
    </resources>
    <service name="foo">
        <fs name="1" mountpoint="/mnt/foo" device="/dev/sdb1" fsid="12344">
            <nfsexport ref="exports">  <!-- nfsexport's path and fsid attributes
                                            are inherited from the mountpoint &
                                            fsid attribute of the parent fs 
                                            resource -->
                <nfsclient ref="bob"/> <!-- nfsclient's path is inherited from the
                                            mountpoint and the fsid is added to the
                                            options string during export -->
                <nfsclient ref="jim"/>
            </nfsexport>
        </fs>
        <fs name="2" mountpoint="/mnt/bar" device="/dev/sdb2" fsid="12345">
            <nfsexport ref="exports">
                <nfsclient ref="bob"/> <!-- Because all of the critical data for this
                                            resource is either defined in the 
                                            resources block or inherited, we can
                                            reference it again! -->
                <nfsclient ref="jim"/>
            </nfsexport>
        </fs>
        <ip address="10.2.13.20"/>
    </service>

If we were to have flat service (a service with no parent/child relationships), there are a couple of things that would be needed:

  • We'd need four nfsclient resources - one per file system (2) * one per per target machine (2) = 4
  • We would have to specify export path & file system ID to each nfsclient, which introduces a greater chance for an error in the configuration.

With the above configuration, however, the resources named nfsclient:bob and nfsclient:jim are defined once, as is nfsexport:exports. Everything which is needed to be known by those resources is inherited. Because the inherited attributes are dynamic (and do not conflict with one another), it is possible to reuse these resources - which is why they are defined in the resources block. Some resources can not be used in multiple places (e.g. fs resources - mounting a file system on 2 nodes is a bad idea!), but there's no harm in defining them in the resources block if that is your preference.

Customizing Resource Actions

See ResourceActions

Failure Recovery & Independent Subtrees

When a start operation fails for any resource, the operation immediately fails for the whole service and no additional resource start operations are attempted.

When a stop operation fails for any resource, all other remaining resources in the service are attempted to be stopped, even if the expected result is a failure to stop. This is done in order to reduce the amount of running resources as much as possible before placing the service in to the failed state.

Independent Subtrees

When a status check fails for a resource, the normal course of action is to restart the whole service. Suppose we have the following service:

    <service name="foo">
        <script name="script_one" ...>
            <script name="script_two" .../>
            <script name="script_three" .../>
        </script>
        <script name="script_four" .../>
    </service>

If any of the scripts defined in this service fail, the normal course of action is to restart (or relocate/disable, according to the service recovery policy) the service. What if, however, we wanted parts of the service to be considered non-critical? What if we wanted to restart only part of the service in place - before attempting normal recovery? The solution is what we call the __independent_subtree attribute. It's used in the following way:

    <service name="foo">
        <script name="script_one" __independent_subtree="1" ...>
            <script name="script_two" __independent_subtree="1" .../>
            <script name="script_three" .../>
        </script>
        <script name="script_four" .../>
    </service>
  • If script:script_one fails, we restart script:script_two, script:script_three and script:script_one
  • If script:script_two fails, we restart just script:script_two
  • If script:script_three fails, we restart script:script_one, script:script_two, and script:script_three
  • If script:script_four fails, we restart the whole service

If an independent subtree is successfully restarted, rgmanager performs no other recovery actions.

Independent subtrees may also have per-subtree restart counters, similar to service restart counters. They are declared by adding __max_restarts and __restart_expire_time to a given __independent_subtree declaration. If a subtree's restart counters are exceeded, the service goes in to recovery. Otherwise, successful restarts of an independent subtree are not considered errors.

Non-Critical Subtrees

What if you want a subset of resources to be considered non-critical? For example, suppose script:script_one and its children are far less important than script::script_four, and we want to keep script:script_four up even if the others fail. You can tag a subtree not only as independent in rgmanager, but also non-critical. This is done by setting the __independent_subtree attribute to 2:

    <service name="foo">
        <script name="script_one" __independent_subtree="2" ...>
            <script name="script_two" __independent_subtree="1" .../>
            <script name="script_three" .../>
        </script>
        <script name="script_four" .../>
    </service>
  • If script:script_one fails, we stop script:script_two, script:script_three and script:script_one
  • If script:script_two fails, we restart just script:script_two (Notice how script_two is a normal independent subtree!)
  • If script:script_three fails, we stop script:script_one, script:script_two, and script:script_three
  • If script:script_four fails, we restart the whole service

A non-critical subtree is immediately stopped if an error occurs at any level of the subtree and __max_restarts or __restart_expire_time are unset.

Whenever a non-critical subtree's maximum restart threshold is exceeded, the subtree is stopped, and the service gains a P flag (partial). It is possible to restore a service to full operation by using the clusvcadm -c (convalesce) operation.

Testing your Configuration

We provide a utility for debugging/testing services and resource ordering called rg_test. rg_test can:

  • Show you the resource rules it understands:
    rg_test rules
    
  • Test your configuration (and /usr/share/cluster) for errors or redundant resource agents:
    rg_test test /etc/cluster/cluster.conf
    
  • Show you the start/stop ordering of a given service
    rg_test noop /etc/cluster/cluster.conf start service <servicename>
    rg_test noop /etc/cluster/cluster.conf stop service <servicename>
    
  • Explicitly start/stop a service (NOTE: Only do this on one node, and always disable the service in rgmanager first!). This is useful for debugging configurations or looking for errors before putting a service into production:
    rg_test test /etc/cluster/cluster.conf start service <servicename>
    rg_test test /etc/cluster/cluster.conf stop service <servicename>
    
  • Explicitly start/stop a resource (NOTE: Only do this on one node, and always disable the parent service in rgmanager first Also, this does NOT start the rest of the service(s) which reference this resource; only the resource itself):
    rg_test test /etc/cluster/cluster.conf start <resource_type> <primary_attribute>
    rg_test test /etc/cluster/cluster.conf stop <resource_type> <primary_attribute>
    
  • Calculate and display the resource tree delta between two cluster.confs:
    rg_test delta /etc/cluster/cluster.conf.bak /etc/cluster/cluster.conf