Using Heartbeat on ClearOS [ClearOS Documentation]

Using Heartbeat on ClearOS

This guide reviews the heartbeat stack of services and covers the basics of implementation. Heartbeat is a set of services designed to provide fail-over capability for services under linux and ClearOS. This open source software allows a backup server to provide services if the primary server fails.

Concepts

Some terms we will use for high availability.

Primary: this server is the default server
Secondary: the server providing redundancy if the primary fails
Master: the server or service containing the authoritative data
Slave: the server or service containing the replicate data of the master.
Split brain: a condition for data and services in a cluster where multiple servers believe that they are authoritatively in control. This condition can lead to corrupted data and requires intervention and recovery.
STONITH: [Shoot The Other Node In The Head] a concept in server redundancy where a node is capable of physically preventing another server from providing data or services by interrupting power or connectivity.
heartbeat: a communication or signalling method between peers in a cluster to inform the other members of that cluster as to the well-being of the node.

Prerequisites

You will need to decide how you will establish a heartbeat will communicate. Common methods are serial or dedicated NIC. You should NOT use the same NIC that you use for the resources your are clustering…ever.

To ease cluster administration we recommend that you unify the SSH key infrastructure so that you can make remote calls and copy files between servers over their shared network.

Installation

yum --enablerepo=clearos-epel,clearos-core install heartbeat

Configuration

There are basically three files that you need to configure for basic heartbeat functionality:

ha.cf Main configuration file
haresources Resource configuration file
authkeys Authentication information

These files need to be created in /etc/ha.d/.

ha.cf

The ha.cf file determines how the cluster communicates. An example file is here:

## /etc/ha.d/ha.cf 
## This configuration is to be the same on both machines
 
keepalive 2
deadtime 10
warntime 5
initdead 15
serial /dev/ttyS0
baud 19200
auto_failback on
node firewall1.domain.lan
node firewall2.domain.lan

On this cluster the server communicates using a Serial rollover cable on COM1. You can use ethernet as well.

It is VITAL that the node names are correct. To determine what node name you should use for this server that you are on, run the following:

uname -n

For the most part, this file should exist the same on ALL nodes in the cluster. The exception being that resources (like communication devices) exist differently between servers.

For specific information, refer to the manual.

haresources

The haresources file contains what will be clustered. It also contains the name of the node that will be the primary node. Following the name of the node, services and IP addresses and other resources are listed in the order that they should start. When a node stops, services are stopped in reverse order as listed.

This file should exist the same on all nodes in the cluster.

Here is an example of an haresources file:

firewall1.domain.lan 192.168.5.3 bypassd smbd nmbd dnsmasq

For details on this file, refer to the manual here.

authkeys

The authkeys file is simple and will look something like this:

auth 1
1 sha1 0123456789abcdef0123456789abcdef

You can generate this file by running the following from command line:

cat <<-'!'AUTH >/etc/ha.d/authkeys
# Automatically generated authkeys file
auth 1
!AUTH
echo "1 sha1" `dd if=/dev/urandom count=4 2>/dev/null | md5sum | cut -c1-32` >> /etc/ha.d/authkeys
echo " "&& echo "New authkeys file generated." && echo "Ensure the following is distributed to all nodes in the cluster (/etc/ha.d/authkeys):" && echo && cat /etc/ha.d/authkeys && echo
chmod 600 /etc/ha.d/authkeys

This file must exist on all nodes in the cluster. Either copy the file to /etc/ha.d/ on those servers or create the file and copy the content verbatim. search?q=clearos%2C%20clearos%20content%2C%20kb%2C%20skunkworks%2C%20categorynetwork%2C%20maintainer_dloper&btnI=lucky

CLEAROS DOCUMENTATION

Table of Contents