DRBD [ClearOS Documentation]

DRBD

DRBD is a data replication engine that works well in redundant ClearBOX configurations. With DRBD you can replicate the data storage so that in the event of a failover, data from the volumes will be available to the redundant server.

This page contains instructions for installing DRBD on ClearOS 5.x For instructions on installing DRBD on ClearOS 6.x, click here.

Getting started

Hardware

ClearBOX 300 is an appropriate platform for DRBD because it has sufficient network interfaces to handle replication of data and the required heartbeat engine for fail over. For greater storage, use ClearBOX 400 or contact ClearCenter for additional hardware options.

At a minimum, you will need a connection for:

DRDB connection requirements
LAN facing interface. This is where you will offer services.
Replication interface. This should be a dedicated connection for data transfer only
Heartbeat interface. This should be used for supporting heartbeat services. Can be a NIC or Serial interface

You will need to designate a box to be your primary and a separate box to be your secondary. The disk(s) of the secondary box must meet or exceed the performance of the primary so that replication will not bottleneck at the secondary.

Software

You will need to install both the DRDB software packages and a kernel module for DRDB appropriate for the ClearOS kernel. Please contact ClearCenter support for this software.

Determine whether you are using the PAE kernel or the standard kernel by running the following:

uname -a

If you are running the PAE kernel, install DRBD by running the following in the directory where the software is located:

rpm -Uvh drbd83-8.3.12-2.v5.i386.rpm kmod-drbd83-PAE-8.3.12-1.v5.i686.rpm

If you are using the standard (non-PAE) kernel, run the following in the directory where the software is located:

rpm -Uvh drbd83-8.3.12-2.v5.i386.rpm kmod-drbd83-8.3.12-1.v5.i686.rpm

Once this is installed you will be able to set up DRBD using the next section.

ClearBOX

If you are using ClearBOX, you will probably want to decouple the /store/data0 partition and reuse this for your data. Be sure to backup this partition before performing the steps.

Stop the following services (prevent them from automatically starting up as well, we will add these services to our high availability daemon):

service httpd stop
service smb stop
service nmb stop
service cyrus-imapd stop
service mysqld stop
chkconfig --level 2345 httpd off
chkconfig --level 2345 smb off
chkconfig --level 2345 nmb off
chkconfig --level 2345 cyrus-imapd off
chkconfig --level 2345 mysqld off

Unmount the bind mounts (must be logged in as root, not su):

umount /home
umount /var/spool/imap
umount /var/lib/mysql
umount /root/support
umount /var/samba/drivers
umount /var/samba/netlogon
umount /var/samba/profiles
umount /var/flexshare/shares
umount /var/www/cgi-bin
umount /var/www/html
umount /var/www/virtual

Finally, unmount the data0 partition.

umount /store/data

You will also want to comment out the mount point for the /store/data0 partition in /etc/fstab (this will prevent the system for automatically mounting the volume on reboot). Also add the noauto option:

#/dev/data/data0         /store/data0            ext3    defaults,noauto        1 2

While you are in this file, change the bind mounts to not automatically mount the devices by adding the noauto option:

/store/data0/live/server1/home                  /home                   none bind,noauto,rw 0 0
/store/data0/live/server1/imap                  /var/spool/imap         none bind,noauto,rw 0 0
/store/data0/live/server1/mysql                 /var/lib/mysql          none bind,noauto,rw 0 0
/store/data0/live/server1/root-support          /root/support           none bind,noauto,rw 0 0
/store/data0/live/server1/samba-drivers         /var/samba/drivers      none bind,noauto,rw 0 0
/store/data0/live/server1/samba-netlogon        /var/samba/netlogon     none bind,noauto,rw 0 0
/store/data0/live/server1/samba-profiles        /var/samba/profiles     none bind,noauto,rw 0 0
/store/data0/live/server1/shares                /var/flexshare/shares   none bind,noauto,rw 0 0
/store/data0/live/server1/www-cgi-bin           /var/www/cgi-bin        none bind,noauto,rw 0 0
/store/data0/live/server1/www-default           /var/www/html           none bind,noauto,rw 0 0
/store/data0/live/server1/www-virtual           /var/www/virtual        none bind,noauto,rw 0 0

Configuring DRBD

Disk

For DRBD to function properly, you will need a volume to replicate. For ClearBOX, we will be using the partition previously used by data0 for this purpose On ClearBOX this is a multi-disk array (even on a single drive system this has been done for easy replacement under predictive failure). On ClearBOX, this is /dev/md3. Confirm this by running the following and finding the largest mirror:

cat /proc/mdstat

Naming

You should adjust your naming of your servers with the DRBD replication network in mind. The servers should be referencing each other via friendly names associated to the DRBD network. For example, if your LAN segment is 192.168.1.x, you will have a separate network for DRBD replication. In our example we will use 172.31.255.x. You will modify your /etc/hosts files (can be done in DNS server settings in Webconfig) so that you have a similar configuration.

/etc/hosts

192.168.1.10 volume.domain.lan volume
172.31.255.1 primary.domain.lan primary
172.31.255.2 secondary.domain.lan secondary

SSH trusting

Optionally, you can set up your servers so that they trust each other which will allow you to SSH between them without requiring a password. This can be convenient for transfering files via SCP or rSYNC.

DRBD configs

We will begin by modifying files.

/etc/drbd.conf

# You can find an example in  /usr/share/doc/drbd.../drbd.conf.example

include "drbd.d/global_common.conf";
include "drbd.d/*.res";

/etc/drbd.d/global_common.conf

global {
	usage-count yes;
	# minor-count dialog-refresh disable-ip-verification
}

common {
	protocol C;

	handlers {
		pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
		pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
		local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
		# fence-peer "/usr/lib/drbd/crm-fence-peer.sh";
		# split-brain "/usr/lib/drbd/notify-split-brain.sh root";
		# out-of-sync "/usr/lib/drbd/notify-out-of-sync.sh root";
		# before-resync-target "/usr/lib/drbd/snapshot-resync-target-lvm.sh -p 15 -- -c 16k";
		# after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
	}

	startup {
		# wfc-timeout degr-wfc-timeout outdated-wfc-timeout wait-after-sb
	}

	disk {
		# on-io-error fencing use-bmbv no-disk-barrier no-disk-flushes
		# no-disk-drain no-md-flushes max-bio-bvecs
	}

	net {
		# sndbuf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers
		# max-epoch-size ko-count allow-two-primaries cram-hmac-alg shared-secret
		# after-sb-0pri after-sb-1pri after-sb-2pri data-integrity-alg no-tcp-cork
	}

	syncer {
		# rate after al-extents use-rle cpu-mask verify-alg csums-alg
	}
}

/etc/drbd.d/data0.res

resource data0 {
  device    drbd0;
  disk      /dev/md3;
  meta-disk internal;
  on primary.domain.lan {
    address   172.31.255.1:7789;
  }
  on secondary.domain.lan {
    address   172.31.255.2:7789;
  }
}

Start DRBD

Once the configuration is in place, you can start the DRBD daemon by running the following:

service drbd start

At this point the server will bring up the volume. You will need to perform these previous actions on both server in order to proceed. Once in place we will issue commands which will coordinate between both servers.

Run the following commands to view the drbd status:

cat /proc/drbd 

drbd-overview

If the disk you are working with is referenced by DRBD as disk 0, run the following:

drbdsetup 0 invalidate-remote
drbdsetup 0 primary

This will start the synchronization process. If you want to monitor the synchronization run the following (use Ctrl+C to stop the monitor):

watch cat /proc/drbd

Setup partition with LVM

You will want to configure the partition with LVM and activate the partition. The specific of setting up the partition with LVM is beyond the scope of the DRBD documentation. Your device is now /dev/drbd0 and NOT /dev/md3.

If /dev/md3 is still in your LVM list, you will need to remove it and add /dev/drbd0 as your LVM volume. Assign it to the name data0. You should get similar results from the following commands:

[root@primary ~]# pvs
  PV         VG   Fmt  Attr PSize   PFree 
  /dev/drbd0 data lvm2 a-   439.97G 13.22G
  /dev/md2   main lvm2 a-    24.41G  7.53G
[root@primary ~]# vgs
  VG   #PV #LV #SN Attr   VSize   VFree 
  data   1   1   0 wz--n- 439.97G 13.22G
  main   1   3   0 wz--n-  24.41G  7.53G
[root@primary ~]# lvs
  LV    VG   Attr   LSize   Origin Snap%  Move Log Copy%  Convert
  data0 data -wi-ao 426.75G                                      
  logs  main -wi-ao   4.88G                                      
  root  main -wi-ao  10.00G                                      
  swap  main -wi-ao   2.00G

Centralized Storage

You will want to set up Centralized Storage for the data on the partition. You will also want to change services so that they don't start unless the data is online. Uncomment the /store/data0 line in /etc/fstab. Also, add the noauto line.

/dev/data/data0         /store/data0            ext3    defaults,noauto        1 2

Mount the data partition and create the necessary structure:

mount /store/data0/
cd /store/data0/
mkdir live
cd live/
mkdir server1
cd server1/
mkdir home imap mysql root-support samba-drivers samba-netlogon samba-profiles shares www-default www-virtual www-cgi-bin
chown winadmin:domain_users samba*
chown mysql:mysql mysql
chown cyrus:mail imap

If you have data previously on root partition (ie. not a ClearBOX previously) then you will need to sync data over to the new partition and delete the previous data before mounting the bind mounts.

rsync -av /home /store/data0/live/server1/home
rsync -av /var/spool/imap /store/data0/live/server1/imap
rsync -av /var/lib/mysql /store/data0/live/server1/mysql
rsync -av /root/support /store/data0/live/server1/root-support
rsync -av /var/samba/drivers /store/data0/live/server1/samba-drivers
rsync -av /var/samba/netlogon /store/data0/live/server1/samba-netlogon
rsync -av /var/samba/profiles /store/data0/live/server1/samba-profiles
rsync -av /var/flexshare/shares /store/data0/live/server1/shares
rsync -av /var/www/cgi-bin /store/data0/live/server1/www-cgi-bin
rsync -av /var/www/html /store/data0/live/server1/www-default
rsync -av /var/www/virtual /store/data0/live/server1/www-virtual

Once the data is removed, you will need to delete the data from the original locations before mounting the devices via bind mounts or you will not be able to reclaim the disk space. Do this step with caution and make sure your data is properly copied to /store/data0 before proceeding.

Mount the new virtual devices:

mount /store/data0/live/server1/imap
mount /store/data0/live/server1/imap/
mount /store/data0/live/server1/mysql/
mount /store/data0/live/server1/root-support/
mount /store/data0/live/server1/samba-drivers/
mount /store/data0/live/server1/samba-netlogon/
mount /store/data0/live/server1/samba-profiles/
mount /store/data0/live/server1/shares/
mount /store/data0/live/server1/www-cgi-bin/
mount /store/data0/live/server1/www-default/
mount /store/data0/live/server1/www-virtual/

Validate your mount points.

mount

At a minimum you should have:

/dev/mapper/data-data0 on /store/data0 type ext3 (rw)
/store/data0/live/server1/home on /home type none (rw,bind)
/store/data0/live/server1/imap on /var/spool/imap type none (rw,bind)
/store/data0/live/server1/mysql on /var/lib/mysql type none (rw,bind)
/store/data0/live/server1/root-support on /root/support type none (rw,bind)
/store/data0/live/server1/samba-drivers on /var/samba/drivers type none (rw,bind)
/store/data0/live/server1/samba-netlogon on /var/samba/netlogon type none (rw,bind)
/store/data0/live/server1/samba-profiles on /var/samba/profiles type none (rw,bind)
/store/data0/live/server1/shares on /var/flexshare/shares type none (rw,bind)
/store/data0/live/server1/www-cgi-bin on /var/www/cgi-bin type none (rw,bind)
/store/data0/live/server1/www-default on /var/www/html type none (rw,bind)

Recovering from Split Brain

Split brain happens when the data on both halves of your DRBD think that they are the right copy. This can cause problems because there is no way to get them to sync up properly.

This can happen frequently if both sides of your DRBD replication are primary (ie. Primary/Primary). To avoid split brain, ensure that the backup copy is set to secondary before taking it down.

Sometimes, take downs of data are unavoidable and boxes my disagree about which should be the authoritative data. If you know which one is the correct data (the survivor) or if you have already tested one side and are satisfied, you can cause the other side (the victim) to rebuild completely by following these steps.

On the survivor, make sure that it is the Primary and it's data is UpToDate. If this is the case, you can use it!

If it is not UpToDate, you should be cautious and concerned. Perhaps you should not consider this to be your survivor if this is the case. Be careful. You may want to get backups.

If it is not Primary, make it primary:

drbdadm primary resource-name

Check and see if it became primary:

cat /proc/drbd

If it is primary, you should use it to validate the data. If the victim is not in sync then make sure you are well and truly disconnected:

drbdadm disconnect resource-name

Ok, now on to the victim, there are ways to get resources that have data maps to sync back up under ideal conditions, this procedure is not that case. In this scenario we are going to kill the victim's data completely. Be sure this is what you want to do.

Be sure you are on the victim for the next destructive steps. Don't do these on a survivor where you have a Primary and UpToDate condition. This is very destructive to data.

On the victim perform the following:

drbdadm secondary resource-name
drbdadm disconnect resource-name
drbdadm create-md resource-name

At this point it will complain that you will be destroying data. If you want to destroy the data so that you can resync from the primary, you will need to answer 'yes'.

drbdadm attach resource-name
drbdadm invalidate resource-name

Back on the primary, run the following:

drbdadm connect resource-name

Now watch the synchronization process:

watch cat /proc/drbd

search?q=clearos%2C%20clearos%20content%2C%20AppName%2C%20app_name%2C%20kb%2C%20howto%2C%20xcategory%2C%20maintainer_dloper%2C%20maintainerreview_x%2C%20keywordfix&btnI=lucky

CLEAROS DOCUMENTATION

Table of Contents

DRBD

Getting started

Hardware

Software

ClearBOX

Configuring DRBD

Disk

Naming

/etc/hosts

SSH trusting

DRBD configs

/etc/drbd.conf

/etc/drbd.d/global_common.conf

/etc/drbd.d/data0.res

Start DRBD

Setup partition with LVM

Centralized Storage

Recovering from Split Brain

Sitemap

Foundation

Company

Partners

Purchase

Contact Us

CLEAROS DOCUMENTATION

Table of Contents

DRBD

Getting started

Hardware

Software

ClearBOX

Configuring DRBD

Disk

Naming

/etc/hosts

SSH trusting

DRBD configs

/etc/drbd.conf

/etc/drbd.d/global_common.conf

/etc/drbd.d/data0.res

Start DRBD

Setup partition with LVM

Centralized Storage

Recovering from Split Brain

Page Tools

Sitemap

Foundation

Company

Partners

Purchase

Contact Us