Some info on how high-availability linux cluster can be accomplished
Most of the information is from following links:
DRBD will make sure that there's one consistent filesystem (or raw device) shared by the two servers. It may only be mounted by the (currently) primary server, and all the changes are mirrored to the secondary (slave) server. After one of the systems goes down, the secondary system will get primary, mount the filesystem and work instead of the fallen one (see later - heartbeat section).
First we need to have the DRBD kernel module - http://oss.linbit.com/drbd/ and some development packages (kernel-devel for the current kernel, gcc, maybe rpm-build or something similar for deb...). Download, unpack, ./configure, make... Afterwards move the module to kernel tree, or make packages (make rpm), install it and that's it.
Main configuration file is /etc/drbd.conf
, the same file is placed on both servers.
It could look like this (see something like /usr/share/doc/drbd-*/drbd.conf
):
global { usage-count yes; } common { syncer { rate 100M; } protocol C; } resource r0 { device /dev/drbd0; meta-disk internal; handlers { #shutdown after non-automatically-solvable sync errors (default values) pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f"; pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f"; local-io-error "echo o > /proc/sysrq-trigger ; halt -f"; outdate-peer "/usr/lib/heartbeat/drbd-peer-outdater -t 5"; } startup { degr-wfc-timeout 120; } disk { on-io-error detach; } net { cram-hmac-alg "sha1"; shared-secret "some-funny-password-here"; #disconnect after non-automatically-solvable sync errors (default values) after-sb-0pri disconnect; after-sb-1pri disconnect; after-sb-2pri disconnect; rr-conflict disconnect; } on server1 { disk /dev/sdb1; addess 192.168.5.128:7788; } on server2 { disk /dev/sdb1; addess 192.168.5.129:7788; } }
Then well prepare the shared device:
server1: #a warning/error may rise on the next command, in case the /dev/sdb1 (or whichever is used) contains #some filesystem... drbdadm create-md r0 drbdadm attach r0 drbdadm connect r0 server2: drbdadm create-md r0 drbdadm attach r0 drbdadm connect r0 server1: drbdadm -- --overwrite-data-of-peer primary r0 # now the synchronization will take place, you may want to wait - but it shouldn't be neccessary mkfs.jfs /dev/drbd0 mkdir /mnt/data mount /dev/drbd0 /mnt/data
Now the two systems should share the disk. However mount /dev/drbd0 /mnt/data
on server2 won't work, not even in read-only mode, and not even when the filesystem is unmounted on the primary system. To test the functionality, you may switch systems' roles:
server1: touch /mnt/data/test umount /mnt/data drbdadm secondary r0 server2: drbdadm primary r0 mount /dev/drbd0 /mnt/data ls /mnt/data ... :-)