Heartbeat Clustering in Linux

Its long time back I have learnt Heartbeat clustering around March-2008, but still this point I never implemented for production servers. This is my first attempt to do it and I am successful in implementing it for two node fail-over Cluster. Clustering is very complex and very advanced topic which I cannot deal with in one post. In this post I will give you some basics of Clustering, advantages of Clustering and configuration of simple fail-over Cluster.
Let’s start.
What is a Cluster any way?
Ans :
A computer cluster is a group of linked computers, working together closely so that in many respects they form a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks. Clusters are usually deployed to improve performance and/or availability over that of a single computer, while typically being much more cost-effective than single computers of comparable speed or availability
www.wikipedia.org.
Cluster terminology.

Node : It’s one of the system/computer which participates with other systems to form a Cluster.

Heartbeat : This a pulse kind of single which is send from all the nodes at regular intervals using a UDP packet so that each system will come to know the status of availability of other node. It’s a kind of door knocking activity like pinging a system, So that each node which are participating in Cluster will come to know the status of other nodes availability in the Cluster.

Floating IP or Virtual IP : This is the IP assigned to the Cluster through which user can access the services. So when ever clients request a service they will be arrived to this IP, and client will not know what are the back-end/actual ip addresses of the nodes. This virtual IP is used to nullify the effect of nodes going down.

Master node : This is the node most of the time where services are run in a High availability Cluster.

Slave node : This is the node which is used in High availability Cluster when master node is down. It will take over the role of servicing the users, when it will not receive heartbeat pulse from master. And automatically gives back the control when the master server is up and running. This slave comes to know about the status of master through heartbeat pulse/signals.

Types of Clusters:
Cluster types can be divided in to two main types
1.
High availability :

These types of Clusters are configured where there should be no downtime. If one node in the cluster goes down second node will take care of serving users without interrupted service with availability of five nines i.e. 99.999%.

2. Load balancing :
These types of Clusters are configured where there are high loads from users. Advantages of load balancing are that users will not get any delays in their request because load on a single system is shared by two or more nodes in the Cluster.

Advantages of Cluster :
1.Reduced Cost : Its cheaper to by 10 normal servers and do cluster on them then buying a high end servers like blade servers, which will do more work than a single blade server which have more processing power.
2. Processing Power
3. Scalability
4. Availability

Configuration files details :
Three main configuration files :

· /etc/ha.d/authkeys
· /etc/ha.d/ha.cf
· /etc/ha.d/haresources

Some other configuration files/folders to know :
/etc/ha.d/resource.d. Files in this directory are very important which contains scripts to start/stop/restart a service run by this Heartbeat cluster.

Before configuration of Heartbeat Cluster these below points to be noted.

Note1 : The contents of ha.cf file are same in all the nodes in a cluster, except ucast and bcast derivatives.

Note2 : The contents of authkeys and haresources files are exact replica on all the nodes in a cluster.

Note3 : A cluster is used to provided a service with high availability/high performance, that service may be a web server, reverse proxy or a Database.

Test scenario setup:
1.
The cluster configuration which I am going to show is a
two node cluster with failover capability for a Squid reverse proxy..
2.For Squid reverse proxy configuration please click here..
3.
Node details are as follows

Node1 :
IpAddress(eth0):10.77.225.21
Subnetmask(eth0):255.0.0.0
Default Gateway(eth0):10.0.0.1
IpAddress(eth1):192.168.0.1(To send heartbeat signals to other nodes)
Sub net mask (eth1):255.255.255.0
Default Gateway (eth1):None(don’t specify any thing, leave blank for this interface default gateway).

Node2 :
IpAddress(eth0):10.77.225.22
Subnetmask(eth0):255.0.0.0
Default Gateway (eth0):10.0.0.1
IpAddress(eth1):192.168.0.2(To send heartbeat signals to other nodes)
Sub net mask (eth1):255.255.255.0
Default Gateway(eth1):None(don’t specify any thing, leave blank for this interface default gateway).


tyle=”font-family: verdana;”>4. Floating Ip address:10.77.225.20

Lets start configuration of Heartbeat cluster. And make a note that ever step in this Heartbeat cluster configuration is divided in two parts parts
1.(configurations on node1)
2.(configurations on node2)

For better understanding purpose

Step1 :
Install the following packages in the same order which is shown. If you did not find the packages online you can download it from our site, click here to download the packages.

Step1(a) : Install the following packages on node1
#rpm -ivh heartbeat-2.1.2-2.i386.rpm
#rpm -ivh heartbeat-ldirectord-2.1.2-2.i386.rpm
#rpm -ivh heartbeat-pils-2.1.2-2.i386.rpm
#rpm -ivh heartbeat-stonith-2.1.2-2.i386.rpm

Step1(b) : Install the following packages on node2
#rpm -ivh heartbeat-2.1.2-2.i386.rpm
#rpm -ivh heartbeat-ldirectord-2.1.2-2.i386.rpm
#rpm -ivh heartbeat-pils-2.1.2-2.i386.rpm
#rpm -ivh heartbeat-stonith-2.1.2-2.i386.rpm



Step2 : By default the main configuration files (ha.cf, haresources and authkeys) are not present in /etc/ha.d/ folder we have to copy these three files from /usr/share/doc/heartbeat-2.1.2 to /etc/ha.d/

Step2(a) : Copy main configuration files from /usr/share/doc/heartbeat-2.1.2 to /etc/ha.d/ on node 1
#cp /usr/share/doc/heartbeat-2.1.2/ha.cf /etc/ha.d/
#cp /usr/share/doc/heartbeat-2.1.2/haresources /etc/ha.d/
#cp /usr/share/doc/heartbeat-2.1.2/authkeys /etc/ha.d/

Step2(b) : Copy main configuration files from /usr/share/doc/heartbeat-2.1.2 to /etc/ha.d/ on node 2
#cp /usr/share/doc/heartbeat-2.1.2/ha.cf /etc/ha.d/
#cp /usr/share/doc/heartbeat-2.1.2/haresources /etc/ha.d/
#cp /usr/share/doc/heartbeat-2.1.2/authkeys /etc/ha.d/



Step3 : Edit ha.cf file
#vi /etc/ha.d/ha.cf

Step3(a) : Edit ha.cf file as follows on node1
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 25
warntime 10
initdead 50
udpport 694
bcast eth1
ucast eth1 192.168.0.1
auto_failback on
node rp1.linuxnix.com
node rp2.linuxnix.com

Step3(b) : Edit ha.cf file as follows on node2
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 25
warntime 10
initdead 50
udpport 694
bcast eth1
ucast eth1 192.168.0.2
auto_failback on
node rp1.linuxnix.com
node rp2.linuxnix.com

Let me explain each entry in detail:
Debugfile :
This is the file where debug info with good details for your heartbeat cluster will be stored, which is very much useful to do any kind of troubleshooting.

Logfile : This is the file where general logging of heartbeat cluster takes place.

Logfacility : This directive is used to specify where to log your heartbeat logs(if its local that indicates store logs locally or if it’s a syslog then store it on remote server and none to disable logging). And there are so many other options, please explore yourself.

Keepalive : This directive is used to set the time interval between heartbeat packets and the nodes to check the availability of other nodes. In this example I specified it as two seconds(keepalive 2).

Deadtime : A node is said to be dead if the other node didn’t receive any update form it.

Warntime : Time in seconds before issuing a “late heartbeat” warning in the logs.

Initdead : With some configurations, the network takes some time to start working after a reboot. This is a separate “deadtime” to handle that case. It should be at least twice the normal deadtime.

Udpport : This is the port used by heartbeat to send heartbeat packet/signals to other nodes to check availability(here in this example I used default port:694).

Bcast : Used to specify on which device/interface to broadcast the heartbeat packets.

Ucast : Used to specify on which device/interface to uni-cast the heartbeat packets.

auto_failback : This option determines whether a resource will automatically fail back to its “primary” node, or remain on whatever node is serving it until that node fails, or an administrator intervenes. In my example I have given as on that indicate if the failed node come back online, control will be given to this node automatically. Let me put it in this way. I have two nodes node1 and node2. My node one machine is a high end one and node is for serving temporary purpose when node 1 goes down. Suppose node1 goes down, node2 will take the control and serve the service, and it will check periodically for node1 starts once it find that node 1 is up, the control is given to node1.

Node : This is used to specify the participated nodes in the cluster. In my cluster only two nodes are participating (rp1 and rp2) so just specify that entries. If in your implementation more nodes are participating please specify a
ll the nodes.



Step4 : Edit haresources file
#vi /etc/ha.d/haresources

Step4(a) : Just specify below entry in last line of this file on node1
rp1.linuxnix.com 10.77.225.20 squid

Step4(b) : Just specify below entry in last line of this file on node1
rp1.linuxnix.com 10.77.225.20 squid

Explanation of each entry :
rp1.linuxnix.com
is the
main node in the cluster
10.77.225.20
is the floating ip address of this cluster.

Squid : This is the service offered by the cluster. And make a note that this is the script file located in /etc/ha.d/ resource.d/.

Note : By default squid script file will not be there in that folder, I created it according to my squid configuration.

What actually this script file contains?
Ans :
This is just a start/stop/restart script for the particular service. So that heartbeat cluster will take care of the starting/stoping/restarting of the service(here its squid).
Here is what squid script file contains.
http://sites.google.com/site/surendra/Home/squid.txt.txt?attredirects=0&d;=1

Step5 : Edit authkeys file, he authkeys configuration file contains information for Heartbeat to use when authenticating cluster members. It cannot be readable or writeable by anyone other than root. so change the permissions of the file to 600 on both the nodes..

Two lines are required in the authkeys file:
A line which says which key to use in signing outgoing packets.
One or more lines defining how incoming packets might be being signed.

Step5 (a) : Edit authkeys file on node1
#vi /etc/ha.d/authkeys
auth 2
#1 crc
2 sha1 HI!
#3 md5 Hello!
Now save and exit the file

Step5 (b) : Edit authkeys file on node2
#vi /etc/ha.d/authkeys
auth 2
#1 crc
2 sha1 HI!
#3 md5 Hello!
Now save and exit the file



Step6 : Edit /etc/hosts file to give entries of host-names for the nodes


Step6(a) : Edit /etc/hosts file on node1 as below


10.77.225.21 rp1.linuxnix.com rp1
10.77.225.22 rp2.linuxnix.com rp2



Step6(b) : Edit /etc/hosts file on node2 as below


10.77.225.21 rp1.linuxnix.com rp1

10.77.225.22 rp2.linuxnix.com rp2

Step7 : Start Heartbeat cluster

Step7(a) : Start heartbeat cluster on node1
#service heartbeat start

Step7(b) : Start heartbeat cluster on node2
#service heartbeat start

Checking your Heartbeat cluster:
If your heartbeat cluster is running fine a Virtual Ethernet Interface is created on
node1 and 10.77.225.20
Clipped output of my first node
# ifconfig

Eth0 Link encap:Ethernet HWaddr 00:02:A5:4C:AF:8E
inet addr:10.77.225.21 Bcast:10.77.231.255 Mask:255.255.248.0
inet6 addr: fe80::202:a5ff:fe4c:af8e/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:5714248 errors:0 dropped:0 overruns:0 frame:0
TX packets:19796 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1533278899 (1.4 GiB) TX bytes:4275200 (4.0 MiB)
Base address:0x5000 Memory:f7fe0000-f8000000

Eth0:0
Link encap:Ethernet HWaddr 00:02:A5:4C:AF:8E
inet addr:10.77.225.20 Bcast:10.77.231.255 Mask:255.255.248.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Base address:0x5000 Memory:f7fe0000-f8000000

Eth1
Link encap:Ethernet HWaddr 00:02:A5:4C:AF:8F
inet addr:192.168.0.1 Bcast:192.168.0.255 Mask:255.255.255.0
inet6 addr: fe80::202:a5ff:fe4c:af8f/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:145979 errors:0 dropped:0 overruns:0 frame:0
TX packets:103753 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:38966724 (37.1 MiB) TX bytes:27640765 (26.3 MiB)
Base address:0x5040 Memory:f7f60000-f7f80000

Try accessing your browser whether Squid is working fine or not. Please follow up coming posts how to troubleshoot heartbeat cluster.

  • i have gone through your doc, you made it really look simple and now i am all set to make a clusture for a web server but, there are some points to be reach.
    1) wether web server sevice will have to run on both machines or not?
    2) how would i connect those two ethernet interfaces to each other and to local switch (need a diagram of lan canling between servers and switch)
    3) configuration realted to squid
    If you provide me this information i think i can setup a cluster with ease

  • Please see in line..
    1) wether web server sevice will have to run on both machines or not?
    >>NO NEED YOU TO RUN THE SERVER SERVICE ON ANY NODE.. HEARTBEAT CLUSTER WILL TAKE CARE OF RUNNING THE SERVICE FOR YOU.. AND ONE MORE THING HEARTBEAT WILL TAKECARE OF RUNNING THE SERVICE ON ACTIVE NODE AND IT WILL TAKE CARE OF STOPING THE SERVICE ON PASIVE NODE.
    2) how would i connect those two ethernet interfaces to each other and to local switch (need a diagram of lan canling between servers and switch)
    >>>ETH1 ON BOTH THE SYSTEMS ARE DIRECTLY CONNECTED WITH A CROSS CABLE.. AND ETH0 ARE CONNECTED TO A SWITCH DIRECTLY..
    3) configuration realted to squid
    >>> EDITED THE POST TO POINT TO SQUID CONFIGURATION..
    If you provide me this information i think i can setup a cluster with ease

  • Anonymous

    Hi Surender,
    First of all thank you very much for the detailed steps.

    I have configured a 2-node cluster as per the steps and everything looks good.
    But in my setup, both the nodes are giving Floating IP for eth0:0. Is it ok? or Something wrong in my config? Please clarify.

    One more thing how to say the setup is configured as Active-Active OR Active-Passive? Please clarify.

    Regards,
    UK

  • >>PLEASE SEE IN LINE..
    I have configured a 2-node cluster as per the steps and everything looks good.
    But in my setup, both the nodes are giving Floating IP for eth0:0. Is it ok? or Something wrong in my config? Please clarify.
    >>> IT SHOULD NOT HAPPEN.. AT A GIVEN TIME ONLY ACTIVE NODE SHOULD HAVE eth0:0 CONFIGURED..
    TO TROUBLESHOOT CLUSTER.. DO AS BELOW..
    1)STOP THE HEARTBEAT CLUSTER ON ACTIVE NODE.. THE SECONDARY NODE SHOULD TAKE CARE OF STARTING THE DEPENDENT SERVICE..
    AND eth0:0 SHOULD COME.. IF THIS WORKS FINE I THINK EVERY THING FINE..
    CHECK THE LOGS OF HEARTBEAT.. MAY BE YOU WILL GET SOME INFO..

    One more thing how to say the setup is configured as Active-Active OR Active-Passive? Please clarify.
    >>>LOAD BALANCING CLUSTER IS CALLED ACTIVE-ACTIVE WHERE AS HA IS CALLED ACTIVE-PASSIVE..
    Regards,
    UK

  • Anonymous

    Excellent. Thanks for your quick response.

    Regd. issue#1, as you mentioned in your blog-step#7, started heartbeat on both node1 and node2. Somehow both the machines are showing eth0:0 and the last started service (i.e. node2) is active and able to access the server with clusterIP. But when I stop it on node2, though node1 show eth0:0 it is not picking up. looks like it is not switching properly…

    Start heartbeat on node1
    start heartbeat on node2
    observation:
    clusterIP on node2 is active. If I stop it on node2 still the clusterIP not accessible from node1.

    FYI: I am using heartbeat for smb service.

    Please let me know if there is any issue in my configs.

    -UK

  • Anonymous

    see /var/log/ha-log on node1 (1st started):
    info: Heartbeat generation: 1265931455
    info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
    info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 – Status: 1
    info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1
    info: glib: ucast: bound send socket to device: eth1
    info: glib: ucast: bound receive socket to device: eth1
    info: glib: ucast: started on port 694 interface eth1 to 10.14.2.131
    info: G_main_add_TriggerHandler: Added signal manual handler
    info: G_main_add_TriggerHandler: Added signal manual handler
    info: G_main_add_SignalHandler: Added signal handler for signal 17
    info: Local status now set to: 'up'
    info: Link nft80fs01a:eth1 up.
    WARN: node nft80fs01b: is dead
    info: Comm_now_up(): updating status to active
    info: Local status now set to: 'active'
    WARN: No STONITH device configured.
    WARN: Shared disks are not protected.
    info: Resources being acquired from nft80fs01b.
    info: Running /etc/ha.d/rc.d/status status
    info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
    info: mach_down takeover complete for node nft80fs01b.
    info: mach_down takeover complete.
    info: Initial resource acquisition complete (mach_down)
    IPaddr[17702]: INFO: Resource is stopped
    heartbeat[17611]: info: Local Resource acquisition completed.
    harc[17753]: info: Running /etc/ha.d/rc.d/ip-request-resp ip-request-resp
    ip-request-resp[17753]: received ip-request-resp 172.25.41.153 OK yes
    ResourceManager[17774]: info: Acquiring resource group: nft80fs01a 172.25.41.153 smb
    IPaddr[17801]: INFO: Resource is stopped
    ResourceManager[17774]: info: Running /etc/ha.d/resource.d/IPaddr 172.25.41.153 start
    IPaddr[17874]: INFO: Using calculated nic for 172.25.41.153: eth0
    IPaddr[17874]: INFO: Using calculated netmask for 172.25.41.153: 255.255.255.0
    IPaddr[17874]: INFO: eval ifconfig eth0:0 172.25.41.153 netmask 255.255.255.0 broadcast 172.25.41.255
    IPaddr[17857]: INFO: Success
    ResourceManager[17774]: info: Running /etc/ha.d/resource.d/smb start
    info: Local Resource acquisition completed. (none)
    info: local resource transition completed.

  • Anonymous

    ********************************
    /var/log/ha-log on node2 (next started):
    info: Heartbeat generation: 1265931585
    info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth1
    info: glib: UDP Broadcast heartbeat closed on port 694 interface eth1 – Status: 1
    info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth1
    info: glib: ucast: bound send socket to device: eth1
    info: glib: ucast: bound receive socket to device: eth1
    info: glib: ucast: started on port 694 interface eth1 to 10.14.2.132
    info: G_main_add_TriggerHandler: Added signal manual handler
    info: G_main_add_TriggerHandler: Added signal manual handler
    info: G_main_add_SignalHandler: Added signal handler for signal 17
    info: Local status now set to: 'up'
    info: Link nft80fs01b:eth1 up.
    WARN: node nft80fs01a: is dead
    info: Comm_now_up(): updating status to active
    info: Local status now set to: 'active'
    WARN: No STONITH device configured.
    WARN: Shared disks are not protected.
    info: Resources being acquired from nft80fs01a.
    info: Running /etc/ha.d/rc.d/status status
    info: No local resources [/usr/share/heartbeat/ResourceManager listkeys nft80fs01b] to acquire.
    info: Taking over resource group 172.25.41.153
    ResourceManager[22155]: 2010/02/15_17:56:58 info: Acquiring resource group: nft80fs01a 172.25.41.153 smb
    IPaddr[22182]: 2010/02/15_17:56:58 INFO: Resource is stopped
    ResourceManager[22155]: 2010/02/15_17:56:58 info: Running /etc/ha.d/resource.d/IPaddr 172.25.41.153 start
    IPaddr[22255]: 2010/02/15_17:56:58 INFO: Using calculated nic for 172.25.41.153: eth0
    IPaddr[22255]: 2010/02/15_17:56:58 INFO: Using calculated netmask for 172.25.41.153: 255.255.255.0
    IPaddr[22255]: 2010/02/15_17:56:58 INFO: eval ifconfig eth0:0 172.25.41.153 netmask 255.255.255.0 broadcast 172.25.41.255
    IPaddr[22238]: 2010/02/15_17:56:58 INFO: Success
    ResourceManager[22155]: 2010/02/15_17:56:58 info: Running /etc/ha.d/resource.d/smb start
    mach_down[22129]: 2010/02/15_17:56:58 info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
    mach_down[22129]: 2010/02/15_17:56:58 info: mach_down takeover complete for node nft80fs01a.
    info: mach_down takeover complete.
    info: Initial resource acquisition complete (mach_down)
    info: Local Resource acquisition completed. (none)
    info: local resource transition completed.

  • Anonymous

    Please update step 4 (a & b). Both are mentioned for node 1. I think 4(b) is for node 2,

  • Please see in line..

    Please update step 4 (a & b). Both are mentioned for node 1. I think 4(b) is for node 2,

    >>> THE CONFIGURATION SHOULD BE SAME ON BOTH THE NODES.. BECAUSE THIS IS MY MASTER NODE.. lET ME PUT IT IN THIS WAY.. SUPPOSE NODE1(ACTIVE) WENT DOWN, NODE TWO(PACIVE) WILL TAKE CARE OF SERVING SMB.. THIS NODE TWO WILL CONTUNIOUSLY SENDING HERTBEAT PULSE TO MASTER NODE TO CHECK THE STAUS.. ONCE NODE1 IS UP NODE TWO WILL CONSIDER THE CONFIG IN THIS STEP4 TO CHECK TO HOW TO TRANSFER THE CONTROL..

  • Please see in line..

    Regd. issue#1, as you mentioned in your blog-step#7, started heartbeat on both node1 and node2. Somehow both the machines are showing eth0:0 and the last started service (i.e. node2) is active and able to access the server with clusterIP. But when I stop it on node2, though node1 show eth0:0 it is not picking up. looks like it is not switching properly…

    >> THATS COOL AND YOUR HEARTBEAT CLUSTER IS WORKING VERY WELL.. FOR YOUR ISSUE THINK IN THIS WAY.. I HAVE TWO NODES.. ONE ACTIVE NODE AND OTHER PASIVE.. WHEN PASIVE WILL GET UP?.. PASIVE WILL GET UP WHEN PASIVE IS NOT RECEIVING HEARTBEAT PULSE FROM ACTIVE NODE(THROUGH ETH1). SO WHAT YOUR PASIVE NODE IS THINKING? IT JUST THINKS THAT ACTIVE NODE WENT DOWN, SO I(PASIVE) HAVE TO START SMB SERVICE HERE AND IT(PASIVE NODE) WILL TAKE INITIATING OF CREATING ETH0:0 TOO.. THAT IS THE REASON YOU ARE SEEING ETH0:0 ON BOTH THE NODES..
    SO HOW TO RESOLVE THIS ISSUE?
    FROM MY UNDERSTANDING THERE IS NO PROPER COMMUNICATION BETWEEN ETH1 OF BOTH NODES.. PLEASE CHECK THAT CONFIGURATION.. DID YOU USED CROSS CABLE TO CONNECT ETH1 OF BOTH THE NODES?
    LET ME KNOW..

    I WILL POST ONE MORE POST ON HOW TO TRUBLESHOOT HEARTBEAT CLUSTER BY THIS WEEKEND.. MAY BE THAT WILL BE MORE USEFULL TO YOU..
    AND THANKS FOR WRITING TO LINUXNIX.COM

  • Anonymous

    Thanks Surender for timely response.

    I have fixed all the issues. The problem is my crossover cable IP's are not communicating properly. There is some issue with network. So I finished the cluster configuration with single NICs.

    Regards,
    UK

  • Anonymous

    How can i configure Heartbeat cluster web service not squid? Do i change rp1.linuxnix.com 10.77.225.20 squid to rp1.linuxnix.com 10.77.225.20 httpd ? Thanks!

  • Anonymous

    How can i setup heartbeat cluster using web service not squid? Do i change this parameter? rp1.linuxnix.com 10.77.225.20 squid to rp1.linuxnix.com 10.77.225.20 httpd? Tahnks

  • @anony.. ya thats true.. you can just keep httpd
    then check if you have httpd script in /etc/ha.d/resource.d/ then only it will work

  • bharat

    my virtual ip address is not stabling in n nodes to primary server as it first binds to other and then to primary server how can i stop it please tell
    Thanks

  • bharat

    I am having a 4 server private network with one public virtual ip address problem is this ip is not binding to primary server as it is floating to other node first and then binding to primary server as such loosing data
    using linux suse server

  • @Bharat.. Can you be more info.. I am not able to understand your question please give as much info as possible so that i can help you in this regard.

    Thanks,
    Surendra.

  • Arunabh

    Dear Surendra,

    You made it very simple for those guys those wants to learn clustering in linux.
    One thing i’ve noticed in step 4 (Pls tell me if i m wrong ).

    Step4(a) : Just specify below entry in last line of this file on node1
    rp1.linuxnix.com 10.77.225.20 squid

    Step4(b) : Just specify below entry in last line of this file on node1
    rp1.linuxnix.com 10.77.225.20 squid

    Whether we need to write the same line twice in the node1 or we should put the same entry on both nodes !!!!

    Regards
    Arunabh

  • admin

    Hi Arun..
    Ya.. that line should be exact replica.. As we configure it as auto fail back.. What we mentioning is that.. if rp1 is up just auot fail back from rp2 to rp1..
    thats it..

  • Avinash Amrutkar

    It’s very useful, can you please provide any video tutorial about this to learn more!!!!!!!

    Regards,
    Avinash Amrutkar

  • Pingback: Heartbeat is not working.....()

  • Pingback: Heartbeat clustering | Mkirankumar()

  • We have a HA solution for Linux at:

    http://saolabs.com/products/ha-server/

  • vatsa

    how to open the udp port (694), checked with nmap is shows always as closed…
    Added the enty in iptables..
    -A RH-Firewall -1-INPUT -p udp -m state –state NEW -m udp –dport 694 -j ACCEPT

  • vatsa

    how to open the udp port (694), checked with nmap is shows always as closed…
    Added the enty in iptables..
    -A RH-Firewall -1-INPUT -p udp -m state –state NEW -m udp –dport 694 -j ACCEPT

  • Chittaranjan

    Its Very Useful. EXCELLENT!!
    Thanks a lot…

  • we have installed heartbeat on active/passive node and made DB oracle service HA. we have to work on primary server regarding some oracle data base services and activities including the restart of oracle DB services but we dont want our DB services to move on passive mode..

    is that possible ?

    What is in my mind is
    (1) to stop the heartbeat on second node
    (2) then stop the heartbeat service on 1st node
    (3) DB team will restart or do any activities regarding oracle
    (4) after finishing .. DB team start the oracle database services on primary node
    (5) we will start the heartbeat on 1st node
    (6) start the heartbeat on Second node.

    am i right in this case??

    • yes. you can do that, but make a note that you are not serviing your user at that time and its completly down time for your HA. Hope this helps.

  • thanks Surendra :)

  • Thank you so much for explaining each bit in detail and at the same keeping it so simple !!

    I have the same question as that of Arunabh, which also confused me in the first look. (i.e. Step 4(a) and (b)). I understood that the line should be same in both the node(s) but I think what makes it confusing is:

    Step4(b) : Just specify below entry in last line of this file on node1
    rp1.linuxnix.com 10.77.225.20 squid

    **While I think it should be:

    Step4(b) : Just specify below entry in last line of this file on node2
    rp1.linuxnix.com 10.77.225.20 squid

    Once thanks, it helped me a lot…

    Regards,
    Rahul.

  • arief

    I follow your tutorial but i have a strange result. I Can’t ping virtual IP address from outside and I have search in google but nothing. Can you help me why i can not ping virtual IP from outside?

  • I have two node cluster on RHEL 6.2 64bit and i am having problem with heartbeat warn lost packet frequently. below is some log for your reference. Please help me out.
    Jan 21 18:01:03 host01 heartbeat: [360]: WARN: 2 lost packet(s) for [host02] [413612:413615]
    Jan 21 18:01:03 host01 heartbeat: [360]: info: No pkts missing from host02!
    Jan 21 18:02:41 host01 heartbeat: [360]: WARN: 2 lost packet(s) for [host02] [413661:413664]
    Jan 21 18:02:41 host01 heartbeat: [360]: info: No pkts missing from host02!
    Jan 21 18:29:46 host01 SRI[1104]: *** glibc detected *** /prd/smsplatform/bin//sri: double free or corruption (fasttop): 0x00007f5b88014bc0 ***
    Jan 21 18:29:46 host01 abrt[15162]: abrt daemon is not running. If it crashed, /proc/sys/kernel/core_pattern contains a stale value, consider resetting it to ‘core’
    Jan 21 18:29:49 host01 abrt[15162]: saved core dump of pid 1104 to /data/logs/smsp/core.1104 (1208258560 bytes)
    Jan 21 18:30:17 ap

    Jan 21 21:39:02 host01 heartbeat: [360]: WARN: 1 lost packet(s) for [host02] [420151:420153]
    Jan 21 21:39:02 host01 heartbeat: [360]: info: No pkts missing from host02!
    Jan 21 21:39:36 host01 heartbeat: [360]: WARN: 8 lost packet(s) for [host02] [420161:420170]
    Jan 21 21:39:36 host01 heartbeat: [360]: WARN: Late heartbeat: Node host02: interval 18000 ms
    Jan 21 21:39:36 host01 heartbeat: [360]: info: No pkts missing from host02!
    Jan 22 01:16:01 host01 auditd[1991]: Audit daemon rotating log files

  • linuxdev

    Hi Surendra, I need to configure heartbeat to monitor tomcat application running on IP:PORT.
    so if tomcat on this particular IP:PORT is down then it should use other machines tomcat.
    Can you give some specifics required for this scenario.?

  • When I initially commented I appear to have clicked the
    -Notify me when new comments are added- checkbox and from now on whenever a comment is added I recieve four emails with the exact same
    comment. There has to be a way you can remove me from that service?
    Thank you!

banner