This discussion is archived
5 Replies Latest reply: Apr 22, 2012 2:33 PM by Avi Miller RSS

Best Practice Question on Heartbeat Issue

user897654321 Newbie
Currently Being Moderated
Our environment consists of 2 Fiber Channel Hard Drive enclosures. One is an HP P2000 that has 12 2TB disks in it, piggy backed on these controllers is a D2700 with 24 SFF drives, 15,000 RPM and 146GB each. This enclosure has full RAID and volume/LUN creation capability. I can pretty much put the disks together anyway I want, though I can not combine SFF and LFF (2TB) drives together into a single RAID Set.

My other devices (Texas Memory System 810) are 2 extremely fast SSD enclosures that have 8 500 Gig cards in each and show up as 4TB of storage each. Each device does not have RAID capability, so there is no redundancy except that if the device detects a bad "chip" it migrates the data to one of the spare chips. There is a full card that is considered a spare, but your data does not exist in more than one place from what I can tell. I can create any number of LUNs and both are completely visible into my Oracle VM environment.

The LFF spinning disks mostly have LUNs that are used for Large data transfers (backups etc) but is the controller for the SFF disks too. The SSDs are used for our database ASM with normal redundancy across the 2 distinct TMS810's. The SFF drives are used for the various filesystems that actually boot up the servers and other things that need a faster disk.

My question is: Which of these 3 should I create my Cluster Heartbeat on? I have that LUN currently on a LFF LUN (one LUN of many on the RAID1 - 2 2TB drives combined). The LUN is only 20Gigs, but I do have other LUNs off that same RAIDSET as I did not want to waste the whole 2TB for a single heartbeat. This way I knew if one disk failed in that set, I could swap the disk and not loose my heartbeat and therefore all of my guests running in my cluster. We are looking for 99.9999% uptime.

Everything in my environment is redundant except for the heartbeat. Does OVM 3.1 expect to have redundant heartbeats perhaps?

If the P2000 goes down, I loose my heartbeat and all of my servers/guests go down too. Its my single point of failure.

I tried a large filecopy 1TB worth of data to a 3TB filesystem on the LFF drives and it seemed to loose heartbeat connectivity and fenced my server. I expect the redundant controllers were overloaded and OVM was not able to keep up. I have no other explanation why the guest was down, and the server needed to be fully rebooted. OVMM showed the server down, the guest down, but I could ping the server still.

I could place it on the extremely fast SSDs, but then it would only be in one location on one set of chips. If I need to replace a flashcard in this device - I must take that device single down and my database would still be up from the other device for ASM, but I would loose my servers and guests. Not the ideal solution.

I am all ears as to how 1) to better configure the hardware we have or 2) buy additional hardware if absolutely necessary. I have 4 physical enclosures - all on separate redundant 8GB FC cards in our 2 servers. It seems it would be enough.

Thanks for all your help, Apologies for the long post.
  • 1. Re: Best Practice Question on Heartbeat Issue
    Avi Miller Guru
    Currently Being Moderated
    user897654321 wrote:
    My question is: Which of these 3 should I create my Cluster Heartbeat on?
    The slowest, smallest, redundantly-backed disk. You only need 10-15GB for the pool filesystem disk, which stores the cluster heartbeat. As you've noticed, it should be isolated so that massive data transfers don't impact the performance. However, you also need to match OCFS2 timeouts with the timeouts for your FC-SAN. You should contact Oracle Support and your hardware vendor to determine the timeout value for the FC fabric (HBA/switch, etc) and configure OCFS2 accordingly.
  • 2. Re: Best Practice Question on Heartbeat Issue
    user897654321 Newbie
    Currently Being Moderated
    >
    The slowest, smallest, redundantly-backed disk. You only need 10-15GB for the pool filesystem disk, which stores the cluster heartbeat. As you've noticed, it should be isolated so that massive data transfers don't impact the performance. However, you also need to match OCFS2 timeouts with the timeouts for your FC-SAN. You should contact Oracle Support and your hardware vendor to determine the timeout value for the FC fabric (HBA/switch, etc) and configure OCFS2 accordingly.
    Avi, can you expand on this more? I am trying to understand the last two sentence you wrote better ("OCFS2 timeouts with the timeouts for my SAN"). You also state slowest, smallest and redundant disk, but the enclosure running those disks can not be in a redundant state. Can you address my other question on if there are plans to make a heartbeat accessible in two locations? That way if my enclosure goes down I don't loose all servers and guests.
  • 3. Re: Best Practice Question on Heartbeat Issue
    Avi Miller Guru
    Currently Being Moderated
    user897654321 wrote:
    Avi, can you expand on this more? I am trying to understand the last two sentence you wrote better ("OCFS2 timeouts with the timeouts for my SAN"). You also state slowest, smallest and redundant disk, but the enclosure running those disks can not be in a redundant state. Can you address my other question on if there are plans to make a heartbeat accessible in two locations? That way if my enclosure goes down I don't loose all servers and guests.
    OCFS2's timeout needs to be larger than the timeout for your SAN. If your SAN takes 120 seconds to fail from one path to another, but OCFS2 is set to 60 second disk heartbeat timeout, then your servers will fence halfway through a potential fabric failover. So, it's very important to match the fabric/hardware failover time with the heartbeat interval on your clustered filesystem. In this case, you need OCFS2 to wait longer than 120 seconds before it takes any action on fencing a node, because you don't want a premature fence while the fabric is failing.

    OCFS2 v1.8 does support multiple global heartbeat regions and there are plans to allow for multiple heartbeat devices in some future version of Oracle VM. However, I have no idea when this will be. Keep in mind however, that if the enclosure hosting the heartbeat goes down, you will lose everything else hosted on that enclosure as well. If you put it on the large storage repository, all your VM virtual disks disappear too, so you're offline anyway. If you put it on the fast SSDs, all your data has gone away, so you're hosed anyway. Both enclosures appear (to me) to be fairly critical for the running of your VMs, so losing either of them during normal operation would probably cause an outage. Unless I'm missing something?
  • 4. Re: Best Practice Question on Heartbeat Issue
    user897654321 Newbie
    Currently Being Moderated
    Avi Miller wrote:
    >
    OCFS2's timeout needs to be larger than the timeout for your SAN. If your SAN takes 120 seconds to fail from one path to another, but OCFS2 is set to 60 second disk heartbeat timeout, then your servers will fence halfway through a potential fabric failover. So,
    Do you know how to check this setting for the Server Pool Heartbeat? Did you say OCFS2 is 60 seconds by default?

    >
    OCFS2 v1.8 does support multiple global heartbeat regions and there are plans to allow for multiple heartbeat devices in some future version of Oracle VM. However, I have no idea when this will be. Keep in mind however, that if the enclosure hosting the heartbeat goes down, you will lose everything else hosted on that enclosure as well. If you put it on the large storage repository, all your VM virtual disks disappear too, so you're offline anyway. If you put it on the fast SSDs, all your data has gone away, so you're hosed anyway. Both enclosures appear (to me) to be fairly critical for the running of your VMs, so losing either of them during normal operation would probably cause an outage. Unless I'm missing something?
    Yes since we have multiple enclosures, I have separated a lot of the servers, 2 node RAC DB servers running on each enclosure (primary on P2000 which is RAIDED, secondary on the SSD which is unraided, but its a backup), 2 different WEB/APP servers on both as well. So if one enclosure goes down, yes I would loose one set of servers, but one DB and one WEB server is still up. No single point of failure. Even if one of the SSDs went down for the Database files those are 2 distinct physical redundant devices with ASM. ASM handles having one side of the FAILURE group down until it can be brought back online. But if I loose the enclosure with the Heartbeat, I loose all my servers and nothing stays up. Its my only point of frustration in my design.
  • 5. Re: Best Practice Question on Heartbeat Issue
    Avi Miller Guru
    Currently Being Moderated
    user897654321 wrote:
    Do you know how to check this setting for the Server Pool Heartbeat? Did you say OCFS2 is 60 seconds by default?
    It's in /etc/sysconfig/o2cb and is set using service o2cb configure -- you need to change it on all servers and reboot the entire pool to change the value. The heartbeat setting is in multiples of 2-second timeouts so the default setting of 31 implies 62 seconds before a fence. To get it to fence after 120 seconds, set it to 61 (122 seconds total).
    But if I loose the enclosure with the Heartbeat, I loose all my servers and nothing stays up. Its my only point of frustration in my design.
    My advice would probably be to create two server pools, one with a pool filesystem on each array, perhaps? That would require at least 4 servers though. Until we have multiple global heartbeat regions. Unfortunately, I can't really think of anything else.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points