11 Replies Latest reply: Jul 30, 2013 4:14 AM by robinsc RSS

    Can Cloud Control 12c be on the Exadata client network?

    marksmithusa

      Hi, there,

       

      We’re trying to configure our brand new X3 machine but we’re having a problem with connecting our Cloud Control OMS to it (the install of the agents fails because it can't connect to the hostname we provided on the management interface). We’re running the latest and greatest Exadata stack (11.2.3.2.1, 11.2.0.3.18) and we’re running Cloud Control 12.1.0.2

       

      The OMS server – oem12oms – is on the same subnet as the client network on the Exadata comp nodes. However, when we try to ping the management hostname of the comp nodes (the docs say using the client hostname is not supported until 12.1.0.3) from the OMS server, we have no success:

       

      For instance:

       

      • oem12oms – 10.50.50.122 (running on eth0)
      • exadbclient1 – 10.50.50.123 (running on eth1 on the first comp node)
      • exadbclient2 – 10.50.50.124 (running on eth1 on the second comp node)
      • exadbadm1 – 10.75.100.23 (running on eth0 on the first comp node)
      • exadbadm2 – 10.75.100.24 (running on eth0 on the second comp node)

       

      From oem12oms, when both eth0 and eth1 are up on the comp nodes, it fails:

      traceroute exadbadm1 (10.75.100.23)

      10.50.50.1 (the default gateway for both the client network and the Exadata comp node itself)

      Then goes nowhere

       

      When we shut down the eth1 interface, I am able to access the management interface from my OMS server:

      traceroute exadbadm1 (10.75.100.23)

                                   10.50.50.1 (the default gateway for both the client network and the Exadata comp node itself)

      exadbadm1 (10.75.100.23)

       

      We have rules/routes which set on the comp node (by default) that dictate all traffic from:

      • the management interface (eth0) uses routing table 220 and comes from 10.75.100.xxx and goes out the same interface on 10.75.100.xxx.
      • the client (eth1) uses routing table 221 and comes from 10.50.50.xxx and goes out the same interface on 10.50.50.xxx

       

      Essentially, while we want to tell incoming traffic NOT to go out a different interface to which it came in, I think we want to workaround this for one exception: the OMS server’s IP address (10.50.50.122) should come in from the MANAGEMENT interface (not the client) and go out the MANAGEMENT interface.

       

      There are a lot of references to advanced Linux network policies, etc. And we’ve read 1306154.1. However, it doesn’t seem to solve our problem and we’ve tried all the ways that we can think of: essentially, it looks like Exadata cannot connect to clients over the management interface which live on its client network (obviously, it will still be able to be accessed through the SCAN addresses), so your Cloud Control cannot be on the client network.

       

      I haven’t read anything which states this, but that’s how it looks to us.

       

      Anyone had this issue before? Our ACS engineer is stumped and has asked around internally to no avail.

       

      Mark

        • 1. Re: Can Cloud Control 12c be on the Exadata client network?
          alvaromiranda

          Looks a routing issue to me.

           

          Can you put the output of route -n before and after you shutdown the eth1 interface?

           

          there i no chance to put a eth1 on the network 100 for oem12oms ?

          • 2. Re: Can Cloud Control 12c be on the Exadata client network?
            marksmithusa

            I can't right now because it's the weekend. But I do remember what we saw and, basically, it was the output was the same with the obvious absence of the eth1 interface entry.

             

            Tracerouting showed what we would expect to see: the connection goes from the OMS server to the gateway (10.50.50.1) and then onto the comp nodes. If we enable the eth1 interface, tracerouting gets to the gateway and then no further. Presumably because of the comp nodes saying 'if you're coming in from this subnet mask, go to this network interface'.

             

            If eth1 is up, it takes all traffic from the 'main' network (10.50.50.122) and forwards it onto the client network (eth1) instead of the management network (eth0).

             

            There is no chance to put an 'eth1' network for the OEM/OMS server, no. Instead of potentially complicating the networking of the OEM/OMS server which is accessed by the majority of our Production databases, our network admins (and OEM guru) would much prefer that we solve a routing issue that only exists on the X3 comp nodes

             

            They're annoying like that. Bah, humbug!

            • 3. Re: Can Cloud Control 12c be on the Exadata client network?
              Marc Fielding

              Hi Mark,

               

               

              I think the fact that the config works when eth1 is down but not when it's up is quite revealing.  I'd guess either an asymetric routing situation, of that you're running afoul of the "ip rule" rules that Exadata uses to try to direct responses out the same interface they came in.  To troubleshoot, I'd suggest running tcpdump on a compute node to see the actual packets on the network, in the two situations: eth1 up and eth1 down.  tcpdump by default shows every packet on the network, so you probably want to narrow it down to your OMS server's traffic alone:

               

               

              tcpdump -nvi eth0 \host 10.75.100.23

              And if eth1 is up check that one too:

              tcpdump -nvi eth1 \host 10.75.100.23

               

               

              You'll be looking to see the traceroute requests and responses, and see how they change when eth1 comes up.  Do the requests come in the same interfaces?  Do the responses come out the same place?  Do you see the responses at all?  Are the source and destination IPs the same?

               

               

              And it never hurts to keep an eye on /var/log/messages in case the kernel starts complaining about rp_filter mismatches

               

               

              Cheers!

               

               

              Marc

              • 4. Re: Can Cloud Control 12c be on the Exadata client network?
                marksmithusa

                Hi, Marc (great name!)

                 

                Yes, we believe we're hitting the 'ip rule' - all traffic coming in on eth1 has to go out on eth1 and all traffic coming in on eth0 has to go out on eth0.

                 

                Unfortunately, we have to come in on the management interface for OEM, BUT the OMS server is on the same subnet as the eth1 client interface.

                 

                Thus, epic fail.

                 

                I take it that tcpdump is a tool that's not part of the standard O/S installed on OEL 5.7? Should we use yum to get it? (I did read about using that but couldn't find it on my system).

                 

                Interesting that you say that about the /var/log/messages. When I try to do a 'ping' from oem12oms (10.50.50.122) to exadbadm1 (10.75.100.23), I see this in the logs on the comp node:

                 

                Jul  1 16:14:26 exadbadm1 kernel: martian source 10.75.100.23 from 10.50.50.122, on dev eth0

                Jul  1 16:14:26 exadbadm1 kernel: ll header: 00:10:e0:31:bb:06:6c:9c:ed:4d:20:c1:08:00

                 

                'Martian source'? Are we able to blame this on aliens?

                 

                Mark

                • 5. Re: Can Cloud Control 12c be on the Exadata client network?
                  Marc Fielding

                  Hi Mark,

                   

                   

                  Ahh yes, martians.  This means that the kernel dropped the packet because it expected the packet to come in the other interface.  It also means that you can probably do a hackish fix by editing /etc/sysctl.conf and changing the rp_filter lines for the appropriate interfaces from 1 to 0, and then running "sysctl -p" to activate changes.  Without RP filters though you're slightly more suspseptible to network attacks involving "forged" IP addresses, but then again in an isolated internal network it isn't all that likely.  The "real" fix would be to get the response packet to come back in the same interface the request came from.

                   

                   

                  tcpdump is indeed a package you need to install via "yum".  It's probably my favourite network tool, showing the actual packets that are being transmitted and received on the network.  A great way to create testcases for bug reports.

                   

                   

                  HTH!

                   

                   

                  Marc

                  • 6. Re: Can Cloud Control 12c be on the Exadata client network?
                    marksmithusa

                    Marc,

                     

                    Do I need a system restart for the rp_filter change to come into effect? I'm thinking no, but I can always check the interweb.

                     

                    Ultimately, you're right: I need to figure out a way to add an exception for extraterrestrial traffic. Found plenty of articles on the web but they explain the problem - but don't show you how to add the exception.

                     

                    At least nothing I've seen, anyway.

                     

                    Looks like I'll be tcpdump'ing tomorrow...

                     

                    Thanks for the tips!

                     

                    Mark

                    • 7. Re: Can Cloud Control 12c be on the Exadata client network?
                      Marc Fielding

                      Hi Mark,

                       

                      When you run "sysctl -p" the changes take effect immediately.

                       

                      I wouldn't so much say adding an exception for this type of traffic, but rather identifying where it's going and where you'd like it to go.  Definitely do the tcpdump to see what's happening, but I suspect the "right" solution will involve adding the same kind of "ip rule" policy routing on your OMS as on the Exadata.

                       

                      Marc

                      • 8. Re: Can Cloud Control 12c be on the Exadata client network?
                        marksmithusa

                        Mark,

                         

                         

                         

                        I read your article about this - it's the best article out there by a LONG way, but it still confuses me because of my eye-popping ignorance. Well done, though, because I almost understood it so you're clearly skilled in explaining complex stuff to untrained monkeys

                         

                        Oracle don't appear to know about this issue, which astounds me. We can't be the only customer who has their client network on their 'main' network. We've had two ACS engineers who have been flummoxed by this and have tried to reach out to the 'internal guru' network in Oracle with joy.

                         

                        I tried to type out what I should do (the configuration is in a mess right now, so I haven't tried this out yet). This is to get things clear in my mind.

                         

                        • OMS server: 10.50.50.122 (client interface - uses the same subnet as the 'general Production' network)
                        • DB Comp Node 1: 10.75.100.23 (management interface - uses an exclusive subnet for Exadata management hostnames)

                         

                        First, we add a custom routing table 230 to the /etc/iproute2/rt_tables

                             echo 230 >> /etc/iproute2/rt_tables

                         

                        Then we need to inform the magical, mysterious routing database of this new table and what we want to do with it:

                        • Add a route to direct traffic from the OMS server (client network) to the comp node (management network) using table 230
                        • Add a rule which tells traffic from the OMS server to use table 230 (with higher priority than other rules)
                        • Add a rule which tells traffic from the comp node to use table 230 (with even higher priority)

                         

                                  ip route add 10.50.50.122/32 via 10.75.100.23 dev eth0 table 230

                                  ip rule add from 10.50.50.122 prio 32750 table 230

                                  ip rule add to 10.75.100.23 prio 32751 table 230

                         

                        The ifcfg files look OK so I didn't see the need to change them.

                         

                        I thought I'd add a route associated with routing table 230 to the rule-eth0 file so that it has two entries (added the last two lines)

                        • The first entry uses table 220 to route traffic coming in on the management interface to the management interface
                        • The second (new) entry uses table 230 to route traffic coming from the OMS server (client network) to the management interface

                         

                                  from 10.75.100.23 table 220

                                  to 10.75.100.23 table 220

                                  from 10.50.50.122 table 230

                                  to 10.75.100.23 table 230

                         

                        Then I'd change the route-eth01 to add the new rule (added the last line)

                        • The first entry tells anything coming in from the management interface to use the eth0 NIC and table 220 (which instructs anything coming from the management network to stay on the management network)
                        • The second entry tells anything coming in from the management interface gateway to do the same (eth0 NIC and table 220)
                        • The third entry (new) tells anything coming from that specific IP address to route to the management IP of the comp node via eth0 but using our new table 230

                         

                                  10.75.100.0/24 dev eth0 table 220

                                  default via 10.75.100.1 dev eth0 table 220

                                  10.50.50.122/24 via 10.75.100.23 eth0 table 230


                        This should mean that the rules/routes will be statically configured so when we restart the network services or the entire box, it should save them?


                        Does that look as it would make sense? The whole thing SEEMS to make perfect sense and seems easy when reading about it, but doesn't seem to play nicely when you come to actually implement it.


                        Mark

                        • 9. Re: Can Cloud Control 12c be on the Exadata client network?
                          marksmithusa

                          We have discovered the solution to this problem! Check out our networking SKILLZ!

                           

                          (OK, so we got a great Support Engineer, but let us have our moment as we did almost all of the grunt work).

                           

                          The problem is that the rule for the eth1 interface is too broad: Oracle are being TOO strict on how they apply rules to the eth1 interface: instead of the IP address of the comp node for that interface, eth1 applies for the entire subnet.

                           

                          So the rule for the eth0 (management) interface looks like this (note the specific IP address - NOT the subnet - is used in this)

                               more /etc/sysconfig/network-scripts/rule-eth0

                                    from 10.75.100.23 table 220

                                    to 10.75.100.23 table 220

                           

                          Take down the eth1 interface

                               ifdown eth1

                           

                          Modify the rule-eth1 file

                               cp /etc/sysconfig/network-scripts/rule-eth1 /etc/sysconfig/network-scripts/rule-eth1.bkup.070313

                               vi /etc/sysconfig/network-scripts/rule-eth1

                           

                               Change this:

                                    from 10.50.50.0/21 table 221

                                    to 10.50.50.0/21 table 221

                           

                               To this:

                                    from 10.50.50.123 table 221

                                    to 10.50.50.123 table 221

                           

                          Restart the eth1 interface:

                               ifup eth1

                           

                          And now connections from the 'general' network (10.50.50.xxx) should work. At least, it does with us.

                           

                          Now it's time for a happy dance. And to possibly teach the fix to the network engineers - as slowly as possible, obviously - as we're so good with networks now.

                           

                          Mark

                          • 10. Re: Can Cloud Control 12c be on the Exadata client network?
                            robinsc

                            My guess is that if you revert to the non uek kernel it will stat working also because the non uek kernel somehow routes things differently... But I think your solution is better though I still haven't grasped the whole thing.  like why 10.50.50.123 ? your oms server was 122...

                            • 11. Re: Can Cloud Control 12c be on the Exadata client network?
                              robinsc

                              we faced a similar issue because we reinstalled OEM 12c 12.1.0.3 on a machine that is in the same subnet as our client network and found we were unable to access the management ip address from the OEm server. However by directly adding static broadcast routes to both the db nodes and to the OEM server we could then ping the db nodes. In this case the packet is still coming and going on the wrong interface probably but its not rejected anymore by the db nodes.