Forum Stats

  • 3,873,362 Users
  • 2,266,538 Discussions
  • 7,911,515 Comments

Discussions

Creating Physical Standby Hits Error at Step 3

4196196
4196196 Member Posts: 6
edited Mar 9, 2020 1:02PM in Enterprise Manager

Trying to create a physical standby using OEM 13.3 hits a brick wall at step 3. The step where you select the standby host and standby OH.

Basically, OEM will not display ANY compatible OH on any other host apart from the OH on the primary host. The standby host has a compatible OS and compatible OH. Both the standby host and the standby OH are visible via OEM.

Here's my setup:

Primary / standby host: orasvr01 / orasvr03

Primary / standby host OS: OL 7.7 / OL 7.7

Primary / standby DB version: 18.3 / 18.3

In a previous configuration a few years ago, using OEM 12.5 and DB version 12.1, I had the same issue but managed to fix it. When it's fixed and working correctly, you see the screen shot "OEM 12c Screen Shot.png" when you click the magnifying glass to select the standby host and OH. Unfortunately I can't remember what I did to get around this. Clicking the magnifying glass in OEM 13.3 shows me the screen shot "OEM 13c screen shot.png". Notice how it doesn't even display the OS or OS version. Seems suspicious, like there's a bug stopping the discovery from working correctly.

Here's what I've tried:

- deployed the database plug-in to the agent;

- decommissioned the agent, uninstalled, re-installed & re-discovered all relevant targets;

- followed MOS Doc ID 1988557.1 (which is riddled with errors/typos and still doesn't fix it);

- tried it with 12.2 OH with same result;

Any suggestions, workaround or fixes would be so gratefully received.

Many thanks in advance.

Answers

  • 4196196
    4196196 Member Posts: 6
    edited Feb 26, 2020 6:40PM

    I also tried changing the boot kernel from uek to non-uek. Made no difference. It's now Oracle Linux Server 7.7, with Linux 3.10.0-1062.12.1.el7.x86_64, which is certified with 18c. So that's not the issue.

    I found a blog which walks through this exact scenario using OEM 13 and rather conveniently they used the IP address of the standby host instead of selecting it from a list of compatible hosts/OHs. The IP address trick didn't work for me.

    Still searching for the answer....

  • Courtney Llamas-Oracle
    Courtney Llamas-Oracle Member Posts: 782 Employee
    edited Feb 27, 2020 10:04AM

    A couple of thoughts and i'll do some research to see if I can find anything on this.  

    First the fact that your host doesn't show OS is strange.  Makes me think that the configuration collection is incomplete on that host.  Can you go to the host, validate that it shows the OS and version in the top left corner. Then go to the Host dropdown menu, select Configuration -> Latest.  Check the Last Collected date, and that the OS, Version are displayed here.    Also, from Host menu, select Monitoring -> Metric Errors, see if there's any collection errors.

    THen do the same on the Oracle Home.   Use the Refresh Configuration button on the OH target window.   If the OH is not a target, then you might still need to promote it.

  • 4196196
    4196196 Member Posts: 6
    edited Feb 27, 2020 1:07PM

    Hello Courtney -

    Thank you so much for replying. I really appreciate your interest and your time. You are definitely poking in the right areas because the results of your suggested checks are interesting. Here's what I found:

    orasvr01 (primary host):

    1. Top left corner, Version and Primary IP Address are blank.Other areas of the screen are populated though. I see values for CPU Utilization, Memory Utilization, Root File System % full, etc. So the agent is reporting something.

    2. Host -> Configuration -> Latest does show the correct information for Target Version (7.7.0.0.0), Operating System (Linux), Platform (x86_64). Last collected date is Feb 26, 2020 5:18:26PM. So pretty recent seems seems encouraging.

    3. Host -> Monitoring -> Metric Collection Errors shows "There are no errors!".

    orasvr03 (standby host):

    1. Same result as orasvr01.

    2. Same result as orasvr01, except Last collected date is Feb 26, 2020 4:10:53PM.

    3. Same result as orasvr01.

    The version & IP address not showing is odd. The respective local /etc/hosts files are fully populated and DNS is setup and working correctly. Everything is resolving fine on the network.

    The OHs are a different matter. Both the primary and standby OHs are visible via Targets -> All Targets and I can click into their respective home pages. So they are both a known quantity to OEM. I think that gets us over the discover and promote hurdle. However, the Summary section on both the home pages look incomplete to me. Host, Home Location and Central Inventory are all correct. Owner, Operating System Group, Owner Groups are all blank. Collection Status says Complete and Last Collected is blank.

    OraDB18Home1_1_orasvr01.<domain>_1909 (primary OH):

    1. Oracle Home -> Configuration -> Latest shows Oracle Home Type O, the Path to Oracle Home and OUI Inventory this home belongs to are both correct. Operating System is Linux and Platform is x86_64.

    Last collected at is Feb 26, 2020 1:15:26PM.

    2. Oracle Home -> Monitoring -> Metric Collection Errors shows no errors.

    OraDB18Home1_1_orasvr03.<domain>_1909 (standby OH):

    1. Same result as the primary OH, except Last collected at is Feb 26, 2020 1:36:22PM.

    2. Same result as the primary OH

    Here's the fun part and may be the root cause. On the OH home page, when I click the Refresh Configuration button a little window appears telling me it's processing, then that disappears and at the top of the screen I see an error message. Both OHs behave in the same way. The error is this:

    The refresh operation failed for the target OraDB18Home1_1_orasvr01.<domain>_1909. Either the agent could be down or the target was not reachable to the agent. Check the logs for more details.

    The agents are not down and are communicating with the OMS with no problems. Reloading and uploading the agents return no errors. However, when I check the emoms.log file on the OMS server, I see an error which is being triggered by the Refresh Configuration attempt. Here's some of the text of the error message:

    2020-02-27 11:41:59,774 [[ACTIVE] ExecuteThread: '30' for queue: 'weblogic.kernel.Default (self-tuning)'] ERROR view.OHMenuBean logp.251 - RemoteAccessException occured during refreshAction

    oracle.sysman.eml.ecm.track.RemoteAccessException: Encountered the following problem(s) during collection and loading process: [null: Could not send some or all data over to repository: Failed to upload file: xferFile failed with INVALID_CONFIG (TIMEOUT)]

    What follows this in the emoms.log file is a bunch text lines which look similar to a Java exception stack. Lines like:

    at oracle.sysman.eml.ecm.track.TrackUtilities.doCollection(TrackUtilities.java:680)

    at oracle.adfinternal.view.faces.lifecycle.LifecycleImpl.execute(LifecycleImpl.java:225)

    at javax.faces.webapp.FacesServlet.service(Unknown Source)

    at weblogic.servlet.internal.StubSecurityHelper$ServletServiceAction.run(StubSecurityHelper.java:280)

    at oracle.security.jps.ee.http.JpsFilter.doFilter(JpsFilter.java:81)

    at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:79)

    at oracle.dms.servlet.DMSServletFilter.doFilter(DMSServletFilter.java:220)

    at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:79)

    Searching MOS, I found MOD Doc ID 1623527.1, but it's not for the version I'm using. I also found MOS Doc ID 2414699.1. This seems closer, but the gcagent.log file on orasvr01 does not contain the error text mentioned in that Doc ID. It does contain this though:

    2020-02-27 11:48:25,565 [9538:GC.Executor.159 (Streaming real-time metric collection of NetworkSummary on target orasvr01.<domain>)] ERROR - Aborting task - unable to write streamed GetMetricData response: org.eclipse.jetty.io.EofException

    The INVALID_CONFIG (TIMEOUT) clause suggests the refresh process is not being given enough time to complete before a timeout threshold is reached. I don't know what that could be or where it would be set.

    That's the latest. I'm still working on this. I hope you can help.

    Thanks,

    Sean.

  • Courtney Llamas-Oracle
    Courtney Llamas-Oracle Member Posts: 782 Employee
    edited Feb 27, 2020 4:15PM

    Nice work    Can you do a quick check and see if there are any invaid objects in OEM repository? 

    Also, 13.3 EM - any bundle patches applied?  And what version are the agents involved?  I'm assuming this refresh config fails on any agent?

  • 4196196
    4196196 Member Posts: 6
    edited Feb 27, 2020 4:50PM

    Sure can. There are zero invalid objects in the OEM repository.

    Bundle patches? I don't think so. This was a straight forward 13.3 OTN download and install using a pre-configured template for a 12.1 repository database. I have not applied anything to the installation after it was installed. The installation went fine. No errors at the time. I have just reviewed my installation notes and am reminded the installation documentation tells you to set this underscored parameter:

    _allow_insert_with_update_check=true

    However, that's not valid with DB 12.1, so I couldn't set it. I don't think this is relevant, but thought I'd mention it just in case.

    Agent status shows this:

    [[email protected] bin]$ ./emctl status agent

    Oracle Enterprise Manager Cloud Control 13c Release 3 

    Copyright (c) 1996, 2018 Oracle Corporation.  All rights reserved.

    ---------------------------------------------------------------

    Agent Version          : 13.3.0.0.0

    OMS Version            : 13.3.0.0.0

    Protocol Version       : 12.1.0.1.0

    We know the OH Refresh Configuration fails on the primary (orasvr01) and standby (orasvr02) hosts. I tried it on the 12.1 OH on the OMS/repo database server and it fails too. The only other server in this configuration is orasvr02. I tried an 18.3 OH refresh on that server. It also failed with the exact same error. Good that it's consistent. Bad that it's still broken. Needless to say, I haven't modified any timeouts or messed with any properties files.

    Thanks,

    Sean.

  • Courtney Llamas-Oracle
    Courtney Llamas-Oracle Member Posts: 782 Employee
    edited Mar 2, 2020 11:40AM

    Can you get the output from the following... 

    1. export EMDCTL_TIMEOUT=4000

    2. emctl control agent runCollection <oracle home target name>:oracle_home oracle_home_config

  • 4196196
    4196196 Member Posts: 6
    edited Mar 2, 2020 2:42PM

    Hi Courtney:

    Thanks for your follow up. Here's the output you asked for:

    [[email protected] bin]$ pwd

    /u01/app/oracle/product/agent/agent_13.3.0.0.0/bin

    [[email protected] bin]$ export EMDCTL_TIMEOUT=4000

    [[email protected] bin]$ echo $EMDCTL_TIMEOUT

    4000

    [[email protected] bin]$ ./emctl control agent runCollection OraDB18Home1_1_orasvr03.<domain>_1909:oracle_home oracle_home_config

    Oracle Enterprise Manager Cloud Control 13c Release 3 

    Copyright (c) 1996, 2018 Oracle Corporation.  All rights reserved.

    ---------------------------------------------------------------

    EMD runCollection error:

    oracle_home_config:Could not send some or all data over to repository: Failed to upload file: xferFile failed with FAIL_CONT (TIMEOUT)

    It takes about 6 or 7 seconds from the time I press enter before the failed to upload message appears. So it's giving the appearance of at least trying to upload.

    Thanks,

    Sean.

  • Courtney Llamas-Oracle
    Courtney Llamas-Oracle Member Posts: 782 Employee
    edited Mar 5, 2020 10:24PM

    sorry, haven't been able to find anything yet.    do you already have an SR? If not, can you please create one.  then email me first.last @ oracle.com and i'll make sure that a bug gets created on it and continues to get worked.   I'm going to be out of the office next week and don't want to lose track.

  • 4196196
    4196196 Member Posts: 6
    edited Mar 9, 2020 1:02PM

    Hi Courtney -

    Thanks for getting back to me. Unfortunately, I do not have a valid CSI for creating SRs. Hence my asking the community for help. I can't find anything about this error either, which is strange. I will follow up with you via email so you don't lose track of this issue. In the meantime, if I stumble across the root cause and/or solution I will post it here for the community.

    Thanks - Sean.