This discussion is archived
4 Replies Latest reply: May 11, 2007 12:29 PM by 345314 RSS

Oracle Cluster and libnnz10 problem - SOLVED!

345314 Newbie
Currently Being Moderated
Hi to all

I have some machines (XServe) with OSX 10.4.9 and Oracle 10.1.0.3.

In all the machines I follow the instructions find in this forum (the relink libnnz10.dylib trick) and all the databases runs very well.

Now I'm trying to instal two machines in cluster mode.

The node 1 is a clean machine. The node 2 is a machine that has a Oracle single instance running.

All the installation in cluster mode runs smooth and I made the same things that in a single instance (like the relink trick).

With all installed, the CRS daemon is running and the listeners is running (I run NETCA before runnig DBCA).

Using the DBCA works fine until the last step - for the database creation, the last script has a shutdown/startup command to run a full recompile of all packages and procedures in the database.

At this moment the installation fails (with end-of-communication channel). Looking in the logs, this is the _dyld libnnz10.dylib error.

Even after the crash in installation, I quit the DBCA and runs SQLPLUS. I connect in the database, try a startup (that fails with the error mentioned), reconnect again and make two commands : alter database mount; alter database open; (and it works).

I do this in the two nodes and the CRS daemon recognizes the running instances (with crs_stat -t I can see all online) and the database is running and in cluster (what I do in one instance appears in the other and the datafiles are the same), including failover tasks (like simulating a node down - the database still running).

But all tries to run the startup/shutdown commands fails. It fails always when the LMON processes try to start.

I recheck the installation, focusing in the relink trick with the libnnz10. When I use DBCA to install a single instance on node 1 or node 2 in the same ORACLE_HOME of the cluster, it runs fine (including startup/shutdown) - this make me think that all the database relink's are fine.

I try to relink the cluster. In installation it uses 4 makes and I redo this makes without the libnnz10. The makes runs well too, but the error continues (only using startup/shutdown in cluster database).

I'm doing the documentation of the process and will post it, but I want to resolve this issue.

Any ideas ?

Thanks to all

Message was edited by:
alliance01
  • 1. Re: Oracle Cluster and libnnz10 problem
    345314 Newbie
    Currently Being Moderated
    I make some tests today.

    Test 1 : return the libnnz10.dylib and make a relink all command, in the two nodes
    Result 1 : the error changes in the two nodes, to the classic

    dyld: Symbol not found: SSLALG_CLIENT_AUTH_MODE_RSA_SIGN_CLIENTSIDE_BS
    Referenced from: /Volumes/Shared/oracle/product/10.1.0/db_1/lib/libnnz10.dylib
    Expected in: flat namespace

    Test 2 : move the libnnz10.dylib and make a relink all command, in the two nodes
    Result 2 : the error returns to the classic one

    dyld: lazy symbol binding failed: Symbol not found: _lxldfcb
    Referenced from: /oracle/app/product/10.1.0.3/db_1/lib/libhasgen10.dylib
    Expected in: flat namespace

    dyld: Symbol not found: _lxldfcb
    Referenced from: /oracle/app/product/10.1.0.3/db_1/lib/libhasgen10.dylib
    Expected in: flat namespace

    ORA-03113: end-of-file on communication channel

    I think that some of my environment variables is wrong and not work in the relink all procedure with cluster (the procedure of relink with single instance works fine) active.
  • 2. Re: Oracle Cluster and libnnz10 problem
    345314 Newbie
    Currently Being Moderated
    I'm still running tests.

    The database if fully operacional, including OEM (Oracle Enterprise Manager). I make exports from singles instances (10.1.0.3) and import into the cluster database without trouble. I load more than 400 GB of data.

    The exception is only functions that need startup/shutdown of the database, even in host terminal (like startup/shutdown in sqlplus, crs_start, srvctl and startup/shutdown in OEM).

    Sorry for my post yesterday (I copied the errors from others sources, because I'm not using the server to post the message). In this situation the path's pointed in the messages in the tests do not match.

    For Test1, the correct message is :

    dyld: Symbol not found: SSLALG_CLIENT_AUTH_MODE_RSA_SIGN_CLIENTSIDE_BS
    Referenced from: /ORAROOT/db/lib/libnnz10.dylib
    Expected in: flat namespace

    For Test 2, the correct message is:

    dyld: lazy symbol binding failed: Symbol not found: _lxldfcb
    Referenced from: /ORAROOT/db/lib/libhasgen10.dylib
    Expected in: flat namespace

    dyld: Symbol not found: _lxldfcb
    Referenced from: /ORAROOT/db/lib/libhasgen10.dylib
    Expected in: flat namespace

    I'm using Xcode 2.0 - I'm going now try other versions.

    I found too some information about the abend - the only executable that needs the _lxldfcb is the orapwd.

    And I'm still trying to understand how the rac* executables work.
  • 3. Re: Oracle Cluster and libnnz10 problem
    345314 Newbie
    Currently Being Moderated
    I tried Xcode 1.5 and Xcode 2.4 - same results.

    The database is now running for 3 days and with 400 GB of data.

    The OEM is running well too.

    I run many of our routines on the database and it responds normally. I do not know what this problem can interfere yet.

    I will leave the database running and making tests on it.

    Now I'm trying to advance to 10.1.0.5.

    Let's see.
  • 4. Re: Oracle Cluster and libnnz10 problem
    345314 Newbie
    Currently Being Moderated
    Hi to all.

    I downloaded the 10.1.0.5 as soon I can - thanks to damjanp!

    After the download, I follow all the instructions.

    I stop the database, all the services and the cluster. Run the installer and it works fine.

    When I try to make the post-installation tasks, I cannot startup the database (nor with the error that I was having). I received a Oracle Cluster Configuration Error...

    I read all the logs and find 3 errors. I do not know if this occurs because of wrong environment variables.

    The commands awk, touch and basename fails - the scripts try to find this commands in /bin (this commands in OSX stay in /usr/bin). I prefer make a link (ln -s) using root :

    ln -s /usr/bin/awk /bin/awk
    ln -s /usr/bin/touch /bin/touch
    ln -s /usr/bin/basename /bin/basename

    When I done this, the cluster starts (ocssd.bin, crsd.bin e evmd.bin). But when I try to start the database, it fails (ORA-29702 : error occurred in Cluster Group Service operation) and srvctl gives me PRKH-1010 : Unable to communicate with CRS services. [OCR Error(Native : prsr_initCLSS:[21])].

    To resolve this I use localconfig (user root) :

    localconfig reset /<ORACLE_HOME>

    After this, I shutdown all and restart the CRS services - and for my surprise the CRS automaticaly starts all the services (including the instances).

    And the libnnz10 problema was gone!

    I read the documentation and found that the "TNS - lost contact" problem in Tiger (yeah - Tiger) was resolved.

    I think that 10.1.0.5 can runs on Tiger more easily.

    The cluster are running fine now - I shutdown/startup the instance several times to test with no errors.

    I will prepare a documentation to post in the Web about this installation. When I finish I post in this forum the link.

    Thank's to all.