13 Replies Latest reply on Dec 22, 2009 12:43 PM by tychos

    connection lost contact error

    465291
      Hello,

      I get the following error:

      RMAN-06900: WARNING: unable to generate V$RMAN_STATUS or V$RMAN_OUTPUT row
      RMAN-06901: WARNING: disabling update of the V$RMAN_STATUS and V$RMAN_OUTPUT rows
      ORACLE error from target database:
      ORA-03135: connection lost contact

      by a full database backup command: backup as copy incremental level 0 database tag level_0;

      The backup goes on, further datafiles are copied.
      Than the backup ends with this error:

      RMAN-00571: ===========================================================
      RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
      RMAN-00571: ===========================================================
      RMAN-00601: fatal error in recovery manager
      RMAN-03004: fatal error during execution of command

      The first error seems to be a timeout problem so I modified the parameters in the sqlnet.ora.
      current settings:
      SQLNET.EXPIRE_TIME=5
      SQLNET.INBOUND_CONNECT_TIMEOUT=300
      SQLNET.SEND_TIMEOUT=300
      SQLNET.RECV_TIMEOUT=300

      The modifications haven't solved the error.
      The second error is to generally and I don't know, if this error is in relationship to the first error. Where can I find more information about the second error on my database server?


      DB version 10gR2 on RedHat Linux ES4


      Best regards,

      Mathias
        • 1. Re: connection lost contact error
          tychos
          Hi Mathias,

          Did you check the alert.log to see if there are entries around the time the ORA-03135 happened?

          It could be related to limited resources on the server running the target dbs.
          Do you have some server monitoring available?

          Regards,

          Tycho
          • 2. Re: connection lost contact error
            465291
            Hi Tycho,

            thanks for your fast reply.

            No entries in the alert.log of the instance. Has the rman a log file?

            Server monitoring via EM.



            Regards,

            Mathias
            • 3. Re: connection lost contact error
              tychos
              Hi Mathias,
              Has the rman a log file?
              only if you specify one with <spool log to '/path/filename' >. (but it will contain the same output as you posted).
              Server monitoring via EM.
              And what is the load on the system around the time of the backup?

              Regards,

              Tycho
              • 4. Re: connection lost contact error
                465291
                Hi Tycho,

                ok, no special rman log file.

                The load on the system around the time of the backup was between 8 and 16, mostly over 10. In normal operation time the load is under 2, very often under 1.



                Regards,

                Mathias
                • 5. Re: connection lost contact error
                  tychos
                  Hi Mathias,
                  Can it be the backup script or your rman settings causing the high load on the machine?
                  Do you use compression or multiple channels?

                  Can you downsize these settings and retry?

                  Regards,

                  Tycho
                  • 6. Re: connection lost contact error
                    465291
                    Hi Tycho,

                    I suggest that the backup causes the high load, because at normal operation time the load is between 0.2 and 2.
                    I use no compression and only one channel.

                    RMAN configuration parameters are:
                    CONFIGURE RETENTION POLICY TO REDUNDANCY 3;
                    CONFIGURE BACKUP OPTIMIZATION OFF; # default
                    CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default
                    CONFIGURE CONTROLFILE AUTOBACKUP ON;
                    CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F'; # default
                    CONFIGURE DEVICE TYPE DISK BACKUP TYPE TO COMPRESSED BACKUPSET PARALLELISM 1;
                    CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
                    CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
                    CONFIGURE MAXSETSIZE TO UNLIMITED; # default
                    CONFIGURE ENCRYPTION FOR DATABASE OFF; # default
                    CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default
                    CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default

                    backup command:
                    allocate channel c1 type disk format '...';
                    backup as copy incremental level 0 database tag level_0;

                    I can retry everytime, but I have no idea what I can configure to get a better result (no errors).
                    Means 'CONFIGURE DEVICE TYPE DISK BACKUP TYPE TO COMPRESSED BACKUPSET PARALLELISM 1;' compression is used for my backup command?



                    Regards,

                    Mathias
                    • 7. Re: connection lost contact error
                      tychos
                      Hi Mathias,
                      Means 'CONFIGURE DEVICE TYPE DISK BACKUP TYPE TO COMPRESSED BACKUPSET PARALLELISM 1;' compression is used for my backup command?
                      Yes.

                      How many cpu's do you have?

                      Regards,

                      Tycho
                      • 8. Re: connection lost contact error
                        465291
                        Hi Tycho,

                        I have 2 cpu's.
                        I will disable compression and retry the database backup.



                        Regards,

                        Mathias
                        • 9. Re: connection lost contact error
                          465291
                          Hi Tycho,

                          same errors as my first post.
                          The first error occures by copying some datafiles later.
                          CPU load was high again, highest value was 16.5



                          Regards,

                          Mathias
                          • 10. Re: connection lost contact error
                            tychos
                            Hi Mathias,

                            It seems like even a normal rman operation brings a high load on the server.
                            Can you contact a sysadmin to investigate what is going on?

                            Regards,

                            Tycho
                            • 11. Re: connection lost contact error
                              465291
                              Hi Tycho,

                              yes, I can conbtact a sysadmin. Also, I will look for other processes at the instance during the backup process again.
                              After this I will check the backup with the errors, maybe rman has copied all datafiles yet.

                              The problem is that I can't make an offline backup to separate other instance processes. Only this one oracle instance runs on the server.



                              Regards,

                              Mathias
                              • 12. Re: connection lost contact error
                                465291
                                Hi Tycho,

                                same errors if I backup archivelogs and also a high load (sometimes over 20!).
                                I have found out that a mview refresh process was running during backup. If this was the reason, I can look for a maintenance window for the backup.

                                But I can't believe it, that this is the only one reason.



                                Regards,

                                Mathias
                                • 13. Re: connection lost contact error
                                  tychos
                                  Hi Mathias,

                                  Run some awr reports to check what is going on while you are running the backup.
                                  Also check the system together with a sysadmin while running a backup.
                                  The information should give you a lead.

                                  Has it ever worked without problems in the past?
                                  Any recent changes?
                                  If this was the reason, I can look for a maintenance window for the backup.
                                  Please do to confirm you can run a backup on this system.

                                  Regards,

                                  Tycho