1 2 Назад Вперед 15 Ответы Последний ответ: 13.08.2019 14:34, автор: pwiseman

    Issue with system() call with libclntsh.so.19.1

    user11763611

      Hi all,

       

      I am testing Oracle Client 19.3 on Linux 64b and face a strange side effect with the system() function:

       

      From time to time, after connecting to the server, the system() call returns -1/fail, and in other cases it returns 0/success.

       

      This is in a context of a QA test that runs for years now, and the only difference is the use of version Oracle client 19.3 versus 18.3.

      When using 18.3 I do not get the problem.

       

      I suspect that some system calls have changed in libclntsh.so.19.1, so I have compared strace outputs, but it's not easy to figure out what has changed.

      So at least for now I wanted to warn Oracle here that something is not working as expected. I will continue to investigate.

       

      Maybe at Oracle support you can try to reproduce with a simple OCI program, with parent connecting to server and the executing a child process with system().

      Child command does not matter, can be a simple "ls -l > /tmp/zz" ...

       

      Regards,

      Seb

        • 1. Re: Issue with system() call with libclntsh.so.19.1
          user11763611

          Ok, now I need some official answer/help from Oracle:

           

          I have installed Oracle Instant Client 18.5 and 19.3 on an Oracle Linux 7.3:

           

          [f4gl@saphir sf]$ cat /etc/oracle-release

          Oracle Linux Server release 7.3

           

          [f4gl@saphir sf]$ uname -a

          Linux saphir 4.1.12-61.1.28.el7uek.x86_64 #2 SMP Thu Feb 23 19:55:12 PST 2017 x86_64 x86_64 x86_64 GNU/Linux

           

          No problem with the system("ls ...") call with following combinations (all through TCP to a remove server):

          - Oracle Instant Client 18.5 to remote Oracle Server 18.3

          - Oracle Instant Client 18.5 to remote Oracle Server 19.3

          - Oracle Instant Client 19.3 to remote Oracle Server 18.3

           

          But when connecting with Oracle Client 19.3 to a remote Oracle Server 19.3, system("ls ...") returns -1 from time to time !

           

          I will provide strace details later here...

           

          Seb

          • 2. Re: Issue with system() call with libclntsh.so.19.1
            user11763611

            Here the strace outputs:

             

            strace-client-18c-server-19c-1.txt     :  18.5 client => 18.3 server (OK)

            strace-client-19c-server-18c-1.txt     :  19.3 client => 18.3 server (OK)

            strace-client-19c-server-19c-1.txt     :  19.3 client => 19.3 server (system() fails)

             

            Seb

            • 3. Re: Issue with system() call with libclntsh.so.19.1
              user11763611

              It appears that it's related to the following mix:

               

              1) signal(SIGCHLD, handler)   - BTW, yes, we should use sigaction() now.

              2) Oracle Client 19c starting a new thread (we see an additional call to clone() in strace when comparing to Oracle Client 18c)

              3) system() getting confused because SIGCHLD is received and treated by waitpid()/wait4() by the wrong thread.

               

              While we need to figure out how to avoid SIGCHLD signal handler, can someone from Oracle explain why the Oracle Client creates a thread???

               

              Our current OCI program is not prepared for a muti-threaded context, so is it possible to disable the thread creation with some option?

               

              Seb

              • 4. Re: Issue with system() call with libclntsh.so.19.1
                cj

                I'll ask someone to take a look.  Have you got some compilable, sharable code that reproduces the issue ?

                • 5. Re: Issue with system() call with libclntsh.so.19.1
                  user11763611

                  Hello CJ and thanks for considering this!

                   

                  Sorry but I could not repro with a simple OCI program...

                   

                  Here is a new summary that I have reported to my Oracle contact in France, with new strace outputs:

                   

                  ==========================================================================

                   

                  We may have found an issue with the Oracle Client lib libclntsh.so.19.1

                  on Oracle Linux 7.x regarding threads, signal handlers and the system()

                  function, this is new to us and was not occurring with 18c clients.

                   

                  The problem appears with the following configuration:

                   

                  A) The OCI client is 19c, and connects to a remote 19c server.

                   

                  WARNING: The problem DOES NOT occur in the following cases:

                   

                  B) The OCI client is 19c, and connects to 19c server ON THE SAME COMPUTER.

                  C) The OCI client is 19c, and connects to a remote 18c server.

                  D) The OCI client is 18c, and connects to a remote 19c server.

                   

                   

                  Symptom:

                   

                  From time to time, after connecting to Oracle, a call to the system()

                  function returns -1 (fail) for a simple "ls -l" command, when the program

                  is implementing a SIHCHLD signal handler calling waitpid()...

                   

                   

                  Looking at the strace outputs, we suspect that the libclntsh.so.19.1 client

                  creates a thread (with the clone() function).

                   

                  In our code, we create a signal handler for SIGCHLD (to avoid zombies):

                   

                  static void hSIGCHLD(int sig)

                  {

                      int result;

                      while (waitpid(-1, &result, WUNTRACED | WNOHANG) > 0);

                      signal(sig, hSIGCHLD);

                  }

                   

                  ...

                   

                  signal(SIGCHLD, hSIGCHLD);

                   

                  Note: To workaround the system() -1 issue with Oracle 19c client, we now do no longer

                  use this code, because we have no more cases where zombies can occur, but that code

                  reveals the issue when mixing threads and system()

                   

                  When using the good old signal() API, the program is not prepared to handle

                  signals as a multi-threaded process:

                   

                  When using threads, signals are delivered to an arbitrary thread!!!

                   

                  When calling system(), the implementation of this function starts another

                  process with clone(), and then waits until it ends with wait4() ...

                   

                  Normally system() delays any SIGCHLD delivery with:

                   

                      rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0

                   

                  But we see something wrong happens, probably because the wrong thread gets

                  the SIGCHLD signal, then the wrong wait4()/waitpid() is continued and then

                  system() gets confused and returns -1 ...

                   

                  Attached you find the strace -f output:

                   

                  strace-19c-threads-fail.txt:  when system() returns -1

                  strace-19c-threads-ok.txt:    when system() returns 0

                   

                  We think that OCI should not create threads, unless the program has explicitly

                  stated to it is prepared for that... for ex with a parameter in OCIEnvCreate().

                  • 6. Re: Issue with system() call with libclntsh.so.19.1
                    user11763611

                    See this vimdiff... depending which thread gets the SIGCHLD signal, system() returns 0 or -1 ...

                    Maybe an implementation bug of system() ??? I would wonder...

                    For sure, mixing threads and basic signal() handling without controlling which thread gets signals (pthread_sigmask()) is NOT safe.

                    Seb

                     

                    strace-diff-1.png

                    • 7. Re: Issue with system() call with libclntsh.so.19.1
                      user11763611

                      My point of view:

                       

                      Assuming this is the reason for the issue, the Oracle client lib should NOT create threads, because legacy C application might use non-thread-safe APIs like signal()...

                       

                      The Oracle Client lib could eventually create threads, if the OCI program explicitly tells that it is thread safe with an OCIEnvCreate() option.

                       

                      We expect Oracle Client lib to be lightweight, like many other DB client libs are.

                      I wonder that it has to create a thread... for what feature???

                       

                      Seb

                      • 8. Re: Issue with system() call with libclntsh.so.19.1
                        cj

                        I will prod the dev team again. Did you log an SR with Support on this?

                         

                        As I understand it, threads are here to stay and there are plans for others in future.

                        • 9. Re: Issue with system() call with libclntsh.so.19.1
                          user11763611

                          If the OCI client lib is intended to work only in a multi-threaded context, then the OCI documentation needs to mention this, and legacy code using signal() needs to be reviewed.

                           

                          I assume that the dev team understands this, and that it was a deliberate decision to introduce threads usage in OCI...

                          I wonder that this kind of feature is added in version 19c, which is in reality from the 12.x family and should required minimal backward compatibility issues.

                           

                          Maybe I am wrong and we should not mix signal() with OCI at all from the beginning.

                          I could not find any topic about signal handling in the OCI documentation:

                          https://docs.oracle.com/apps/search/search.jsp?word=signals&product=b28359-01&book=b28395

                           

                          Seb

                          • 10. Re: Issue with system() call with libclntsh.so.19.1
                            user11763611

                            About the SR: Sorry but we have only a silver partnership and no permission to create a SR.

                            We are a dev tool company and we do not go to production with Oracle, we don't want to spend too much money in partnership contracts (we support other DB engines).

                            But we have large customers using our product with Oracle.

                             

                            Seb

                            • 11. Re: Issue with system() call with libclntsh.so.19.1
                              user11763611

                              I have found something similar to our issue: https://stackoverflow.com/questions/17550217/linux-system-sigchld-handling-multithreading

                               

                              system() calls sigprocmask() to block SIGCHLD, but sigprocmask() is not thread safe (should use pthread_sigmask()).

                               

                              As stated before, we do no longer use SIGCHLD.

                               

                              However, my conclusion is that since Oracle client lib is now multi-threaded, it's no longer possible to use standard C APIs like system().

                              At least on Linux.

                               

                              Seb

                              • 12. Re: Issue with system() call with libclntsh.so.19.1
                                user11763611

                                Some more thinking:

                                 

                                While reading: https://devarea.com/linux-handling-signals-in-a-multithreaded-application/#.XOKEiJyzJhF

                                 

                                I realized that proper signal handling is possible in a multi-threaded process, but it needs to be managed by a single piece of code/component.

                                 

                                Having different components (the libclntsh.so lib and my OCI program code) installing signal handlers can end up in a big mess, even if my code would properly handle signals for a multi-threaded process.

                                 

                                The idea in the above link is to dedicate a thread to receive signals and block the signals for all other threads.

                                 

                                How can I be sure that Oracle client will not overwrite my signal handlers and settings for multi-thread context?

                                (I want my program code to master the signal handling)

                                 

                                Where can I find documentation about Oracle Call Interface and signal handling?

                                I found this https://docs.oracle.com/en/database/oracle/oracle-database/18/unxar/using-oracle-precompilers-and-oracle-call-interface.…

                                What is Oracle two-task communication? Is this always enabled?

                                 

                                I also found this:

                                 

                                https://stackoverflow.com/questions/17124881/oracle-proc-oci-install-handlers-for-sigsegv-sigabrt-and-friends-why-and-ho…

                                 

                                The sqlnet.ora parameter

                                 

                                DIAG_SIGHANDLER_ENABLED=FALSE

                                 

                                ... prevents the OCI client to install signal handlers (tested)

                                 

                                Is this sufficient?

                                 

                                Seb

                                • 13. Re: Issue with system() call with libclntsh.so.19.1
                                  user11763611

                                  This section of the doc states that OCI is thread-safe:

                                  https://docs.oracle.com/en/database/oracle/oracle-database/18/unxar/using-oracle-precompilers-and-oracle-call-interface.…

                                   

                                  However, this does not mean that the user application actually needs to be thread-safe.

                                  To me it says that the user application can eventually be multi-threaded... that's a big difference.

                                   

                                  Seb

                                  • 14. Re: Issue with system() call with libclntsh.so.19.1
                                    cj

                                    The developers asked me to log a bug on this so it could be tracked.  It is bug 29865658.

                                    1 2 Назад Вперед