9 Replies Latest reply: Jun 12, 2013 11:35 AM by Todd Little-Oracle RSS

    Tuxedo server going down with TPESVCERR error when Client ran for 2nd time

    908950
      Hi there,

      I am trying a program with 2 Tuxedo servers to write into a file, where the 1 server during its startup requests the 2nd server to open a file (create if required) and then return file descriptor to it. the servers are coded in C and the request and response information are in C structures. Here, I am populating the FML buffer from the structure before sending the request/response and getting the values from the FML buffer after receiving request /response using FML functions.

      The FML buffers are allocated using tpalloc function, since these buffers are to be used in Tuxedo calls (tpacall,tpgetrply,tpreturn) for communication.

      And when the client sends a message to Server1 to log it into a file, Server1 sends this request to Server2 based on the file descriptor that it gets during startup, server2 writes into the file and replies back to Server1, based on the reply that server1 got from server2 , it sends response to the Client.

      Here, I have below issue:

      When the client is run for the 2nd time, the request flow is fine from Server1 to Server2 and server2 is able to write into the file and sending the response to Server1. However, Server2 is going down and tpgetrply call in Server1 is returning "TPESVCERR" error.

      tpreturn used in Server2:
      tpreturn (TPSUCCESS,0,replyFML,replySize,0);*

      tpgetrply in Server1:
      int ab = tpgetrply(&cd,(char *)&replyfml,&replysize,0);
      +if(ab == -1){+
      printf("EMRAPI-LOG_WriteLogComplete:tpgetrply failed with %s\n",tpstrerror(tperrno));
      +}+

      From Logs:
      102723.dtcs-ndaxdb01!cleanupsrv.12934.1.-2: TRACE:tr: trace("*:ulog:dye")
      102723.dtcs-ndaxdb01!cleanupsrv.12934.1.-2: TRACE:tr: dye
      102723.dtcs-ndaxdb01!emr_server.12606.1.0: TRACE:at: } +tpgetrply = -1 [tperrno TPESVCERR]+*
      ...
      EMRAPI-LOG_WriteLogComplete:*tpgetrply failed with TPESVCERR - server error while handling request*

      Can anyone please help to resolve this issue? Thanks!

      Regards,
      905947
        • 1. Re: Tuxedo server going down with TPESVCERR error when Client ran for 2nd time
          Todd Little-Oracle
          Hi,

          My guess is that server2 is exiting for some reason or another (unhandled exception most likely) before it gets a chance to issue the tpreturn(). In that case, the Tuxedo BBL will generate a reply to server1 indicating that the service didn't return successfully. If you enable ATMI tracing in both servers, you should be able to get a better idea of what is happening. You might also want to try to attach to server2 with a debugger and see what's happening in server2. Also note that file handles are generally not something you pass around between processes, although in this case it sounds like you are passing a file handle from server2 back to server1 and that server1 doesn't use the file handle except to pass it back to server2. That should be OK as long as server2 doesn't close the file in the mean time as then the file handle will no longer be valid. Tuxedo communication is very different than domain sockets and you can't in general share file handles using Tuxedo communication.

          Regards,
          Todd Little
          Oracle Tuxedo Chief Architect
          • 2. Re: Tuxedo server going down with TPESVCERR error when Client ran for 2nd time
            908950
            Thanks Todd for the reply.


            Yes, during the startup of the server1, it sends the request to server2 to open the file. Server2 will then open the file and send the file handle, but it does not closes it. While sending the request to write into that file, server1 will send this filehandle to server2 using which it writes into the file and send the response.


            I have enabled trace for server2 as well, the tpreturn call is being retruned with TPSUCESS as shown in the below trace logs when the client is run 2nd second time:

            072622.dtcs-ndaxdb01!log_server.12055.1.0: Exiting from ConvertLOG_WRITE_REPLY
            072622.dtcs-ndaxdb01!log_server.12055.1.0: TRACE:at: { +_tpreturn(2, 0, 0x17ac80, 2048, 0x0)_+
            072632.dtcs-ndaxdb01!emr_server.12058.1.0: TRACE:at: { tpgetrply(0xffbfe8c4, 0xffbfe654, 0xffbfe650, 0x0)

            Note:
            log_server - is executable of server2
            emr_server - is executable of server1


            Request you to please help to resolve this.Thanks!

            Regards,
            905947
            • 3. Re: Tuxedo server going down with TPESVCERR error when Client ran for 2nd time
              Per Lindström
              Hi,

              you're not likely to be able to successfully reuse a file handle opened in one process (Server2) in another process (Server1). File handles are as far as I know private to the process they´re created in.

              You might consider letting Server2 "own" the file all the time (possibly opening it in tpsvrinit() and then just send messages to Server2 whenever you want to write something to it. In short - let one process and one only be responsible for the low-level handling of the file.

              Hope this helps,
              /Per
              • 4. Re: Tuxedo server going down with TPESVCERR error when Client ran for 2nd time
                Todd Little-Oracle
                Hi,

                Can you post the entire trace of both servers for both calls?

                Regards,
                Todd Little
                Oracle Tuxedo Chief Architect
                • 5. Re: Tuxedo server going down with TPESVCERR error when Client ran for 2nd time
                  908950
                  Thanks Per and Todd.

                  Hi Todd,

                  Please find the trace of both servers for both calls as below:



                  +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

                  .....

                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: EMR_API_SEND_ELM_MSG_2
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: EMRAPI: About to call ProcessSendELMRequest Function
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: Entered ConvertEMR_API_SEND_ELM_RQST function
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: ConvertEMR_API_SEND_ELM_RQST: Received = 1
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: Exiting from ConvertEMR_Do_Reply
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: TRACE:at: { tpalloc("FML32", "", 1024)
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: TRACE:at: } tpalloc = 0x2465a0
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: Entered ConvertLOG_API_WRITE_RQST function
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: ConvertLOG_API_WRITE_RQST: Received = 0
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: ConvertLOG_API_WRITE_RQST:Fadd32 success for s->RequestType
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: ConvertLOG_API_WRITE_RQST:Fadd32 success for s->FileHandle
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: ConvertLOG_API_WRITE_RQST:Fadd32 success for s->ProductId
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: ConvertLOG_API_WRITE_RQST:Fadd32 success for s->SubProductId
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: ConvertLOG_API_WRITE_RQST:Fadd32 success for s->ProductRev
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: ConvertLOG_API_WRITE_RQST:Fadd32 success for s->Severity
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: ConvertLOG_API_WRITE_RQST:Fadd32 success for s->DataGeographic
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: ConvertLOG_API_WRITE_RQST:Fadd32 success for s->pid
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: ConvertLOG_API_WRITE_RQST:Fadd32 success for s->DateTimeOccurred
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: ConvertLOG_API_WRITE_RQST:Fadd32 success for s->DateTimeOccurredUsec
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: ConvertLOG_API_WRITE_RQST:Fadd32 success for s->MessageRef
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: ConvertLOG_API_WRITE_RQST:Fadd32 success for s->ParamCount
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: ConvertLOG_API_WRITE_RQST:Fadd32 success for s->ByteCount
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: Exiting from ConvertEMR_Do_Reply
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: entered ELF_ClientSend function
                  *070743.dtcs-ndaxdb01!emr_server.24453.1.0: TRACE:at: { tpacall("LOGAPI", 0x2465a0, 1024, 0x8)*
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: TRACE:at: } tpacall = 4 [CLIENTID {1368428863, 0, 2}]
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: TRACE:at: { tpfree(0x2465a0)
                  070743.dtcs-ndaxdb01!log_server.24450.1.0: TRACE:tr: dye
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: TRACE:at: } tpfree
                  *070743.dtcs-ndaxdb01!log_server.24450.1.0: TRACE:at: { tpservice({"LOGAPI", 0x0, 0x160c48, 4096, 0, 0, {1368428863, 0, 2}})*
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: TRACE:at: { tpalloc("FML32", "", 2048)
                  070743.dtcs-ndaxdb01!log_server.24450.1.0: TRACE:at: { tpalloc("FML32", "", 1024)
                  070743.dtcs-ndaxdb01!emr_server.24453.1.0: TRACE:at: } tpalloc = 0x249238
                  070743.dtcs-ndaxdb01!log_server.24450.1.0: TRACE:at: } tpalloc = 0x1721c8
                  070743.dtcs-ndaxdb01!log_server.24450.1.0: TRACE:at: { tpalloc("FML32", "", 2048)
                  070743.dtcs-ndaxdb01!log_server.24450.1.0: TRACE:at: } tpalloc = 0x17ac58
                  070743.dtcs-ndaxdb01!log_server.24450.1.0: Entered ConvertLOG_API_WRITE_RQST function
                  070743.dtcs-ndaxdb01!log_server.24450.1.0: ConvertLOG_API_WRITE_RQST: Received = 1
                  070743.dtcs-ndaxdb01!log_server.24450.1.0: ConvertLOG_API_WRITE_RQST:Fget32 failed for s->DataArea with LIBFML_CAT:3: ERROR: N
                  o space in fielded buffer
                  070743.dtcs-ndaxdb01!log_server.24450.1.0: Exiting from ConvertEMR_Do_Reply
                  070743.dtcs-ndaxdb01!log_server.24450.1.0: Entered ConvertLOG_WRITE_REPLY function
                  070743.dtcs-ndaxdb01!log_server.24450.1.0: ConvertLOG_WRITE_REPLY: Received = 0
                  070743.dtcs-ndaxdb01!log_server.24450.1.0: ConvertLOG_WRITE_REPLY:Fadd32 success for s->ErrorCode
                  070743.dtcs-ndaxdb01!log_server.24450.1.0: ConvertLOG_WRITE_REPLY:Fadd32 success for s->errno
                  070743.dtcs-ndaxdb01!log_server.24450.1.0: Exiting from ConvertLOG_WRITE_REPLY
                  *070743.dtcs-ndaxdb01!log_server.24450.1.0: TRACE:at: { tpreturn(2, 0, 0x17ac58, 2048, 0x0)*
                  *070753.dtcs-ndaxdb01!emr_server.24453.1.0: TRACE:at: { tpgetrply(0xffbfe8c4, 0xffbfe654, 0xffbfe650, 0x0)*
                  070833.dtcs-ndaxdb01!BBL.24440.1.0: LIBTUX_CAT:541: WARN: Server GROUP1/1 terminated
                  070833.dtcs-ndaxdb01!BBL.24440.1.0: LIBTUX_CAT:551: WARN: Cleaning up server GROUP1/1
                  070833.dtcs-ndaxdb01!cleanupsrv.24700.1.-2: 05-13-2013: Tuxedo Version 11.1.1.2.0, 32-bit
                  070833.dtcs-ndaxdb01!cleanupsrv.24700.1.-2: TRACE:tr: trace("*:ulog:dye")
                  070833.dtcs-ndaxdb01!cleanupsrv.24700.1.-2: TRACE:tr: dye
                  070833.dtcs-ndaxdb01!cleanupsrv.24700.1.-2: TRACE:ia: { tpchkauth()
                  070833.dtcs-ndaxdb01!cleanupsrv.24700.1.-2: TRACE:ia: } tpchkauth = 0
                  070833.dtcs-ndaxdb01!cleanupsrv.24700.1.-2: TRACE:tr: trace("*:ulog:dye")
                  070833.dtcs-ndaxdb01!cleanupsrv.24700.1.-2: TRACE:tr: dye
                  *070833.dtcs-ndaxdb01!emr_server.24453.1.0: TRACE:at: } tpgetrply = -1 [tperrno TPESVCERR]*
                  070833.dtcs-ndaxdb01!emr_server.24453.1.0: Entered ConvertLOG_WRITE_REPLY function
                  070833.dtcs-ndaxdb01!emr_server.24453.1.0: ConvertLOG_WRITE_REPLY: Received = 1
                  070833.dtcs-ndaxdb01!emr_server.24453.1.0: ConvertLOG_WRITE_REPLY:Fget32 failed for s->ErrorCode with LIBFML_CAT:4: ERROR: Fie
                  ld not present
                  070833.dtcs-ndaxdb01!emr_server.24453.1.0: ConvertLOG_WRITE_REPLY:Fget32 failed for s->errno with LIBFML_CAT:4: ERROR: Field n
                  ot present
                  070833.dtcs-ndaxdb01!emr_server.24453.1.0: ConvertLOG_WRITE_REPLY:Fget32 failed for s->SqlErrorText with LIBFML_CAT:4: ERROR:
                  Field not present
                  070833.dtcs-ndaxdb01!cleanupsrv.24700.1.-2: server GROUP1/1: CMDTUX_CAT:551: INFO: server removed
                  070833.dtcs-ndaxdb01!emr_server.24453.1.0: Exiting from ConvertLOG_WRITE_REPLY
                  070833.dtcs-ndaxdb01!cleanupsrv.24700.1.-2: TRACE:ia: { tpgetctxt(0xffbfe2ac, 0x0)
                  070833.dtcs-ndaxdb01!emr_server.24453.1.0: TRACE:at: { tpfree(0x249238)
                  070833.dtcs-ndaxdb01!emr_server.24453.1.0: TRACE:at: } tpfree
                  070833.dtcs-ndaxdb01!emr_server.24453.1.0: TRACE:at: { tpalloc("FML32", "", 1024)
                  070833.dtcs-ndaxdb01!emr_server.24453.1.0: TRACE:at: } tpalloc = 0x249238
                  070833.dtcs-ndaxdb01!emr_server.24453.1.0: TRACE:at: { tpalloc("FML32", "", 1024)
                  070833.dtcs-ndaxdb01!emr_server.24453.1.0: TRACE:at: } tpalloc = 0x246688
                  070833.dtcs-ndaxdb01!emr_server.24453.1.0: TRACE:at: { tpalloc("STRING", "", 100)
                  070833.dtcs-ndaxdb01!emr_server.24453.1.0: TRACE:at: } tpalloc = 0x249a48
                  070833.dtcs-ndaxdb01!emr_server.24453.1.0: TRACE:at: { tpreturn(2, 0, 0x249a48, 4, 0x0)
                  ....


                  ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


                  Regards,
                  905947
                  • 6. Re: Tuxedo server going down with TPESVCERR error when Client ran for 2nd time
                    Todd Little-Oracle
                    Hi,

                    Although without your UBBCONFIG file I can't be certain, but from the log, it appears the log_server crashed while trying to process the request. Also, I don't see a successful tpgetrply() call in the log, just the unsuccessful one. But in any case, you need to fix the log_server to not crash.

                    Regards,
                    Todd Little
                    Oracle Tuxedo Chief Architect
                    • 7. Re: Tuxedo server going down with TPESVCERR error when Client ran for 2nd time
                      908950
                      Hi Todd,

                      The above logs, I have pasted when the client was run 2nd time,

                      Also, I have provided the Groups, Servers and Services section from the UBBCONFIG file that I am using.

                      ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

                      *GROUPS+
                      +#DEFAULT: TMSNAME=TMS_SQL+
                      +#DEFAULT: TMSNAME=Oracle_XA+
                      GROUP1
                      LMID=simple     GRPNO=1
                      TMSNAME=Oracle_XA

                      *SERVERS+
                      DEFAULT:
                      CLOPT="-A"

                      log_server       SRVGRP=GROUP1 SRVID=1
                      emr_server       SRVGRP=GROUP1 SRVID=2

                      *SERVICES+
                      LOGAPI
                      EMRAPI

                      +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

                      Regards,
                      905947
                      • 8. Re: Tuxedo server going down with TPESVCERR error when Client ran for 2nd time
                        908950

                        Hi there,

                         

                        Can anyone please help me to resolve the issue explained above? Thanks very much!

                         

                        Regards,

                        908950

                        • 9. Re: Tuxedo server going down with TPESVCERR error when Client ran for 2nd time
                          Todd Little-Oracle

                          Hi,

                           

                          As I replied before, my guess is that the log_server is dying, probably due to an access violation or something similar.  You can see this from the ULOG line:

                           

                          070833.dtcs-ndaxdb01!BBL.24440.1.0: LIBTUX_CAT:541: WARN: Server GROUP1/1 terminated


                          which indicates the server with server ID 1 and group ID of 1 (which from your UBBCONFIG is the log_server) exited unexpectedly.  This is almost always caused by an unhandled exception.  Attach a debugger to to the server before trying the test and the debugger should tell you what is happening.  Alternatively you could possibly use dtrace or strace depending on the platform you are using.


                          My guess is that your tpreturn() call in log_server is passing one or more bad parameters as it pass TPSUCCESS in rval, but the caller received TPSVCFAIL.


                          Regards,

                          Todd Little

                          Oracle Tuxedo Chief Architect