5 Replies Latest reply: Jan 22, 2014 3:49 AM by Chrisjenkins-Oracle RSS

    ttRepSubscriberWait - Clarification

    sgracelin

      Replication is configured between three DSNs in three hosts[Each as master and other two as subscriber]. Any update on any DSN will replicate to other two nodes.

       

      When a table is altered in one of these DSN in one host then the expected behaviour in T10 is it will replicate to other two nodes. There could be some problem that the alter didnt replicate to one of the subscriber. To avoid this problem during upgrades, we are using this built-in ttRepSubscriberWait to verify that Alter Table is replicated to all other DSNs.

       

      Consider the below scenario,

      Master DSN A has got Connection1 which has prepared a query[statement1] on Table1. This statement is executed and is yet to commit. So there is a lock on Table 1 in Master DSN A.

      In Master DSN B, a column col1 is dropped in Table1. and the built-in ttRepSubscriberWait is invoked. This call is returning 00 even though the column col1 is not dropped in Master DSN A becos it has a lock.

       

      Timesten documentation says

      "This procedure causes the caller to wait until all transactions that committed before the call have been transmitted to the subscriber subscriberStoreName. It also waits until the subscriber has acknowledged that the updates have been durably committed at the subscriber database"

       

      There is no error in transmitter T10 side. The following error is seen in receiver T10.

       

      2014-01-23 13:00:12.67 Err : REP: 30007: DG_222221:receiver.c(11832): TT16094: Failed to execute SQL: ALTER TABLE "DG"."SUBS_LOC_SVC_INFO" DROP ("LOC_REQ_EXPIRY")

      2014-01-23 13:00:12.69 Err : REP: 30007: DG_222221:receiver.c(7100): TT16187: Transaction 1390083931/185874; Error: transient 0, permanent 1 state 2 error message :

       

      Any idea why T10 is returning 00 for ttRepSubscriberWait call when the alter table drop was not durably committed at the subscriber database??

       

      Is there any way in T10 to identify that an alter table done in one of the node is 100% replicated to all its subscriber DSNs.

       

      Regards,

      Gracelin Priya

        • 1. Re: ttRepSubscriberWait - Clarification
          Chrisjenkins-Oracle

          Hi Gracelin,

           

          What exact version of TimesTen is this (ttVersion output please)? Also, prior to the error in the logs, do you see any messages indicating that the subscriber is retrying the ALTER TABLE operation (you may need to look in the message log rather than the error log to see those)?

           

          The issue here is that this failure has been categorised as a 'permanent' error and as such replication reports it and then continues. i.e. it considers this operation 'finished'. Most locking errors are considered as 'transient' and will be retired several times (holding up the entire replication flow) before begin declared as 'failed' but it maybe we are not doing it for this case (which would be a bug).

           

          Please provide the information I requested above and then we can see if this is a bug or not.

           

          Thanks,

           

          Chris

          • 2. Re: ttRepSubscriberWait - Clarification
            sgracelin

            Chris,

             

            Following are the details required.

             

            Timesten Version is

             

            TimesTen Release 11.2.1.9.8 (64 bit Linux/x86_64) (kodiak:53388) 2012-11-19T10:01:17Z

              Instance admin: root

              Instance home directory: /opt/TimesTen/kodiak

              Group owner: kodiakgroup

              Daemon home directory: /var/TimesTen/kodiak

              PL/SQL enabled.

             

            There is no message in receiver side regarding the retry of the operation.  Please refer below ttmesg.log output from receiver side

             

            2014-01-23 13:00:04.98 Info:    : 12348: 30141 ------------------: process exited

            2014-01-23 13:00:04.98 Info:    : 12348: Finished daRecovery for pid 30141.

            2014-01-23 13:00:11.58 Info: REP: 30007: DG_222221:receiver.c(8508): TT16193: Adding definition for table: DG.SUBS_LOC_SVC_INFO

            2014-01-23 13:00:11.58 Info: REP: 30007: DG_222221:meta.c(6634):DG.SUBS_LOC_SVC_INFO ptn 0: srcoff 0, destoff 0, length 64

            2014-01-23 13:00:11.58 Info: REP: 30007: DG_222221:receiver.c(8972): TT16203: Passed extended comparison for table DG.SUBS_LOC_SVC_INFO

            2014-01-23 13:00:12.67 Err : REP: 30007: DG_222221:receiver.c(11832): TT16094: Failed to execute SQL: ALTER TABLE "DG"."SUBS_LOC_SVC_INFO" DROP ("LOC_REQ_EXPIRY")

            2014-01-23 13:00:12.69 Err : REP: 30007: DG_222221:receiver.c(7100): TT16187: Transaction 1390083931/185874; Error: transient 0, permanent 1 state 2 error message :

            2014-01-23 13:00:17.78 Info:    : 12348: 19999/0x1cd4eb0: sbCmdCompAllDepsRemove(): Dependency list length high water: 228

            2014-01-23 13:01:00.37 Info: SRV: 18937: EventID=37| Client protocol version 41 is not supported by this server, which uses version 36. Attempting to renegotiate protocol level.

            2014-01-23 13:01:00.40 Info:    : 12348: maind got #15538.103502, hello: pid=18937 type=library payload=%00%00%00%00 protocolID=TimesTen 11.2.1.9.8.kodiak ident=%00%00%00%00

             

            Please help us to resolve this issue.

             

            Regards,

            Priya

            • 3. Re: ttRepSubscriberWait - Clarification
              Chrisjenkins-Oracle

              Since this definitely looks like it is a bug you need to log an SR with support so they can do any further diagnosis required, log the bug and follow it through to resolution. There really isn't anything more we can do via the forum.

               

              Regards,

               

              Chris

              • 4. Re: ttRepSubscriberWait - Clarification
                sgracelin

                Chris,

                 

                Can you please provide some link or documentation or list the transient errors for which timesten replication agent will re-try to apply multiply time and which are categorised as permanent error? Is that that timesten replication agent will never apply re-try mechanism for any of the permanent error received? For re-try what is the maximum number of attempt replication agent will try to apply the transaction?.

                 

                Thanks and Regards,

                Gracelin

                • 5. Re: ttRepSubscriberWait - Clarification
                  Chrisjenkins-Oracle

                  The information on error classification as regards transient versus permanent and the associated retry policy is not currently documented. Please log an SR and request this information via the SR.

                   

                  What I can say is that for transient errors, if the error persists after the retry policy has been applied then it will be considered as permanent. For permanent errors the operation is logged and rejected and will never be re-tried.

                   

                  Chris