1 2 Previous Next 17 Replies Latest reply: Jan 23, 2013 3:25 AM by Chrisjenkins-Oracle RSS

    group commit warning(error 6003)

    581561
      hi, chris:
      Recently, we got some warning message from ttmesglog, Does this effect the performance of applications?? our log buffer size is 1G, the ckptfrequecy is 10 min.
      from the log, we can see all the warning is the checkpoint time. Does it has relation with our configurations?? thank you.
      2009-01-10 12:41:47.55 Warn: : 139890: 790600/1104f0750: sbLogBufGCLockENQNoErr: The commit is performed without the benefit of g
      roup commit optimization (error 6003)
      2009-01-10 17:51:49.19 Warn: : 139890: 790600/1104f0750: sbLogBufGCLockENQNoErr: The commit is performed without the benefit of g
      roup commit optimization (error 6003)
      2009-01-10 18:01:50.66 Warn: : 139890: 2212210/1112d6e90: sbLogBufGCLockENQNoErr: The commit is performed without the benefit of
      group commit optimization (error 6003)
      2009-01-10 18:21:40.05 Warn: : 139890: 790600/1104f0750: sbLogBufGCLockENQNoErr: The commit is performed without the benefit of g
      roup commit optimization (error 6003)

      from the error pdf, we know 6003 is lock request timeout, can I ask, when flush the log buffer it need what type of locks?? or lock which system tables??
        • 1. Re: group commit warning(error 6003)
          Chrisjenkins-Oracle
          Hi,

          This is not an error and TT does not lock system tables when comitting. Please see the following description from docNote 402403.1


          When transactions are being DurableCommited against a data store, it
          is sometimes possible to see processes awaiting the "GrpComm" (Group
          Commit) lock. This effect can be seen either by running ttXactAdmin or
          as warnings in the message log. These warnings will typicaly have the
          format "Unable to acquire group commit lock (error 6003)" in 6.0 and
          5.1. In version 5.0 a number of retries for this lock may also be
          displayed.


          The Group Commit lock is a peformance
          optimisation. It allows one transaction to durably sync all
          transactions that have already written their records to the kernel I/O
          buffer but not durably committed them to disk.


          Processes
          waiting on the Group Commit lock does not indicate a problem with this
          lock. When durable commits are occurring, most transactions spend the
          majority of their time waiting to be committed durably, so it is to be
          expected that there will be many waiters on this lock. In fact, in an
          application where transactions are short, you will frequently find
          nearly every connection waiting for the Group Commit lock. The idea is
          that all of the waiters on the lock will be flushed together with a
          single disk sync. This situation should not be seen as “contention” as
          there is no such thing as “contention” on the Group Commit lock. It is
          a performance optimization, rather than a protector of a shared
          resources.

          It is true that transactions can be seen waiting
          for locks that are held by other transactions that in turn either hold,
          or are waiting for, the Group Commit lock. This also does not
          (necessarily) indicate a problem with the Group Commit lock. A
          transaction cannot release its regular (db, tbl, row, etc.) locks until
          it has committed, so you will frequently see this sort of lock wait
          pattern in applications that use durable commits.


          As such,
          the group commit warning is frequently harmless. Sometimes though
          failure to acquire the group-commit lock can indicate a slow disk (lock
          waiters time out while the previous durable commit being synced to
          disk) or high cpu contention (the lock holder can’t get enough cycles
          to release the lock before the waiters time out).

          Because the
          lock is not needed for correctness, the transaction will not retry
          after reporting a failure to get the lock. Having waited 10 seconds for
          a group sync, instead it will then issue a 'sync' for itself.


          The
          MONITOR table records the number of durable commits (XACT_D_COMMITS),
          and also the number of syncs (LOG_FORCES). When group commit is in
          effect, you will see that XACT_D_COMMITS is much larger than
          LOG_FORCES as many transactions are being durably committed in one
          write to disk.


          Note: in 7.0 the message was changed to this:

          09:24:47.96
          Warn: : 4228: 5908/00D6D098: sbLogBufGCLockENQNoErr: The commit is
          performed without the benefit of group commit optimization (error 6003)

          Chris
          • 2. Re: group commit warning(error 6003)
            581561
            hi, chris:
            last day I saw this document, I have an doubt yet. our dsn configurations set the DurableCommit = 0, this means the log will hold on log buffer, not flush to disk immediate. And In our application, we never call the method "call ttDurableCommit" . So I don't know when will happen group commit? In checkpoint or log buffer full??
            And alse from monitor result, we can see XACT_D_COMMITS is smaller than LOG_FORCES. So I guess the group commit failed is because can not get the lock of sys.tables because that time application hold the lock of that table, May I right??

            Command> monitor;

            TIME_OF_1ST_CONNECT: Wed Nov 19 03:38:45 2008
            DS_CONNECTS: 765880
            DS_DISCONNECTS: 765572
            DS_CHECKPOINTS: 7976
            DS_CHECKPOINTS_FUZZY: 7976
            DS_COMPACTS: 0
            PERM_ALLOCATED_SIZE: 47185920
            PERM_IN_USE_SIZE: 9466840
            PERM_IN_USE_HIGH_WATER: 10595889
            TEMP_ALLOCATED_SIZE: 2097152
            TEMP_IN_USE_SIZE: 49467
            TEMP_IN_USE_HIGH_WATER: 1071532
            SYS18: 0
            XACT_BEGINS: 3832097991
            XACT_COMMITS: 3653172180
            XACT_D_COMMITS: 164258
            XACT_ROLLBACKS: 178904469
            LOG_FORCES: 258747556
            DEADLOCKS: 0
            LOCK_TIMEOUTS: 349
            LOCK_GRANTS_IMMED: 37650983544
            LOCK_GRANTS_WAIT: 175732
            SYS19: 3874656
            CMD_PREPARES: 390128
            CMD_REPREPARES: 3219
            CMD_TEMP_INDEXES: 5
            LAST_LOG_FILE: 8574
            REPHOLD_LOG_FILE: 8574
            REPHOLD_LOG_OFF: 691708344
            REP_XACT_COUNT: 1457457905
            REP_CONFLICT_COUNT: 0
            REP_PEER_CONNECTIONS: 9
            REP_PEER_RETRIES: 1604
            FIRST_LOG_FILE: 8571
            LOG_BYTES_TO_LOG_BUFFER: 8773404946216
            LOG_FS_READS: 12551
            LOG_FS_WRITES: 262781614
            LOG_BUFFER_WAITS: 0
            CHECKPOINT_BYTES_WRITTEN: 1753562376
            SYS1: 18519283303
            SYS2: 18519089347
            SYS3: 0
            SYS4: 0
            SYS5: 0
            SYS6: 0
            SYS7: 45499
            SYS8: 0
            REQUIRED_RECOVERY: 0
            SYS11: 0
            SYS12: 1
            TYPE_MODE: 0
            SYS13: 0
            SYS14: 0
            SYS15: 12028
            SYS16: 0
            SYS17: 0
            • 3. Re: group commit warning(error 6003)
              581561
              I think I understand the lock request timeout is means get group commit lock failed.
              and still we have no durablecommit, which situation can cause group commit? when the group commit happened, application need to wait the precommit on disk??
              whether it effect the performance of application? because sometimes we found the operation is slowly, most of time is normal.

              btw: when the operation was slowly, we found some pagein on system, so does we need the set the parameter v_pinshm =1 to make sure all memory hold by timesten not swap . thank you...
              • 4. Re: group commit warning(error 6003)
                Jspalmer-Oracle
                Please keep in mind that Replication will perform a durable commit every 200ms. This is a hardcoded setting and cannot be changed or turned off. You state that the group commit optimization warnings coincide with a checkpoint. This sounds to me like the checkpoint is using I/O resources that otherwise the group commit optimization would benefit from using.

                A few comments:

                - Are your checkpoint and log files on different disks?
                - LOG_FS_READS is 12551. Do you see this regularly increasing? It would be optimum if all log reads by replication and xla readers were done from the log buffer rather than the filesystem.
                - you could use ttCkptHistory to check the amount of bytes written out at each checkpoint, and determine if those which coincided with the group commit warning, were also writing out more bytes than usual.
                • 5. Re: group commit warning(error 6003)
                  581561
                  o, I understand, we have replications. I guess the warning message was cause by checkpoint, because from the call ttckpthistroy, the ckpt time is that time, and our ckpt frequency is 10 min. thank you...
                  from the monitor result, the log_fs_reads is not vary all the time.
                  • 6. Re: group commit warning(error 6003)
                    581561
                    every time the ckpt vary column is 1572035848
                    • 7. Re: group commit warning(error 6003)
                      Chrisjenkins-Oracle
                      Okay, it looks very much like these warnings are coming from the regular durable commits done by the repagent and are likely caused by heavy I/O contention ocurring when a checkpoint runs. Although the warnings are just warnings and are not in themselves a problem it sounds like your system is not optimally configured and that could lead to other probelms in the future. As previously mentioned, it is stringly recommended that the TimesTen datastore checkpoint and transaction log files be placed on seperate disks from each other and also sepaarte to other high activity files. Bo9th checkpointing and loggign are I/O intensive operations and, unless your disks are on a high-end RAID-0 disk array then a checkpoint is likely to overload the disk and impact performance.

                      Just to correct a couple of other points as well:

                      1. Commit does not need to get any locks on system tables etc. the group commit lock is a special lock only used by durable commits.

                      2. The log buffer is not flushed only when it gets full. It is continually flushed by the flusher thread in the manging sub-daemon and in general should never become 'full'.

                      Chris
                      • 8. Re: group commit warning(error 6003)
                        581561
                        thank you, our dastastore files and log files are on different disks.
                        and from the nmon result the IO is normal.
                        May I ask, how often does the subdaemon flush logs ?

                        Edited by: user578558 on 2009-1-14 上午12:29
                        • 9. Re: group commit warning(error 6003)
                          Chrisjenkins-Oracle
                          The sub-daemon flusher thread flushes log records continually. As records are posted into the log buffer by application transactions the flusher thread writes them to disk. if at any time there are no records to flush, the flusher goes to sleep on a semaphore. As soon as some records are placed in the buffer, the flusher is woken up and starts flushing again. So, flushing is continuous, not time interval based.

                          What kind of disk hardware are you using (rotational speed, interface type etc.)? Are both disks connected to the same controller? Are there any other files on the log and checkpoint disks or are they dedicated devices? What O/S platform are you running on?

                          Chris
                          • 10. Re: group commit warning(error 6003)
                            581561
                            the system is SAM, DS8000, IBM p595
                            they are many controllers, and many logic disks .
                            they are dedicated devices
                            AIX Version 5.3
                            logs:
                            PV COPIES IN BAND DISTRIBUTION
                            vpath13 286:000:000 0% 161:000:000:125:000
                            vpath42 286:000:000 0% 161:000:000:125:000
                            vpath20 286:000:000 0% 161:000:000:125:000
                            vpath49 286:000:000 0% 161:000:000:125:000
                            vpath26 286:000:000 0% 161:000:000:125:000
                            vpath55 286:000:000 0% 161:000:000:125:000
                            vpath12 286:000:000 0% 161:000:000:125:000
                            vpath41 286:000:000 0% 161:000:000:125:000
                            vpath27 286:000:000 0% 161:000:000:125:000
                            vpath56 286:000:000 0% 161:000:000:125:000
                            vpath14 286:000:000 0% 161:000:000:125:000
                            vpath43 286:000:000 0% 161:000:000:125:000
                            vpath28 286:000:000 0% 161:000:000:125:000
                            vpath57 286:000:000 0% 161:000:000:125:000

                            chk:
                            ttchk2:/ttchk2
                            PV COPIES IN BAND DISTRIBUTION
                            vpath27 172:000:000 0% 000:000:000:067:105
                            vpath56 172:000:000 0% 000:000:000:067:105
                            vpath20 172:000:000 0% 000:000:000:067:105
                            vpath49 172:000:000 0% 000:000:000:067:105
                            vpath26 172:000:000 0% 000:000:000:067:105
                            vpath55 172:000:000 0% 000:000:000:067:105
                            vpath13 172:000:000 0% 000:000:000:067:105
                            vpath42 172:000:000 0% 000:000:000:067:105
                            vpath12 172:000:000 0% 000:000:000:067:105
                            vpath41 172:000:000 0% 000:000:000:067:105
                            vpath14 172:000:000 0% 000:000:000:067:105
                            vpath43 172:000:000 0% 000:000:000:067:105
                            vpath28 172:000:000 0% 000:000:000:067:105
                            vpath57 172:000:000 0% 000:000:000:067:105

                            Edited by: user578558 on 2009-1-15 上午1:12

                            Edited by: user578558 on 2009-1-15 上午1:17
                            • 11. Re: group commit warning(error 6003)
                              Chrisjenkins-Oracle
                              Thanks for the information, but it does not tell me a great deal in terms of how disks are arranged into RAID groups and how these map to filesystems on this, and maybe other, machines. If these warnings always coinciode with checkpoints then it seems like some I/O or CPU resource issues is the underlying cause (checkpoint operations can be both I/O and CPU intensive). To get to the bottom of this would likely require detailed investigation in conjunction with an AIX expert. If the only symptoms you are seeing is these warnings but otherwise everything is working fine then I would not worry about them and just ignore them. If you really feel you need to eliminate them then you should open an SR with Oracle support.

                              Chris
                              • 12. Re: group commit warning(error 6003)
                                986081
                                Hi Chris, All,

                                One of our application processes got stuck when accessing timesten, the function back trace is as below:

                                fe5cc8ac nanosleep (ffb87390, ffb87388)
                                fc094f24 sbSleep (2a, 0, 0, 0, 0, 2) + 70
                                fc1c6218 sbSpinLatchGet (e4403390, 28c5320, 0, 7, fc92373c, 70) + 1a8
                                fc1e235c sbLogBufInsertFast (ffb8781c, 28c63b8, 0, 1, 0, 0) + 554
                                fc204a98 sbLogRecInsert (e440ce98, 8d4598, 0, 1, 1c, e440cea0) + 4b8
                                fc203e5c sbLogWrite (8, 8d4598, 0, 0, 28c5320, e4c4e928) + 12c
                                fc278dec sbTblStorLTupDelete (e5eaae50, 1, 0, 90bb78, e440ce0c, e48c) + 5fc
                                fc288edc sbTblDelete (e5eaae50, 16, 0, 8d47c8, 0, 1) + 2f0
                                fc3a0298 sbEeDeleteEval (fa0c93a0, f9e4c150, fa0c9410, 28c5320, fc917a2c, 0) + 478
                                fc38b9f4 sbEeStmtEval (ecb0, f9e4c150, 28c5320, ef88, ef8c, ef90) + 1b4
                                fc7a7d0c sbSqlCmdExec (0, 14017550, 0, 0, 142947a0, 2cdd238) + 338
                                fc0f5020 sb_sqlExec (2cdbf60, 2cdd238, 0, ffb890c4, 0, 0) + 348
                                fc070140 cxlSqlExecute (2cdb5b8, 1, 12c4, 2cde1e8, fc917a2c, 0) + 54
                                fc03f3c8 stmtExecute (2cdb5b8, ffffdffb, 16, 0, 8000, c) + 1d0
                                fc035f20 SQLExecute (2cdb5b8, 4c, 0, fc035efc, 27438d8, 0) + 24
                                ...

                                The process stuck there for 20 seconds and finally was killed by monitor process.
                                Before the process was killed, in ttmesglog there were 5 occurrence of the 6003 error.

                                03:39:28.87 Timesten error 6003
                                03:39:44.65 Timesten error 6003
                                03:39:59.53 Timesten error 6003
                                03:40:11.39 Timesten error 6003
                                03:40:13.20 Timesten error 6003
                                03:40:58.25 Timesten error 6003
                                03:41:23 the application process was detected that it has been stuck at nanosleep/sbSleep/sbSpinLatchGet() for 20s and was killed.

                                There are two processes accessing the DB, and the other one was killed later.

                                Could anyone help me out? what's the possible reasons and how to avoid it happening again?

                                Thanks,
                                Jack.Guo

                                Edited by: 983078 on Jan 21, 2013 2:00 AM
                                • 13. Re: group commit warning(error 6003)
                                  Chrisjenkins-Oracle
                                  The thread in the process that was killed was blocked waiting to post a log record into the in-memory log buffer. It's hard to be definitive with just the pstack but the associated group commit timeout warnings in the message log suggest that you have a logging bottleneck. TimesTen is not able to flush the log buffer to disk fast enough to make space for new records. Since posting the log record is a 'must do' operation and generally should complete very quickly it is not valid to 'give up after a timeout'.

                                  What are your DSN settings for this database? What kind of disk storage are the transaction logs located on and is that storage shared with anything else (such as TT checkpoint files)? Do you observe any non-zero/increasing values in SYS.MONITOR.LOG_BUFFER_WAITS? What are the details of the hardware (#CPUs, type pf CPUs) and workload?

                                  Chris
                                  • 14. Re: group commit warning(error 6003)
                                    986081
                                    Hi Chris,

                                    Thanks for your quick response.

                                    Our system is running on Netra-T2000. 7 application service instances share the hardware with the support of virtualization. Just before the process was killed, we had cron jobs to clean the application log files, which are located on the same disk of the DB log files. We were running stress testing when the issue was fired, so the system was quite busy at that time -- average cpu usage was over 90%. We run similar test in each weekend, and we clean the log files at each mid-night, but we never hit such issue (application process got stuck for a long time) before, though we occasionally had 6003 errors in ttmesglog.

                                    Here is the DB settings (direct DB access):
                                    [ApplicationNameDataDir]
                                    Driver=/opt/TimesTen/spatial/lib/libtten.so
                                    DataStore=/CompanyName/db/ApplicationNameData
                                    DurableCommits=0
                                    ExclAccess=0
                                    LockLevel=0
                                    PermSize=320
                                    TempSize=32
                                    ThreadSafe=1
                                    WaitForConnect=1
                                    Preallocate=1
                                    Authenticate=1
                                    UID=spatial
                                    PWD=spatial1
                                    LogBuffSize=16384
                                    CkptFrequency=60
                                    Connections=128
                                    LogFileSize=8
                                    DatabaseCharacterSet=TIMESTEN8
                                    TypeMode=1


                                    And the ttstatus output:

                                    TimesTen status report as of Tue Jan 22 03:13:13 2013

                                    Daemon pid 7726 port 17000 instance xxxxxx
                                    TimesTen server pid 7731 started on port 17002
                                    No TimesTen webserver running

                                    ------------------------------------------------------------------------
                                    Data store /xxxxxx/CompanyName/db/ApplicationNameData
                                    There are 20 connections to the data store
                                    Data store is in shared mode
                                    Shared Memory KEY 0x0400b436 ID 1476395024
                                    Type PID Context Connection Name ConnID
                                    Process 10436 0x01aa9140 APP1 12
                                    Process 11375 0x00cd8398 APP2 13
                                    ...
                                    Process 9706 0x0345bbe0 APP6 9
                                    Process 9706 0x0ad6f738 APP6 10
                                    Process 9706 0x0afeec98 APP6 11
                                    Replication 8509 0x00164100 APP7 4
                                    Replication 8509 0x00183270 APP8 6
                                    ...
                                    Subdaemon 7727 0x000718b8 Worker 2042
                                    Subdaemon 7727 0x000e25b0 Flusher 2043
                                    Subdaemon 7727 0x000f1710 Checkpoint 2047
                                    ...
                                    Replication policy : Always
                                    Replication agent is running.
                                    Cache Agent policy : Manual
                                    ------------------------------------------------------------------------
                                    End of report


                                    Regards,
                                    Jack
                                    1 2 Previous Next