This discussion is archived
1 2 Previous Next 17 Replies Latest reply: Jan 23, 2013 1:25 AM by ChrisJenkins RSS

group commit warning(error 6003)

581561 Newbie
Currently Being Moderated
hi, chris:
Recently, we got some warning message from ttmesglog, Does this effect the performance of applications?? our log buffer size is 1G, the ckptfrequecy is 10 min.
from the log, we can see all the warning is the checkpoint time. Does it has relation with our configurations?? thank you.
2009-01-10 12:41:47.55 Warn: : 139890: 790600/1104f0750: sbLogBufGCLockENQNoErr: The commit is performed without the benefit of g
roup commit optimization (error 6003)
2009-01-10 17:51:49.19 Warn: : 139890: 790600/1104f0750: sbLogBufGCLockENQNoErr: The commit is performed without the benefit of g
roup commit optimization (error 6003)
2009-01-10 18:01:50.66 Warn: : 139890: 2212210/1112d6e90: sbLogBufGCLockENQNoErr: The commit is performed without the benefit of
group commit optimization (error 6003)
2009-01-10 18:21:40.05 Warn: : 139890: 790600/1104f0750: sbLogBufGCLockENQNoErr: The commit is performed without the benefit of g
roup commit optimization (error 6003)

from the error pdf, we know 6003 is lock request timeout, can I ask, when flush the log buffer it need what type of locks?? or lock which system tables??
  • 1. Re: group commit warning(error 6003)
    ChrisJenkins Guru
    Currently Being Moderated
    Hi,

    This is not an error and TT does not lock system tables when comitting. Please see the following description from docNote 402403.1


    When transactions are being DurableCommited against a data store, it
    is sometimes possible to see processes awaiting the "GrpComm" (Group
    Commit) lock. This effect can be seen either by running ttXactAdmin or
    as warnings in the message log. These warnings will typicaly have the
    format "Unable to acquire group commit lock (error 6003)" in 6.0 and
    5.1. In version 5.0 a number of retries for this lock may also be
    displayed.


    The Group Commit lock is a peformance
    optimisation. It allows one transaction to durably sync all
    transactions that have already written their records to the kernel I/O
    buffer but not durably committed them to disk.


    Processes
    waiting on the Group Commit lock does not indicate a problem with this
    lock. When durable commits are occurring, most transactions spend the
    majority of their time waiting to be committed durably, so it is to be
    expected that there will be many waiters on this lock. In fact, in an
    application where transactions are short, you will frequently find
    nearly every connection waiting for the Group Commit lock. The idea is
    that all of the waiters on the lock will be flushed together with a
    single disk sync. This situation should not be seen as “contention” as
    there is no such thing as “contention” on the Group Commit lock. It is
    a performance optimization, rather than a protector of a shared
    resources.

    It is true that transactions can be seen waiting
    for locks that are held by other transactions that in turn either hold,
    or are waiting for, the Group Commit lock. This also does not
    (necessarily) indicate a problem with the Group Commit lock. A
    transaction cannot release its regular (db, tbl, row, etc.) locks until
    it has committed, so you will frequently see this sort of lock wait
    pattern in applications that use durable commits.


    As such,
    the group commit warning is frequently harmless. Sometimes though
    failure to acquire the group-commit lock can indicate a slow disk (lock
    waiters time out while the previous durable commit being synced to
    disk) or high cpu contention (the lock holder can’t get enough cycles
    to release the lock before the waiters time out).

    Because the
    lock is not needed for correctness, the transaction will not retry
    after reporting a failure to get the lock. Having waited 10 seconds for
    a group sync, instead it will then issue a 'sync' for itself.


    The
    MONITOR table records the number of durable commits (XACT_D_COMMITS),
    and also the number of syncs (LOG_FORCES). When group commit is in
    effect, you will see that XACT_D_COMMITS is much larger than
    LOG_FORCES as many transactions are being durably committed in one
    write to disk.


    Note: in 7.0 the message was changed to this:

    09:24:47.96
    Warn: : 4228: 5908/00D6D098: sbLogBufGCLockENQNoErr: The commit is
    performed without the benefit of group commit optimization (error 6003)

    Chris
  • 2. Re: group commit warning(error 6003)
    581561 Newbie
    Currently Being Moderated
    hi, chris:
    last day I saw this document, I have an doubt yet. our dsn configurations set the DurableCommit = 0, this means the log will hold on log buffer, not flush to disk immediate. And In our application, we never call the method "call ttDurableCommit" . So I don't know when will happen group commit? In checkpoint or log buffer full??
    And alse from monitor result, we can see XACT_D_COMMITS is smaller than LOG_FORCES. So I guess the group commit failed is because can not get the lock of sys.tables because that time application hold the lock of that table, May I right??

    Command> monitor;

    TIME_OF_1ST_CONNECT: Wed Nov 19 03:38:45 2008
    DS_CONNECTS: 765880
    DS_DISCONNECTS: 765572
    DS_CHECKPOINTS: 7976
    DS_CHECKPOINTS_FUZZY: 7976
    DS_COMPACTS: 0
    PERM_ALLOCATED_SIZE: 47185920
    PERM_IN_USE_SIZE: 9466840
    PERM_IN_USE_HIGH_WATER: 10595889
    TEMP_ALLOCATED_SIZE: 2097152
    TEMP_IN_USE_SIZE: 49467
    TEMP_IN_USE_HIGH_WATER: 1071532
    SYS18: 0
    XACT_BEGINS: 3832097991
    XACT_COMMITS: 3653172180
    XACT_D_COMMITS: 164258
    XACT_ROLLBACKS: 178904469
    LOG_FORCES: 258747556
    DEADLOCKS: 0
    LOCK_TIMEOUTS: 349
    LOCK_GRANTS_IMMED: 37650983544
    LOCK_GRANTS_WAIT: 175732
    SYS19: 3874656
    CMD_PREPARES: 390128
    CMD_REPREPARES: 3219
    CMD_TEMP_INDEXES: 5
    LAST_LOG_FILE: 8574
    REPHOLD_LOG_FILE: 8574
    REPHOLD_LOG_OFF: 691708344
    REP_XACT_COUNT: 1457457905
    REP_CONFLICT_COUNT: 0
    REP_PEER_CONNECTIONS: 9
    REP_PEER_RETRIES: 1604
    FIRST_LOG_FILE: 8571
    LOG_BYTES_TO_LOG_BUFFER: 8773404946216
    LOG_FS_READS: 12551
    LOG_FS_WRITES: 262781614
    LOG_BUFFER_WAITS: 0
    CHECKPOINT_BYTES_WRITTEN: 1753562376
    SYS1: 18519283303
    SYS2: 18519089347
    SYS3: 0
    SYS4: 0
    SYS5: 0
    SYS6: 0
    SYS7: 45499
    SYS8: 0
    REQUIRED_RECOVERY: 0
    SYS11: 0
    SYS12: 1
    TYPE_MODE: 0
    SYS13: 0
    SYS14: 0
    SYS15: 12028
    SYS16: 0
    SYS17: 0
  • 3. Re: group commit warning(error 6003)
    581561 Newbie
    Currently Being Moderated
    I think I understand the lock request timeout is means get group commit lock failed.
    and still we have no durablecommit, which situation can cause group commit? when the group commit happened, application need to wait the precommit on disk??
    whether it effect the performance of application? because sometimes we found the operation is slowly, most of time is normal.

    btw: when the operation was slowly, we found some pagein on system, so does we need the set the parameter v_pinshm =1 to make sure all memory hold by timesten not swap . thank you...
  • 4. Re: group commit warning(error 6003)
    jspalmer Journeyer
    Currently Being Moderated
    Please keep in mind that Replication will perform a durable commit every 200ms. This is a hardcoded setting and cannot be changed or turned off. You state that the group commit optimization warnings coincide with a checkpoint. This sounds to me like the checkpoint is using I/O resources that otherwise the group commit optimization would benefit from using.

    A few comments:

    - Are your checkpoint and log files on different disks?
    - LOG_FS_READS is 12551. Do you see this regularly increasing? It would be optimum if all log reads by replication and xla readers were done from the log buffer rather than the filesystem.
    - you could use ttCkptHistory to check the amount of bytes written out at each checkpoint, and determine if those which coincided with the group commit warning, were also writing out more bytes than usual.
  • 5. Re: group commit warning(error 6003)
    581561 Newbie
    Currently Being Moderated
    o, I understand, we have replications. I guess the warning message was cause by checkpoint, because from the call ttckpthistroy, the ckpt time is that time, and our ckpt frequency is 10 min. thank you...
    from the monitor result, the log_fs_reads is not vary all the time.
  • 6. Re: group commit warning(error 6003)
    581561 Newbie
    Currently Being Moderated
    every time the ckpt vary column is 1572035848
  • 7. Re: group commit warning(error 6003)
    ChrisJenkins Guru
    Currently Being Moderated
    Okay, it looks very much like these warnings are coming from the regular durable commits done by the repagent and are likely caused by heavy I/O contention ocurring when a checkpoint runs. Although the warnings are just warnings and are not in themselves a problem it sounds like your system is not optimally configured and that could lead to other probelms in the future. As previously mentioned, it is stringly recommended that the TimesTen datastore checkpoint and transaction log files be placed on seperate disks from each other and also sepaarte to other high activity files. Bo9th checkpointing and loggign are I/O intensive operations and, unless your disks are on a high-end RAID-0 disk array then a checkpoint is likely to overload the disk and impact performance.

    Just to correct a couple of other points as well:

    1. Commit does not need to get any locks on system tables etc. the group commit lock is a special lock only used by durable commits.

    2. The log buffer is not flushed only when it gets full. It is continually flushed by the flusher thread in the manging sub-daemon and in general should never become 'full'.

    Chris
  • 8. Re: group commit warning(error 6003)
    581561 Newbie
    Currently Being Moderated
    thank you, our dastastore files and log files are on different disks.
    and from the nmon result the IO is normal.
    May I ask, how often does the subdaemon flush logs ?

    Edited by: user578558 on 2009-1-14 上午12:29
  • 9. Re: group commit warning(error 6003)
    ChrisJenkins Guru
    Currently Being Moderated
    The sub-daemon flusher thread flushes log records continually. As records are posted into the log buffer by application transactions the flusher thread writes them to disk. if at any time there are no records to flush, the flusher goes to sleep on a semaphore. As soon as some records are placed in the buffer, the flusher is woken up and starts flushing again. So, flushing is continuous, not time interval based.

    What kind of disk hardware are you using (rotational speed, interface type etc.)? Are both disks connected to the same controller? Are there any other files on the log and checkpoint disks or are they dedicated devices? What O/S platform are you running on?

    Chris
  • 10. Re: group commit warning(error 6003)
    581561 Newbie
    Currently Being Moderated
    the system is SAM, DS8000, IBM p595
    they are many controllers, and many logic disks .
    they are dedicated devices
    AIX Version 5.3
    logs:
    PV COPIES IN BAND DISTRIBUTION
    vpath13 286:000:000 0% 161:000:000:125:000
    vpath42 286:000:000 0% 161:000:000:125:000
    vpath20 286:000:000 0% 161:000:000:125:000
    vpath49 286:000:000 0% 161:000:000:125:000
    vpath26 286:000:000 0% 161:000:000:125:000
    vpath55 286:000:000 0% 161:000:000:125:000
    vpath12 286:000:000 0% 161:000:000:125:000
    vpath41 286:000:000 0% 161:000:000:125:000
    vpath27 286:000:000 0% 161:000:000:125:000
    vpath56 286:000:000 0% 161:000:000:125:000
    vpath14 286:000:000 0% 161:000:000:125:000
    vpath43 286:000:000 0% 161:000:000:125:000
    vpath28 286:000:000 0% 161:000:000:125:000
    vpath57 286:000:000 0% 161:000:000:125:000

    chk:
    ttchk2:/ttchk2
    PV COPIES IN BAND DISTRIBUTION
    vpath27 172:000:000 0% 000:000:000:067:105
    vpath56 172:000:000 0% 000:000:000:067:105
    vpath20 172:000:000 0% 000:000:000:067:105
    vpath49 172:000:000 0% 000:000:000:067:105
    vpath26 172:000:000 0% 000:000:000:067:105
    vpath55 172:000:000 0% 000:000:000:067:105
    vpath13 172:000:000 0% 000:000:000:067:105
    vpath42 172:000:000 0% 000:000:000:067:105
    vpath12 172:000:000 0% 000:000:000:067:105
    vpath41 172:000:000 0% 000:000:000:067:105
    vpath14 172:000:000 0% 000:000:000:067:105
    vpath43 172:000:000 0% 000:000:000:067:105
    vpath28 172:000:000 0% 000:000:000:067:105
    vpath57 172:000:000 0% 000:000:000:067:105

    Edited by: user578558 on 2009-1-15 上午1:12

    Edited by: user578558 on 2009-1-15 上午1:17
  • 11. Re: group commit warning(error 6003)
    ChrisJenkins Guru
    Currently Being Moderated
    Thanks for the information, but it does not tell me a great deal in terms of how disks are arranged into RAID groups and how these map to filesystems on this, and maybe other, machines. If these warnings always coinciode with checkpoints then it seems like some I/O or CPU resource issues is the underlying cause (checkpoint operations can be both I/O and CPU intensive). To get to the bottom of this would likely require detailed investigation in conjunction with an AIX expert. If the only symptoms you are seeing is these warnings but otherwise everything is working fine then I would not worry about them and just ignore them. If you really feel you need to eliminate them then you should open an SR with Oracle support.

    Chris
  • 12. Re: group commit warning(error 6003)
    986081 Newbie
    Currently Being Moderated
    Hi Chris, All,

    One of our application processes got stuck when accessing timesten, the function back trace is as below:

    fe5cc8ac nanosleep (ffb87390, ffb87388)
    fc094f24 sbSleep (2a, 0, 0, 0, 0, 2) + 70
    fc1c6218 sbSpinLatchGet (e4403390, 28c5320, 0, 7, fc92373c, 70) + 1a8
    fc1e235c sbLogBufInsertFast (ffb8781c, 28c63b8, 0, 1, 0, 0) + 554
    fc204a98 sbLogRecInsert (e440ce98, 8d4598, 0, 1, 1c, e440cea0) + 4b8
    fc203e5c sbLogWrite (8, 8d4598, 0, 0, 28c5320, e4c4e928) + 12c
    fc278dec sbTblStorLTupDelete (e5eaae50, 1, 0, 90bb78, e440ce0c, e48c) + 5fc
    fc288edc sbTblDelete (e5eaae50, 16, 0, 8d47c8, 0, 1) + 2f0
    fc3a0298 sbEeDeleteEval (fa0c93a0, f9e4c150, fa0c9410, 28c5320, fc917a2c, 0) + 478
    fc38b9f4 sbEeStmtEval (ecb0, f9e4c150, 28c5320, ef88, ef8c, ef90) + 1b4
    fc7a7d0c sbSqlCmdExec (0, 14017550, 0, 0, 142947a0, 2cdd238) + 338
    fc0f5020 sb_sqlExec (2cdbf60, 2cdd238, 0, ffb890c4, 0, 0) + 348
    fc070140 cxlSqlExecute (2cdb5b8, 1, 12c4, 2cde1e8, fc917a2c, 0) + 54
    fc03f3c8 stmtExecute (2cdb5b8, ffffdffb, 16, 0, 8000, c) + 1d0
    fc035f20 SQLExecute (2cdb5b8, 4c, 0, fc035efc, 27438d8, 0) + 24
    ...

    The process stuck there for 20 seconds and finally was killed by monitor process.
    Before the process was killed, in ttmesglog there were 5 occurrence of the 6003 error.

    03:39:28.87 Timesten error 6003
    03:39:44.65 Timesten error 6003
    03:39:59.53 Timesten error 6003
    03:40:11.39 Timesten error 6003
    03:40:13.20 Timesten error 6003
    03:40:58.25 Timesten error 6003
    03:41:23 the application process was detected that it has been stuck at nanosleep/sbSleep/sbSpinLatchGet() for 20s and was killed.

    There are two processes accessing the DB, and the other one was killed later.

    Could anyone help me out? what's the possible reasons and how to avoid it happening again?

    Thanks,
    Jack.Guo

    Edited by: 983078 on Jan 21, 2013 2:00 AM
  • 13. Re: group commit warning(error 6003)
    ChrisJenkins Guru
    Currently Being Moderated
    The thread in the process that was killed was blocked waiting to post a log record into the in-memory log buffer. It's hard to be definitive with just the pstack but the associated group commit timeout warnings in the message log suggest that you have a logging bottleneck. TimesTen is not able to flush the log buffer to disk fast enough to make space for new records. Since posting the log record is a 'must do' operation and generally should complete very quickly it is not valid to 'give up after a timeout'.

    What are your DSN settings for this database? What kind of disk storage are the transaction logs located on and is that storage shared with anything else (such as TT checkpoint files)? Do you observe any non-zero/increasing values in SYS.MONITOR.LOG_BUFFER_WAITS? What are the details of the hardware (#CPUs, type pf CPUs) and workload?

    Chris
  • 14. Re: group commit warning(error 6003)
    986081 Newbie
    Currently Being Moderated
    Hi Chris,

    Thanks for your quick response.

    Our system is running on Netra-T2000. 7 application service instances share the hardware with the support of virtualization. Just before the process was killed, we had cron jobs to clean the application log files, which are located on the same disk of the DB log files. We were running stress testing when the issue was fired, so the system was quite busy at that time -- average cpu usage was over 90%. We run similar test in each weekend, and we clean the log files at each mid-night, but we never hit such issue (application process got stuck for a long time) before, though we occasionally had 6003 errors in ttmesglog.

    Here is the DB settings (direct DB access):
    [ApplicationNameDataDir]
    Driver=/opt/TimesTen/spatial/lib/libtten.so
    DataStore=/CompanyName/db/ApplicationNameData
    DurableCommits=0
    ExclAccess=0
    LockLevel=0
    PermSize=320
    TempSize=32
    ThreadSafe=1
    WaitForConnect=1
    Preallocate=1
    Authenticate=1
    UID=spatial
    PWD=spatial1
    LogBuffSize=16384
    CkptFrequency=60
    Connections=128
    LogFileSize=8
    DatabaseCharacterSet=TIMESTEN8
    TypeMode=1


    And the ttstatus output:

    TimesTen status report as of Tue Jan 22 03:13:13 2013

    Daemon pid 7726 port 17000 instance xxxxxx
    TimesTen server pid 7731 started on port 17002
    No TimesTen webserver running

    ------------------------------------------------------------------------
    Data store /xxxxxx/CompanyName/db/ApplicationNameData
    There are 20 connections to the data store
    Data store is in shared mode
    Shared Memory KEY 0x0400b436 ID 1476395024
    Type PID Context Connection Name ConnID
    Process 10436 0x01aa9140 APP1 12
    Process 11375 0x00cd8398 APP2 13
    ...
    Process 9706 0x0345bbe0 APP6 9
    Process 9706 0x0ad6f738 APP6 10
    Process 9706 0x0afeec98 APP6 11
    Replication 8509 0x00164100 APP7 4
    Replication 8509 0x00183270 APP8 6
    ...
    Subdaemon 7727 0x000718b8 Worker 2042
    Subdaemon 7727 0x000e25b0 Flusher 2043
    Subdaemon 7727 0x000f1710 Checkpoint 2047
    ...
    Replication policy : Always
    Replication agent is running.
    Cache Agent policy : Manual
    ------------------------------------------------------------------------
    End of report


    Regards,
    Jack
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points