This discussion is archived
1 2 Previous Next 19 Replies Latest reply: Feb 17, 2012 6:00 AM by __OUTSIDER___ Go to original post RSS
  • 15. Re: Problem with ASM diskgroup
    __OUTSIDER___ Newbie
    Currently Being Moderated
    Levi Pereira wrote:
    Hi,
    and last question. How do you think is this problem with time sync is bug ? What will happen if my time sync server will be unavailable is this reason that my node should have any problems ? or maybe I was wrong to understand how works Cluster Time Synchronization Service?
    CTSSD is only designed to work with mainstream time synchronization software solutions like NTP.
    Check if CTSSD is running in ACTIVE or OBSERVER mode.

    Also try this note:
    CTSSD Runs in Active Mode When Non-Mainstream Vendor Time Sync Software is Running [ID 1077289.1]

    *CTSSD Runs in Observer Mode Even Though No Time Sync Software is Running [ID 1054006.1]*
    In both nodes "The Cluster Time Synchronization Service is in Active mode".
    Unfortunately I still don't have access to MOS. (still waiting)

    So finality do you have any ideas about this situation ?
    Is this situation happened because I added the new diskgroup when my second node was powered off ?

    or

    this is only because of data and time ?
  • 16. Re: Problem with ASM diskgroup
    Levi-Pereira Guru
    Currently Being Moderated
    Hi,
    So finality do you have any ideas about this situation ?
    Is this situation happened because I added the new diskgroup when my second node was powered off ?
    or

    this is only because of data and time ?
    Yes .. I have idea about it.

    Once I was doing tests with ORACLE Clusterware on VMWARE. When I restarted the VM the same situation happens.. date and time was wrong and clusterware fails at startup.
    The reason behind was the VM take old configuration(date) and not synchronized time of VM-RAC with Server Host where VM was, due option of time syncronization was disabled.
    CTSS does not work if the difference between the local time and the time on the reference node was too much, so the error "CRS-02411" will be raised.
    CRS-02411: The Cluster Time Synchronization Service will take a long time to perform time synchronization as local time is significantly different from mean cluster time.
    Cause: The difference between the local time and the time on the reference node was too much to be synchronized in a short period.
    Action: (Optional) Shutdown and restart the Oracle Clusterware on this node to instantaneously synchronize the time with the reference node.
    As you are using Oracle VM you should check how Oracle VM time Synchronization is configured.

    Regards,
    Levi Pereira
  • 17. Re: Problem with ASM diskgroup
    __OUTSIDER___ Newbie
    Currently Being Moderated
    Levi Pereira wrote:
    Yes .. I have idea about it.

    Once I was doing tests with ORACLE Clusterware on VMWARE. When I restarted the VM the same situation happens.. date and time was wrong and clusterware fails at startup.
    The reason behind was the VM take old configuration(date) and not synchronized time of VM-RAC with Server Host where VM was, due option of time syncronization was disabled.
    CTSS does not work if the difference between the local time and the time on the reference node was too much, so the error "CRS-02411" will be raised.
    CRS-02411: The Cluster Time Synchronization Service will take a long time to perform time synchronization as local time is significantly different from mean cluster time.
    Cause: The difference between the local time and the time on the reference node was too much to be synchronized in a short period.
    Action: (Optional) Shutdown and restart the Oracle Clusterware on this node to instantaneously synchronize the time with the reference node.
    As you are using Oracle VM you should check how Oracle VM time Synchronization is configured.

    Regards,
    Levi Pereira
    Yes you are right, I was thought so. The OVM time is old because this is new server and
    I forget configure ntp in this server.

    Thank for help again Levi and if you know can you suggest me some article or documentation
    about how to add new node to existing RAC environment ?
  • 18. CTSSD not working as expected !!
    Levi-Pereira Guru
    Currently Being Moderated
    Hi Tiran,

    I was not satisfied with my conclusion on this issue. For that reason I'm changing my anwser..
    After a deep analysis I discovered some issue that have not been noted:
    CTSSD not working as I expected!! Maybe a bug or I'm missing something. I tested on Clusterware 11.2.0.3.


    Look this:

    I have 3 nodes ( lnxora01,lnxora02,lnxora03)

    Test executed on lnxora03:
    $ crsctl query crs activeversion
    Oracle Clusterware active version on the cluster is [11.2.0.3.0]
    $  crsctl query crs releaseversion
    Oracle High Availability Services release version on the local node is [11.2.0.3.0]
    I wil adjust date and time from "2012-02-16 17:26" to "2010-01-01 00:00" and let us see the behavior of the cluster.
    $ date
    Thu Feb 16 17:26:16 BRST 2012
    $ sudo date +%Y%m%d -s "20100101"
    20100101
    
    
    $ date
    Fri Jan  1 00:00:00 BRST 2010
    CTTS detect the difference date/time between nodes of cluster.
    cat /u01/app/11.2.0/grid_1/log/lnxora03/alertlnxora03.log
    [ctssd(7455)]CRS-2404:The Cluster Time Synchronization Service detects that the local time is significantly different from the mean cluster time. Details in /u01/app/11.2.0/grid_1/log/lnxora03/ctssd/octssd.log.
    But ...CTTS is not able to adjust date/time because magnitude of the systime diff is larger than max adjtime limit
    cat /u01/app/11.2.0/grid_1/log/lnxora03/ctssd/octssd.log
    2012-02-16 17:26:15.155: [    CTSS][1107089728]ctssslave_msg_handler4_3: slave_sync_with_master finished sync process. Exiting clsctssslave_msg_handler
    2010-01-01 00:00:06.933: [    CTSS][1111292224]ctsselect_msm: CTSS mode is [0xc4]
    2010-01-01 00:00:06.933: [    CTSS][1111292224]ctssslave_swm1_2: Ready to initiate new time sync process.
    2010-01-01 00:00:06.934: [    CTSS][1111292224]ctssslave_swm2_1: Waiting for time sync message from master. sync_state[2].
    2010-01-01 00:00:06.936: [    CTSS][1107089728]ctsscomm_recv_cb2: Receive incoming message event. Msgtype [2].
    2010-01-01 00:00:06.936: [    CTSS][1107089728]ctssslave_msg_handler4_1: Waiting for slave_sync_with_master to finish sync process. sync_state[3].
    2010-01-01 00:00:06.936: [    CTSS][1111292224]ctssslave_swm2_3: Received time sync message from master.
    *2010-01-01 00:00:06.936: [    CTSS][1111292224]ctssslave_swm: The magnitude [67109176224199] of the offset [-67109176224199 usec] is larger than [86400000000 usec] sec which is the CTSS limit.*
    *2010-01-01 00:00:06.937: [    CTSS][1111292224]ctssslave_swm15: The CTSS master is ahead this node. The local time offset [131071 usec] is being adjusted. Sync method [2]*
    After 10 minutes waiting ... nothing happens.. date/time still wrong and octssd...keep logging..
    $ date
    Fri Jan  1 00:10:06 BRST 2010
    
    
    2010-01-01 00:10:39.790: [    CTSS][1111292224]ctssslave_swm: The magnitude [67109175600524] of the offset [-67109175600524 usec] is larger than [86400000000 usec] sec which is the CTSS limit.
    2010-01-01 00:10:39.790: [    CTSS][1111292224]ctssslave_swm: The magnitude of the systime diff is larger than max adjtime limit. Offset [-67109175600524] usec will be changed to max adjtime limit [+/- 131071].
    2010-01-01 00:10:39.790: [    CTSS][1111292224]ctssslave_swm15: The CTSS master is ahead this node. The local time offset [131071 usec] is being adjusted. Sync method [2]
    2010-01-01 00:10:39.790: [    CTSS][1111292224]ctssslave_swm17: LT [1262311839sec 788957usec], MT [1329421015sec 389481usec], Delta [2056usec]
    2010-01-01 00:10:39.790: [    CTSS][1111292224]ctssslave_swm19: The offset is [-131071 usec] and sync interval set to [4]
    2010-01-01 00:10:39.790: [    CTSS][1111292224]ctssslave_swm: Received from master (mode [0xcc] nodenum [1] hostname [lnxora01] )
    2010-01-01 00:10:39.790: [    CTSS][1111292224]ctsselect_msm: Sync interval returned in [4]
    2010-01-01 00:10:39.790: [    CTSS][1107089728]ctssslave_msg_handler4_3: slave_sync_with_master finished sync process. Exiting clsctssslave_msg_handler
    The documentation said:

    CRS-02411: The Cluster Time Synchronization Service will take a long time to perform time synchronization as local time is significantly different from mean cluster time.
    Cause: The difference between the local time and the time on the reference node was too much to be synchronized in a short period.
    Action: (Optional) Shutdown and restart the Oracle Clusterware on this node to instantaneously synchronize the time with the reference node.

    So, If I restart Oracle Clusterware, will adjust the date/time with the reference node.

    Stopping Clusterware
    $ sudo crsctl stop crs
    CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'lnxora03'
    CRS-2673: Attempting to stop 'ora.crsd' on 'lnxora03'
    .
    .
    .
    CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'lnxora03' has completed
    CRS-4133: Oracle High Availability Services has been stopped.
    The date/time still wrong
    $date
    Fri Jan  1 00:16:14 BRST 2010
    Let us start clusterware
    $ sudo crsctl start crs
    
    cat /u01/app/11.2.0/grid_1/log/lnxora03/alertlnxora03.log
    
    2010-01-01 00:17:06.152
    [ohasd(9712)]CRS-2112:The OLR service started on node lnxora03.
    2010-01-01 00:17:06.173
    [ohasd(9712)]CRS-1301:Oracle High Availability Service started on node lnxora03.
    2010-01-01 00:17:06.176
    The CSSD (Voting Files) know wich date/time is wrong
    [ohasd(9712)]CRS-8013:reboot advisory message text: Rebooting after limit 27740 exceeded; disk timeout 0, network timeout 27740, last heartbeat from CSSD at epoch seconds 1323619546.873, 27750 milliseconds ago based on invariant clock value of 3502334
    I expect which date/time will be adjusted, but this not happens, the CTSSD fails and abort. (I believe we have a bug here).
    The clusterware can neither start... (What happened in your case..too)
    [ctssd(9985)]CRS-2401:The Cluster Time Synchronization Service started on host lnxora03.
    2010-01-01 00:17:45.252
    [ctssd(9985)]CRS-2404:The Cluster Time Synchronization Service detects that the local time is significantly different from the mean cluster time. Details in /u01/app/11.2.0/grid_1/log/lnxora03/ctssd/octssd.log.
    2010-01-01 00:17:45.898
    [ctssd(9985)]CRS-2402:The Cluster Time Synchronization Service aborted on host lnxora03. Details at  in /u01/app/11.2.0/grid_1/log/lnxora03/ctssd/octssd.log.
    2010-01-01 00:17:47.016
    [ctssd(10006)]CRS-2405:The Cluster Time Synchronization Service on host lnxora03 is shutdown by user
    
    [ctssd(10198)]CRS-2404:The Cluster Time Synchronization Service detects that the local time is significantly different from the mean cluster time. Details in /u01/app/11.2.0/grid_1/log/lnxora03/ctssd/octssd.log.
    2010-01-01 00:19:04.943
    [ctssd(10198)]CRS-2402:The Cluster Time Synchronization Service aborted on host lnxora03. Details at  in /u01/app/11.2.0/grid_1/log/lnxora03/ctssd/octssd.log.
    2010-01-01 00:19:05.962
    [ohasd(9712)]CRS-2807:Resource 'ora.asm' failed to start automatically.
    2010-01-01 00:19:05.962
    [ohasd(9712)]CRS-2807:Resource 'ora.crsd' failed to start automatically.
    2010-01-01 00:19:05.962
    [ohasd(9712)]CRS-2807:Resource 'ora.ctssd' failed to start automatically.
    2010-01-01 00:19:05.962
    [ohasd(9712)]CRS-2807:Resource 'ora.evmd' failed to start automatically.
    The CTSSD keep logging on octssd.... Time offset is too much to be corrected
    cat /u01/app/11.2.0/grid_1/log/lnxora03/ctssd/octssd.log
    
    2010-01-01 00:17:38.939: [    CTSS][2780695840]Oracle Database CTSS Release 11.2.0.3.0 Production Copyright 2006, 2011 Oracle.  All rights reserved.
    2010-01-01 00:17:38.939: [    CTSS][2780695840]ctss_main: The Cluster Time Synchronization Service is started with option [reboot].
    2010-01-01 00:18:56.170: [    CTSS][1110415680]ctssslave_swm: The magnitude [67109175205299 usec] of the offset [-67109175205299 usec] is larger than [86400000000 usec] sec which is the CTSS limit.
    2010-01-01 00:18:56.171: [    CTSS][1110415680]ctsselect_monitor_steysync_mode: Failed in clsctssslave_sync_with_master [12]: Time offset is too much to be corrected
    2010-01-01 00:18:56.171: [    CTSS][1104111936]ctssslave_msg_handler4_3: slave_sync_with_master finished sync process. Exiting clsctssslave_msg_handler
    Solution I corrected the date time manually and started my cluster

    Question (to Oracle) What's maximum time difference that CTSS can correct?


    Another situation... I will advance the clock in few seconds and see what happens...

    My current time:
    $ dsh date
    lnxora01:
    Thu Feb 16 18:07:09 BRST 2012
    lnxora02:
    Thu Feb 16 18:07:09 BRST 2012
    lnxora03:
    Thu Feb 16 18:07:09 BRST 2012
    After advance the clock on node lnxora03:
    $ dsh date
    lnxora01:
    Thu Feb 16 18:17:56 BRST 2012
    lnxora02:
    Thu Feb 16 18:17:56 BRST 2012
    lnxora03:
    Thu Feb 16 18:18:55 BRST 2012
    To my suppressed, the CTSS do not adjusted the clock as I expected, because The magnitude of the systime diff is larger than max adjtime limit.
    2012-02-16 18:19:50.605: [    CTSS][1111132480]ctssslave_swm: The magnitude [59148443] of the offset [59148443 usec] is larger than [10000000 usec] whic     h is the CTSS limit that can be synced up in a short period.
    2012-02-16 18:19:50.605: [    CTSS][1111132480]ctssslave_swm: The magnitude of the systime diff is larger than max adjtime limit. Offset [59148443] usec      will be changed to max adjtime limit [+/- 131071].
    2012-02-16 18:19:50.605: [    CTSS][1111132480]ctssslave_swm15: The CTSS master is behind this node. The local time offset [-131071 usec] is being adjus     ted. Sync method [2]
    2012-02-16 18:19:50.605: [    CTSS][1111132480]ctssslave_swm17: LT [1329423590sec 605432usec], MT [1329423531sec 456989usec], Delta [2296usec]
    2012-02-16 18:19:50.605: [    CTSS][1111132480]ctssslave_swm19: The offset is [131071 usec] and sync interval set to [4]
    2012-02-16 18:19:50.605: [    CTSS][1111132480]ctssslave_swm: Received from master (mode [0xcc] nodenum [1] hostname [alemanha] )
    2012-02-16 18:19:50.605: [    CTSS][1111132480]ctsselect_msm: Sync interval returned in [4]
    2012-02-16 18:19:50.605: [    CTSS][1106929984]ctssslave_msg_handler4_3: slave_sync_with_master finished sync process. Exiting clsctssslave_msg_handler
    So, let us restart the Clusterware and see what happen...

    $ sudo crsctl stop crs
    CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'lnxora03'
    CRS-2673: Attempting to stop 'ora.crsd' on 'lnxora03'
    Nice.... Oracle adjusted the clock at startup of Clusterware...
    ####################
    lnxora01:
    Thu Feb 16 18:24:22 BRST 2012
    lnxora02:
    Thu Feb 16 18:24:22 BRST 2012
    lnxora03:
    Thu Feb 16 18:25:21 BRST 2012
    ####################
    lnxora01:
    Thu Feb 16 18:24:23 BRST 2012
    lnxora02:
    Thu Feb 16 18:24:23 BRST 2012
    lnxora03:
    Thu Feb 16 18:24:23 BRST 2012
    ####################
    
    
    2012-02-16 18:25:21.663: [    CTSS][1116731712]ctssslave_swm2_3: Received time sync message from master.
    2012-02-16 18:25:21.663: [    CTSS][1116731712]clsctssslave.c:679:CLSCTSS_CVAL_SET: &cval[12164e7c] prev_value[0] new_value[2408] start_tm [0] curr_tm [3484934] period [1800000].
    2012-02-16 18:24:22.745: [    CTSS][1116731712]ctssslave_swm15: The CTSS master is behind this node. The local time offset [-58918430 usec] is being adjusted. Sync method [1]
    2012-02-16 18:24:22.745: [    CTSS][1116731712]ctssslave_swm17: LT [1329423921sec 663301usec], MT [1329423862sec 744871usec], Delta [2186usec]
    2012-02-16 18:24:22.745: [    CTSS][1116731712]ctssslave_swm19: The offset is [58918430 usec] and sync interval set to [1]
    2012-02-16 18:24:22.745: [    CTSS][1116731712]ctssslave_swm: Received from master (mode [0xcc] nodenum [1] hostname [lnxora01] )
    2012-02-16 18:24:22.745: [    CTSS][1090578752]ctssslave_msg_handler4_3: slave_sync_with_master finished sync process. Exiting clsctssslave_msg_handler
    So, Why Oracle adjusted the clock only at startup of clusterware...Is it a rule?

    My conclusion:

    Never trust the CTSS if you need precision time synchronization.
    The fact of the CTSS be enabled in ACTIVE mode does not mean that the time will be synchronized between servers ....There are a number of variables without explanation ...This need to be documented for a better understanding.

    Documentation said:
    Cluster Time Synchronization Service

    Cluster node times should be synchronized. With this release, Oracle Clusterware provides Cluster Time Synchronization Service (CTSS), which ensures that there is a synchronization service in the cluster. If Network Time Protocol (NTP) is not found during cluster configuration, then CTSS is configured to ensure time synchronization.

    http://docs.oracle.com/cd/E11882_01/install.112/e22489/whatsnew.htm#CWLIN123

    Cool ... but how far the CTSS will ensure time synchronization.Because this is not what I saw.

    What means "max adjtime limit" or "adjtime"? I believe it's acronyms to "adjust time" .. .. there is default values to adjtime?


    Another point:
    On first test case changing date/time from 2012 to 2010, I expected clusterware adjust date or reboot that node...but the node still running with all resources online... that scared.


    Tip: The problem was not the ASM but CTSSD, so the title of this thread should be "Clusterware not starting due wrong DATE/TIME and CTSSD not working" instead "Problem with ASM diskgroup"


    Please Ignore my questions in this reply .. It's just a relief, which should be open an SR.

    Hope it help others....

    Regards,
    Levi Pereira
  • 19. Re: CTSSD not working as expected !!
    __OUTSIDER___ Newbie
    Currently Being Moderated
    Hi Levi,

    I read this two times, its really amazing.
    Congratulation! this is great investigation thanks a lot and thanks for your time.
    So now everything is clear with situation and I think Oracle will fix this
    soon (I hope).

    It is pleasure deal with you !
1 2 Previous Next

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points