8 Replies Latest reply: Oct 14, 2013 11:44 PM by tien86 RSS

    Failed to open Inter-Controller Communication Channels! Stk6000

    tien86

      Hi pros,

       

      I have a problem while replacing a new controller. My storage 6180 have 2 controller A ( OK ) - B ( failed ). I power off storage than replacing controller B by a new controller.

       

      Now, i cannot access to controller A through serial console. F/W controller A: 7.60.x.x

       

      I can access to controller B through serial console and CAM which show all drive is incompatible and unassigned, no volumes, no virtual disk. F/W controller B: 7.80.x.x

       

      Controller B message shows  Failed to open Inter-Controller Communication Channels!

       

      I searched google and MOS but still have not find any solution for this problem. I will appreciate for any helps.

       

       

      This is log file from serial controller B

      09/22/02-12:03:37 (tRAID): WARN:  Failed to open Inter-Controller Communication Channels!

      09/22/02-12:03:37 (tRAID): NOTE:  LockMgr Role is Master

      09/22/02-12:03:38 (utlTimer): NOTE:  fcnChannelReport ==>  0  1

      09/22/02-12:03:40 (tHckReset): NOTE:  HealthCheck: Alt Ctl: 1 Reset_Failure, state: 0 Start

      0x8f9084 (tNetCfgInit): miiPhyInit check cable connection

      09/22/02-12:03:49 (tNetCfgInit): NOTE:  eth1: LinkDown event

      09/22/02-12:03:49 (tNetCfgInit): NOTE:  Network Ready

      09/22/02-12:04:07 (tRAID): NOTE:  WWN baseName 00040080-e5185e0e (valid==>SigMatch)

      09/22/02-12:04:07 (tRAID): NOTE:  spmEarlyData: No data available

      09/22/02-12:04:08 (tRAID): SOD: Pre-Initialization Phase Complete

      09/22/02-12:04:09 (utlTimer): NOTE:  fcnChannelReport ==>  0  1 ~2 -3

      09/22/02-12:04:14 (utlTimer): NOTE:  fcnChannelReport ==>  0  1 ~2 =3

      09/22/02-12:04:39 (utlTimer): NOTE:  fcnChannelReport ==>  0  1  2 =3

      09/22/02-12:04:40 (tHckReset): NOTE:  HealthCheck: Alt Ctl: 4 Norun_Failure, state: 0 Start

      09/22/02-12:04:40 (tHckReset): NOTE:  HealthCheckManager: Notify Event 6 Ctl_Not_Running

      09/22/02-12:04:40 (cmgrEvent): WARN:  Alt Ctl Reboot:

                                      Reboot CompID: 0x407

                                      Reboot reason: 0x6

                                      Reboot reason extra: 0x0

      09/22/02-12:04:40 (cmgrEvent): NOTE:  holding alt ctl in reset

      09/22/02-12:04:40 (cmgrEvent): NOTE:  HealthCheck: Alt Ctl: 1 Reset_Failure, state: 0 Start

      09/22/02-12:04:42 (utlTimer): NOTE:  fcnChannelReport ==> +0  1  2 =3

      09/22/02-12:04:43 (utlTimer): NOTE:  fcnChannelReport ==>  0  1  2 =3

      09/22/02-12:04:46 (tRAID): WARN:  pcm::CapabilityManager::validateAdoption exception  (DbmWriteException: recType: 85, status: -12)

      09/22/02-12:04:46 (tRAID): WARN:  dbm::RWFileSystem::initialize: Exception caught, ConstructorIOException: -16

      09/22/02-12:04:46 (tRAID): ERROR: In PersistenceManager::initialize: catch DbmNoFileSystemException: recType: 84

      09/22/02-12:04:46 (tRAID): ERROR: ADM Load Reservations failed with error (5) Exception

      09/22/02-12:04:46 (tRAID): NOTE:  ACS: Icon ping to alternate failed: -2, resp: 0

      09/22/02-12:04:46 (tRAID): NOTE:  ACS: autoCodeSync(): Process start. Comm Mode: 0, Status: 0

      09/22/02-12:04:46 (tRAID): WARN:  ACS: autoCodeSync(): Skipped since alt not communicating.

      09/22/02-12:04:46 (tRAID): WARN:  DbmNoFileSystemException: recType: 59 Line 1819 File cmgrControllerMgr.cc

      09/22/02-12:04:46 (tRAID): WARN:  DbmNoFileSystemException: recType: 59 Line 1819 File cmgrControllerMgr.cc

      09/22/02-12:04:46 (tRAID): SOD: Code Synchronization Initialization Phase Complete

      09/22/02-12:04:46 (sntpEvent): NOTE:  sntpEventHandler: VNI_GET_SYNC_TIME failed

      09/22/02-12:04:47 (tRAID): WARN:  USM  Exception caught in processUsmHeader - DbmNoFileSystemException: recType: 30

      09/22/02-12:04:47 (tRAID): WARN:  USM  Error allocating UsmStableStorageHeader in processUsmHeader() - DbmNoFileSystemException: recType: 30

      09/22/02-12:04:47 (tRAID): NOTE:  SOD failure in evf::VolumeCfgManager::initialize

      09/22/02-12:04:47 (tRAID): NOTE:  DbmNoFileSystemException in evf::VolExtentMgr::initialize

      09/22/02-12:04:47 (tRAID): NOTE:  DbmNoFileSystemException in safe::initialize

      09/22/02-12:04:47 (tRAID): ERROR: Caught DbmNoFileSystemException: recType: 95 in initialize

      09/22/02-12:04:47 (tRAID): WARN:  edrSOD: No Config File System

      09/22/02-12:04:48 (tRAID): WARN:  snrProcessDatabase: No File System Found

      09/22/02-12:04:48 (tRAID): WARN:  spm::SPMManager::initialize NoFileSystem

      09/22/02-12:04:48 (tRAID): NOTE:  fcn: Peering Disabled (Alt Unavailable)

      09/22/02-12:04:48 (tRAID): NOTE:  ion: Peering Disabled (Alt Unavailable)

      09/22/02-12:04:48 (tRAID): WARN:  Error in updating in memory dq cfg!

      09/22/02-12:04:48 (tRAID): WARN:  Caught DbmNoFileSystemException: recType: 65 in fbm::initialize

      09/22/02-12:04:48 (tRAID): WARN:  MelMgr::initialize FAILED:  Exception  Line 2197 File mlmMelMgr.cc

      09/22/02-12:04:48 (tRAID): WARN:  readDatabaseSyslogConfig:  Caught Exception DbmNoFileSystemException: recType: 74 Line 420 File mlmSyslogMgr.cc

      09/22/02-12:04:48 (tRAID): WARN:  Syslog database access failed:  Exception at Line 324 File mlmSyslogMgr.cc

      09/22/02-12:04:49 (tRAID): WARN:  Unable to intialize mirror device

      09/22/02-12:04:49 (tRAID): NOTE:  CacheMgr::cacheOpenMirrorDevice:: mirror device 0xfffffff

      09/22/02-12:04:49 (tRAID): WARN:  CacheStore::read(,): Exception DbmNoFileSystemException: recType: 24

      09/22/02-12:04:49 (tRAID): WARN:  CCM: readAndValidateCacheStore() caught CacheStoreDataException

      09/22/02-12:04:49 (tRAID): NOTE:  CCM: readAndValidateCacheStore() recovering

      09/22/02-12:04:49 (tRAID): NOTE:  CCM: readAndValidateCacheStore() partitioning for no mirroring

      09/22/02-12:04:49 (tRAID): WARN:  CacheStoreExt::read(,): Exception DbmNoFileSystemException: recType: 82

      09/22/02-12:04:49 (tRAID): WARN:  CCM: readAndValidateCacheStore() cacheStoreExt read exception

      09/22/02-12:04:49 (tRAID): NOTE:  CCM: readAndValidateCacheStore() initializes with default values

      09/22/02-12:04:49 (tRAID): NOTE:  CCM:  Changing default demand cache flush values

      09/22/02-12:04:49 (tRAID): NOTE:  CCM: validateCacheMem() cache memory is invalid

      09/22/02-12:04:49 (tRAID): NOTE:  CCM: validateCacheMem() Initializing my partition

      09/22/02-12:04:50 (tRAID): NOTE:  CacheStore::write(): Exception DbmNoFileSystemException: recType: 24

      09/22/02-12:04:50 (tRAID): WARN:  CCM: initialize() caught exception(2) CacheStoreDataException

      09/22/02-12:04:50 (tRAID): NOTE:  CCM: sodClearMOSIntentsAlt(), failure clearing MOS intents on alt

      09/22/02-12:04:52 (tRAID): WARN:  Exception caught in presMgrInit - DbmNoFileSystemException: recType: 12

      09/22/02-12:04:52 (tRAID): NOTE:  DbmNoFileSystemException in vdm::CrushDriveManager::initialize

      09/22/02-12:04:52 (tRAID): NOTE:  DbmNoFileSystemException in vdm::DriveManager::initialize

      09/22/02-12:04:52 (tRAID): NOTE:  DbmNoFileSystemException in vdm::VolumeGroupManager::initialize

      09/22/02-12:04:52 (tRAID): NOTE:  DbmNoFileSystemException in vdm::DacstoreVolManager::initialize

      09/22/02-12:04:52 (tRAID): NOTE:  DbmNoFileSystemException in vdm::PieceManager::initialize

      09/22/02-12:04:52 (tRAID): NOTE:  DbmNoFileSystemException in vdm::CrushStripeManager::initialize

      09/22/02-12:04:52 (tRAID): NOTE:  DbmNoFileSystemException in vdm::exop::ExclOpManager::initialize

      09/22/02-12:04:52 (tRAID): WARN:  CCM: initComplete() - isRestoreInProgressAlt caught IconSendInfeasibleException Error

      09/22/02-12:04:52 (tRAID): NOTE:  CacheStore::write(): Exception DbmNoFileSystemException: recType: 24

      09/22/02-12:04:52 (tRAID): WARN:  CCM: initComplete() caught(2) CacheStoreDataException

      09/22/02-12:04:53 (tRAID): NOTE:  DiagVolManager::initialize: Exception - Alt controller not ready

      09/22/02-12:04:53 (tRAID): ERROR: dbm: SubRecInterface::save caught DbmNoFileSystemException: recType: 58 - recType: 58, txn: 0x03b0de7c

      09/22/02-12:04:53 (tRAID): ERROR: dbm: SubRecInterface::save caught DbmNoFileSystemException: recType: 58 - recType: 58, txn: 0x03b0de7c

      09/22/02-12:04:53 (tRAID): WARN:  Caught DbmNoFileSystemException: recType: 57 in sam::StorageArrayManager::initialize

      09/22/02-12:04:55 (tRAID): SOD: Initialization Phase Complete

      09/22/02-12:04:53 (ProcessHandlers): WARN:  Exception caught in WWNStorage iosReleased: DbmNoFileSystemException: recType: 9

      09/22/02-12:04:53 (ProcessHandlers): WARN:  dbm::WWNStorage::iosReleased() FAILED.

      ==============================================

      Title:     Disk Array Controller

                 Copyright 2008-2011 LSI Logic Corporation, All Rights Reserved.

       

      Name:      RC

      Version:   07.80.51.10

      Date:      10/03/2011

      Time:      15:49:14 CDT

      Models:    4980 4981 4985 4988

      Manager:   devmgr.v1080api06.Manager

      ==============================================

       

      09/22/02-12:04:56 (tRAID): sodMain Normal sequence finished, elapsed time = 112 seconds

      09/22/02-12:04:56 (tRAID): sodMain complete

      09/22/02-12:05:06 (ProcessHandlers): WARN:  Drive 0x10000 will be marked incompatible as DSM detected error: 12

      09/22/02-12:05:06 (ProcessHandlers): ERROR: dbm: SubRecInterface::read caught DbmNoFileSystemException: recType: 84 - recType: 84, txn: 0x03b0df9c

      09/22/02-12:05:08 (utlTimer): WARN:  Extended Link Down Timeout on channel 3

      09/22/02-12:05:10 (ProcessHandlers): WARN:  DbmNoFileSystemException: recType: 35 Line 6404 File cmgrControllerMgr.cc

      09/22/02-12:05:10 (ProcessHandlers): WARN:  DbmNoFileSystemException: recType: 35 Line 5444 File cmgrControllerMgr.cc

      09/22/02-12:05:10 (ProcessHandlers): WARN:  CCM: backupStorageAvailableAlt() caught IconSendInfeasibleException Error

      09/22/02-12:05:10 (ProcessHandlers): NOTE:  SYMbol available

      09/22/02-12:05:12 (ProcessHandlers): SOD: sodComplete Notification Complete

      09/22/02-12:05:10 (PersistentRestore): NOTE:  PSTOR: PstorRecordManager::readRecord data block not found

      09/22/02-12:05:10 (PersistentRestore): WARN:  IOManager::readBackupStatus - Pstor record does not exsit

      09/22/02-12:05:10 (PersistentRestore): NOTE:  IOManager::getBackupDataSize - read to pstor failed

      09/22/02-12:05:10 (PersistentRestore): WARN:  ddcDq & ddcTrace restore abandoned: nothing to recover

      09/22/02-12:05:10 (PersistentRestore): WARN:  PSTOR: PstorRecordMgr: removeRecord failed

      09/22/02-12:05:10 (PersistentRestore): WARN:  IOManager::readBackupStatus - Pstor record does not exist

      09/22/02-12:05:10 (PersistentRestore): NOTE:  DDC Restore Failed

       

      09/22/02-12:05:10 (PersistentRestore): NOTE:  IOManager::restoreData - m_DataSize:0x500000, m_StartAddress:0x3b0e320

      09/22/02-12:05:10 (PersistentRestore): NOTE:  IOManager::restoreData - Successful

      09/22/02-12:05:10 (PersistentRestore): NOTE:  ncb::IOManager::restoreData - Successful

      09/22/02-12:05:10 (PersistentRestore): NOTE:  DQ Restore Completed

      09/22/02-12:06:10 (ProcessEvents): WARN:  RAIDVolumeManager::updateAltMountStates, caught IconSendInfeasibleException Error

      09/22/02-12:18:25 (utlTimer): NOTE:  fcnChannelReport ==>  0 +1  2 =3

      09/22/02-12:18:32 (utlTimer): NOTE:  fcnChannelReport ==>  0 -1  2 =3

      09/22/02-12:18:35 (IOSched): NOTE:  Extended Link Down  ==> Chan 1

      09/22/02-12:18:42 (utlTimer): NOTE:  fcnChannelReport ==>  0 +1  2 =3

      09/22/02-12:21:23 (utlTimer): NOTE:  fcnChannelReport ==> +0  1  2 =3

      09/22/02-12:21:38 (utlTimer): NOTE:  fcnChannelReport ==> +0  1  2 =3

      09/22/02-12:22:37 (utlTimer): NOTE:  fcnChannelReport ==> +0  1  2 =3

      09/22/02-12:22:55 (tSubSys): NOTE:  HealthCheck: Alternate controller removal

      09/22/02-12:22:55 (tSubSys): NOTE:  HealthCheckManager: Notify Event 5 Ctl_Removed

      09/22/02-12:22:56 (utlTimer): NOTE:  fcnChannelReport ==> -0  1  2 =3

      09/22/02-12:23:00 (IOSched): NOTE:  Extended Link Down  ==> Chan 0

      09/22/02-12:23:01 (utlTimer): NOTE:  fcnChannelReport ==> =0  1  2 =3

      09/22/02-12:23:24 (IOSched): WARN:  Extended Link Down is over on channel 0 - lasted 29 seconds

      09/22/02-12:23:25 (utlTimer): NOTE:  fcnChannelReport ==> +0  1  2 =3

        • 1. Re: Failed to open Inter-Controller Communication Channels! Stk6000
          McW (Oracle)

          This would be, unfortunately,  expected result due to the replacement done with the unit powered off.

           

          The array is designed such that replacement of components (especially controller) should be done while the unit is operating,  the redundant (surviving) controller will be able to detect new component and do the needful,  such as syncing the firmware!

           

          In your case,  the unit is powered off and a controller with a different firmware is introduced,  so when the unit is powered on,  the "new" controller happen to take control,  the old controller may be in a lockdown state now,  and the new controller figured out that all the disks inside this array does not belong to him to begin with!

           

          You might try to power off the unit,  pull out the new controller,  power on with the original good controller.  Use your management software (CAM / SANtricity) to see if the array is back,  if not your controller might be in a lockdown state (look at the 7-segment display) due to the previous event.    If you got your array back online,  then follow the mgmt sw's service advisor to replace the controller.   If not,  you probably need help from support service to get out of the lockdown state,  bec. access to the serial port by customer is not supported.

           

          I see that you have already taken this question to MOS Community,  which is actively monitored by a lot more experts,   so let's continue from there...

          • 2. Re: Failed to open Inter-Controller Communication Channels! Stk6000
            tien86

            Hi McW,

             

            Is there anyway to use only this new controller B such as resetting the array to default factory ?

            • 3. Re: Failed to open Inter-Controller Communication Channels! Stk6000
              McW (Oracle)

              If you ask a support engineer,  possibly yes (close to but will never be real factory setting).  If you are thinking to do this yourself,  the standard answer would have to be "not supported".   Easier is probably to get the original good controller A working again,  it is likely more tedious to wipe the disk to adapt to a new controller with a new firmware version.

               

              Each disk in the array has an area set aside for configuration info and among many other things, identifiers are written into it,  which is how the controller prevents accidentally corrupting user data.   There is a supported feature to "import" disks with data volumes intact from one array to another array,  but in a controlled manner,  meaning you need to "export" it first i.e. using the original controller.   The controller firmware is pretty smart in preventing someone randomly pulling out disk from one array and plugging into another.  

               

              With your new controller B,  you transplanted half a brain into another body while the other half was asleep,  so it is rejecting all the parts :-(.

              • 4. Re: Failed to open Inter-Controller Communication Channels! Stk6000
                tien86

                Today, i try to remove all disks,inserts some new disks, remove controller A and place only controller B new, which aim to make a totally new storage with new controller.

                 

                then I go Administration>reset storage to default factory configuration but get error.

                 

                I find another discussion at https://forums.oracle.com/message/10531531

                which brings both controllers to another storage one by one to synchronize with existent controller.

                • 5. Re: Failed to open Inter-Controller Communication Channels! Stk6000
                  McW (Oracle)

                  Being employee here,  I really shouldn't encourage "unsupported" operations,  they are unsupported for good reasons... 

                  I think CAM's option says "Reset Configuration"?   That's not quite "factory", and it expects a normal working array,  and I think you figured that out by getting error.

                  And the "other thread"  talks about cracking a duplex 2500 back into 2 simplex 2500,  again, only "upgrade" from simplex to duplex is supported.

                  So... if you want a new storage...  buy one ... (Remember,  I'm employee ... just joking.)

                   

                  You need "real" new (as in virgin) disks.   Power everything off,  load the controller,  load your virgin disks,  then power on.  It will look brand new.  

                  If it doesn't,  then your disks are not virgin as you thought and had been inserted in an array before,  and the controller had left it's marks already.

                   

                  (Disclaimer: All words used are absolutely computer technical in nature and drawn no human physiological analogy.   I fear the wrath of censorship ... )

                  • 6. Re: Failed to open Inter-Controller Communication Channels! Stk6000
                    tien86

                    HI,

                     

                    Today following your suggestion I can alrady up the storage with the new controller. The virginity really saved my life. System with 4 other esxi servers is running now. I am very appreciate for your help.

                     

                    BUt i still have to workaound with the lockdown controller while it didnt sync with new controller.

                    • 7. Re: Failed to open Inter-Controller Communication Channels! Stk6000
                      McW (Oracle)

                      Now that your array is running like a new array with a newer firmware version,  have you tried plugging in the original old working controller into the empty slot WITH THE ARRAY RUNNING?

                       

                      The result however is a bit difficult to predict,  sync might happen,  or not.    When engineering test replacement or upgrade procedure, certain assumption has to be made to limit down to reasonable test scenarios,  assumptions like what fw version will be in spare stock and cross-sync compatibility between fw versions.

                       

                      Bec. of the unsupported nature of what already happened in your array,  your best bet is to get another new controller and plug it in.

                       

                      Btw,  single controller in a 6180 is also not supported,  there must be 2.

                       

                      Good luck.

                      • 8. Re: Failed to open Inter-Controller Communication Channels! Stk6000
                        tien86

                        Hi,

                         

                        I tried to insert lock-controller, manually place offline and online. It show OK on CAM for some minutes before back to be ERROR.

                         

                        I feel this controller become ERROR instead of being lock-down as before. Perhaps we should ask for a replacement because nothing we can intervene to the hardware. This is log file.

                         

                        9/29/02-07:37:15 (symTask2): NOTE:  setControllerToFailed_1: Failing alternate controller

                        09/29/02-07:37:25 (symTask1): NOTE:  setControllerToOptimal_1: Setting alternate to optimal

                        09/29/02-07:37:25 (symTask1): NOTE:  buc controllerAltStateChanged

                        09/29/02-07:37:25 (symTask1): NOTE:  releasing alt ctl from reset

                        09/29/02-07:37:26 (tHckReset): NOTE:  HealthCheck: Alt Ctl: 1 Reset_Failure, state: 0 Start

                        09/29/02-07:37:28 (ccmEventTask): NOTE:  vdm::syncRequired(): Begin

                        09/29/02-07:37:28 (ccmEventTask): NOTE:  vdm::syncRequired(): Complete, elapsed time = 0 seconds

                        09/29/02-07:37:29 (ccmEventTask): WARN:  CCM Failed to notify the alternate to update Cache Store in pstor, writeCacheStoreToPstor() caught IconSendInfeasibleException Error

                        09/29/02-07:37:29 (ccmEventTask): WARN:  CCM Failed to notify the alternate to update Cache Store in pstor, writeCacheStoreToPstor() caught IconSendInfeasibleException Error

                        09/29/02-07:38:26 (tHckReset): NOTE:  HealthCheck: Alt Ctl: 4 Norun_Failure, state: 0 Start

                        09/29/02-07:38:26 (tHckReset): NOTE:  HealthCheckManager: Notify Event 6 Ctl_Not_Running

                        09/29/02-07:38:26 (cmgrEvent): WARN:  Alt Ctl Reboot:

                                                        Reboot CompID: 0x407

                                                        Reboot reason: 0x6

                                                        Reboot reason extra: 0x0

                        09/29/02-07:38:26 (cmgrEvent): NOTE:  holding alt ctl in reset

                        09/29/02-07:38:27 (cmgrEvent): NOTE:  HealthCheck: Alt Ctl: 1 Reset_Failure, state: 0 Start

                        09/29/02-07:38:28 (utlTimer): NOTE:  fcnChannelReport ==>  0 +1  2 =3

                        09/29/02-07:38:33 (ccmEventTask): WARN:  CCM Failed to notify the alternate to update Cache Store in pstor, writeCacheStoreToPstor() caught IconSendInfeasibleException Error

                        09/29/02-07:38:34 (ccmEventTask): WARN:  CCM Failed to notify the alternate to update Cache Store in pstor, writeCacheStoreToPstor() caught IconSendInfeasibleException Error