This discussion is archived
8 Replies Latest reply: Oct 14, 2013 9:44 PM by tien86 RSS

Failed to open Inter-Controller Communication Channels! Stk6000

tien86 Newbie
Currently Being Moderated

Hi pros,

 

I have a problem while replacing a new controller. My storage 6180 have 2 controller A ( OK ) - B ( failed ). I power off storage than replacing controller B by a new controller.

 

Now, i cannot access to controller A through serial console. F/W controller A: 7.60.x.x

 

I can access to controller B through serial console and CAM which show all drive is incompatible and unassigned, no volumes, no virtual disk. F/W controller B: 7.80.x.x

 

Controller B message shows  Failed to open Inter-Controller Communication Channels!

 

I searched google and MOS but still have not find any solution for this problem. I will appreciate for any helps.

 

 

This is log file from serial controller B

09/22/02-12:03:37 (tRAID): WARN:  Failed to open Inter-Controller Communication Channels!

09/22/02-12:03:37 (tRAID): NOTE:  LockMgr Role is Master

09/22/02-12:03:38 (utlTimer): NOTE:  fcnChannelReport ==>  0  1

09/22/02-12:03:40 (tHckReset): NOTE:  HealthCheck: Alt Ctl: 1 Reset_Failure, state: 0 Start

0x8f9084 (tNetCfgInit): miiPhyInit check cable connection

09/22/02-12:03:49 (tNetCfgInit): NOTE:  eth1: LinkDown event

09/22/02-12:03:49 (tNetCfgInit): NOTE:  Network Ready

09/22/02-12:04:07 (tRAID): NOTE:  WWN baseName 00040080-e5185e0e (valid==>SigMatch)

09/22/02-12:04:07 (tRAID): NOTE:  spmEarlyData: No data available

09/22/02-12:04:08 (tRAID): SOD: Pre-Initialization Phase Complete

09/22/02-12:04:09 (utlTimer): NOTE:  fcnChannelReport ==>  0  1 ~2 -3

09/22/02-12:04:14 (utlTimer): NOTE:  fcnChannelReport ==>  0  1 ~2 =3

09/22/02-12:04:39 (utlTimer): NOTE:  fcnChannelReport ==>  0  1  2 =3

09/22/02-12:04:40 (tHckReset): NOTE:  HealthCheck: Alt Ctl: 4 Norun_Failure, state: 0 Start

09/22/02-12:04:40 (tHckReset): NOTE:  HealthCheckManager: Notify Event 6 Ctl_Not_Running

09/22/02-12:04:40 (cmgrEvent): WARN:  Alt Ctl Reboot:

                                Reboot CompID: 0x407

                                Reboot reason: 0x6

                                Reboot reason extra: 0x0

09/22/02-12:04:40 (cmgrEvent): NOTE:  holding alt ctl in reset

09/22/02-12:04:40 (cmgrEvent): NOTE:  HealthCheck: Alt Ctl: 1 Reset_Failure, state: 0 Start

09/22/02-12:04:42 (utlTimer): NOTE:  fcnChannelReport ==> +0  1  2 =3

09/22/02-12:04:43 (utlTimer): NOTE:  fcnChannelReport ==>  0  1  2 =3

09/22/02-12:04:46 (tRAID): WARN:  pcm::CapabilityManager::validateAdoption exception  (DbmWriteException: recType: 85, status: -12)

09/22/02-12:04:46 (tRAID): WARN:  dbm::RWFileSystem::initialize: Exception caught, ConstructorIOException: -16

09/22/02-12:04:46 (tRAID): ERROR: In PersistenceManager::initialize: catch DbmNoFileSystemException: recType: 84

09/22/02-12:04:46 (tRAID): ERROR: ADM Load Reservations failed with error (5) Exception

09/22/02-12:04:46 (tRAID): NOTE:  ACS: Icon ping to alternate failed: -2, resp: 0

09/22/02-12:04:46 (tRAID): NOTE:  ACS: autoCodeSync(): Process start. Comm Mode: 0, Status: 0

09/22/02-12:04:46 (tRAID): WARN:  ACS: autoCodeSync(): Skipped since alt not communicating.

09/22/02-12:04:46 (tRAID): WARN:  DbmNoFileSystemException: recType: 59 Line 1819 File cmgrControllerMgr.cc

09/22/02-12:04:46 (tRAID): WARN:  DbmNoFileSystemException: recType: 59 Line 1819 File cmgrControllerMgr.cc

09/22/02-12:04:46 (tRAID): SOD: Code Synchronization Initialization Phase Complete

09/22/02-12:04:46 (sntpEvent): NOTE:  sntpEventHandler: VNI_GET_SYNC_TIME failed

09/22/02-12:04:47 (tRAID): WARN:  USM  Exception caught in processUsmHeader - DbmNoFileSystemException: recType: 30

09/22/02-12:04:47 (tRAID): WARN:  USM  Error allocating UsmStableStorageHeader in processUsmHeader() - DbmNoFileSystemException: recType: 30

09/22/02-12:04:47 (tRAID): NOTE:  SOD failure in evf::VolumeCfgManager::initialize

09/22/02-12:04:47 (tRAID): NOTE:  DbmNoFileSystemException in evf::VolExtentMgr::initialize

09/22/02-12:04:47 (tRAID): NOTE:  DbmNoFileSystemException in safe::initialize

09/22/02-12:04:47 (tRAID): ERROR: Caught DbmNoFileSystemException: recType: 95 in initialize

09/22/02-12:04:47 (tRAID): WARN:  edrSOD: No Config File System

09/22/02-12:04:48 (tRAID): WARN:  snrProcessDatabase: No File System Found

09/22/02-12:04:48 (tRAID): WARN:  spm::SPMManager::initialize NoFileSystem

09/22/02-12:04:48 (tRAID): NOTE:  fcn: Peering Disabled (Alt Unavailable)

09/22/02-12:04:48 (tRAID): NOTE:  ion: Peering Disabled (Alt Unavailable)

09/22/02-12:04:48 (tRAID): WARN:  Error in updating in memory dq cfg!

09/22/02-12:04:48 (tRAID): WARN:  Caught DbmNoFileSystemException: recType: 65 in fbm::initialize

09/22/02-12:04:48 (tRAID): WARN:  MelMgr::initialize FAILED:  Exception  Line 2197 File mlmMelMgr.cc

09/22/02-12:04:48 (tRAID): WARN:  readDatabaseSyslogConfig:  Caught Exception DbmNoFileSystemException: recType: 74 Line 420 File mlmSyslogMgr.cc

09/22/02-12:04:48 (tRAID): WARN:  Syslog database access failed:  Exception at Line 324 File mlmSyslogMgr.cc

09/22/02-12:04:49 (tRAID): WARN:  Unable to intialize mirror device

09/22/02-12:04:49 (tRAID): NOTE:  CacheMgr::cacheOpenMirrorDevice:: mirror device 0xfffffff

09/22/02-12:04:49 (tRAID): WARN:  CacheStore::read(,): Exception DbmNoFileSystemException: recType: 24

09/22/02-12:04:49 (tRAID): WARN:  CCM: readAndValidateCacheStore() caught CacheStoreDataException

09/22/02-12:04:49 (tRAID): NOTE:  CCM: readAndValidateCacheStore() recovering

09/22/02-12:04:49 (tRAID): NOTE:  CCM: readAndValidateCacheStore() partitioning for no mirroring

09/22/02-12:04:49 (tRAID): WARN:  CacheStoreExt::read(,): Exception DbmNoFileSystemException: recType: 82

09/22/02-12:04:49 (tRAID): WARN:  CCM: readAndValidateCacheStore() cacheStoreExt read exception

09/22/02-12:04:49 (tRAID): NOTE:  CCM: readAndValidateCacheStore() initializes with default values

09/22/02-12:04:49 (tRAID): NOTE:  CCM:  Changing default demand cache flush values

09/22/02-12:04:49 (tRAID): NOTE:  CCM: validateCacheMem() cache memory is invalid

09/22/02-12:04:49 (tRAID): NOTE:  CCM: validateCacheMem() Initializing my partition

09/22/02-12:04:50 (tRAID): NOTE:  CacheStore::write(): Exception DbmNoFileSystemException: recType: 24

09/22/02-12:04:50 (tRAID): WARN:  CCM: initialize() caught exception(2) CacheStoreDataException

09/22/02-12:04:50 (tRAID): NOTE:  CCM: sodClearMOSIntentsAlt(), failure clearing MOS intents on alt

09/22/02-12:04:52 (tRAID): WARN:  Exception caught in presMgrInit - DbmNoFileSystemException: recType: 12

09/22/02-12:04:52 (tRAID): NOTE:  DbmNoFileSystemException in vdm::CrushDriveManager::initialize

09/22/02-12:04:52 (tRAID): NOTE:  DbmNoFileSystemException in vdm::DriveManager::initialize

09/22/02-12:04:52 (tRAID): NOTE:  DbmNoFileSystemException in vdm::VolumeGroupManager::initialize

09/22/02-12:04:52 (tRAID): NOTE:  DbmNoFileSystemException in vdm::DacstoreVolManager::initialize

09/22/02-12:04:52 (tRAID): NOTE:  DbmNoFileSystemException in vdm::PieceManager::initialize

09/22/02-12:04:52 (tRAID): NOTE:  DbmNoFileSystemException in vdm::CrushStripeManager::initialize

09/22/02-12:04:52 (tRAID): NOTE:  DbmNoFileSystemException in vdm::exop::ExclOpManager::initialize

09/22/02-12:04:52 (tRAID): WARN:  CCM: initComplete() - isRestoreInProgressAlt caught IconSendInfeasibleException Error

09/22/02-12:04:52 (tRAID): NOTE:  CacheStore::write(): Exception DbmNoFileSystemException: recType: 24

09/22/02-12:04:52 (tRAID): WARN:  CCM: initComplete() caught(2) CacheStoreDataException

09/22/02-12:04:53 (tRAID): NOTE:  DiagVolManager::initialize: Exception - Alt controller not ready

09/22/02-12:04:53 (tRAID): ERROR: dbm: SubRecInterface::save caught DbmNoFileSystemException: recType: 58 - recType: 58, txn: 0x03b0de7c

09/22/02-12:04:53 (tRAID): ERROR: dbm: SubRecInterface::save caught DbmNoFileSystemException: recType: 58 - recType: 58, txn: 0x03b0de7c

09/22/02-12:04:53 (tRAID): WARN:  Caught DbmNoFileSystemException: recType: 57 in sam::StorageArrayManager::initialize

09/22/02-12:04:55 (tRAID): SOD: Initialization Phase Complete

09/22/02-12:04:53 (ProcessHandlers): WARN:  Exception caught in WWNStorage iosReleased: DbmNoFileSystemException: recType: 9

09/22/02-12:04:53 (ProcessHandlers): WARN:  dbm::WWNStorage::iosReleased() FAILED.

==============================================

Title:     Disk Array Controller

           Copyright 2008-2011 LSI Logic Corporation, All Rights Reserved.

 

Name:      RC

Version:   07.80.51.10

Date:      10/03/2011

Time:      15:49:14 CDT

Models:    4980 4981 4985 4988

Manager:   devmgr.v1080api06.Manager

==============================================

 

09/22/02-12:04:56 (tRAID): sodMain Normal sequence finished, elapsed time = 112 seconds

09/22/02-12:04:56 (tRAID): sodMain complete

09/22/02-12:05:06 (ProcessHandlers): WARN:  Drive 0x10000 will be marked incompatible as DSM detected error: 12

09/22/02-12:05:06 (ProcessHandlers): ERROR: dbm: SubRecInterface::read caught DbmNoFileSystemException: recType: 84 - recType: 84, txn: 0x03b0df9c

09/22/02-12:05:08 (utlTimer): WARN:  Extended Link Down Timeout on channel 3

09/22/02-12:05:10 (ProcessHandlers): WARN:  DbmNoFileSystemException: recType: 35 Line 6404 File cmgrControllerMgr.cc

09/22/02-12:05:10 (ProcessHandlers): WARN:  DbmNoFileSystemException: recType: 35 Line 5444 File cmgrControllerMgr.cc

09/22/02-12:05:10 (ProcessHandlers): WARN:  CCM: backupStorageAvailableAlt() caught IconSendInfeasibleException Error

09/22/02-12:05:10 (ProcessHandlers): NOTE:  SYMbol available

09/22/02-12:05:12 (ProcessHandlers): SOD: sodComplete Notification Complete

09/22/02-12:05:10 (PersistentRestore): NOTE:  PSTOR: PstorRecordManager::readRecord data block not found

09/22/02-12:05:10 (PersistentRestore): WARN:  IOManager::readBackupStatus - Pstor record does not exsit

09/22/02-12:05:10 (PersistentRestore): NOTE:  IOManager::getBackupDataSize - read to pstor failed

09/22/02-12:05:10 (PersistentRestore): WARN:  ddcDq & ddcTrace restore abandoned: nothing to recover

09/22/02-12:05:10 (PersistentRestore): WARN:  PSTOR: PstorRecordMgr: removeRecord failed

09/22/02-12:05:10 (PersistentRestore): WARN:  IOManager::readBackupStatus - Pstor record does not exist

09/22/02-12:05:10 (PersistentRestore): NOTE:  DDC Restore Failed

 

09/22/02-12:05:10 (PersistentRestore): NOTE:  IOManager::restoreData - m_DataSize:0x500000, m_StartAddress:0x3b0e320

09/22/02-12:05:10 (PersistentRestore): NOTE:  IOManager::restoreData - Successful

09/22/02-12:05:10 (PersistentRestore): NOTE:  ncb::IOManager::restoreData - Successful

09/22/02-12:05:10 (PersistentRestore): NOTE:  DQ Restore Completed

09/22/02-12:06:10 (ProcessEvents): WARN:  RAIDVolumeManager::updateAltMountStates, caught IconSendInfeasibleException Error

09/22/02-12:18:25 (utlTimer): NOTE:  fcnChannelReport ==>  0 +1  2 =3

09/22/02-12:18:32 (utlTimer): NOTE:  fcnChannelReport ==>  0 -1  2 =3

09/22/02-12:18:35 (IOSched): NOTE:  Extended Link Down  ==> Chan 1

09/22/02-12:18:42 (utlTimer): NOTE:  fcnChannelReport ==>  0 +1  2 =3

09/22/02-12:21:23 (utlTimer): NOTE:  fcnChannelReport ==> +0  1  2 =3

09/22/02-12:21:38 (utlTimer): NOTE:  fcnChannelReport ==> +0  1  2 =3

09/22/02-12:22:37 (utlTimer): NOTE:  fcnChannelReport ==> +0  1  2 =3

09/22/02-12:22:55 (tSubSys): NOTE:  HealthCheck: Alternate controller removal

09/22/02-12:22:55 (tSubSys): NOTE:  HealthCheckManager: Notify Event 5 Ctl_Removed

09/22/02-12:22:56 (utlTimer): NOTE:  fcnChannelReport ==> -0  1  2 =3

09/22/02-12:23:00 (IOSched): NOTE:  Extended Link Down  ==> Chan 0

09/22/02-12:23:01 (utlTimer): NOTE:  fcnChannelReport ==> =0  1  2 =3

09/22/02-12:23:24 (IOSched): WARN:  Extended Link Down is over on channel 0 - lasted 29 seconds

09/22/02-12:23:25 (utlTimer): NOTE:  fcnChannelReport ==> +0  1  2 =3

  • 1. Re: Failed to open Inter-Controller Communication Channels! Stk6000
    McW (Oracle) Explorer
    Currently Being Moderated

    This would be, unfortunately,  expected result due to the replacement done with the unit powered off.

     

    The array is designed such that replacement of components (especially controller) should be done while the unit is operating,  the redundant (surviving) controller will be able to detect new component and do the needful,  such as syncing the firmware!

     

    In your case,  the unit is powered off and a controller with a different firmware is introduced,  so when the unit is powered on,  the "new" controller happen to take control,  the old controller may be in a lockdown state now,  and the new controller figured out that all the disks inside this array does not belong to him to begin with!

     

    You might try to power off the unit,  pull out the new controller,  power on with the original good controller.  Use your management software (CAM / SANtricity) to see if the array is back,  if not your controller might be in a lockdown state (look at the 7-segment display) due to the previous event.    If you got your array back online,  then follow the mgmt sw's service advisor to replace the controller.   If not,  you probably need help from support service to get out of the lockdown state,  bec. access to the serial port by customer is not supported.

     

    I see that you have already taken this question to MOS Community,  which is actively monitored by a lot more experts,   so let's continue from there...

  • 2. Re: Failed to open Inter-Controller Communication Channels! Stk6000
    tien86 Newbie
    Currently Being Moderated

    Hi McW,

     

    Is there anyway to use only this new controller B such as resetting the array to default factory ?

  • 3. Re: Failed to open Inter-Controller Communication Channels! Stk6000
    McW (Oracle) Explorer
    Currently Being Moderated

    If you ask a support engineer,  possibly yes (close to but will never be real factory setting).  If you are thinking to do this yourself,  the standard answer would have to be "not supported".   Easier is probably to get the original good controller A working again,  it is likely more tedious to wipe the disk to adapt to a new controller with a new firmware version.

     

    Each disk in the array has an area set aside for configuration info and among many other things, identifiers are written into it,  which is how the controller prevents accidentally corrupting user data.   There is a supported feature to "import" disks with data volumes intact from one array to another array,  but in a controlled manner,  meaning you need to "export" it first i.e. using the original controller.   The controller firmware is pretty smart in preventing someone randomly pulling out disk from one array and plugging into another.  

     

    With your new controller B,  you transplanted half a brain into another body while the other half was asleep,  so it is rejecting all the parts :-(.

  • 4. Re: Failed to open Inter-Controller Communication Channels! Stk6000
    tien86 Newbie
    Currently Being Moderated

    Today, i try to remove all disks,inserts some new disks, remove controller A and place only controller B new, which aim to make a totally new storage with new controller.

     

    then I go Administration>reset storage to default factory configuration but get error.

     

    I find another discussion at https://forums.oracle.com/message/10531531

    which brings both controllers to another storage one by one to synchronize with existent controller.

  • 5. Re: Failed to open Inter-Controller Communication Channels! Stk6000
    McW (Oracle) Explorer
    Currently Being Moderated

    Being employee here,  I really shouldn't encourage "unsupported" operations,  they are unsupported for good reasons... 

    I think CAM's option says "Reset Configuration"?   That's not quite "factory", and it expects a normal working array,  and I think you figured that out by getting error.

    And the "other thread"  talks about cracking a duplex 2500 back into 2 simplex 2500,  again, only "upgrade" from simplex to duplex is supported.

    So... if you want a new storage...  buy one ... (Remember,  I'm employee ... just joking.)

     

    You need "real" new (as in virgin) disks.   Power everything off,  load the controller,  load your virgin disks,  then power on.  It will look brand new.  

    If it doesn't,  then your disks are not virgin as you thought and had been inserted in an array before,  and the controller had left it's marks already.

     

    (Disclaimer: All words used are absolutely computer technical in nature and drawn no human physiological analogy.   I fear the wrath of censorship ... )

  • 6. Re: Failed to open Inter-Controller Communication Channels! Stk6000
    tien86 Newbie
    Currently Being Moderated

    HI,

     

    Today following your suggestion I can alrady up the storage with the new controller. The virginity really saved my life. System with 4 other esxi servers is running now. I am very appreciate for your help.

     

    BUt i still have to workaound with the lockdown controller while it didnt sync with new controller.

  • 7. Re: Failed to open Inter-Controller Communication Channels! Stk6000
    McW (Oracle) Explorer
    Currently Being Moderated

    Now that your array is running like a new array with a newer firmware version,  have you tried plugging in the original old working controller into the empty slot WITH THE ARRAY RUNNING?

     

    The result however is a bit difficult to predict,  sync might happen,  or not.    When engineering test replacement or upgrade procedure, certain assumption has to be made to limit down to reasonable test scenarios,  assumptions like what fw version will be in spare stock and cross-sync compatibility between fw versions.

     

    Bec. of the unsupported nature of what already happened in your array,  your best bet is to get another new controller and plug it in.

     

    Btw,  single controller in a 6180 is also not supported,  there must be 2.

     

    Good luck.

  • 8. Re: Failed to open Inter-Controller Communication Channels! Stk6000
    tien86 Newbie
    Currently Being Moderated

    Hi,

     

    I tried to insert lock-controller, manually place offline and online. It show OK on CAM for some minutes before back to be ERROR.

     

    I feel this controller become ERROR instead of being lock-down as before. Perhaps we should ask for a replacement because nothing we can intervene to the hardware. This is log file.

     

    9/29/02-07:37:15 (symTask2): NOTE:  setControllerToFailed_1: Failing alternate controller

    09/29/02-07:37:25 (symTask1): NOTE:  setControllerToOptimal_1: Setting alternate to optimal

    09/29/02-07:37:25 (symTask1): NOTE:  buc controllerAltStateChanged

    09/29/02-07:37:25 (symTask1): NOTE:  releasing alt ctl from reset

    09/29/02-07:37:26 (tHckReset): NOTE:  HealthCheck: Alt Ctl: 1 Reset_Failure, state: 0 Start

    09/29/02-07:37:28 (ccmEventTask): NOTE:  vdm::syncRequired(): Begin

    09/29/02-07:37:28 (ccmEventTask): NOTE:  vdm::syncRequired(): Complete, elapsed time = 0 seconds

    09/29/02-07:37:29 (ccmEventTask): WARN:  CCM Failed to notify the alternate to update Cache Store in pstor, writeCacheStoreToPstor() caught IconSendInfeasibleException Error

    09/29/02-07:37:29 (ccmEventTask): WARN:  CCM Failed to notify the alternate to update Cache Store in pstor, writeCacheStoreToPstor() caught IconSendInfeasibleException Error

    09/29/02-07:38:26 (tHckReset): NOTE:  HealthCheck: Alt Ctl: 4 Norun_Failure, state: 0 Start

    09/29/02-07:38:26 (tHckReset): NOTE:  HealthCheckManager: Notify Event 6 Ctl_Not_Running

    09/29/02-07:38:26 (cmgrEvent): WARN:  Alt Ctl Reboot:

                                    Reboot CompID: 0x407

                                    Reboot reason: 0x6

                                    Reboot reason extra: 0x0

    09/29/02-07:38:26 (cmgrEvent): NOTE:  holding alt ctl in reset

    09/29/02-07:38:27 (cmgrEvent): NOTE:  HealthCheck: Alt Ctl: 1 Reset_Failure, state: 0 Start

    09/29/02-07:38:28 (utlTimer): NOTE:  fcnChannelReport ==>  0 +1  2 =3

    09/29/02-07:38:33 (ccmEventTask): WARN:  CCM Failed to notify the alternate to update Cache Store in pstor, writeCacheStoreToPstor() caught IconSendInfeasibleException Error

    09/29/02-07:38:34 (ccmEventTask): WARN:  CCM Failed to notify the alternate to update Cache Store in pstor, writeCacheStoreToPstor() caught IconSendInfeasibleException Error

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points