1 Reply Latest reply: Mar 1, 2013 9:50 AM by JimKlimov RSS

    dsadm reports running servers as stopped

      We were recently migrating an old setup onto somewhat newer platform, and found a piece of strange behavior: "dsadm info" reports both "dsins1" and "ads" instances as stopped, even though they are running. This happens both when they are integrated into SMF and not, 32-bit and 64-bit mode.

      Platform: DSEE 6.3.1_sec, Solaris 10u8 x86_64

      Because of this:
      1) DSCC refuses to work (thinks ADS is down); likewise "dsccsetup status" returns incorrect "stopped" result (no surprise - I trussed that it calls dsadm for the status query).
      2) SMF refuses to start - "dsadm start" quits instantly, though it spawns a working LDAP instance. It is however possible to force SMF to work properly by setting it up to call $INSTANCE_PATH/start-slapd and stop-slapd as SMF methods.
      3) subsequent calls to "dsadm start" while the server is running don't detect that it exists, and the program removes the "invalid" pid file from the instance's logs directory. This makes further stop/restarts more tricky.

      Also found that the start/stop scripts rely too much on pid files and don't inspect SMF_FMRI (or the service instance mentioned in $INSTANCE_PATH/config/state), so absent pid-files lead to misdiagnozed errors with default scripts. That is not difficult to fix, however...

      Access rights seem normal, as compared to other similar systems. Old installation (cloned onto new VM and live-updated) did not show such behaviour...

      So far I failed to truss anything reasonable out of this... any ideas, short of re-installation (which I'll try in another zone of the same box soon anyway)?

      //Jim Klimov

      Edited by: JimKlimov on Feb 14, 2013 11:36 PM
        • 1. Re: dsadm reports running servers as stopped
          It seems that the problem was somehow because the server was installed in a 32-bit VM, and migrated onto a newer host capable of running the 64-bit VMs, so Solaris switched to that. While there were no actual problems with database files, source architecture seems to have got recorded there and confused some verifier routines?

          I tested that a newly-made DS instance works as expected. Its start/stop scripts referenced "amd64" in the paths, but changing original dsins scripts accordingly did not help.

          Ultimately, data and configs were exported from original instance, which was then destroyed. New instance remade in its place and data reimported again and replications reconfigured and customizations (plugins, acls) reapplied, and all works like it should now. Rinse and repeat for siblings in the MMR cluster.