3 Replies Latest reply: Jan 12, 2010 8:58 AM by 807567 RSS

    Trap syntax error in trap-schedule.dat alert; possibly SMC related

    807567
      Receiving many ID xxxxxx daemon.alert events every hour. The message is:
      [ID xxxxxx daemon.alert] syslog Nov 10 14:00:10 trap syntax error in trap-schedule.dat(200) at token '???'
      followed immediately (same timestamp) with:
      [ID yyyyyy daemon.alert] syslog Nov 10 14:00:10 trap *** aborting execution ***

      I coudn't find anything on the web specifically for this message but several hits that looked similar seemed to indicate that this might be a problem in SMC. If it is not I apologize in advance.

      System is an ldom:
      Solaris 10;
      SunOS clsol8 5.10 Generic_127127-11 sun4v sparc SUNW,T5140

      Current situation is:
      # /opt/SUNWsymon/sbin/es-validate

      This script will help you in validation of Sun (TM) Management Center.

      Validation Tool Version : 4.0
      Host name : clsol8
      Number of CPUs : 24
      Platform : SUNW,T5140
      Operating System : SunOS 5.10
      Memory size : 2048 Megabytes
      Swap space : 595480k used, 3727296k available
      JAVA VERSION : "1.5.0_14"

      Sun Management Center Production Environment Installation.

      Following layers are installed : SERVER, AGENT, CONSOLE
      Installation location : /opt/SUNWsymon
      ----
      Sun Management Center installation status:
      PRODUCT : Production Environment
      INSTALLATION STATUS : Setup.
      DATABASE SETUP : Setup.
      COMPLETELY INSTALLED PACKAGES : SUNWescom,SUNWesbui,SUNWesbuh,SUNWenesi,
      : SUNWesdb,SUNWesagt,SUNWessrv,SUNWessa,
      : SUNWesjp,SUNWesaxp,SUNWesse,SUNWesclt,
      : SUNWesjrm,SUNWmeta,SUNWenesf,SUNWesmdr,
      : SUNWesgui,SUNWesweb,SUNWessvc,SUNWesasc,
      : SUNWescix,SUNWsuagt,SUNWsusrv,SUNWesval,
      : SUNWesmc,SUNWessms,SUNWessdv,SUNWescdv,
      : SUNWlgsmc,SUNWeslac,SUNWesodbc,SUNWesmib,
      : SUNWesken,SUNWesmod,SUNWesae,SUNWesaem,
      : SUNWesmcp,SUNWesafm,SUNWescon,SUNWsucon,
      : SUNWescli,SUNWesclb

      <NOTE: I had to cut out a bunch of installed components to hit character limit>
      ----
      Sun Management Center Add-Ons and Versions:
      PRODUCT VERSION
      ----
      Production Environment 4.0
      Advanced System Monitoring 4.0_Build15
      Sun Fire Entry-Level Midrange S 3.5-v6
      Service Availability Manager 4.0_Build15
      Performance Reporting Manager 4.0_Build15
      Solaris Container Manager 4.0_Build15
      Sun Fire Midrange Systems Platf 3.5-v6
      System Reliability Manager 4.0_Build15
      Sun Management Center Integrati 4.0_Build15
      Workgroup Server 3.6
      Generic X86/X64 Config Reader 4.0_Build15

      Sun Management Center Patch installation details:
      No Sun Management Center patch is installed.
      --
      Sun Management Center disk-space consumption:
      ---
      PRODUCT APPROXIMATE DISK SPACE CONSUMED
      ---
      Production Environment : 54452 kB
      Advanced System Monitoring : 2391 kB
      Sun Fire Entry-Level Midrange S : 1738 kB
      Service Availability Manager : 1838 kB
      Performance Reporting Manager : 3371 kB
      Solaris Container Manager : 3688 kB
      Sun Fire Midrange Systems Platf : 3270 kB
      System Reliability Manager : 970 kB
      Sun Management Center Integrati : 540 kB
      Workgroup Server : 3707 kB
      Generic X86/X64 Config Reader : 608 kB
      ---------------
      TOTAL : 76573 kB

      Database is located at : /var/opt/SUNWsymon/db/data/SunMC
      Free space available on this partition is : 6493142 kB
      ---
      Following locales are installed :
      ---

      Information about upgrade from old versions is not available.

      Sun Management Center Ports:
      ----
      SUNMC COMPONENT PORT_ID
      ----
      agent service 1161
      trap service 162
      event service 163
      topology service 164
      cfgserver service 165
      cstservice service 167
      metadata service 168
      platform service 166
      grouping service 5600
      rmi service 2099
      webserver_HTTP service 8080
      webserver_HTTPS service 8443

      You are currently running SNMPDX.

      Sun Management Center Server Hosts definitions in domain-config.x:
      ---
      SUNMC COMPONENT SERVER_HOST
      ---
      agent service clsol8
      trap service clsol8
      event service clsol8
      topology service clsol8
      cfgserver service clsol8
      cstservice service clsol8
      metadata service clsol8
      platform service clsol8

      Sun Management Center Processes:
      ---
      SUNMC SERVICE STATUS
      ---
      Java Server Running.
      Database services Not Running.
      Grouping service Running.
      Event-handler service Running.
      Topology service Not Running.
      Trap-handler service Not Running.
      Configuration service Running.
      CST service Not Running.
      Metadata Services Running.
      Hardware service Not Running.
      Web server Running.
      Sun Management Center Agent Running.
      Platform Agent Not Running.
      ---
      Privilege level for Sun Management Center users :
      CATEGORY USERS
      esadm : smcadmin
      esdomadm : smcadmin
      esops :

      ALL USERS : smcadmin
      ----

      server is local host
      ---

      Web server package is installed correctly.
      Web Server is up and responding.

      Web Server servlet engine is up and responding.

      I have also read that patching SMC has caused problems for some people so I don't really want to try that until I get some feedback.
        • 1. Re: Trap syntax error in trap-schedule.dat alert; possibly SMC related
          Mike Kirk
          Hi m_nicholson,
          m_nicholson wrote:
          Receiving many ID xxxxxx daemon.alert events every hour. The message is:
          [ID xxxxxx daemon.alert] syslog Nov 10 14:00:10 trap syntax error in trap-schedule.dat(200) at token '???'
          followed immediately (same timestamp) with:
          [ID yyyyyy daemon.alert] syslog Nov 10 14:00:10 trap *** aborting execution ***
          Sounds like SunMC (or at least that trap service) was shut down incorrectly at some point: maybe a power failure... or a filesystem filled up? Either way the /var/opt/SUNWsymon/cfg/trap-schedule.dat file is corrupted, and Solaris is likely restarting the sunmctrap service over and over and over...

          I believe that's one of the .dat files that SunMC can recreate. Stop your SunMC Server, move that corrupt file to another location (i.e. make a backup by changing the filename) then restart SunMC. You should see a new file created within the first couple of minutes.

          Also, keep an eye on /var/adm/messages for other "aborting execution" messages: you may have more than one bad file
          I have also read that patching SMC has caused problems for some people
          so I don't really want to try that until I get some feedback.
          In general, patches fix more problems than they create: I recommend you install the latest set. But your current trap-schedule.dat problem isn't due to a problem that a patch would fix, just a config file that's formatted incorrectly.

          Regards,

          Mike.Kirk@HalcyonInc.com
          http://www.HalcyonInc.com
          • 2. Re: Trap syntax error in trap-schedule.dat alert; possibly SMC related
            807567
            Hi Mike,

            Tried your suggestion to rename the .dat and let SMC recreate it but I can't get the SMC database service to launch. So it looks like I am having trouble with the database. Any suggestions?

            To back up, you nailed it with the loss of power - we lost both power supplies on a Sunday night with no warning and nothing in /var/adm/messages (since we send them to a loghost.) It was determined that chips within each power supply in the T5140 failed. So we replaced both power supplies and fired the server up. That is when we started getting the trap errors and only those two errors on clsol8.

            I did check http://sun.com/msg/SMF-8000-KS but am not sure how that helps.

            What I tried:
            On clsol8 as root:
            # cd /opt/SUNWsymon/sbin
            # ls
            {db-memconfig.sh   es-details        es-imagetool      es-setup
            db-start          es-device         es-inst           es-start
            db-stop           es-dt             es-keys.sh        es-stop
            es-apps           es-gui-imagetool  es-lic            es-tool
            es-backup         es-guiinst        es-load-default   es-trapdest
            es-chelp          es-guisetup       es-makeagent      es-uninst
            es-cli            es-guistart       es-platform       es-validate
            es-common.sh      es-guistop        es-restore        esmultiip
            es-config         es-guiuninst      es-run            ports.config}
            # ./es-stop -A
            {Stopping metadata component
            Stopping cfgserver component
            Stopping topology component
            Stopping event component
            Stopping grouping service
            Stopping trap component
            Stopping java server
            Stopping webserver
            Stopping agent component
            Stopping platform component}
            <attempting Mike's solution suggestion>
            # cd /var/opt/SUNWsymon/cfg
            # ls
            {...
            trap-schedule.dat
            ...}
            # mv trap-schedule.dat trap-schedule.dat.maybeCorrupted
            # /opt/SUNWsymon/sbin/es-start -Ac
            {Some of the SunMC services are in maintenace state.
            Please check the corresponding SMF service log in /var/svc/log directory.
            Please disable the services in maintenance state and re-start the services again.}
            # svcs -vx
            {....

            svc:/application/management/sunmcdatabase:default (SunMC database service)
            State: maintenance since November 6, 2009 3:50:51 PM CST
            Reason: Start method exited with $SMF_EXIT_ERR_FATAL.
            See: http://sun.com/msg/SMF-8000-KS
            See: /var/svc/log/application-management-sunmcdatabase:default.log
            Impact: This service is not running.

            svc:/application/management/sunmcwebserver:default (SunMC webserver service)
            State: maintenance since November 12, 2009 2:45:50 PM CST
            Reason: Start method failed repeatedly, last exited with status 103.
            See: http://sun.com/msg/SMF-8000-KS
            See: /var/svc/log/application-management-sunmcwebserver:default.log
            Impact: This service is not running.}
            # svcadm disable sunmcdatabase
            # svcadm disable sunmcwebserver
            # svcs -vx
            {...}
            # svcadm enable sunmcdatabase
            # svcadm enable sunmcwebserver
            # /opt/SUNWsymon/sbin/es-start -Ac
            {Failed to successfully perform Database Startup.}
            # svcs -vx
            {...

            svc:/application/management/sunmcdatabase:default (SunMC database service)
            State: maintenance since November 12, 2009 2:49:50 PM CST
            Reason: Start method exited with $SMF_EXIT_ERR_FATAL.
            See: http://sun.com/msg/SMF-8000-KS
            See: /var/svc/log/application-management-sunmcdatabase:default.log
            Impact: This service is not running.}

            # cd /
            # svcadm disable sunmcdatabase
            # shutdown -y -g0 -i6
            (after reboot, I logged into clsol8 and su'd to root)
            # svcs -xv
            {...

            svc:/application/management/sunmcdatabase:default (SunMC database service)
            State: disabled since November 12, 2009 3:02:11 PM CST
            Reason: Disabled by an administrator.
            See: http://sun.com/msg/SMF-8000-05
            Impact: 1 dependent service is not running:
            svc:/application/management/sunmctopology:default}
            # svcadm enable sunmcdatabase
            # svcadm disable sunmcdatabase
            # svcadm clear sunmcdatabase
            svcadm: Instance "svc:/application/management/sunmcdatabase:default" is not in a maintenance or degraded state.
            # svcadm refresh sunmcdatabase
            # svcadm enable sunmcdatabase
            # svcs -xv

            svc:/application/management/sunmcdatabase:default (SunMC database service)
            State: offline since November 12, 2009 3:09:25 PM CST
            Reason: Start method is running.
            See: http://sun.com/msg/SMF-8000-C4
            See: /var/svc/log/application-management-sunmcdatabase:default.log
            Impact: 1 dependent service is not running:
            svc:/application/management/sunmctopology:default
            # svcs -xv

            svc:/application/management/sunmcdatabase:default (SunMC database service)
            State: maintenance since November 12, 2009 3:10:30 PM CST
            Reason: Start method exited with $SMF_EXIT_ERR_FATAL.
            See: http://sun.com/msg/SMF-8000-KS
            See: /var/svc/log/application-management-sunmcdatabase:default.log
            Impact: 1 dependent service is not running:
            svc:/application/management/sunmctopology:default




            [ Nov 12 15:08:37 Leaving maintenance because disable requested. ]
            [ Nov 12 15:08:37 Disabled. ]
            [ Nov 12 15:09:07 Rereading configuration. ]
            [ Nov 12 15:09:25 Enabled. ]
            [ Nov 12 15:09:25 Executing start method ("/lib/svc/method/es-svc.sh start datab
            ase") ]
            execution of verifyDatabaseUp failed


            exiting........................
            [ Nov 12 15:10:30 Method "start" exited with status 95 ]
            • 3. Re: Trap syntax error in trap-schedule.dat alert; possibly SMC related
              807567
              To anyone finding this post:

              I ended up uninstalling SMC 4 and reinstalling it but that did not get me going right away. I finally figured out that I needed to start the postgres database with the following (NOTE: I put output from commands in curly braces {...} ):
              su postgres
              initdb -D /var/lib/pgsql/data
              {The files belonging to this database system will be owned by user "postgres".
              This user must also own the server process.

              The database cluster will be initialized with locales
                COLLATE:  en_US.ISO8859-1
                CTYPE:    en_US.ISO8859-1
                MESSAGES: C
                MONETARY: en_US.ISO8859-1
                NUMERIC:  en_US.ISO8859-1
                TIME:     en_US.ISO8859-1
              The default database encoding has accordingly been set to LATIN1.

              initdb: directory "/var/lib/pgsql/data" exists but is not empty
              If you want to create a new database system, either remove or empty
              the directory "/var/lib/pgsql/data" or run initdb
              with an argument other than "/var/lib/pgsql/data".}
              pg_ctl -D /var/lib/pgsql/data status
              {pg_ctl: neither postmaster nor postgres running}
              pg_ctl -D /var/lib/pgsql/data -l /var/lib/pgsql/data/logfile start
              {pg_ctl: another postmaster may be running; trying to start postmaster anyway
              postmaster starting}
              pg_ctl -D /var/lib/pgsql/data status
              {pg_ctl: postmaster is running (PID: 29542)
              /usr/bin/postgres -D /var/lib/pgsql/data}
              exit (out of postgres user back to root user)

              Now that the postgres database was running, I could finish the setup of SMC and es-validate would show that SMC was running.

              Since I didn't get the database running before uninstalling, I don't know if Mike's suggestion would have gotten me going by itself but it is definitely a good thing to be aware of.