This discussion is archived
3 Replies Latest reply: Jan 12, 2010 6:58 AM by 807567 RSS

Trap syntax error in trap-schedule.dat alert; possibly SMC related

807567 Newbie
Currently Being Moderated
Receiving many ID xxxxxx daemon.alert events every hour. The message is:
[ID xxxxxx daemon.alert] syslog Nov 10 14:00:10 trap syntax error in trap-schedule.dat(200) at token '???'
followed immediately (same timestamp) with:
[ID yyyyyy daemon.alert] syslog Nov 10 14:00:10 trap *** aborting execution ***

I coudn't find anything on the web specifically for this message but several hits that looked similar seemed to indicate that this might be a problem in SMC. If it is not I apologize in advance.

System is an ldom:
Solaris 10;
SunOS clsol8 5.10 Generic_127127-11 sun4v sparc SUNW,T5140

Current situation is:
# /opt/SUNWsymon/sbin/es-validate

This script will help you in validation of Sun (TM) Management Center.

Validation Tool Version : 4.0
Host name : clsol8
Number of CPUs : 24
Platform : SUNW,T5140
Operating System : SunOS 5.10
Memory size : 2048 Megabytes
Swap space : 595480k used, 3727296k available
JAVA VERSION : "1.5.0_14"

Sun Management Center Production Environment Installation.

Following layers are installed : SERVER, AGENT, CONSOLE
Installation location : /opt/SUNWsymon
----
Sun Management Center installation status:
PRODUCT : Production Environment
INSTALLATION STATUS : Setup.
DATABASE SETUP : Setup.
COMPLETELY INSTALLED PACKAGES : SUNWescom,SUNWesbui,SUNWesbuh,SUNWenesi,
: SUNWesdb,SUNWesagt,SUNWessrv,SUNWessa,
: SUNWesjp,SUNWesaxp,SUNWesse,SUNWesclt,
: SUNWesjrm,SUNWmeta,SUNWenesf,SUNWesmdr,
: SUNWesgui,SUNWesweb,SUNWessvc,SUNWesasc,
: SUNWescix,SUNWsuagt,SUNWsusrv,SUNWesval,
: SUNWesmc,SUNWessms,SUNWessdv,SUNWescdv,
: SUNWlgsmc,SUNWeslac,SUNWesodbc,SUNWesmib,
: SUNWesken,SUNWesmod,SUNWesae,SUNWesaem,
: SUNWesmcp,SUNWesafm,SUNWescon,SUNWsucon,
: SUNWescli,SUNWesclb

<NOTE: I had to cut out a bunch of installed components to hit character limit>
----
Sun Management Center Add-Ons and Versions:
PRODUCT VERSION
----
Production Environment 4.0
Advanced System Monitoring 4.0_Build15
Sun Fire Entry-Level Midrange S 3.5-v6
Service Availability Manager 4.0_Build15
Performance Reporting Manager 4.0_Build15
Solaris Container Manager 4.0_Build15
Sun Fire Midrange Systems Platf 3.5-v6
System Reliability Manager 4.0_Build15
Sun Management Center Integrati 4.0_Build15
Workgroup Server 3.6
Generic X86/X64 Config Reader 4.0_Build15

Sun Management Center Patch installation details:
No Sun Management Center patch is installed.
--
Sun Management Center disk-space consumption:
---
PRODUCT APPROXIMATE DISK SPACE CONSUMED
---
Production Environment : 54452 kB
Advanced System Monitoring : 2391 kB
Sun Fire Entry-Level Midrange S : 1738 kB
Service Availability Manager : 1838 kB
Performance Reporting Manager : 3371 kB
Solaris Container Manager : 3688 kB
Sun Fire Midrange Systems Platf : 3270 kB
System Reliability Manager : 970 kB
Sun Management Center Integrati : 540 kB
Workgroup Server : 3707 kB
Generic X86/X64 Config Reader : 608 kB
---------------
TOTAL : 76573 kB

Database is located at : /var/opt/SUNWsymon/db/data/SunMC
Free space available on this partition is : 6493142 kB
---
Following locales are installed :
---

Information about upgrade from old versions is not available.

Sun Management Center Ports:
----
SUNMC COMPONENT PORT_ID
----
agent service 1161
trap service 162
event service 163
topology service 164
cfgserver service 165
cstservice service 167
metadata service 168
platform service 166
grouping service 5600
rmi service 2099
webserver_HTTP service 8080
webserver_HTTPS service 8443

You are currently running SNMPDX.

Sun Management Center Server Hosts definitions in domain-config.x:
---
SUNMC COMPONENT SERVER_HOST
---
agent service clsol8
trap service clsol8
event service clsol8
topology service clsol8
cfgserver service clsol8
cstservice service clsol8
metadata service clsol8
platform service clsol8

Sun Management Center Processes:
---
SUNMC SERVICE STATUS
---
Java Server Running.
Database services Not Running.
Grouping service Running.
Event-handler service Running.
Topology service Not Running.
Trap-handler service Not Running.
Configuration service Running.
CST service Not Running.
Metadata Services Running.
Hardware service Not Running.
Web server Running.
Sun Management Center Agent Running.
Platform Agent Not Running.
---
Privilege level for Sun Management Center users :
CATEGORY USERS
esadm : smcadmin
esdomadm : smcadmin
esops :

ALL USERS : smcadmin
----

server is local host
---

Web server package is installed correctly.
Web Server is up and responding.

Web Server servlet engine is up and responding.

I have also read that patching SMC has caused problems for some people so I don't really want to try that until I get some feedback.
  • 1. Re: Trap syntax error in trap-schedule.dat alert; possibly SMC related
    MikeKirk Newbie
    Currently Being Moderated
    Hi m_nicholson,
    m_nicholson wrote:
    Receiving many ID xxxxxx daemon.alert events every hour. The message is:
    [ID xxxxxx daemon.alert] syslog Nov 10 14:00:10 trap syntax error in trap-schedule.dat(200) at token '???'
    followed immediately (same timestamp) with:
    [ID yyyyyy daemon.alert] syslog Nov 10 14:00:10 trap *** aborting execution ***
    Sounds like SunMC (or at least that trap service) was shut down incorrectly at some point: maybe a power failure... or a filesystem filled up? Either way the /var/opt/SUNWsymon/cfg/trap-schedule.dat file is corrupted, and Solaris is likely restarting the sunmctrap service over and over and over...

    I believe that's one of the .dat files that SunMC can recreate. Stop your SunMC Server, move that corrupt file to another location (i.e. make a backup by changing the filename) then restart SunMC. You should see a new file created within the first couple of minutes.

    Also, keep an eye on /var/adm/messages for other "aborting execution" messages: you may have more than one bad file
    I have also read that patching SMC has caused problems for some people
    so I don't really want to try that until I get some feedback.
    In general, patches fix more problems than they create: I recommend you install the latest set. But your current trap-schedule.dat problem isn't due to a problem that a patch would fix, just a config file that's formatted incorrectly.

    Regards,

    Mike.Kirk@HalcyonInc.com
    http://www.HalcyonInc.com
  • 2. Re: Trap syntax error in trap-schedule.dat alert; possibly SMC related
    807567 Newbie
    Currently Being Moderated
    Hi Mike,

    Tried your suggestion to rename the .dat and let SMC recreate it but I can't get the SMC database service to launch. So it looks like I am having trouble with the database. Any suggestions?

    To back up, you nailed it with the loss of power - we lost both power supplies on a Sunday night with no warning and nothing in /var/adm/messages (since we send them to a loghost.) It was determined that chips within each power supply in the T5140 failed. So we replaced both power supplies and fired the server up. That is when we started getting the trap errors and only those two errors on clsol8.

    I did check http://sun.com/msg/SMF-8000-KS but am not sure how that helps.

    What I tried:
    On clsol8 as root:
    # cd /opt/SUNWsymon/sbin
    # ls
    {db-memconfig.sh   es-details        es-imagetool      es-setup
    db-start          es-device         es-inst           es-start
    db-stop           es-dt             es-keys.sh        es-stop
    es-apps           es-gui-imagetool  es-lic            es-tool
    es-backup         es-guiinst        es-load-default   es-trapdest
    es-chelp          es-guisetup       es-makeagent      es-uninst
    es-cli            es-guistart       es-platform       es-validate
    es-common.sh      es-guistop        es-restore        esmultiip
    es-config         es-guiuninst      es-run            ports.config}
    # ./es-stop -A
    {Stopping metadata component
    Stopping cfgserver component
    Stopping topology component
    Stopping event component
    Stopping grouping service
    Stopping trap component
    Stopping java server
    Stopping webserver
    Stopping agent component
    Stopping platform component}
    <attempting Mike's solution suggestion>
    # cd /var/opt/SUNWsymon/cfg
    # ls
    {...
    trap-schedule.dat
    ...}
    # mv trap-schedule.dat trap-schedule.dat.maybeCorrupted
    # /opt/SUNWsymon/sbin/es-start -Ac
    {Some of the SunMC services are in maintenace state.
    Please check the corresponding SMF service log in /var/svc/log directory.
    Please disable the services in maintenance state and re-start the services again.}
    # svcs -vx
    {....

    svc:/application/management/sunmcdatabase:default (SunMC database service)
    State: maintenance since November 6, 2009 3:50:51 PM CST
    Reason: Start method exited with $SMF_EXIT_ERR_FATAL.
    See: http://sun.com/msg/SMF-8000-KS
    See: /var/svc/log/application-management-sunmcdatabase:default.log
    Impact: This service is not running.

    svc:/application/management/sunmcwebserver:default (SunMC webserver service)
    State: maintenance since November 12, 2009 2:45:50 PM CST
    Reason: Start method failed repeatedly, last exited with status 103.
    See: http://sun.com/msg/SMF-8000-KS
    See: /var/svc/log/application-management-sunmcwebserver:default.log
    Impact: This service is not running.}
    # svcadm disable sunmcdatabase
    # svcadm disable sunmcwebserver
    # svcs -vx
    {...}
    # svcadm enable sunmcdatabase
    # svcadm enable sunmcwebserver
    # /opt/SUNWsymon/sbin/es-start -Ac
    {Failed to successfully perform Database Startup.}
    # svcs -vx
    {...

    svc:/application/management/sunmcdatabase:default (SunMC database service)
    State: maintenance since November 12, 2009 2:49:50 PM CST
    Reason: Start method exited with $SMF_EXIT_ERR_FATAL.
    See: http://sun.com/msg/SMF-8000-KS
    See: /var/svc/log/application-management-sunmcdatabase:default.log
    Impact: This service is not running.}

    # cd /
    # svcadm disable sunmcdatabase
    # shutdown -y -g0 -i6
    (after reboot, I logged into clsol8 and su'd to root)
    # svcs -xv
    {...

    svc:/application/management/sunmcdatabase:default (SunMC database service)
    State: disabled since November 12, 2009 3:02:11 PM CST
    Reason: Disabled by an administrator.
    See: http://sun.com/msg/SMF-8000-05
    Impact: 1 dependent service is not running:
    svc:/application/management/sunmctopology:default}
    # svcadm enable sunmcdatabase
    # svcadm disable sunmcdatabase
    # svcadm clear sunmcdatabase
    svcadm: Instance "svc:/application/management/sunmcdatabase:default" is not in a maintenance or degraded state.
    # svcadm refresh sunmcdatabase
    # svcadm enable sunmcdatabase
    # svcs -xv

    svc:/application/management/sunmcdatabase:default (SunMC database service)
    State: offline since November 12, 2009 3:09:25 PM CST
    Reason: Start method is running.
    See: http://sun.com/msg/SMF-8000-C4
    See: /var/svc/log/application-management-sunmcdatabase:default.log
    Impact: 1 dependent service is not running:
    svc:/application/management/sunmctopology:default
    # svcs -xv

    svc:/application/management/sunmcdatabase:default (SunMC database service)
    State: maintenance since November 12, 2009 3:10:30 PM CST
    Reason: Start method exited with $SMF_EXIT_ERR_FATAL.
    See: http://sun.com/msg/SMF-8000-KS
    See: /var/svc/log/application-management-sunmcdatabase:default.log
    Impact: 1 dependent service is not running:
    svc:/application/management/sunmctopology:default




    [ Nov 12 15:08:37 Leaving maintenance because disable requested. ]
    [ Nov 12 15:08:37 Disabled. ]
    [ Nov 12 15:09:07 Rereading configuration. ]
    [ Nov 12 15:09:25 Enabled. ]
    [ Nov 12 15:09:25 Executing start method ("/lib/svc/method/es-svc.sh start datab
    ase") ]
    execution of verifyDatabaseUp failed


    exiting........................
    [ Nov 12 15:10:30 Method "start" exited with status 95 ]
  • 3. Re: Trap syntax error in trap-schedule.dat alert; possibly SMC related
    807567 Newbie
    Currently Being Moderated
    To anyone finding this post:

    I ended up uninstalling SMC 4 and reinstalling it but that did not get me going right away. I finally figured out that I needed to start the postgres database with the following (NOTE: I put output from commands in curly braces {...} ):
    su postgres
    initdb -D /var/lib/pgsql/data
    {The files belonging to this database system will be owned by user "postgres".
    This user must also own the server process.

    The database cluster will be initialized with locales
      COLLATE:  en_US.ISO8859-1
      CTYPE:    en_US.ISO8859-1
      MESSAGES: C
      MONETARY: en_US.ISO8859-1
      NUMERIC:  en_US.ISO8859-1
      TIME:     en_US.ISO8859-1
    The default database encoding has accordingly been set to LATIN1.

    initdb: directory "/var/lib/pgsql/data" exists but is not empty
    If you want to create a new database system, either remove or empty
    the directory "/var/lib/pgsql/data" or run initdb
    with an argument other than "/var/lib/pgsql/data".}
    pg_ctl -D /var/lib/pgsql/data status
    {pg_ctl: neither postmaster nor postgres running}
    pg_ctl -D /var/lib/pgsql/data -l /var/lib/pgsql/data/logfile start
    {pg_ctl: another postmaster may be running; trying to start postmaster anyway
    postmaster starting}
    pg_ctl -D /var/lib/pgsql/data status
    {pg_ctl: postmaster is running (PID: 29542)
    /usr/bin/postgres -D /var/lib/pgsql/data}
    exit (out of postgres user back to root user)

    Now that the postgres database was running, I could finish the setup of SMC and es-validate would show that SMC was running.

    Since I didn't get the database running before uninstalling, I don't know if Mike's suggestion would have gotten me going by itself but it is definitely a good thing to be aware of.