This discussion is archived
2 Replies Latest reply: Feb 23, 2012 2:36 PM by 805322 RSS

Solaris 8 login problems: no UTMPX, tmchild: exec service failed...etc

805322 Newbie
Currently Being Moderated
Hi All,

I have this really strange problem on a Solaris 8 server. I've tried searching on Google but didn't come across anything useful. Hopefully someone here has more experience and can share some of their knowledge.

The basic problem is the user is unable to log into the Solaris 8 server after a period (Anything from a few hours to 10 hours). The Server does not run any Xservers.

The symptoms are as follows:
1. Server responds to pings from other machines
2. rlogin to the server produces the following errors:
No utmpx entry, you must exec "login" from lowest level "shell" or Protocol error
3. rsh will sometimes work but not always (I use rsh to reboot the server when it works)
4. login at the terminal at the physical server produces the following errors:
tmchild: exec service failed, errno=5
INIT: failed write of utmpx entry: "Co"
5. the following errors are printed on the terminal prior to this happening:
cannot open /var/spool/mqueue: Not a directory

I have check disk space on the server and all mounts are at most 40% full.

For the utmpx error the most command suggested fix seems to be to delete the /var/adm/utmpx file and create a new one. This only seems to prolong the period to failing.
  • 1. Re: Solaris 8 login problems: no UTMPX, tmchild: exec service failed...etc
    BryanWood Explorer
    Currently Being Moderated
    What is the file size of utmpx and also wtmpx?

    Seems like something is corrupting utmpx/wtmpx, maybe an automated job that frequently does an "rsh" or "ssh".

    While the system is behaving normally (after you've removed the utmpx file as you mention in your post), and after say 1 hour (given you say it begins to fail after a few hours), run:
    root# last | more
    The above output should tell you which user is logging in and how frequently. If you are able to confirm this theory, then you would be looking for the 3rd column which is the IP address or DNS alias of the source machine performing the logins.

    Here is a perl script that will roll up the last output:
    root#
    root# cat rollup.pl
    #!/usr/bin/perl
    use strict;
    my (%rollup) = ();
    open(LAST,"last|")
      or die "cannot execute last command: $!";
    while(<LAST>){
      next if (/ begins /);
      my @fields = split;
      $rollup{$fields[0]}{source}{$fields[2]}++;
      $rollup{$fields[0]}{all}++;
    }
    foreach my $user (sort
      {$rollup{$b}{all} <=> $rollup{$a}{all}}
      keys %rollup){
      next unless $user;
      print "user $user logins: $rollup{$user}{all}\n";
      foreach my $source (sort
        {$rollup{$user}{source}{$b} <=> $rollup{$user}{source}{$a}}
        keys %{$rollup{$user}{source}}){
        print "  $source logins: $rollup{$user}{source}{$source}\n";
      }
    }
    close(LAST);
    root#
    root# chmod +x rollup.pl
    root# ./rollup.pl
    user bryan logins: 38
      :0.0 logins: 24
      :0 logins: 14
    user root logins: 21
      192.168.1.107 logins: 9
      192.168.1.20 logins: 7
      192.168.1.128 logins: 4
      :0.0 logins: 1
    user reboot logins: 15
      boot logins: 15
    root#
    Another suggestion would be to save a copy of the problematic utmpx file, and try to read its entries with "od -c":
    root# cd /var/adm
    root# cp utmpx utmpx.save
    root# od -c "utmpx.save"
    Lastly, here is a script that truncates the wtmpx file taken from http://www.linuxmisc.com/3-linux/0c72ad22d625e643.htm
    #! /bin/sh - 
    # 
    # adm.weekly: once a week adm log rolling with wtmpx compression 
    # 
    PATH=/usr/bin:/bin:/usr/sbin 
    umask 022 
    LOG=wtmpx 
    DIR=/var/adm 
    cd $DIR || exit 1 
    for GEN in 4 3 2 1 0 
    do 
            ROT=`expr $GEN + 1` 
            test -f $LOG.$GEN && compress -f $LOG.$GEN 
            test -f $LOG.$GEN.Z && mv $LOG.$GEN.Z $LOG.$ROT.Z 
    done 
    BS=372 
    SK=0 
    test -f $LOG.skip && SK=`cat $LOG.skip` 
    dd if=$LOG   of=$LOG.0 bs=$BS skip=$SK 2>/dev/null 
    cp $LOG.0 $LOG 
    SK=`wc -c <$LOG.0` 
    SK=`expr $SK / $BS` 
    echo "$SK" >$LOG.skip 
    chmod 644    $LOG 
    compress $LOG.0 
    #!/end 
  • 2. Re: Solaris 8 login problems: no UTMPX, tmchild: exec service failed...etc
    805322 Newbie
    Currently Being Moderated
    Thanks BryanWood. That was helpful. I've checked wtmpx and it's not very big, about 5MB. As far as I can tell remote logins do not run any automated jobs other than users manually transferring files.

    Incidentally, the server failed to boot a few times with an error message saying no boot media was found. I'm thought the disk drive might be dying so I looked in the message log.
    I found a lot of SCSI messages like the following:

    SCSI: [ID 107833 kern.warning] WARNING: /pci@0,0/pci1000,f@12(ncrs0)
    SCSI: [ID 107833 kern.warning] WARNING: /pci@0,0/pci1000,f@12/sd@6,0 invalid reselection (6:0)
    SCSI transport failed: reason 'reset': retrying command
    SCSI transport failed: reason 'unexpected_bus_free: retrying command

    I still looking up what these mean but it would seem like the disk is failing. Can you (or anyone else) confirm?

    Thanks.

Legend

  • Correct Answers - 10 points
  • Helpful Answers - 5 points