13 Replies Latest reply: Apr 7, 2014 9:11 AM by Paul In 'T Hout-Oracle RSS

    Understanding strace output

    user503699

      Hello,

       

      I am trying to troubleshoot an issue with a long delay while logging on to a Red Hat Linux 6.3 server.

      The issue occurs even when I try to do a "su - meuser" from the box itself, which makes me think it may not be related to remote connectivity or network.

      Anyways, I tried to use strace to find out what might be causing the delay. The strace output does show a particular call consuming 300 seconds to complete.

      However, I am not sure what exactly it means.

      Following are the relevant details:

      1)  strace -r -e trace=open,close,network,read,write -o sutrace.trc su - meuser

      The relevant lines from sutrace.trc:

           0.002662 socket(PF_FILE, SOCK_STREAM, 0) = 5

           0.000471 connect(5, {sa_family=AF_FILE, path="/var/lib/sss/pipes/pam"}, 110) = 0

           0.003306 write(5, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16

           0.000445 write(5, "\3\0\0\0", 4)   = 4

           0.000236 read(5, "\24\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0", 16) = 16

           0.000278 read(5, "\3\0\0\0", 4)    = 4

           0.000237 getsockopt(5, SOL_SOCKET, SO_PEERCRED, "\5\7\0\0\0\0\0\0\0\0\0\0", [12]) = 0

           0.000229 write(5, "]\0\0\0\364\0\0\0\0\0\0\0\0\0\0\0", 16) = 16

           0.000234 write(5, "IPAM\1\0\0\0\7\0\0\0oracle\0\2\0\0\0\5\0\0\0su-l\0"..., 77) = 77

         300.059967 close(5)                  = 0

       

      2) strace -r -e trace=open,close,network,read,write -o sshtrace.trc ssh meuser@localserver

      The relevant lines from sutrace.trc:

      0.000295 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3

      .

      .

      .

      300.951476 read(3, "b\343\205\370UI\376dq\244!I\6\214\203\301\205\354\323k\26\212\347\246t@\243\rpL\353\341"..., 8192) = 48

       

      3) strace -c ssh meuser@localserver

      % time     seconds  usecs/call     calls    errors syscall

      ------ ----------- ----------- --------- --------- ----------------

      37.04    0.001222           5       251       182 open

      30.31    0.001000          24        42           select

      12.76    0.000421          14        31           write

        7.79    0.000257          11        24        15 stat

        3.36    0.000111           1        88           close

        2.76    0.000091           2        53           fstat

        2.24    0.000074           1       132           read

        1.97    0.000065           3        23           munmap

        1.76    0.000058           1        41           rt_sigaction

        0.00    0.000000           0         5           poll

        0.00    0.000000           0        95           mmap

        0.00    0.000000           0        52           mprotect

        0.00    0.000000           0         6           brk

        0.00    0.000000           0         1           rt_sigprocmask

        0.00    0.000000           0        17           ioctl

        0.00    0.000000           0         1         1 access

        0.00    0.000000           0         3           dup

        0.00    0.000000           0        16           socket

        0.00    0.000000           0         9         6 connect

        0.00    0.000000           0         9           sendto

        0.00    0.000000           0         2           recvfrom

        0.00    0.000000           0        14           recvmsg

        0.00    0.000000           0         1           shutdown

        0.00    0.000000           0         7           bind

        0.00    0.000000           0         8           getsockname

        0.00    0.000000           0         1           getpeername

        0.00    0.000000           0         3           setsockopt

        0.00    0.000000           0         2           getsockopt

        0.00    0.000000           0         1           execve

        0.00    0.000000           0         1           uname

        0.00    0.000000           0        15           fcntl

        0.00    0.000000           0         1           umask

        0.00    0.000000           0         1           getrlimit

        0.00    0.000000           0         2           getuid

        0.00    0.000000           0         1           geteuid

        0.00    0.000000           0         1           setresuid

        0.00    0.000000           0         1           statfs

        0.00    0.000000           0         1           arch_prctl

        0.00    0.000000           0         9         1 futex

        0.00    0.000000           0         1           set_tid_address

        0.00    0.000000           0         1           set_robust_list

      ------ ----------- ----------- --------- --------- ----------------

      100.00    0.003299                   973       205 total

       

      Can you please help me understand what exactly does "socket(PF_FILE, SOCK_STREAM, 0)" do?

      In the above output, I am assuming that the calls taking most time are for the operations corresponding to this "socket" entry.

      I am no linux expert so can't really understand how to interpret this?

       

      Thanks in advance.

        • 1. Re: Understanding strace output
          Catch-22

          Strace may not be necessary to troubleshot your problem. You can also use ssh -vvv to debug the output.

           

          However you may want to check the following post:

           

          logon issue with OL 6.3

          • 2. Re: Understanding strace output
            user503699

            Hello Dude,

             

            Couple of points here:

            1) The issue is not just with ssh. The delay occurs even when I do a su - meuser from the server.

            2) The post you have mentioned had a symptom whereby there was a long delay to get even password prompt. In my case, the password prompt is displayed immediately.

            After I enter password, there is a long delay (almost 5+ minutes), before I get a shell prompt. All subsequent usage works fine.

            • 3. Re: Understanding strace output
              user503699

              Dude,

               

              I tried using ssh -vv as you suggested. Don't think it reveals anything but here is what I see

              meuser@myserver's password:

              debug2: we sent a password packet, wait for reply

              debug1: Authentication succeeded (password).

              debug1: channel 0: new [client-session]

              debug2: channel 0: send open

              debug1: Requesting no-more-sessions@openssh.com

              debug1: Entering interactive session.

              debug2: callback start

              debug2: client_session2_setup: id 0

              debug2: channel 0: request pty-req confirm 1

              debug1: Sending environment.

              debug1: Sending env LANG = en_US.UTF-8

              debug2: channel 0: request env confirm 0

              debug2: channel 0: request shell confirm 1

              debug2: fd 3 setting TCP_NODELAY

              debug2: callback done

              debug2: channel 0: open confirm rwindow 0 rmax 32768

              debug2: channel_input_status_confirm: type 99 id 0

              debug2: PTY allocation request accepted on channel 0

              debug2: channel 0: rcvd adjust 2097152

              debug2: channel_input_status_confirm: type 99 id 0

              debug2: shell request accepted on channel 0

               

              The delay occurs after the step (in BOLD) above.

              • 4. Re: Understanding strace output
                Catch-22

                Do you have the same issue when you use su meuser (without the - to setup the environment)? Do you have any fancy welcome or MOTD message?

                • 5. Re: Understanding strace output
                  user503699

                  Dude! wrote:

                   

                  Do you have the same issue when you use su meuser (without the - to setup the environment)? Do you have any fancy welcome or MOTD message?

                   

                  Yes. The issue is also there when I do "su meuser". There is a welcome message but it is not fancy and it is only displayed before it prompts for password, up to which stage everything works fine.

                  • 6. Re: Understanding strace output
                    Catch-22

                    How busy is the computer. Do you have any errors in /var/log/messages regarding PAM?

                    • 7. Re: Understanding strace output
                      user503699

                      Server is not busy. Unfortunately, I don't have access to /var/log/messages.

                      Does the strace output above make any sense?

                      • 8. Re: Understanding strace output
                        Catch-22

                        No. Why don't you have access to /var/log/messages?

                        • 9. Re: Understanding strace output
                          user503699

                          Dude! wrote:

                           

                          No. Why don't you have access to /var/log/messages?

                          Guess what??? It is "company security policy"....

                          Thanks for your help so far but I wanted to know if there is anything more I can do in order to investigate the issue and strace came to my mind. Because the strace output appeared to provide initial lead, I thought I would ask here.

                          • 10. Re: Understanding strace output
                            Catch-22

                            I have used strace myself a number of times for troubleshooting, but I cannot see the delays or details in the output you have provided.

                             

                            My guess is that you may have an invalid PAM module or configuration, which should be seen in the logs. If you are supposed to manage the system and do not get access to root, I would quit the job and pass the problem on to the policy makers to deal with the nonsense.

                             

                            Perhaps you can see more if you do not filter strace and instead use the defaults, like strace -f su - meuser and watch the screen.

                            • 11. Re: Understanding strace output
                              user503699

                              To be fair, I am not supposed to manage the system so I am not exactly responsible for this issue.

                              I am mainly a DBA but have great interest in learning linux and I tend to use problems like these to learn.

                              As for the strace output, if you see the details I have posted in my original post above, you will notice some system calls taking 300+ seconds to complete.

                              Having seen them in the output, I thought I would get some kind of clue as to where the problem might be.

                              • 12. Re: Understanding strace output
                                Catch-22

                                As far as I can tell, the relevant lines you marked as bold do not explain the reason for your problem. I suggest to ask the administrator to check for clues in the logs. That's pretty much the same a DBA should do first too, isn't it?

                                • 13. Re: Understanding strace output
                                  Paul In 'T Hout-Oracle

                                  Acutally, from the strace output is is quite clear that the delay happens when connecting to the PAM authentication service. The word 'PAM' or 'pam' appears twice in your initial strace output.

                                  With strace I found the -s flag also usefull at times to increase the lenght of the readable bits