1 Reply Latest reply on Jan 29, 2016 1:30 AM by User12616303-Oracle

    Getting "SIGPROF - Profiling timer expired error" when we invoke collect command on a program

    user7792861

      Hi team,

       

      We aren't able to record a experiment for longer duation using collect command.whenever we invoke the collect command on our program the collect executes for some time and closes the experiment but we would like to execute the program for long time.

      we tried couple of options like the flag -t specifying the duration but it doesn't record for the duration. To see what happens we did a strace on the command and found few errors in the console.

       

      command options used:

      ./collect -o test.1.er -d /opt/oracle/solarisstudio12.4/bin -t 600 -p on -S on -s calibrate -i on -H on /etc/init.d/haproxy start

       

      with strace

       

      strace -e open,close ./collect -o test.1.er -d /opt/oracle/solarisstudio12.4/bin -t 600 -p on -S on -s calibrate -i on -H on /etc/init.d/haproxy start

      [ Process PID=27383 runs in 32 bit mode. ]

      open("/opt/oracle/solarisstudio12.4/bin/../../rtlibs/tls/i686/sse2/liber_dbe.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)

      open("/opt/oracle/solarisstudio12.4/bin/../../rtlibs/tls/i686/liber_dbe.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)

      open("/opt/oracle/solarisstudio12.4/bin/../../rtlibs/tls/sse2/liber_dbe.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)

      open("/opt/oracle/solarisstudio12.4/bin/../../rtlibs/tls/liber_dbe.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)

      open("/opt/oracle/solarisstudio12.4/bin/../../rtlibs/i686/sse2/liber_dbe.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)

      open("/opt/oracle/solarisstudio12.4/bin/../../rtlibs/i686/liber_dbe.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)

      open("/opt/oracle/solarisstudio12.4/bin/../../rtlibs/sse2/liber_dbe.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)

      open("/opt/oracle/solarisstudio12.4/bin/../../rtlibs/liber_dbe.so.1", O_RDONLY) = -1 ENOENT (No such file or directory)

       

      =========

       

      open("/opt/oracle/solarisstudio12.4/bin/test.1.er/data.frameinfo", O_RDWR) = 3

      close(3)                                = 0

      --- SIGPROF (Profiling timer expired) @ 0 (0) ---

      --- SIGPROF (Profiling timer expired) @ 0 (0) ---

      --- SIGPROF (Profiling timer expired) @ 0 (0) ---

      --- SIGPROF (Profiling timer expired) @ 0 (0) ---

      --- SIGPROF (Profiling timer expired) @ 0 (0) ---

       

      Have listed the errors above thrown in console. Could you please review and let us know  how to record a experiment for long duration.

       

      Thanks

      Sattish.

        • 1. Re: Getting "SIGPROF - Profiling timer expired error" when we invoke collect command on a program
          User12616303-Oracle

          Some thoughts:
           
            - not all Linuxes are fully supported.  Adding the version and kernel number to this post might be helpful.
           
            - collect may not have been tested with daemons.  Are you actually interested in haproxy itself, or some other process?  It could be that collect is having problems "following" the various forks/spawns to get to the process of interest.

            - What is the username listed for the process you actually want to measure?  Is that process a descendent of haproxy?  If root is involved, collect might be running into file permission problems.  For example, if a user process creates the parent experiment, and root subprocess then tries to write a subexperiment, it may fail with access permissions.
           
            -  In case you are tempted to try  -P to attach to a running PID:  attach to multithreaded processes only works on Solaris.  If possible, I'd avoid the -P <pid> option on Linux.

            - Running collect with several tracing options enabled, in particular (-i, -H), may not give you very useful results since each introduces significant overhead.  Note that collect's most common use-case is time profiling ("collect -p"), so I'd try to get that working first.
           
            - It can be helpful to write experiments to a fast file system (e.g. -d /tmp) in order to get more accurate performance results.
           
            - By default, a binary archiving step occurs during collect; it can be turned off to aid in debugging with -A off.  (Post-run archiving can be done with er_archive).
           
            - Inside each experiment is a file called log.xml. The data includes collect version, options, start/stop times, errors, and info on descendent processes.  If you have descendent processes, there will also be subexperiments, each with a log.xml.  Take a look at the log.xml files to see if they make sense.  You can ask for help decoding the contents.
           
           
          So... maybe start with the simplest data collection options on a simple app that just does some busy work. E.g.
           
              collect -d /tmp -A off <my_standalone_app>
           
          If that works try your application (/etc/init.d/haproxy)

              collect -d /tmp -A off /etc/init.d/haproxy start

          If the application seems to stop early, take a look at the log.xml files, and compare that to the descendent processes you expect.

          You can also look at the experiment in Analyzer's timeline to see if it jives with the log.xml files.

          If haproxy run succeeds and the process is actually multithreaded, you could then try adding "-s on". 
           
              collect -d /tmp -A off -s on /etc/init.d/haproxy start

          Finally, if you really need the data, you could try one tracing option (-i or -H) at a time, and possibly without -s or -p.   Runs longer than a few minutes can create very large experiments that take a long time to load.