2 Replies Latest reply: Jul 31, 2009 1:34 PM by 807567 RSS

    Weird process hang/loop

    807567
      All -

      We are running Solaris 10 on a T5240 2 cpus, 8 cores, configured to run 128 virtual cpus. One of the programs we run successfully on our Solaris 8 platform is experiencing odd behavior on our new box. Intermittent executions of the program causes a "hang". The high level sequence of events: a shell script calls a cobol program, the cobol program hangs. I ran truss -p against the pid for the cobol program. Two different hang scenarios revealed two different truss outputs.

      Output 1 :

      stat("M$FST345.TMP", 0xFFFFFFFF7FFFD830) Err#2 ENOENT
      open("M$FST345.TMP", O_RDWR|O_CREAT, 0666) Err#13 EACCES [file_dac_write]
      stat("M$FST346.TMP", 0xFFFFFFFF7FFFD6E8) Err#2 ENOENT
      stat("M$FST346.TMP", 0xFFFFFFFF7FFFD830) Err#2 ENOENT
      open("M$FST346.TMP", O_RDWR|O_CREAT, 0666) Err#13 EACCES [file_dac_write]
      stat("M$FST347.TMP", 0xFFFFFFFF7FFFD6E8) Err#2 ENOENT
      stat("M$FST347.TMP", 0xFFFFFFFF7FFFD830) Err#2 ENOENT
      open("M$FST347.TMP", O_RDWR|O_CREAT, 0666) Err#13 EACCES [file_dac_write]


      This is repeated until "M$FST" reaches 999 then it starts over - the output screams out and the pid itself is getting tons of cpu time.

      The second hang scenario's truss output is:

      nanosleep(0xFFFFFFFF7FFF7670, 0xFFFFFFFF7FFF7660) = 0
      lseek(115, 0, SEEK_SET) = 0
      read(115, "FE S0202040403FF\0010702".., 241) = 241
      fcntl(114, F_SETLK, 0xFFFFFFFF7FFF7130) Err#11 EAGAIN
      nanosleep(0xFFFFFFFF7FFF7670, 0xFFFFFFFF7FFF7660) = 0
      lseek(115, 0, SEEK_SET) = 0
      read(115, "FE S0202040403FF\0010702".., 241) = 241
      fcntl(114, F_SETLK, 0xFFFFFFFF7FFF7130) Err#11 EAGAIN

      The same info is repeated - although the output from this truss iteration is much slower.

      I apologize for what is almost for sure a lack of detail here - I am a DBA helping out :-) Any advise is appreciated. The processes are still hung at this moment.

      Thanks -

      Mike
        • 1. Re: Weird process hang/loop
          807567
          The core:

          core 'core.13085' of 13085: /fcicrd/vcs/prod/das/das2008/bin/das2008
          ffffffff7ebd7f90 stat (ffffffff7fffdae8, ffffffff7fffdbd4, 870b, ffffffff7fffdafc, ffffffff7f5b4cf0, 11d974) + 8
          ffffffff7f430378 disk_open (ffffffff7fffdae8, ffffffff7fffdbd4, 870b, ffffffff7fffdafc, 184a34, ffffffff7f5b4cf0) + c0
          ffffffff7f44dc24 mFt_rt_fsys_file_open (100b047b8, 870b, ffffffff7fffdfdc, 0, 13, 17ca4c) + 74
          ffffffff7f4963c8 osm_open (ffffffff7fffe0a8, ffffffff7fffdbd4, 3, 870b, ffffffff7fffe0b8, 1) + 230
          ffffffff7f492514 CBL_CREATE_FILE (122874, ffffffff7dd5fa32, ffffffff7dd5fa33, ffffffff7dd5fa34, ffffffff7dd6132c, 0) + a4
          ffffffff7dc0f81c ???????? (ffffffff7dd9a760, 1, ffffffff7dd98760, ffffffff7dc59b90, ffffffff7dd60760, ffffffff7dd5f8f0)
          0000000100163e6c ???????? (101, 100341e30, 10022c408, 100229118, 10033fe30, 10033e768)
          ffffffff7f4b29d0 ???????? (100002418, 0, ffffffff7ffff398, 8, ffffffff7ffff288, 100b0ec18)
          ffffffff7f48448c _mFgmain2 (100002418, 0, 10033edb8, ffffffff7ffff3a0, ffffffff7f5b4cf0, 68) + fc
          0000000100002404 ???????? (1, ffffffff7ffff538, ffffffff7ffff548, ffffffff7eb4b820, ffffffff7f100140, ffffffff7f700200)
          000000010000239c _start (0, 0, 0, 0, 0, 0) + 17c                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
          • 2. Re: Weird process hang/loop
            807567
            I think we figured this one out. Issues with Cobol and locking... the pfiles cmd was useful for sure.

            MM