13 Replies Latest reply on Jun 28, 2010 11:58 PM by 807559

    Not enough memory space on solaris 10

    807559
      Hi,
      I have various job submission clients pumping job to my application on Solaris 10 for 24hrs duration.After 6hrs i am getting a "Not enough memory space exception" .
      I am running my solaris 10 on default configurations set already by SUN.
      Please help me out.
        • 1. Re: Not enough memory space on solaris 10
          807559
          Hi,

          we have an E25K and returns the same error after Solaris migration (9 --> 10) after a few hours.
          We have installed the solaris resource manager for managing sessions on server. Currently the system is semi-stable, but error:

          prstat: not enough memory: Not enough space

          occurs for example after set up prstat.
          • 2. Re: Not enough memory space on solaris 10
            807559
            I found that a project and a hugh number of file-descriptors set e.g 10000
            ulimit -n 1024 then prstat started working again

            PROJID NPROC SIZE RSS MEMORY TIME CPU PROJECT
            0 93 9495M 7860M 98% 0:03:04 1.1% system
            1 6 28M 17M 0.2% 0:00:00 0.1% user.root
            200 1 3048K 2392K 0.0% 0:00:00 0.0% group.mqm
            3 10 46M 25M 0.3% 0:00:00 0.0% #default
            • 3. Re: Not enough memory space on solaris 10
              807559
              The shell we use is tcsh. So I tried # limit descriptors 1024.
              ulimit -a
              time(seconds) unlimited
              file(blocks) unlimited
              data(kbytes) unlimited
              stack(kbytes) 8192
              coredump(blocks) 0
              nofiles(descriptors) 1024
              vmemory(kbytes) unlimited

              But prstat complained again: >prstat
              prstat: not enough memory: Not enough space

              Any ideas? If this problem concerns only prstat, then It's not a problem for us. But if we are having other problems (bind crashes) on this machine. From vmstat, memory is not a problem:
              vmstat 5 3
              kthr memory page disk faults cpu
              r b w swap free re mf pi po fr de sr m0 m1 m2 m5 in sy cs us sy id
              0 0 0 11577560 1676136 5 17 10 0 0 0 0 1 1 1 0 262 105 235 47 17 36
              1 0 0 12419992 2493288 0 8 0 0 0 0 0 0 0 0 0 3560 24854 2456 70 29 1
              1 0 0 12419840 2493136 0 0 0 0 0 0 0 0 0 0 0 3750 26288 2154 68 32 1

              Any help would be appreciated...
              • 4. Re: Not enough memory space on solaris 10
                807559
                You could try running prstat with truss ( truss -f -o /tmp/out prstat ) and see what syscall is actually failing. You have a large # of projects/zones on this machine?
                • 5. Re: Not enough memory space on solaris 10
                  807559
                  truss -f -o /tmp/out prstat
                  prstat: not enough memory: Not enough space
                  I do not have any projects/zones on this machine
                  • 6. Re: Not enough memory space on solaris 10
                    807559
                    uh, so what did it say in /tmp/out? There should be an ENOMEM somewhere. Do you have swap configured? (what does swap -l say?)
                    • 7. Re: Not enough memory space on solaris 10
                      807559
                      swap -l
                      swapfile             dev  swaplo blocks   free /dev/md/dsk/d5      85,5      16 20480096 20480096
                      and ths is the output of /tmp/out:
                      root@ns> truss -f -o /tmp/out prstat
                      prstat: not enough memory: Not enough space
                      
                      root@ns> cat /tmp/out
                      10810:  execve("/usr/bin/prstat", 0xFFBFFD14, 0xFFBFFD1C)  argc = 1
                      10810:  resolvepath("/usr/lib/ld.so.1", "/usr/lib/ld.so.1", 1023) = 16
                      10810:  resolvepath("/usr/bin/prstat", "/usr/bin/prstat", 1023) = 15
                      10810:  stat("/usr/bin/prstat", 0xFFBFFAE0)             = 0
                      10810:  open("/var/ld/ld.config", O_RDONLY)             = 3
                      10810:  fstat(3, 0xFFBFF560)                            = 0
                      10810:  mmap(0x00000000, 152, PROT_READ, MAP_SHARED, 3, 0) = 0xFF3B0000
                      10810:  close(3)                                        = 0
                      10810:  stat("/lib/libc.so.1", 0xFFBFF5E8)              = 0
                      10810:  resolvepath("/lib/libc.so.1", "/usr/lib/libc.so.1", 1023) = 18
                      10810:  open("/lib/libc.so.1", O_RDONLY)                = 3
                      10810:  mmap(0x00010000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 3, 0) = 0xFF3A0000
                      10810:  mmap(0x00010000, 802816, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFF280000
                      10810:  mmap(0xFF280000, 704380, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFF280000
                      10810:  mmap(0xFF33C000, 24560, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 704512) = 0xFF33C000
                      10810:  mmap(0xFF342000, 6792, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANON, -1, 0) = 0xFF342000
                      10810:  munmap(0xFF32C000, 65536)                       = 0
                      10810:  memcntl(0xFF280000, 117768, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
                      10810:  close(3)                                        = 0
                      10810:  stat("/lib/libdl.so.1", 0xFFBFF5E8)             = 0
                      10810:  resolvepath("/lib/libdl.so.1", "/usr/lib/libdl.so.1", 1023) = 19
                      10810:  open("/lib/libdl.so.1", O_RDONLY)               = 3
                      10810:  mmap(0xFF3A0000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFF3A0000
                      10810:  mmap(0x00002000, 8192, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFF3AA000
                      10810:  mmap(0xFF3AA000, 2210, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFF3AA000
                      10810:  close(3)                                        = 0
                      10810:  stat("/usr/platform/SUNW,Sun-Fire-V210/lib/libc_psr.so.1", 0xFFBFF2E8) = 0
                      10810:  resolvepath("/usr/platform/SUNW,Sun-Fire-V210/lib/libc_psr.so.1", "/usr/platform/sun4u-us3/lib/libc_psr.so.1", 1023) = 41
                      10810:  open("/usr/platform/SUNW,Sun-Fire-V210/lib/libc_psr.so.1", O_RDONLY) = 3
                      10810:  mmap(0xFF3A0000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFF3A0000
                      10810:  close(3)                                        = 0
                      10810:  mmap(0x00000000, 8192, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFF390000
                      10810:  getustack(0xFFBFF924)
                      10810:  getrlimit(RLIMIT_STACK, 0xFFBFF91C)             = 0
                      10810:  getcontext(0xFFBFF758)
                      10810:  setustack(0xFF343A0C)
                      10810:  brk(0x00021150)                                 = 0
                      10810:  brk(0x00023150)                                 = 0
                      10810:  sysinfo(SI_ISALIST, "", 1)                      = 98
                      10810:  sysinfo(SI_ISALIST, "sparcv9+vis2 sparcv9+vis sparcv9 sparcv8plus+vis sparcv8plus sparcv8 sparcv8-fsmuld sparcv7 sparc", 98) = 98
                      10810:  open("/proc/self/auxv", O_RDONLY)               = 3
                      10810:  fstat(3, 0xFFBFFB00)                            = 0
                      10810:  read(3, "\0\007D8FFBFFFD7\0\007DE".., 152)      = 152
                      10810:  close(3)                                        = 0
                      10810:  access("/usr/bin/sparcv9+vis2/prstat", 1)       Err#2 ENOENT
                      10810:  access("/usr/bin/sparcv9+vis/prstat", 1)        Err#2 ENOENT
                      10810:  access("/usr/bin/sparcv9/prstat", 1)            = 0
                      10810:  execve("/usr/bin/sparcv9/prstat", 0xFFBFFD14, 0xFFBFFD1C)  argc = 1
                      10810:  mmap(0x00000000, 8192, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFFFFFFFF7F500000
                      10810:  resolvepath("/usr/lib/sparcv9/ld.so.1", "/usr/lib/sparcv9/ld.so.1", 1023) = 24
                      10810:  resolvepath("/usr/bin/sparcv9/prstat", "/usr/bin/sparcv9/prstat", 1023) = 23
                      10810:  stat("/usr/bin/sparcv9/prstat", 0xFFFFFFFF7FFFF858) = 0
                      10810:  open("/var/ld/64/ld.config", O_RDONLY)          Err#2 ENOENT
                      10810:  stat("/usr/lib/64/libc.so.1", 0xFFFFFFFF7FFFEE70) = 0
                      10810:  resolvepath("/usr/lib/64/libc.so.1", "/usr/lib/sparcv9/libc.so.1", 1023) = 26
                      10810:  open("/usr/lib/64/libc.so.1", O_RDONLY)         = 3
                      10810:  mmap(0x00100000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_ALIGN, 3, 0) = 0xFFFFFFFF7F400000
                      10810:  mmap(0x00100000, 1859584, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFFFFFFFF7F200000
                      10810:  mmap(0xFFFFFFFF7F200000, 741688, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFFFFFFFF7F200000
                      10810:  mmap(0xFFFFFFFF7F3B6000, 52744, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 745472) = 0xFFFFFFFF7F3B6000
                      10810:  mmap(0xFFFFFFFF7F3C4000, 3960, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_ANON, -1, 0) = 0xFFFFFFFF7F3C4000
                      10810:  munmap(0xFFFFFFFF7F2B6000, 1048576)             = 0
                      10810:  memcntl(0xFFFFFFFF7F200000, 159008, MC_ADVISE, MADV_WILLNEED, 0, 0) = 0
                      10810:  close(3)                                        = 0
                      10810:  stat("/usr/lib/64/libdl.so.1", 0xFFFFFFFF7FFFEE70) = 0
                      10810:  resolvepath("/usr/lib/64/libdl.so.1", "/usr/lib/sparcv9/libdl.so.1", 1023) = 27
                      10810:  open("/usr/lib/64/libdl.so.1", O_RDONLY)        = 3
                      10810:  mmap(0xFFFFFFFF7F400000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFFFFFFFF7F400000
                      10810:  mmap(0x00002000, 8192, PROT_NONE, MAP_PRIVATE|MAP_NORESERVE|MAP_ANON|MAP_ALIGN, -1, 0) = 0xFFFFFFFF7F100000
                      10810:  mmap(0xFFFFFFFF7F100000, 2614, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFFFFFFFF7F100000
                      10810:  close(3)                                        = 0
                      10810:  stat("/usr/platform/SUNW,Sun-Fire-V210/lib/sparcv9/libc_psr.so.1", 0xFFFFFFFF7FFFE800) = 0
                      10810:  resolvepath("/usr/platform/SUNW,Sun-Fire-V210/lib/sparcv9/libc_psr.so.1", "/usr/platform/sun4u-us3/lib/sparcv9/libc_psr.so.1", 1023) = 49
                      10810:  open("/usr/platform/SUNW,Sun-Fire-V210/lib/sparcv9/libc_psr.so.1", O_RDONLY) = 3
                      10810:  mmap(0xFFFFFFFF7F400000, 8192, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED, 3, 0) = 0xFFFFFFFF7F400000
                      10810:  mmap(0x00000000, 8192, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANON, -1, 0) = 0xFFFFFFFF7F000000
                      10810:  close(3)                                        = 0
                      10810:  getustack(0xFFFFFFFF7FFFF3E8)
                      10810:  getrlimit(RLIMIT_STACK, 0xFFFFFFFF7FFFF3D8)     = 0
                      10810:  getcontext(0xFFFFFFFF7FFFF0F0)
                      10810:  setustack(0xFFFFFFFF7F3C4EE8)
                      10810:  brk(0x100115250)                                = 0
                      10810:  brk(0x100119250)                                = 0
                      10810:  getrlimit(RLIMIT_NOFILE, 0xFFFFFFFF7FFFFA90)    = 0
                      10810:  setrlimit(RLIMIT_NOFILE, 0xFFFFFFFF7FFFFA90)    = 0
                      10810:  fstat(2, 0xFFFFFFFF7FFFE520)                    = 0
                      10810:  write(2, " p r s t a t", 6)                     = 6
                      10810:  write(2, " :  ", 2)                             = 2
                      10810:  write(2, " n o t   e n o u g h   m".., 17)      = 17
                      10810:  write(2, " :  ", 2)                             = 2
                      10810:  write(2, " N o t   e n o u g h   s".., 16)      = 16
                      10810:  write(2, "\n", 1)                               = 1
                      10810:  _exit(1)
                      Sorry, but I didn't see any ENOMEM in /tmp/out..
                      • 8. Re: Not enough memory space on solaris 10
                        807559
                        Hmmm. What does nofiles say if you do a "ulimit -H"? If it says unlimited, looks like the prstat source will puke with that value. You could try doing a 'ulimit -H -n 65536" and see if prstat works.
                        • 9. Re: Not enough memory space on solaris 10
                          807559
                          Thanks for your reply.
                          I tried settting the hard limit:
                          root> ulimit -H
                          unlimited
                          root> ulimit -H -n 65536
                          root> prstat
                          prstat: not enough memory: Not enough space
                          But this didn't work either...
                          Any other ideas?
                          • 10. Re: Not enough memory space on solaris 10
                            807559
                            Sorry... guess I should have read things more carefully. Thought the problem was that it never ran, instead of it ran and stopped. Prstat seems to dump that error when it cannot reallocate some memory. So, either a) there's a bug in realloc or b) you are actually out of memory. So, try the following:
                            1) when it is failing, run vmstat and look at what is says for swap. That number should have more than 5 digits.
                            2) try ulimit -n 12; prstat
                            This should cause prstat to allocate much less memory on initialization
                            3) try running with a different malloc library - put this into a shell script and run it:

                            #!/bin/sh
                            LD_PRELOAD=watchmalloc.so.1
                            MALLOC_DEBUG=RW
                            export LD_PRELOAD MALLOC_DEBUG
                            prstat

                            -r
                            • 11. Re: Not enough memory space on solaris 10
                              807559
                              Below are the answers to your questions:
                              1)
                              prstat
                              prstat: not enough memory: Not enough space
                              vmstat
                              kthr memory page disk faults cpu
                              r b w swap free re mf pi po fr de sr m0 m1 m2 m5 in sy cs us sy id
                              0 0 0 11815656 1927072 4 19 3 0 0 0 0 1 1 1 0 890 176 1516 41 17 42

                              2)
                              ulimit -n 12; prstat
                              prstat: not enough memory: Not enough space

                              3)
                              cat prstat-test.sh
                              #!/bin/sh -x LD_PRELOAD=watchmalloc.so.1 MALLOC_DEBUG=RW export LD_PRELOAD MALLOC_DEBUG prstat
                              ./prstat-test.sh;vmstat
                              LD_PRELOAD=watchmalloc.so.1 MALLOC_DEBUG=RW + export LD_PRELOAD MALLOC_DEBUG + prstat prstat: not enough memory: Not enough space kthr      memory            page            disk          faults      cpu r b w   swap  free  re  mf pi po fr de sr m0 m1 m2 m5   in   sy   cs us sy id 0 0 0 11815728 1927152 4 19 3  0  0  0  0  1  1  1  0  890  179 1516 41 17 42
                              • 12. Re: Not enough memory space on solaris 10
                                791266
                                mk789 wrote:
                                Solaris 10 by default places /tmp on swap. This is good for speed, but not so good on a general purpose box where some applications may fill up /tmp. If you fill /tmp, you essentially reduce the amount of available swap to 0. This can lead to trouble, run out of physical ram, and new processes may not start. You get lovely fork() errors on the shell, and interesting messages in dmesg:

                                # ps -ef
                                -bash: fork: Not enough space
                                # free
                                -bash: fork: Not enough space
                                # prstat
                                -bash: fork: Not enough space
                                ...
                                # dmesg
                                ...
                                Dec 7 02:56:27 w01.someserver.everycity.co.uk genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 8193 (munin-node)
                                Dec 7 02:56:51 w01. someserver.everycity.co.uk tmpfs: [ID 518458 kern.warning] WARNING: /tmp: File system full, swap space limit exceeded
                                Dec 7 02:56:57 w01. someserver.everycity.co.uk genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 8223 (exim)
                                Dec 7 02:57:26 w01. someserver.everycity.co.uk genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 563 (httpd)

                                The easiest way to fix this is to immediately disable any services that eat ram using svcadm disable, and clear out /tmp. You can then either move /tmp to a physical partition by editing /etc/vfstab, increase the amount of swap, or my favourite, limit the amount of swap /tmp can use by adding a mount option to /etc/vfstab:

                                # grep /tmp /etc/vfstab
                                swap - /tmp tmpfs - yes SIZE=2048M

                                Unfortunately with this you have to reboot the box, which wasn’t an option with the machine I was running on. So I added a bunch more swap for the time being.
                                @mk789

                                It's great that you want to help, but please don't use this site as a link-farm. I'm blocking your post, but what you posted (except for the link can be seen in my quote).

                                Kaj
                                • 13. Re: Not enough memory space on solaris 10
                                  807559
                                  Hello i had a similar problem and neither 'reboot' command was working, the server was a few miles away so the problem was big hehe.

                                  an awful workaroud if you have no other choice:

                                  cd /proc
                                  echo *

                                  0 147 212 24021 24134 24168 24248 24317 24549 24563 3 318 425 489 7
                                  1 2 22656 24023 24138 24173 24251 24474 24551 24565 301 330 426 503 702
                                  120 202 22663 24025 24139 24174 24265 24476 24553 24567 306 333 455 506 726
                                  124 20881 22665 24080 24143 24175 24289 24520 24555 24569 307 337 458 508 884
                                  135 20888 22670 24085 24146 24176 24299 24524 24557 24571 308 350 459 538 9
                                  140 20890 24003 24126 24150 24241 24300 24546 24559 24573 313 352 469 542
                                  145 20902 24004 24132 24160 24242 24302 24547 24561 24574 316 354 481 547

                                  (This is a like ls/ps command )
                                  so this are the actual PIDs so i signaled with -9 (kill -9) 4 or five of the higher ones (example kill -9 24302; kill -9 24242 ...) . then i was able of restart zones and applications. (DON'T KILL the number 0!!! PLEASE!!! use the higher PIDs ok? )

                                  I don't recommend this unless you really really need to...