Forum Stats

  • 3,758,159 Users
  • 2,251,346 Discussions
  • 7,870,074 Comments

Discussions

ORA-27102 SVR4 Error: 12: Not enough space

2

Answers

  • lrp
    lrp Member Posts: 85
    edited Jun 4, 2009 6:42PM
    to SB: This symptom has showed up across our dev and production databases every 2 months or so. Interpreting from the alert log, it has also happened close to the time when an archive/backup occurs at the same time as either a recompile, auto-stats gather, or snapshot. So, "load" would appear to cause it, but nothing I would consider heavy I/O activity.

    orafad: The metalink for ORA-27063 does mention this, and I guess it to be a rather generic post. The real culprit has to be the codes behind the 27063, which is what I'm trying to get to the bottom of:
    SVR4 = a header error indicating some OS 'thing'
    Error 12 = ???? Solaris memory error. This appears to show up for most 'capacity-related' things on google searches, for both file descriptors, semaphores, swap, and shared memory.
    Additional information = -1 error state, you're most likely correct.
    Additional information: 8192 = could mean anything, and that's what I need to find out from either Metalink support (in progress) or Solaris's knowledge base.

    Update: I apologize, orafad -- you mentioned the ENOMEM, which I missed. Where did you reference this ? I'd love to have an additional resource to look up the error code. Our memory_target/memory_max_size were set at 5G/8GB, with the shared memory set at 20GB and overall physical memory at 32GB. Our sysadmin logs showed no memory usage or swap errors, leading me to believe it was not a general 'out of memory' error so much as a kernel resource setting (semaphores, per process limit of some sort).

    Edited by: lrp on Jun 1, 2009 5:50 PM

    Edited by: lrp on Jun 1, 2009 5:53 PM
  • sb92075
    sb92075 Member Posts: 42,196 Blue Diamond
    This symptom has showed up across our dev
    if I were in your shoes, I'd do what I could to change the underlying file system .
    If problem still happens on different fs, then file system flavor can be ruled out as possible root cause.

    Again, I am fairly certain Oracle is the victim & not the culprit.
    Proving who or what is to blame will be a battle.

    Happy Hunting!
  • lrp
    lrp Member Posts: 85
    Because this happens so infrequently, I need a way to measure what is happening (be it stack, filedescriptors) . I've already got scripts tallying those resources per process on a 5 minute interval, so I'm hoping to prepare myself for the next occurrence (could be next week, could be a month from now). ..unfortunately, moving to another filesystem is going to be rather hard to prove a case for, since we would have no way to really identifying whether the experiment was successful.

    Thanks for your time in this.
  • sb92075
    sb92075 Member Posts: 42,196 Blue Diamond
    Are there any additional clues in OS messages or dmesg logs?
  • lrp
    lrp Member Posts: 85
    Absolutely no /var/adm/messages related to the failure (which is what keeps pissing me off, since Metalink's notes on the 27063 error routinely point me to the OS logs) -- I just spoke with the SA, and most of the messages get piped there. He did have sar available with a viewer, but i also had charts for the disk i/o at the time -- nothing stood out.
  • 561365
    561365 Member Posts: 132
    solaris 10 require the setting of memory limit

    projadd -U oracle -K "project.max-shm-memory=(priv,4096MB,deny)" user.oracle

    The memory specified above should be more than the actual sga &pga you are using.

    Thanks
    561365
  • lrp
    lrp Member Posts: 85
    edited Jun 2, 2009 4:32PM
    That was the first thing we increased -- Our 11g memory target (which is the combined SGA+PGA, automatically managed by the instance) is at 5GB, with a memory_max_size = 8G. While it is the only database in the solaris project there ARE other databases on the prod server, contained within their own project (ie. oraproj2) and ORACLE_HOME.

    If I'm reading the results below correctly, I believe we have 16GB of shared memory available:
    [email protected]:PROD01:/var/adm> projects -l oraproj
    oraproj
            projid : 102
            comment: ""
            users  : oracle
            groups : (none)
            attribs: project.max-sem-ids=(priv,200,deny)
                     project.max-sem-nsems=(priv,512,deny)
                     project.max-shm-ids=(priv,200,deny)
                     project.max-shm-memory=(priv,17179869184,deny)
  • jgarry
    jgarry Member Posts: 13,842
    lrp wrote:
    Good thoughts, but we had ruled that out early: Filesystem's not the issue, the disk had plenty of space. Metalink itself (472813.1) points to "the Unix error number (ERRNO) 12 during a Unix write()/open() system call, and this Unix error indicates a lack of *process memory* rather than physical disk space."

    - /var/adm/messages has no memory- or disk-related messages around the time of failure.
    - SAN administrator saw nothing in their logs at the time of failure

    We had already tried raising SGA and raising shared memory in the solaris project, but it seems like we're fishing for an answer by blindly raising a parameter when we don't know what OS limit Oracle had reached. The key numbers I'm looking for are those specified in the 'additional information' section. Oracle's knowledge base has nothing that I can use so far.
    On a different version and platform, I had rare issues when running RMAN. Eventually I came to the conclusion that: RMAN uses large pool; OS eventually fragments I/O buffers due to the way RMAN uses I/O on that platform. It is of course wild speculation that this has anything to do with your problem, but given the lack of real information, I'd say you perhaps want to shrink your SGA some, give more to large pool, and pray a lot. Oh, and sample the large pool SGA statistics when you are running RMAN, and wonder if the parallel automatic tuning is shooting your large_pool_size in the leg if you are using it.
    jgarry
  • sb92075
    sb92075 Member Posts: 42,196 Blue Diamond
    I just stumble across this

    Subject: Upon startup of Linux database get ORA-27102: out of memory Linux-X86_64 Error: 28: No space left on device
    Doc ID: 301830.1

    Let us know if it helped.
  • lrp
    lrp Member Posts: 85
    edited Jun 4, 2009 3:43PM
    Subject: Upon startup of Linux database get ORA-27102: out of memory Linux-X86_64 Error: 28: No space left on device
    Doc ID: 301830.1>
    Thank you. I looked up the article, and it shows a similar message but has a key distinction between error codes -- that OS code was error: 28, while my error was error: 12. The linux and Solaris error code tables are similar, so for OS error 28, the the solaris error code table shows:
    "28: ENOSPC No space left on device
    While writing an ordinary file or creating a directory entry, there is no free space
    left on the device. In the fcntl routine, the setting or removing of record locks
    on a file cannot be accomplished because there are no more record entries left
    on the system.{code}
    By the same token, my OS error code 12 that I'm seeing appears to mean:
    {code}
    "12 ENOMEM Not enough space
    During execution of an exec, brk, or sbrk routine, a program asks for more space
    than the system is able to supply. This is not a temporary condition; the maximum
    size is a system parameter. On some architectures, the error may also occur if the
    arrangement of "text, data, and stack segments requires too many"
    "segmentation registers, or if there is not enough swap space" during the fork
    routine. If this error occurs on a resource associated with Remote File
    Sharing (RFS), it indicates a memory depletion which may be temporary,
    dependent on system activity at the time the call was invoked.
    Emphasis on the clause "..*text, data, and stack segments requires too many segmentation registers, or if there is not enough swap space*.."

    ..So my clues from the OS documentation point to some maximum, like swap, segmentation, stack and other resources.
    I can only assume the "additional information" of -1 and 8192 are relevant numbers to those resources.

    The two things that are *8192* in my environment appear to be # of file descriptors and stack size.

    Therefore, my plan of attack is going to change both in my oracle profile to see if this occurs again:

    ulimit -n 16834 (raise file descriptors per process from 8kto 16k)
    ulimit -s 32767 (raise stack from 8mb to 32mb)

    Hopefully, this will give Oracle more leeway to use OS resources AND give me extra clues if the error shows up again. In other words, if something like stack is truly the issue, then I expect to see another crash with addtional information = *32767* instead of 8192:
    KCF: write/open error block=0x1571 online=1
    file=57 /datafile/DB_001.dbf
    error=27063 txt: 'SVR4 Error: 12: Not enough space
    Additional information: -1
    Additional information: "32767"'
    Will update this thread with any relevant results..
This discussion has been closed.