Skip to Main Content

Hardware

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

SAM-QFS 5.3 What Works With What

Brad Blasing - Oracle-OracleJun 23 2015 — edited Jun 30 2015

SAM-QFS 5.3 What Works With What

(Revision 1.5)

April 20, 2015

1. Introduction

SAM-QFS is an integrated hierarchical storage manager (HSM) and storage area network (SAN) file system. SAM is the HSM storage and archive management component. QFS is the SAN scalable high performance file system component. SAM-QFS also has integrated disk volume management and tape volume management. QFS also has a write once, read many times (WORM) file system capability. QFS can be used independently of SAM when just a file system is needed. SAM requires QFS and cannot be used independently of QFS. This "What Works With What" covers the specifics of what works with SAM-QFS.

This What Works With What (WWWW or W4) covers the following products:

  • SAM-QFS 5.3
  • QFS 5.3

The build level for the base 5.3 release is 5.3.2.The build level for the base 5.3-01 patch is 5.3.9.The build level for the base 5.3-02 patch is 5.3.14.The build level for the base 5.3-03 patch is 5.3.18.

2. Supported Products, Configurations, And Features

The following products, configurations, and features are supported with SAM-QFS and QFS.

2.1 Hardware Products

  • All Sun SPARC (64 bit) architecture servers and workstations.
  • All Sun x64 (64 bit) AMD architecture servers and workstations.
  • All Sun x64 (64 bit) Intel architecture servers and workstations.
  • All Sun x86 (32 bit) Intel architecture servers are only supported as Shared QFS clients.
  • All Fujitsu Prime Power SPARC servers.
  • All Sun SCSI, FC, and iSCSI protocols.
  • All Sun SCSI and FC HBAs.
  • All Sun FC switches.
  • All Sun RAID disk arrays (includes StorageTek RAID disk arrays).
  • All Sun hard disk drives (includes the X4500).
  • The tape libraries and tape drives listed at:
    SAM-QFS Tape Library and Drive Support
  • All non-Sun HBAs and switches supported by Sun (includes HBAs and switches listed in the Interop and PreQual Tools as well as via the Get To Yes program).
  • All non-Sun RAID disk arrays supported by Sun (includes RAID disk arrays listed in the Interop and PreQual Tools as well as via the Get To Yes program).

2.2 Software Products

  • Oracle Solaris Operating Systems: See section 4.1 below for certified versions
  • Linux Operating Systems: See section 4.2 below for certified versions
  • ZFS Volume Manager (ZVOLs)
  • Solaris Volume Manager (SVM)
  • Solaris Volume Manager (SVM) OBAN cluster capability with Solaris Cluster
  • Veritas Volume Manager (VxVM) with stand-alone SAM-QFS
  • NFS V4 (without delegation for shared file systems)
  • NFS V3
  • SAMBA
  • All StorageTek ACSLS releases through 8.2
  • Solaris Cluster HA-QFS
  • Solaris Cluster HA-SAM (SAM failover). It is only supported in a two host active/passive configuration. There are several configuration restrictions regarding this feature and the SAM Administration Guide should be read prior to using this feature.
  • Solaris Cluster HA-NFS with HA-SAM.  NFS v3 and v4 were tested.  NFS file systems must be exported from the current SAM-QFS MDS.  Minimum code levels for using HA-NFS with HA-SAM are Solaris 11.1 SRU 19.6 and SAM-QFS 5.3.
  • Solaris Cluster HA-Oracle
  • Solaris Cluster Scalable (Shared) QFS with Oracle Real Application Cluster (Oracle RAC)
  • IBM Tivoli SANergy with a Solaris SPARC server. IBM Tivoli SANergy file sharing software may be used to support hosts running different operating systems (heterogeneous support). This software is not supported with Solaris x64 servers, however. This software is now supported by Sun Support Services with IBM serving as backline support.

2.3 Configurations

  • Solaris mixed architecture (SPARC with x64) Shared QFS configurations requires use of EFI disk labels.
  • Shared QFS should be implemented using a private or dedicated ethernet switched network for metadata transmission. Performance problems can occur if this network is not private.
  • For Shared QFS configurations of 10 hosts (nodes) or greater it is recommended that applications not be run on the metadata server to provide more predictable client performance.
  • Sun StorageTek Enterprise Backup to a SAM-QFS file system.
  • Veritas NetBackup to a SAM-QFS file system.
  • Clients outside of Solaris Cluster in Solaris Cluster configurations. This includes clients of mixed architectures (SPARC with x64). This capability is only supported with the ma file system type.

2.4 Virtualization Configurations

  • Zones (containers). Only one non-global zone per file system is supported. There are several configuration restrictions regarding this feature and the QFS Administration Guide should be read prior to using this feature. For Solaris Cluster, Oracle RAC is supported in a Zone Cluster.
  • Oracle VM Server for SPARC (previously known as Sun Logical Domains, or LDOMs) is supported with SAM-QFS, with the following restrictions:
    - Minimum of 4 cores assigned to the domain.
    - Suggested minimum of 24GB of RAM.
    - SAM-QFS MDS must boot from a physical device, and therefore it must have at least 1 PCI root complex.
    - Disk I/O virtualization is not supported for LUNs which are used in a QFS file system.
    - Network virtualization is supported.
    - Tape devices must be attached via non-virtualized PCI slots attached to the SAM-QFS MDS server.
    - QFS clients may boot from a virtualized disk, however they still need a PCI root complex to access file system devices via PCI controllers (FC or SAS etc.)
  • For Solaris Cluster, virtual storage devices used by Shared QFS must be backed by whole SCSI FC LUN disk arrays, and must not be shared with any other guest domain on the same server; virtualized partial LUN disk arrays are not supported.

2.5 Feature Limitations

  • If running SAM with Shared QFS, SAM must be run on the metadata server.
  • A version 1 file system cannot be upgraded to a version 2 file system.
  • SAM-Remote servers and clients must be running the same revision level of SAM-QFS.
  • Remote disk archiving servers and clients must be running the same revision level of SAM-QFS.
  • Shared QFS clients must be running within one software revision of the Shared QFS metadata server. So if the Shared QFS metadata server is at revision level "N", the Shared QFS clients must be at revision level N or N-1. (Note that a Shared QFS client will be at revision level N+1 only when performing the first step of a rolling upgrade where the potential metadata server is upgraded.)
  • Multi-reader servers and clients must be running the same revision of SAM-QFS or QFS software.
  • Mixed architecture (SPARC with x64) metadata servers are not supported for failover purposes.
  • Mixed architecture (SPARC with x64) multi-reader configurations are not supported.
  • Online shrink. Online shrink is only supported for V2A ma file systems. Online shrink now includes support for Solaris Cluster.
  • NFSv4 ACL's supported on Solaris 11 only.

3. Unsupported Products, Configurations, And Features

The following products, configurations, and features are not currently supported with SAM-QFS and QFS.

3.1 Unsupported Software Products

  • Sun StorageTek Enterprise Backup of a SAM file system
  • Veritas NetBackup of a SAM file system
  • Instant Image (II)
  • Sun StorageTek Network Data Replicator (SNDR)
  • NFS V4 delegations with Shared QFS
  • CIFS (Common Internet File System) - Ephemeral ID's are not supported. All Windows identities must have an explicit idmap entry (either directory or name map based)

3.2 Unsupported Configurations

  • Solaris regular zones with Solaris Cluster
  • Solaris Cluster HA-SAM with clients outside of Solaris Cluster

3.3 Unsupported Features

  • Segmented files on a shared file system
  • Memory mapped segmented files
  • Mixed architecture (SPARC with x64) metadata server failover
  • Mixed architecture (SPARC with x64) multi-reader
  • The Linux Shared QFS client doesn't support:
    - access control lists (ACLs)
    - quotas
    - being used as a NFS or Samba server
    - sam-aio driver
    - SAM-QFS Manager (GUI)
    - 32 bit kernels on x64 systems
    - forced unmount (umount -f)
    - mdadm path failover

4. Certified Products And Configurations

The following products and configurations were certified (tested) for SAM-QFS and QFS:

4.1 SAM and QFS

  • Oracle Solaris 11 (base 11.0 release)
  • Oracle Solaris 11.1 (requires SAM-QFS 5.3-01 patch)
  • Oracle Solaris 11.2 (requires SAM-QFS 5.3-03 patch)
  • Oracle Solaris 10 10/08 or later updates of Solaris 10
  • The tape libraries and tape drives listed at:
    SAM-QFS Tape Library and Drive Support

4.2 Linux Shared QFS Clients

The following versions of Linux were certified for Sun x64 64 bit architecture systems:

  • Oracle Linux 5.6 (2.6.18-238.0.0.0.1.el5 kernel)
  • Oracle Enterprise Linux 5.4 (2.6.18-164.0.0.0.1.el5 kernel)
  • RedHat 5.6 SMP RHEL AS and ES (via OL 5.6)
  • RedHat 5.4 SMP RHEL AS and ES (via OEL 5.4)
  • RedHat 4.5 (2.6.9-55.ELsmp x86_64 kernel) SMP RHEL AS and ES
  • SUSE 11 Service Pack 1 (2.6.32.12-0.7-default kernel) SMP SLES
  • SUSE 10 Service Pack 3 (2.6.16.60-0.54.5-smp kernel) SMP SLES
  • SUSE 10 Service Pack 2 (2.6.16.60-0.21-smp x86_64 kernel) SMP SLES
  • SUSE 9 Service Pack 4 (2.6.5-7.308 x86_64 kernel) SMP SLES

4.2.1 Linux I/O Channel (Path) Failover

Device mapper path failover was tested with RedHat 4, SUSE 10, and SUSE 9 and is supported.mdadm path failover is not supported. This is due to its use of a superblock at the end of each disk slice that Solaris is unaware of.

4.3 Oracle Database

4.3.1 HA-Oracle

The following versions of HA-Oracle were certified with QFS and Solaris Cluster:

  • SPARC Oracle 11g Release 2 (11.2.0.3, 11.2.0.4 also supported)
  • SPARC Oracle 10g Release 2 (10.2.0.5)
  • x64 Oracle 11g Release 2 (11.2.0.3)
  • x64 Oracle 10g Release 2 (10.2.0.5)

4.3.2 Oracle Real Application Cluster (Oracle RAC)

The following versions of Oracle RAC were certified with Shared QFS and Solaris Cluster:

  • SPARC Oracle 11g Release 2 (11.2.0.3, 11.2.0.4 also supported)
  • SPARC Oracle 10g Release 2 (10.2.0.5)
  • x64 Oracle 11g Release 2 (11.2.0.3)
  • x64 Oracle 10g Release 2 (10.2.0.5)

4.3.3 Oracle Solaris Cluster

The following versions of OSC were certified with Shared QFS:

  • OSC 3.3 U1 with Solaris 10
  • OSC 4.0 with Solaris 11.0
  • OSC 4.1 with Solaris 11.1  (SAM-QFS 5.3-01 or a later patch is required for Solaris 11.1)
  • Clusters configured with Solaris Cluster should all be at the same major level of Solaris.

5. Uncertified Products And Configurations

The following products were not certified or tested with SAM-QFS and QFS and thus the potential exists that a customer may have problems with them:

  • Sun StorageTek Enterprise Backup of a QFS file system
  • Veritas NetBackup of a QFS file system
  • Veritas Cluster Services (VCS)
  • Veritas Cluster Volume Manager (VCVM)
  • Sun StorageTek Resource Management Suite
  • Sun StorADE
  • Any other Solaris Cluster agent with shared QFS configuration that hasn't been previously listed as supported
  • HA-NFS with HA-SAM with code levels less than Solaris 11.1 SRU 19.6 or SAM-QFS 5.3.

Note that backup of QFS file system data should work. Backup products may not be aware of all of the QFS metadata extended attribute information, however, and this should be taken into account when backing up a QFS file system.

Comments

sb92075
27063, 00000, "number of bytes read/written is incorrect"
// *Cause: the number of bytes read/written as returned by aiowait
// does not match the original number, additional information
// indicates both these numbers
// *Action: check errno

: 'SVR4 Error: 12: Not enough space
Is the volume filling up & running out of free disk space?
lrp
Good thoughts, but we had ruled that out early: Filesystem's not the issue, the disk had plenty of space. Metalink itself (472813.1) points to "the Unix error number (ERRNO) 12 during a Unix write()/open() system call, and this Unix error indicates a lack of *process memory* rather than physical disk space."

- /var/adm/messages has no memory- or disk-related messages around the time of failure.
- SAN administrator saw nothing in their logs at the time of failure

We had already tried raising SGA and raising shared memory in the solaris project, but it seems like we're fishing for an answer by blindly raising a parameter when we don't know what OS limit Oracle had reached. The key numbers I'm looking for are those specified in the 'additional information' section. Oracle's knowledge base has nothing that I can use so far.
sb92075
http://www.lmgtfy.com/?q=oracle+SVR4+Error:+12:+Not+enough+space
lrp
Thanks! We'd definitely been checking google for information for the past couple weeks before checking with Oracle Forums. In fact, there are several blogs and [lazydba/experts-exchange links|http://www.lazydba.com/oracle/0__125336.html] on the subject which point us in the right direction and was the basis for us looking at enlarging the shared memory kernel parameters to start.

At this point, it's more how to interpret how Oracle spits out information, since there wasn't any publicly available information on the format of the error code.

Much like how P1, P2, and P3 in v$session_wait will mean different things, I would guess that the "Additional Information" tokens after the "Error: 12 code" mean different things. That much is evident in my searches, where it appears that:
Error code 12 => memory related things
Error code 11 => resource is temporarily unavailable for whatever reason
Error code 5 => disk/IO issue

.. so drilling down further, Error code 12's two additional information items must mean something:
-1 => some return code?
8192 => the number at which it failed at? Some bit-wise address?

At no point does the stack gets mentioned in our diagnostic text, which is why I'm asking the larger oracle community.
sb92075
Are file=6 & file=57 on same volume?

What is storage architecture containing Oracle's dbf files?
What flavor of file system supports Oracle's dbf files?
lrp
To elaborate, the filesystem is UFS-based, all configured in RAID-6 [(striped disks with dual parity)|http://en.wikipedia.org/wiki/Redundant_array_of_independent_disks] (including the redo logs and archive logs). The storage underneath the filesystem is a Hitachi 9985v SAN, meaning the physical disks themselves were grouped into logical partitions and then divvied up into filesystems. Files #6 & 57 are on different filesystems, but all of them had ample space at the time.
sb92075
Since errors are being logged into alertSID.log file, you know when this problem occurs.
When these errors occur, is it during HEAVY I/O activity.

I am not making any accusation, just making idle observation.
I have never used (or seen) UFS under Oracle.
An ever so slight possibility is a file system bug is inflicting the damage.

From my exeperience, the root cause is outside Oracle at OS or similar layer.
Oracle is just to dumb to lie about errors & it is detecting a SNAFU in underlying system.

Good Luck with your Gremlin hunt.
lrp
I am entirely with you on this.
-- if it was exclusively an oracle memory error, we would have seen an ORA-4031 indicating exactly which pool was compromised.
-- if it was a dbwr error, we would have seen a more descriptive alert saying 'disk full' or 'crc did not match'
-- etc.

The only IO activity we did catch was the fact that RMAN archive log backups were running at the time. Our I/O usage charts for those filesystems (and solaris sar/vmstat/iostat counts) did not spike during those times.

Regarding the filesystem setup, our vote was to use ZFS, but this was a decision made beyond our heads. Unfortunately, without a real error to show the SAN administrator, we are unable to provide them effective evidence to support the claim. Getting proper diagnostic information was the point of this forum post -- notably, Metalink's own article +(22080.1 - An Introduction to Error Message Articles)+ pretty much admits we need to look further than just the error codes:

Please note that the steps included under these additional headings
is generally very terse. Information may be incomplete and may use
abbreviations which are not self explanatory. It is also possible that
there may be references to articles which are not visible to customers.
These additional notes are intended to help give pointers and are NOT
intended to be a complete explanation of the error.

More complete descriptions may be included in separate problem/solution or
bulletin documents.
orafad
ora-27063 mentions "Cause: the number of bytes read/written as returned by aiowait does not match the original number, additional information indicates both these numbers"
So OS returned -1 ("error state") while Oracle expected 8196 bytes (block size, probably).

About the ENOMEM (error code 12) - it might be a good idea to check shared mem settings and system memory overall.
sb92075
When these errors occur, is it during HEAVY I/O activity?
or just randomly across 24 hours?
lrp
to SB: This symptom has showed up across our dev and production databases every 2 months or so. Interpreting from the alert log, it has also happened close to the time when an archive/backup occurs at the same time as either a recompile, auto-stats gather, or snapshot. So, "load" would appear to cause it, but nothing I would consider heavy I/O activity.

orafad: The metalink for ORA-27063 does mention this, and I guess it to be a rather generic post. The real culprit has to be the codes behind the 27063, which is what I'm trying to get to the bottom of:
SVR4 = a header error indicating some OS 'thing'
Error 12 = ???? Solaris memory error. This appears to show up for most 'capacity-related' things on google searches, for both file descriptors, semaphores, swap, and shared memory.
Additional information = -1 error state, you're most likely correct.
Additional information: 8192 = could mean anything, and that's what I need to find out from either Metalink support (in progress) or Solaris's knowledge base.

Update: I apologize, orafad -- you mentioned the ENOMEM, which I missed. Where did you reference this ? I'd love to have an additional resource to look up the error code. Our memory_target/memory_max_size were set at 5G/8GB, with the shared memory set at 20GB and overall physical memory at 32GB. Our sysadmin logs showed no memory usage or swap errors, leading me to believe it was not a general 'out of memory' error so much as a kernel resource setting (semaphores, per process limit of some sort).

Edited by: lrp on Jun 1, 2009 5:50 PM

Edited by: lrp on Jun 1, 2009 5:53 PM
sb92075
This symptom has showed up across our dev
if I were in your shoes, I'd do what I could to change the underlying file system .
If problem still happens on different fs, then file system flavor can be ruled out as possible root cause.

Again, I am fairly certain Oracle is the victim & not the culprit.
Proving who or what is to blame will be a battle.

Happy Hunting!
lrp
Because this happens so infrequently, I need a way to measure what is happening (be it stack, filedescriptors) . I've already got scripts tallying those resources per process on a 5 minute interval, so I'm hoping to prepare myself for the next occurrence (could be next week, could be a month from now). ..unfortunately, moving to another filesystem is going to be rather hard to prove a case for, since we would have no way to really identifying whether the experiment was successful.

Thanks for your time in this.
sb92075
Are there any additional clues in OS messages or dmesg logs?
lrp
Absolutely no /var/adm/messages related to the failure (which is what keeps pissing me off, since Metalink's notes on the 27063 error routinely point me to the OS logs) -- I just spoke with the SA, and most of the messages get piped there. He did have sar available with a viewer, but i also had charts for the disk i/o at the time -- nothing stood out.
561365
solaris 10 require the setting of memory limit

projadd -U oracle -K "project.max-shm-memory=(priv,4096MB,deny)" user.oracle

The memory specified above should be more than the actual sga &pga you are using.

Thanks
lrp
That was the first thing we increased -- Our 11g memory target (which is the combined SGA+PGA, automatically managed by the instance) is at 5GB, with a memory_max_size = 8G. While it is the only database in the solaris project there ARE other databases on the prod server, contained within their own project (ie. oraproj2) and ORACLE_HOME.

If I'm reading the results below correctly, I believe we have 16GB of shared memory available:
oracle@server1:PROD01:/var/adm> projects -l oraproj
oraproj
        projid : 102
        comment: ""
        users  : oracle
        groups : (none)
        attribs: project.max-sem-ids=(priv,200,deny)
                 project.max-sem-nsems=(priv,512,deny)
                 project.max-shm-ids=(priv,200,deny)
                 project.max-shm-memory=(priv,17179869184,deny)
jgarry
lrp wrote:
Good thoughts, but we had ruled that out early: Filesystem's not the issue, the disk had plenty of space. Metalink itself (472813.1) points to "the Unix error number (ERRNO) 12 during a Unix write()/open() system call, and this Unix error indicates a lack of *process memory* rather than physical disk space."

- /var/adm/messages has no memory- or disk-related messages around the time of failure.
- SAN administrator saw nothing in their logs at the time of failure

We had already tried raising SGA and raising shared memory in the solaris project, but it seems like we're fishing for an answer by blindly raising a parameter when we don't know what OS limit Oracle had reached. The key numbers I'm looking for are those specified in the 'additional information' section. Oracle's knowledge base has nothing that I can use so far.
On a different version and platform, I had rare issues when running RMAN. Eventually I came to the conclusion that: RMAN uses large pool; OS eventually fragments I/O buffers due to the way RMAN uses I/O on that platform. It is of course wild speculation that this has anything to do with your problem, but given the lack of real information, I'd say you perhaps want to shrink your SGA some, give more to large pool, and pray a lot. Oh, and sample the large pool SGA statistics when you are running RMAN, and wonder if the parallel automatic tuning is shooting your large_pool_size in the leg if you are using it.
sb92075
I just stumble across this

Subject: Upon startup of Linux database get ORA-27102: out of memory Linux-X86_64 Error: 28: No space left on device
Doc ID: 301830.1

Let us know if it helped.
lrp
Subject: Upon startup of Linux database get ORA-27102: out of memory Linux-X86_64 Error: 28: No space left on device
Doc ID: 301830.1>
Thank you. I looked up the article, and it shows a similar message but has a key distinction between error codes -- that OS code was error: 28, while my error was error: 12. The linux and Solaris error code tables are similar, so for OS error 28, the the solaris error code table shows:
"28: ENOSPC No space left on device
While writing an ordinary file or creating a directory entry, there is no free space
left on the device. In the fcntl routine, the setting or removing of record locks
on a file cannot be accomplished because there are no more record entries left
on the system.{code}
By the same token, my OS error code 12 that I'm seeing appears to mean:
{code}
"12 ENOMEM Not enough space
During execution of an exec, brk, or sbrk routine, a program asks for more space
than the system is able to supply. This is not a temporary condition; the maximum
size is a system parameter. On some architectures, the error may also occur if the
arrangement of "text, data, and stack segments requires too many"
"segmentation registers, or if there is not enough swap space" during the fork
routine. If this error occurs on a resource associated with Remote File
Sharing (RFS), it indicates a memory depletion which may be temporary,
dependent on system activity at the time the call was invoked.
Emphasis on the clause "..*text, data, and stack segments requires too many segmentation registers, or if there is not enough swap space*.."

..So my clues from the OS documentation point to some maximum, like swap, segmentation, stack and other resources.
I can only assume the "additional information" of -1 and 8192 are relevant numbers to those resources.

The two things that are *8192* in my environment appear to be # of file descriptors and stack size.

Therefore, my plan of attack is going to change both in my oracle profile to see if this occurs again:

ulimit -n 16834 (raise file descriptors per process from 8kto 16k)
ulimit -s 32767 (raise stack from 8mb to 32mb)

Hopefully, this will give Oracle more leeway to use OS resources AND give me extra clues if the error shows up again. In other words, if something like stack is truly the issue, then I expect to see another crash with addtional information = *32767* instead of 8192:
KCF: write/open error block=0x1571 online=1
file=57 /datafile/DB_001.dbf
error=27063 txt: 'SVR4 Error: 12: Not enough space
Additional information: -1
Additional information: "32767"'
Will update this thread with any relevant results..
orafad
lrp wrote:
..So my clues from the OS documentation point to some maximum, like swap, segmentation, stack and other resources.
I can only assume the "additional information" of -1 and 8192 are relevant numbers to those resources.

The two things that are *8192* in my environment appear to be # of file descriptors and stack size.
As I tried to explain earlier: this means no more than "I (Oracle) asked to read/write 8192 bytes from the buffer but got back -1". Syscall returns number of bytes actually read/written, or -1 indicating that an error occurred.

error=27063 txt: 'SVR4 Error: 12: Not enough space
Is this from a 32-bit Oracle server?

As the software owner user, could you verify system parameters? (prctl)

Specifically, what are your parameters settings that corresponds to shmmax and shmall?
lrp
>
As I tried to explain earlier: this means no more than "I (Oracle) asked to read/write 8192 bytes from the buffer but got back -1". Syscall returns number of bytes actually read/written, or -1 indicating that an error occurred.
error=27063 txt: 'SVR4 Error: 12: Not enough space
>
I do recall the post. Is there a man page describing that error code that I can look into further? I realize that the error number, but I didn't happen to find anywhere which stated the details behind 'errno' or return code. The man page for syscall talks about returning -1 on error, but doesn't say anything about the second return code ( unless you mean the variable errno ).

>
Is this from a 32-bit Oracle server?
As the software owner user, could you verify system parameters? (prctl)
Specifically, what are your parameters settings that corresponds to shmmax and shmall?
>
It's Oracle 64-bit Enterprise on Solaris 10 SPARC 64-bit. I hate pasting the full text of a screendump, but since I cannot really figure which are the important pieces of info, I'll paste the results of prctl on the project that my DB is running under. (if you can narrow which params i'm looking for, i can cull the rest later in an edit). This is only for one of the db's but the idea is the same. Note that project.max-shm-memory = 16 GB, well above my oracle memory_max_size of 8GB. I'm not sure how to obtain the solaris equivalent of shmall.
oracle@server1:PROD01:/fs1> prctl -i project oraproj
project: 100: oraproj
NAME    PRIVILEGE       VALUE    FLAG   ACTION                       RECIPIENT
project.max-contracts
        privileged      10.0K       -   deny                                 -
        system          2.15G     max   deny                                 -
project.max-device-locked-memory
        privileged      1.95GB      -   deny                                 -
        system          16.0EB    max   deny                                 -
project.max-locked-memory
        system          16.0EB    max   deny                                 -
project.max-port-ids
        privileged      8.19K       -   deny                                 -
        system          65.5K     max   deny                                 -
project.max-shm-memory
        privileged      16.0GB      -   deny                                 -          <----------
        system          16.0EB    max   deny                                 -
project.max-shm-ids
        privileged        200       -   deny                                 -
        system          16.8M     max   deny                                 -
project.max-msg-ids
        privileged        258       -   deny                                 -
        system          16.8M     max   deny                                 -
project.max-sem-ids
        privileged        200       -   deny                                 -
        system          16.8M     max   deny                                 -
project.max-crypto-memory
        privileged      7.81GB      -   deny                                 -
        system          16.0EB    max   deny                                 -
project.max-tasks
        system          2.15G     max   deny                                 -
project.max-lwps
        system          2.15G     max   deny                                 -
project.cpu-cap
        system          4.29G     inf   deny                                 -
project.cpu-shares
        privileged          1       -   none                                 -
        system          65.5K     max   none                                 -
zone.max-swap
        system          16.0EB    max   deny                                 -
zone.max-locked-memory
        system          16.0EB    max   deny                                 -
zone.max-shm-memory
        system          16.0EB    max   deny                                 -
zone.max-shm-ids
        system          16.8M     max   deny                                 -
zone.max-sem-ids
        system          16.8M     max   deny                                 -
zone.max-msg-ids
        system          16.8M     max   deny                                 -
zone.max-lwps
        system          2.15G     max   deny                                 -
zone.cpu-cap
        system          4.29G     inf   deny                                 -
zone.cpu-shares
        privileged          1       -   none                                 -
Edited by: lrp on Jun 4, 2009 4:23 PM
lrp1
Hello--I'm just updating the post with further information.
There is a Nov 9 2009 SUN Blog post (http://blogs.sun.com/hippy/entry/problems_with_solaris_and_a) which mentions similar symptoms to our problem and mentions basically to either:
a) upgrade to solaris 10 update 8 (we are at update 4)
b) disable oracle DISM

The article appears to advise turning off only SGA_MAX_SIZE. We currently have several memory settings:
memory_max_target                    big integer 8000M
memory_target                        big integer 5056M
shared_memory_address                integer     0
sga_max_size                         big integer 8000M
sga_target                           big integer 0
pga_aggregate_target                 big integer 0
If we were to disable DISM, does that mean disabling ONLY SGA_MAX_SIZE, or should we also remove the MEMORY_MAX_SIZE? If we wanted to maintain the same memory settings, would we then set memory_target to 8000M and leave SGA_TARGET/PGA_AGGREGATE_TARGET completely alone?

All in all, Oracle Support still does not give us many clues beyond saying that it is an OS file-resource error, not recognizing that it is a solaris kernel memory limit.
1 - 23

Post Details