I'm neither a database expert nor am a professional dba, but been using Oracle databases in my test environments for quite some time.
I have been working on comparing single instance 184.108.40.206 database instance vs. 12.1 database instance for similar workloads. Don't ask me why.
During a 4 hour load test, I see that 12c fares pathetically when compared to 11gR2. Top wait events list "enq: US - Contention" and "latch: enqueue chains".
On an average my TPS would be around 2500 and suddenly it drops to less than 500. When I check the v$session_wat view, I see that 80% of the sessions are in the above events when the TPS numbers fall. Following is a snippet:
3683 enq: US - contention Other 0 -1
3684 enq: US - contention Other 733594 -1
3685 enq: US - contention Other 409950 -1
3686 buffer busy waits Concurrency 188612 -1
3687 enq: US - contention Other 997039 -1
3688 enq: US - contention Other 3736050 -1
3689 enq: US - contention Other 0 -1
3690 latch: enqueue hash chains Other 0 -1
3691 enq: US - contention Other 0 -1
3692 enq: US - contention Other 952676 -1
3693 enq: US - contention Other 1513398 -1
3694 enq: US - contention Other 109772 -1
I have moved my redo log files and undo datafile onto SSD based storage, increased redo log groups, but I still see the same wait events.
11gR2 does not seem to have this kind of suffering. I installed 220.127.116.11 database instance on a different boot disk and run the same workload. I do not see these wait events and find no sudden drops in TPS numbers.
Undo retention values and undo tablespace sizes are same across both instances.
My configuration is as follows:
64 Gb RAM, 16 cores, general purpose database on VxFS file systems mounted with CIO and delaylog options.
DB block size 8KB and FS block size 8KB. DB running in dedicated server mode. No archive logging. No flashback recovery. 6 Redo log groups with 2 (1 GB) members each.
Values for sessions, open_cursors and processes parameters were bumped-up sufficiently.
Bumped-up shmmax (accommodate memeory_target of 48Gb), semmnu, nproc, nkthread, maxuprc, and a few other over the recommendation made by OUI.
Swingbench 2.5, OE schema, 10GB seed data, 600 users load,
Currently running the same load on 18.104.22.168 after moving redo log files onto SSD based storage to check throughput numbers. I have the awr and addm reports collected during the stress runs.
Need some support to identify and resolve the bottleneck. This is a test environment, so feel free to suggest config changes as well.
Thanks in advance,