Unified Logging in RAC 12c

Version 2

    Highly Available Unified Logging in RAC

     

    Introduction

     

    Problems spanning multiple Oracle RAC instances are encountered very frequently. In such cases, alert log and trace files from across multiple instances might be required for diagnosis. This can be quite difficult if ADR is located on non-shared storage as it requires logging on to each node. By placing ADR base for all instances of an Oracle RAC database on a shared location, aggregated diagnostic data from all instances can be viewed from a  single location. This article explores placement of ADR for an Oracle RAC database on Oracle cloud file system so that it is accessible by all the nodes in the cluster and can be exported over Highly Available Network File System (HANFS) to clients outside the cluster as well.

     

    Unified Logging in RAC

     

    Various Diagnostic files for an oracle database are:

    • Alert Log

    • Trace Files

    • Dumps and Core Files

    • Health monitor reports

    • Data repair records

    • SQL test cases

    • Incident packages etc.

     

     

    From Oracle Database 11g onwards, these diagnostic files are stored in the Automatic Diagnostic Repository (ADR)—a file-based repository outside the database. The location of ADR, known as ADR Base,  is set by the DIAGNOSTIC_DEST initialization parameter.

     

    In case of a RAC database, each instance has its own alert log and various other diagnostic files which are stored underneath its own home directory within the ADR. Since ADR is outside the database, it can be accessed even when the database is down. However, if ADR Base is set to a non-shared location, one needs to physically log on to the machine hosting an Oracle RAC instance in  order to view the instance’s diagnostic  files residing on the operating system. If  the problems span multiple Oracle RAC instances, in order to diagnose you may need to correlate the alert logs and trace files from across multiple instances. This would involve logging on to each node which besides being inconvenient might not be possible if one or more nodes are down. Things could be much easier if the diagnostic files could somehow be gathered in a central location which is accessible from every node in the cluster. This can be implemented by setting ADR base to a location on shared storage on each node in an Oracle Real Application Clusters (Oracle RAC) environment, so that aggregated diagnostic data from all instances can be viewed from a  single location.

     

    From Oracle 12c onwards, ACFS can be leveraged as a highly available network file system (HANFS). In this article, I will demonstrate storage of ADR for all instances of  a RAC database onthe same ACFS file system so that ADR can be exported over HANFS. As a result, the log files and trace files of all the instances are always available to the client server(s) outside the cluster as long as at least one node in the cluster is active.

     

    Current setup:

     

    Oracle 12.1.0.2c Flex cluster having two nodes host01 and host02

     

    Oracle 12.1.0.2c policy managed RAC database pmdb having two instances pmdb_1 and pmdb_2

     

    Demonstration:

     

    Check Prerequisites

     

    First, let’s verify that all kernel modules needed for ACFS and ADVM are loaded on both the nodes

     

     

    [root@host01 ~]# lsmod |grep oracle

    oracleacfs           2837904  1

    oracleadvm            342512  1

    oracleoks             409560  2 oracleacfs,oracleadvm

    oracleasm              84136  1

     

    [root@host02 ~]# lsmod |grep oracle

    oracleacfs           2837904  1

    oracleadvm            342512  1

    oracleoks             409560  2 oracleacfs,oracleadvm

    oracleasm              84136 1

     

    Since HANFS requires a running NFS service on each node that can host the HANFS services, we’ll next verify if the NFS service is running on both nodes.

     

    [root@host01 ~]# service nfs status

     

    rpc.mountd (pid 4921) is running...

    nfsd (pid 4910 4909 4908 4907 4906 4905 4904 4903) is running...

    rpc.rquotad (pid 4882) is running...

     

    [root@host02 ~]# service nfs status

     

    rpc.mountd (pid 4985) is running...

    nfsd (pid 4982 4981 4980 4979 4978 4977 4976 4975) is running...

    rpc.rquotad (pid 4941) is running...

     

     

     

    Create ADVM Volume

     

    Create and mount an ASM diskgroup called DIAG supporting all the new ASM Dynamic Volume (ADVM) features included in release 12.1.

     

    SQL>create diskgroup DIAG external redundancy

        disk 'ORCL:ASMDISK014','ORCL:ASMDISK013','ORCL:ASMDISK012','ORCL:ASMDISK011'

        attribute 'compatible.asm'= '12.1','compatible.advm'='12.1';

     

    [root@host01 ~]# srvctl start diskgroup -diskgroup DIAG -n host01,host02

     

    [root@host01 ~]# crsctl stat res ora.DIAG.dg

    NAME=ora.DIAG.dg

    TYPE=ora.diskgroup.type

    TARGET=ONLINE          , ONLINE

    STATE=ONLINE on host01, ONLINE on host02

     

    Create a volume VOL1 in the DIAG disk group having a size of 300 MB

     

    [grid@host01 root]$ asmcmd volcreate -G DIAG -s 300m VOL1

     

    Since ADVM/ACFS utilize an ASM proxy instance to connect to Flex ASM, an ASM proxy instance must be running on each hub node that can host HANFS services.

     

    Let us check that ASM PROXY instance is running in both the nodes.

     

    [root@host02 ~]# srvctl status asm -proxy

    ADVM proxy is running on node host01,host02

     

    Take note of the volume device associated with volume VOL1

     

    [root@host02 ~]# srvctl start volume -volume VOL1 -diskgroup DIAG 

    [root@host02 ~]# srvctl status volume -all

    Volume VOL1 of diskgroup DIAG for device /dev/asm/vol1-260 is enabled

    Volume VOL1 of diskgroup DIAG for device /dev/asm/vol1-260 is running on nodes: host01,host02

     

    [grid@host01 tmp]$ asmcmd volinfo -G DIAG VOL1 |grep Device

             Volume Device: /dev/asm/vol1-260

     

     

    Create and start a cloud file system resource

     

    Construct an ACFS file system on the volume VOL1.

     

     

    [root@host01 ~]#  mkfs -t acfs /dev/asm/vol1-260

     

    mkfs.acfs: version                = 12.1.0.2.0

    mkfs.acfs: on-disk version        = 39.0

    mkfs.acfs: volume                 = /dev/asm/vol1-260

    mkfs.acfs: volume size            = 335544320  ( 320.00 MB )

    mkfs.acfs: Format complete.

     

     

    Create mount point on both the nodes for mounting the ACFS file system

     

    root@host01 ~]# mkdir -p /mnt/acfsmounts/acfs1

                    

    root@host02 ~]# mkdir -p /mnt/acfsmounts/acfs1

     

     

    A cloud  file system is an Oracle Clusterware resource that you want to expose through HANFS. Let us create a cloud file system resource which will be used to store ADR. This resource uses the volume device VOL1  along with the mount point /mnt/acfsmounts/acfs1. Starting the cloud file system resources results in mounting of the file system on both the nodes.

     

     

    [root@host01 ~]# srvctl add filesystem -m /mnt/acfsmounts/acfs1 -d /dev/asm/vol1-260

                   

                     srvctl start  filesystem -d /dev/asm/vol1-260

     

                     srvctl status  filesystem -d /dev/asm/vol1-260

    ACFS file system /mnt/acfsmounts/acfs1 is mounted on nodes host01,host02

     

     

    Configure an HAVIP resource

     

    The HAVIP resource is a special class of the standard Oracle node VIP Oracle Clusterware resource. Each HAVIP resource manages a unique IP address in the cluster on a single node at any time, and this global resource will be relocated to an active node in the cluster as necessary with the objective of providing uninterrupted service of NFS exported paths to its client(s).

     

    Create an HAVIP resource called havip1 on a non-pingable , non-DHCP IP  address 192.9.201.184 which will be later employed to export the ADR to the clients.

     

    [root@host01 ~]# srvctl add havip -address 192.9.201.184 -id havip1

     

    [root@host01 ~]# srvctl config havip -id havip1

     

    HAVIP exists: /havip1/192.9.201.184, network number 1

    Description:

    Home Node:

    HAVIP is enabled.

    HAVIP is individually enabled on nodes:

    HAVIP is individually disabled on nodes:

     

    Configure and start an Export File System Resource

     

    An Export Filesystem (FS) resource is a cluster resource which is associated with a HAVIP and together they publish one or more designated ACFS file systems to cluster’s client(s) using HANFS.

     

    Let us create and start an Export FS cluster resource called export1 associated with HAVIP havip1.

    This Export FS resource publishes the specified file system using HANFS.

     

    [root@host01 ~]# srvctl add exportfs -id havip1 -path /mnt/acfsmounts/acfs1 -name
    export1 -options rw -clients *.example.com

     

    [root@host01 ~]# srvctl start exportfs -name export1

     

     

    Confirm that the exportfs resource export1 is running and that the file system is exported on server host02

     

    [root@host01 ~]# srvctl status exportfs -name export1

     

    export file system export1 is enabled

    export file system export1 is exported on node host02

     

     

     

    It can be observed that clusterware has  automatically started  the associated havip resource havip1 is on the same host (host02) as exportfs resource export1.

     

    [root@host01 ~]# srvctl status havip

    HAVIP ora.havip1.havip is enabled

    HAVIP ora.havip1.havip is running on nodes host02

     

     

    Modify the access privileges for the new ACFS file system  to enable access by any user.

     

    [root@host01 ~]# ls -l /mnt/acfsmounts

    total 16

    drwxr-xr-x 4 root root 4096 May 6 19:50 acfs1

     

    [root@host01 ~]# chmod -R 777 /mnt/acfsmounts

     

    [root@host01 ~]# ls -l /mnt/acfsmounts

    total 16

    drwxrwxrwx 4 root root 4096 May  6 19:50 acfs1

     

     

    Mount the HANFS Exported File System

     

    Mount the HANFS exported file system on a HANFS client server named server1

     

    [root@server1 ~]#  mkdir -p /mnt/hanfs1

                    

                       mount -t nfs 192.9.201.184:/mnt/acfsmounts/acfs1 /mnt/hanfs1

     

    [root@server1 ~]# df

    Filesystem           1K-blocks      Used Available Use% Mounted on

    /dev/sda5             71241024   6221804 61341944  10% /

    /dev/sda3              4956316    141336 4559148   4% /tmp

    /dev/sda1               101086     11378 84489  12% /boot

    tmpfs                   200776         0 200776   0% /dev/shm

    1. 192.9.201.184:/mnt/acfsmounts/acfs1

                            327680    144416 183264  45% /mnt/hanfs1

     

     

    Set newly created ACFS file system as the ADR Base for RAC database instances pmdb_1 and pmdb_2

     

     

    SQL> sho parameter db_name

     

    NAME                                    TYPE        VALUE

    ------------------------------------    ----------- ------------------------------

    db_name                                 string      pmdb

     

    SQL> sho parameter diagnostic_dest

     

    NAME                                 TYPE        VALUE

    ------------------------------------ ----------- ------------------------------

    diagnostic_dest                      string      /u01/app/oracle

     

    SQL> alter system set diagnostic_dest='/mnt/acfsmounts/acfs1';

     

    System altered.

     

    SQL> sho parameter diagnostic_dest

     

    NAME                                 TYPE        VALUE

    ------------------------------------ ----------- ------------------------------

    diagnostic_dest                      string      /mnt/acfsmounts/acfs1

     

     

     

    Verify that various folders for diagnostic files for both the RAC instances have been automatically created in mount point /mnt/acfsmounts/acfs1

     

     

    [root@host01 trace]# cd /mnt/acfsmounts/

     

    [root@host01 acfsmounts]# tree -d

    .

    |-- acfs1

    |   |-- diag

    |   |   `-- rdbms

    |   |       `-- pmdb

    |   |           |-- pmdb_1

    |   |           | |-- alert

    |   |           | |-- cdump

    |   |           | |-- hm

    |   |           | |-- incident

    |   |           | |-- incpkg

    |   |           | |-- ir

    |   |           | |-- lck

    |   |           | |-- log

    |   |           | |   |-- ddl

    |   |           | |   |-- debug

    |   |           | |   |-- imdb

    |   |           | |   `-- test

    |   |           | |-- metadata

    |   |           | |-- metadata_dgif

    |   |           | |-- metadata_pv

    |   |           | |-- stage

    |   |           | |-- sweep

    |   |           | `-- trace

    |   |           `-- pmdb_2

    |   |               |-- alert

    |   |               |-- cdump

    |   |               |-- hm

    |   |               |-- incident

    |   |               |-- incpkg

    |   |               |-- ir

    |   |               |-- lck

    |   |               |-- log

    |   |               | |-- ddl

    |   |               |   |-- debug

    |   |               |   |-- imdb

    |   |               |   `-- test

    |   |               |-- metadata

    |   |               |-- metadata_dgif

    |   |               |-- metadata_pv

    |   |               |-- stage

    |   |               |-- sweep

    |   |               `-- trace

    |   `-- lost+found

     

     

    Initiate some activity on both the instances so that alert log gets populated for both the instances  in new ADR  location

     

    SQL> sho parameter instance_name

     

    NAME                                 TYPE        VALUE

    ------------------------------------ ----------- ------------------------------

    instance_name                        string      pmdb_2

     

    SQL> alter system switch logfile;

     

    System altered.

     

    SQL> conn sys/oracle@host02-vip.cluster01.example.com:1521/pmdb as sysdba

    1. Connected.

     

    SQL> sho parameter instance_name

     

    NAME                                 TYPE        VALUE

    ------------------------------------ ----------- ------------------------------

    instance_name                        string      pmdb_1

     

    SQL> alter system switch logfile;

     

    System altered.

     

     

    Modify the access privileges for the ADR  to enable access by any user.

     

    [root@host01 acfs1]# chmod -R 777 /mnt/acfsmounts/acfs1/diag

    [root@host01 acfs1]# ls -l /mnt/acfsmounts/acfs1

    total 88

    drwxrwxrwx 3 oracle asmadmin 8192 May  6 20:19 diag

    drwxrwxrwx 2 root   root     65536 May 6 19:50 lost+found

     

     

    Verify that HANFS server host02 has successfully exported ADR for both the instances of RAC databases to client server server1

     

     

    [root@server1 ~]# cd /mnt/hanfs1

     

    [root@server1 hanfs1]# tree -d

    .

    |-- diag

    |   `-- rdbms

    |       `-- pmdb

    |           |-- pmdb_1

    |           |   |-- alert

    |           |   |-- cdump

    |           |   |-- hm

    |           |   |-- incident

    |           |   |-- incpkg

    |           |   |-- ir

    |           |   |-- lck

    |           |   |-- log

    |           |   | |-- ddl

    |           |   | |-- debug

    |           |   | |-- imdb

    |           |   | `-- test

    |           |   |-- metadata

    |           |   |-- metadata_dgif

    |           |   |-- metadata_pv

    |           |   |-- stage

    |           |   |-- sweep

    |           |   `-- trace

    |           `-- pmdb_2

    |               |-- alert

    |               |-- cdump

    |               |-- hm

    |               |-- incident

    |               |-- incpkg

    |               |-- ir

    |               |-- lck

    |               |-- log

    |               |   |-- ddl

    |               |   |-- debug

    |               |   |-- imdb

    |               |   `-- test

    |               |-- metadata

    |               |-- metadata_dgif

    |               |-- metadata_pv

    |               |-- stage

    |               |-- sweep

    |               `-- trace

    `-- lost+found

     

     

    View the contents of alert log of instances pmdb_1 and pmdb_2 from HANFS client server1

     

     

     

    root@server1 trace]# tail -2 /mnt/hanfs1/diag/rdbms/pmdb/pmdb_1/trace/alert_pmdb_1.log

     

      Current log# 4 seq# 2 mem# 0: +DATA/PMDB/ONLINELOG/group_4.288.911156637

      Current log# 4 seq# 2 mem# 1: +FRA/PMDB/ONLINELOG/group_4.260.911156645

     

    [root@server1 trace]# tail -2 /mnt/hanfs1/diag/rdbms/pmdb/pmdb_2/trace/alert_pmdb_2.log

     

      Current log# 1 seq# 9 mem# 0: +DATA/PMDB/ONLINELOG/group_1.282.911155419

      Current log# 1 seq# 9 mem# 1: +FRA/PMDB/ONLINELOG/group_1.257.911155437

     

     

    Verify that instance pmdb_1 of database pmdb is running on the node host02 where havip1 is currently executing

     

    [root@host02 ~]# srvctl status havip

     

    HAVIP ora.havip1.havip is enabled

    HAVIP ora.havip1.havip is running on nodes host02

     

    [root@host02 ~]# ps -ef |grep pmon

    oracle    3794     1  0 19:08 ?        00:00:00 ora_pmon_pmdb_1

    root     13469 14328  0 20:47 pts/2    00:00:00 grep pmon

    grid     14560     1  0 19:31 ?        00:00:00 apx_pmon_+APX2

    grid     19589     1  0 18:27 ?        00:00:00 asm_pmon_+ASM2

     

     

     

    In order to verify high availability of HANFS exported file system, shut down host02 where havip1 is executing

     

    [root@host02 ~]# init 0 

     

    Verify that HAVIP1 has failed over to host01 and instance pmdb_1 has been terminated

     

    [root@host01 acfs1]# srvctl status havip

    HAVIP ora.havip1.havip is enabled

    HAVIP ora.havip1.havip is running on nodes host01

     

    [root@host01 acfs1]# srvctl status database -d pmdb

    Instance pmdb_2 is running on node host01

    Instance pmdb_1 is not running on node host02

     

     

    Note that host01 where havip1 is currently executing has taken over the role of HANFS server and is now exporting the ADR over HANFS. As a result, alert log of pmdb_1 instance continues to be accessible from HANFS client server1 even though node host02 is down.

     

    [root@server1 trace]# tail -5 /mnt/hanfs1/diag/rdbms/pmdb/pmdb_1/trace/alert_pmdb_1.log

     

    License high water mark = 7

    Fri May 06 20:48:13 2016

    USER (ospid: 13937): terminating the instance

    Fri May 06 20:48:17 2016

    Instance terminated by USER, pid = 13937

     

     

    Summary

     

      • Diagnostic files of an oracle database are stored in the Automatic Diagnostic Repository (ADR) - a file-based repository outside the database whose location is set by the DIAGNOSTIC_DEST initialization parameter.

     

      • In case of a RAC database, each instance has its own alert log and various other diagnostic files which are stored underneath its own home directory within the ADR.

     

      • if ADR Base is set to a non-shared location and you need to correlate the alert logs and trace files from across multiple instances, you will need to log on to each node which besides being inconvenient might not be possible if one or more nodes are down.

     

      • By setting ADR base to a location on shared storage on each node in an Oracle Real Application Clusters (Oracle RAC) environment, the aggregated diagnostic data from all instances can be viewed from a  single location.

     

      • From Oracle database 12c onwards, ACFS can be leveraged as a highly available network file system (HANFS). By storing the ADR on ACFS, ADR can be exported over HANFS so that the log files and trace files of all the instances are always available to the client server(s) outside the cluster as long as at least one node in the cluster is active.

     

     

     

    About the Authors

     

     

    Nassyam Basha Oracle DBA, OCM 11g, Oracle ACE Director, Author of Data Guard 11gR2 Book, Blogger, OTN Moderator and Super Hero, MOSC Guru,  Writer with OTN, DELL and  having around 9 years of hands on experience in High Availability technologies Like Oracle RAC, Data Guard, Exadata and much more. Currently working with The Pythian Group, In the past i have worked for AT&T (Bell Labs), dbaDirect, SLK software services.

     

    Anju Garg is an Oracle Ace with over 14 years of experience in IT Industry in various roles. Since 2010, she has been involved in teaching and has trained more than 100 DBAs from across the world in various core DBA technologies like RAC, Data guard, Performance Tuning, SQL statement tuning, Database Administration etc. She has keen interest in RAC and Performance Tuning and shares her knowledge via her technical blog at http://oracleinaction.com/.