Highly Available Unified Logging in RAC
Introduction
Problems spanning multiple Oracle RAC instances are encountered very frequently. In such cases, alert log and trace files from across multiple instances might be required for diagnosis. This can be quite difficult if ADR is located on non-shared storage as it requires logging on to each node. By placing ADR base for all instances of an Oracle RAC database on a shared location, aggregated diagnostic data from all instances can be viewed from a single location. This article explores placement of ADR for an Oracle RAC database on Oracle cloud file system so that it is accessible by all the nodes in the cluster and can be exported over Highly Available Network File System (HANFS) to clients outside the cluster as well.
Unified Logging in RAC
Various Diagnostic files for an oracle database are:
-
Alert Log
-
Trace Files
-
Dumps and Core Files
-
Health monitor reports
-
Data repair records
-
SQL test cases
-
Incident packages etc.
From Oracle Database 11g onwards, these diagnostic files are stored in the Automatic Diagnostic Repository (ADR)—a file-based repository outside the database. The location of ADR, known as ADR Base, is set by the DIAGNOSTIC_DEST initialization parameter.
In case of a RAC database, each instance has its own alert log and various other diagnostic files which are stored underneath its own home directory within the ADR. Since ADR is outside the database, it can be accessed even when the database is down. However, if ADR Base is set to a non-shared location, one needs to physically log on to the machine hosting an Oracle RAC instance in order to view the instance’s diagnostic files residing on the operating system. If the problems span multiple Oracle RAC instances, in order to diagnose you may need to correlate the alert logs and trace files from across multiple instances. This would involve logging on to each node which besides being inconvenient might not be possible if one or more nodes are down. Things could be much easier if the diagnostic files could somehow be gathered in a central location which is accessible from every node in the cluster. This can be implemented by setting ADR base to a location on shared storage on each node in an Oracle Real Application Clusters (Oracle RAC) environment, so that aggregated diagnostic data from all instances can be viewed from a single location.
From Oracle 12c onwards, ACFS can be leveraged as a highly available network file system (HANFS). In this article, I will demonstrate storage of ADR for all instances of a RAC database onthe same ACFS file system so that ADR can be exported over HANFS. As a result, the log files and trace files of all the instances are always available to the client server(s) outside the cluster as long as at least one node in the cluster is active.
Current setup:
Oracle 12.1.0.2c Flex cluster having two nodes host01 and host02
Oracle 12.1.0.2c policy managed RAC database pmdb having two instances pmdb_1 and pmdb_2
Demonstration:
Check Prerequisites
First, let’s verify that all kernel modules needed for ACFS and ADVM are loaded on both the nodes
[root@host01 ~]# lsmod |grep oracle
oracleacfs 2837904 1
oracleadvm 342512 1
oracleoks 409560 2 oracleacfs,oracleadvm
oracleasm 84136 1
[root@host02 ~]# lsmod |grep oracle
oracleacfs 2837904 1
oracleadvm 342512 1
oracleoks 409560 2 oracleacfs,oracleadvm
oracleasm 84136 1
Since HANFS requires a running NFS service on each node that can host the HANFS services, we’ll next verify if the NFS service is running on both nodes.
[root@host01 ~]# service nfs status
rpc.mountd (pid 4921) is running...
nfsd (pid 4910 4909 4908 4907 4906 4905 4904 4903) is running...
rpc.rquotad (pid 4882) is running...
[root@host02 ~]# service nfs status
rpc.mountd (pid 4985) is running...
nfsd (pid 4982 4981 4980 4979 4978 4977 4976 4975) is running...
rpc.rquotad (pid 4941) is running...
Create ADVM Volume
Create and mount an ASM diskgroup called DIAG supporting all the new ASM Dynamic Volume (ADVM) features included in release 12.1.
SQL>create diskgroup DIAG external redundancy
disk 'ORCL:ASMDISK014','ORCL:ASMDISK013','ORCL:ASMDISK012','ORCL:ASMDISK011'
attribute 'compatible.asm'= '12.1','compatible.advm'='12.1';
[root@host01 ~]# srvctl start diskgroup -diskgroup DIAG -n host01,host02
[root@host01 ~]# crsctl stat res ora.DIAG.dg
NAME=ora.DIAG.dg
TYPE=ora.diskgroup.type
TARGET=ONLINE , ONLINE
STATE=ONLINE on host01, ONLINE on host02
Create a volume VOL1 in the DIAG disk group having a size of 300 MB
[grid@host01 root]$ asmcmd volcreate -G DIAG -s 300m VOL1
Since ADVM/ACFS utilize an ASM proxy instance to connect to Flex ASM, an ASM proxy instance must be running on each hub node that can host HANFS services.
Let us check that ASM PROXY instance is running in both the nodes.
[root@host02 ~]# srvctl status asm -proxy
ADVM proxy is running on node host01,host02
Take note of the volume device associated with volume VOL1
[root@host02 ~]# srvctl start volume -volume VOL1 -diskgroup DIAG
[root@host02 ~]# srvctl status volume -all
Volume VOL1 of diskgroup DIAG for device /dev/asm/vol1-260 is enabled
Volume VOL1 of diskgroup DIAG for device /dev/asm/vol1-260 is running on nodes: host01,host02
[grid@host01 tmp]$ asmcmd volinfo -G DIAG VOL1 |grep Device
Volume Device: /dev/asm/vol1-260
Create and start a cloud file system resource
Construct an ACFS file system on the volume VOL1.
[root@host01 ~]# mkfs -t acfs /dev/asm/vol1-260
mkfs.acfs: version = 12.1.0.2.0
mkfs.acfs: on-disk version = 39.0
mkfs.acfs: volume = /dev/asm/vol1-260
mkfs.acfs: volume size = 335544320 ( 320.00 MB )
mkfs.acfs: Format complete.
Create mount point on both the nodes for mounting the ACFS file system
root@host01 ~]# mkdir -p /mnt/acfsmounts/acfs1
root@host02 ~]# mkdir -p /mnt/acfsmounts/acfs1
A cloud file system is an Oracle Clusterware resource that you want to expose through HANFS. Let us create a cloud file system resource which will be used to store ADR. This resource uses the volume device VOL1 along with the mount point /mnt/acfsmounts/acfs1. Starting the cloud file system resources results in mounting of the file system on both the nodes.
[root@host01 ~]# srvctl add filesystem -m /mnt/acfsmounts/acfs1 -d /dev/asm/vol1-260 srvctl start filesystem -d /dev/asm/vol1-260 srvctl status filesystem -d /dev/asm/vol1-260ACFS file system /mnt/acfsmounts/acfs1 is mounted on nodes host01,host02
Configure an HAVIP resource
The HAVIP resource is a special class of the standard Oracle node VIP Oracle Clusterware resource. Each HAVIP resource manages a unique IP address in the cluster on a single node at any time, and this global resource will be relocated to an active node in the cluster as necessary with the objective of providing uninterrupted service of NFS exported paths to its client(s).
Create an HAVIP resource called havip1 on a non-pingable , non-DHCP IP address 192.9.201.184 which will be later employed to export the ADR to the clients.
[root@host01 ~]# srvctl add havip -address 192.9.201.184 -id havip1
[root@host01 ~]# srvctl config havip -id havip1
HAVIP exists: /havip1/192.9.201.184, network number 1
Description:
Home Node:
HAVIP is enabled.
HAVIP is individually enabled on nodes:
HAVIP is individually disabled on nodes:
Configure and start an Export File System Resource
An Export Filesystem (FS) resource is a cluster resource which is associated with a HAVIP and together they publish one or more designated ACFS file systems to cluster’s client(s) using HANFS.
Let us create and start an Export FS cluster resource called export1 associated with HAVIP havip1.
This Export FS resource publishes the specified file system using HANFS.
[root@host01 ~]# srvctl add exportfs -id havip1 -path /mnt/acfsmounts/acfs1 -name
export1 -options rw -clients *.example.com
[root@host01 ~]# srvctl start exportfs -name export1
Confirm that the exportfs resource export1 is running and that the file system is exported on server host02
[root@host01 ~]# srvctl status exportfs -name export1
export file system export1 is enabled
export file system export1 is exported on node host02
It can be observed that clusterware has automatically started the associated havip resource havip1 is on the same host (host02) as exportfs resource export1.
[root@host01 ~]# srvctl status havip
HAVIP ora.havip1.havip is enabled
HAVIP ora.havip1.havip is running on nodes host02
Modify the access privileges for the new ACFS file system to enable access by any user.
[root@host01 ~]# ls -l /mnt/acfsmountstotal 16drwxr-xr-x 4 root root 4096 May 6 19:50 acfs1[root@host01 ~]# chmod -R 777 /mnt/acfsmounts[root@host01 ~]# ls -l /mnt/acfsmountstotal 16drwxrwxrwx 4 root root 4096 May 6 19:50 acfs1
Mount the HANFS Exported File System
Mount the HANFS exported file system on a HANFS client server named server1
[root@server1 ~]# mkdir -p /mnt/hanfs1
mount -t nfs 192.9.201.184:/mnt/acfsmounts/acfs1 /mnt/hanfs1
[root@server1 ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda5 71241024 6221804 61341944 10% /
/dev/sda3 4956316 141336 4559148 4% /tmp
/dev/sda1 101086 11378 84489 12% /boot
tmpfs 200776 0 200776 0% /dev/shm
- 192.9.201.184:/mnt/acfsmounts/acfs1
327680 144416 183264 45% /mnt/hanfs1
Set newly created ACFS file system as the ADR Base for RAC database instances pmdb_1 and pmdb_2
SQL> sho parameter db_nameNAME TYPE VALUE------------------------------------ ----------- ------------------------------db_name string pmdb SQL> sho parameter diagnostic_destNAME TYPE VALUE------------------------------------ ----------- ------------------------------diagnostic_dest string /u01/app/oracleSQL> alter system set diagnostic_dest='/mnt/acfsmounts/acfs1';System altered.SQL> sho parameter diagnostic_destNAME TYPE VALUE------------------------------------ ----------- ------------------------------diagnostic_dest string /mnt/acfsmounts/acfs1
Verify that various folders for diagnostic files for both the RAC instances have been automatically created in mount point /mnt/acfsmounts/acfs1
[root@host01 trace]# cd /mnt/acfsmounts/[root@host01 acfsmounts]# tree -d.|-- acfs1| |-- diag| | `-- rdbms| | `-- pmdb| | |-- pmdb_1| | | |-- alert| | | |-- cdump| | | |-- hm| | | |-- incident| | | |-- incpkg| | | |-- ir| | | |-- lck| | | |-- log| | | | |-- ddl| | | | |-- debug| | | | |-- imdb| | | | `-- test| | | |-- metadata| | | |-- metadata_dgif| | | |-- metadata_pv| | | |-- stage| | | |-- sweep| | | `-- trace| | `-- pmdb_2| | |-- alert| | |-- cdump| | |-- hm| | |-- incident| | |-- incpkg| | |-- ir| | |-- lck| | |-- log| | | |-- ddl| | | |-- debug| | | |-- imdb| | | `-- test| | |-- metadata| | |-- metadata_dgif| | |-- metadata_pv| | |-- stage| | |-- sweep| | `-- trace| `-- lost+found
Initiate some activity on both the instances so that alert log gets populated for both the instances in new ADR location
SQL> sho parameter instance_nameNAME TYPE VALUE------------------------------------ ----------- ------------------------------instance_name string pmdb_2SQL> alter system switch logfile;System altered.SQL> conn sys/oracle@host02-vip.cluster01.example.com:1521/pmdb as sysdbaConnected.SQL> sho parameter instance_nameNAME TYPE VALUE------------------------------------ ----------- ------------------------------instance_name string pmdb_1SQL> alter system switch logfile;System altered.
Modify the access privileges for the ADR to enable access by any user.
[root@host01 acfs1]# chmod -R 777 /mnt/acfsmounts/acfs1/diag[root@host01 acfs1]# ls -l /mnt/acfsmounts/acfs1total 88drwxrwxrwx 3 oracle asmadmin 8192 May 6 20:19 diagdrwxrwxrwx 2 root root 65536 May 6 19:50 lost+found
Verify that HANFS server host02 has successfully exported ADR for both the instances of RAC databases to client server server1
[root@server1 ~]# cd /mnt/hanfs1
[root@server1 hanfs1]# tree -d
.
|-- diag
| `-- rdbms
| `-- pmdb
| |-- pmdb_1
| | |-- alert
| | |-- cdump
| | |-- hm
| | |-- incident
| | |-- incpkg
| | |-- ir
| | |-- lck
| | |-- log
| | | |-- ddl
| | | |-- debug
| | | |-- imdb
| | | `-- test
| | |-- metadata
| | |-- metadata_dgif
| | |-- metadata_pv
| | |-- stage
| | |-- sweep
| | `-- trace
| `-- pmdb_2
| |-- alert
| |-- cdump
| |-- hm
| |-- incident
| |-- incpkg
| |-- ir
| |-- lck
| |-- log
| | |-- ddl
| | |-- debug
| | |-- imdb
| | `-- test
| |-- metadata
| |-- metadata_dgif
| |-- metadata_pv
| |-- stage
| |-- sweep
| `-- trace
`-- lost+found
View the contents of alert log of instances pmdb_1 and pmdb_2 from HANFS client server1
root@server1 trace]# tail -2 /mnt/hanfs1/diag/rdbms/pmdb/pmdb_1/trace/alert_pmdb_1.log Current log# 4 seq# 2 mem# 0: +DATA/PMDB/ONLINELOG/group_4.288.911156637 Current log# 4 seq# 2 mem# 1: +FRA/PMDB/ONLINELOG/group_4.260.911156645[root@server1 trace]# tail -2 /mnt/hanfs1/diag/rdbms/pmdb/pmdb_2/trace/alert_pmdb_2.log Current log# 1 seq# 9 mem# 0: +DATA/PMDB/ONLINELOG/group_1.282.911155419 Current log# 1 seq# 9 mem# 1: +FRA/PMDB/ONLINELOG/group_1.257.911155437
Verify that instance pmdb_1 of database pmdb is running on the node host02 where havip1 is currently executing
[root@host02 ~]# srvctl status havipHAVIP ora.havip1.havip is enabledHAVIP ora.havip1.havip is running on nodes host02[root@host02 ~]# ps -ef |grep pmonoracle 3794 1 0 19:08 ? 00:00:00 ora_pmon_pmdb_1root 13469 14328 0 20:47 pts/2 00:00:00 grep pmongrid 14560 1 0 19:31 ? 00:00:00 apx_pmon_+APX2grid 19589 1 0 18:27 ? 00:00:00 asm_pmon_+ASM2
In order to verify high availability of HANFS exported file system, shut down host02 where havip1 is executing
[root@host02 ~]# init 0
Verify that HAVIP1 has failed over to host01 and instance pmdb_1 has been terminated
[root@host01 acfs1]# srvctl status havip
HAVIP ora.havip1.havip is enabled
HAVIP ora.havip1.havip is running on nodes host01
[root@host01 acfs1]# srvctl status database -d pmdb
Instance pmdb_2 is running on node host01
Instance pmdb_1 is not running on node host02
Note that host01 where havip1 is currently executing has taken over the role of HANFS server and is now exporting the ADR over HANFS. As a result, alert log of pmdb_1 instance continues to be accessible from HANFS client server1 even though node host02 is down.