This content has been marked as final. Show 10 replies
As stated above, I moved the controlfiles aside and then copied them back into place, like this:
mv control01.ctl control01-bak.ctlDid that for each controlfile, and then the database mounted.
cp control01-bak.ctl control01.ctl
But, after rebooting the machine, we discovered that all locks were back in place-- it looks like the system is locking the files on boot-up, and not letting them go. The lock is held by PID 1, which is init.
sculkget: lock held by PID: 1This is definitely looking like a major system issue, and not a DBA issue, and hence I have little right to expect assistance in this forum. But nonetheless I lay my situation out here before you all in case someone else recognizes my problem before the server bursts into flames!
The system is CentOS 4.5-- not my choice, but that's the way it is.
lk<SID> is a placeholder file while instance is running. If your database is down while you still having this file that means your instance processes were killed instead of proper shutdown.
ORA-00202: control file: '/local/opt/oracle/product/10.2.0/dbs/lkFOOBAR'
check your alert.log see if there's any error messages.
What's the mount option you had on the NFS mount point?
BTW, is your ORACLE_HOME /local/opt/oracle/product/10.2.0/ on a NFS mount point?
We can bypass the lock on the lk<SID> file simply by deleting that file. Then, upon restarting the instance, we encounter locks on the controlfiles. By performing the copying technique I describe above, we can get rid of the locks on the controlfiles and mount the instance. But at that point, we discover there are locks on all the datafiles.
ORACLE_HOME /local/opt/oracle/product/10.2.0/ on a NFS mount point?Yes, it is.
Because all oracle files (including ORACLE_HOME) are on NFS mounts, we unmounted these directories and mounted them on a new host. When we attempted to start the database, the lock errors persisted. Thus we know that the locks are on the level of the Isilon storage device rather than the DB host itself.
I thought of RAC in the midst of this, and said to myself, "I'm sure glad we're not on RAC".
We opened a ticket with the vendor. They recommended we restart nfsd and lockd on the Isilon and use a different NFS node, which we did, and that solved our problem. We are not satisfied that we have resolved the issue in the long term, but at least our DB is back up, and we can go to sleep knowing it was not the DB's fault.
Well looks like you temporary fixed the problem. but there are few things you might want to take a note to look for.
1. the mount options of your NFS mount point. Make sure it's follow the Oracle metalink doc recommendation.
2. make sure you umount and release the NFS from old node before mount on new host.
3. there should be similar release lock command in Isilon as Netapp. Check with Isilon support.
You can check the NFS locks in Isilon using below command,
isilon1-1#isi nfs locks list
svid@Hostname Lock Type Range Path
802@myhost exclusive [1, 18446744073709551615] /ifs/data/mydata1/database/pracle/oracle.changes.record
This will list the current NFS locks.
isilon1-1# isi nfs locks waiters
No waiters found.
This will list the waiting locks if any.
To release the locks do the following
isilon1-1# isi nfs clients remove myhost
Client myhost removed
isilon1-1#isi nfs locks list
No locks found
Let me know how it goes.