This content has been marked as final. Show 16 replies
The error has nothing to do with your application code.
It is to do with either
1. Number of File Descriptors
2. Incorrect setup of the Cluster FileSystem
If you are running RAC and your database file is under $ORACLE_HOME/dbs then I would expect that $ORACLE_HOME should be a Cluster FileSystem visible from both nodes of the Cluster.
What sort of storage and filesystem have been configured ?
Hi i got same problem with one of my database running on RAC.
are you sure that proper permissions had given to both the instances on this datafile.
in my case, we forgot to give permissions to second instance. so whenever our application tries to access the tables through 2nd instance, its giving the same problem what you faced.
so first check for OS level permissions.
I am have been working with Jason on this as I was the one who set up the database cluster. I have the file descriptors currently set to 95536, higher then the minimum required by oracle of 65536. As mentioned, the other tablespace in use for our student management system never runs into any errors or problems like the one Jason posted.
You are correct in thinking this is a cluster. We have the storage set up on a San with each node connecting via iSCSI. I followed the walk throughs and guides on configuring the ASM & OCFS. We assumed the cluster was functioning properly because when we tested the failover everything functioned properly with all the data properly failing over.
What "failover" are you talking of in RAC ?
Both instances should be able to concurrently access the datafiles in RAC.
If you have a user connecting to instance "P1" and accessing datafile 6 and another user connecting to instance "P2" the second user should also be able to acess datafile 6.
What seems to be happening is that the client is connecting to alternate instances using client-side load balancing, switching between instances after about 3 or so connections and being unable to read the datafile from the second instance.
This is our first time having and using an Oracle cluster & Oracle Database so the way we tested failover was by causing one node to go down to see if node 2 would pick up the DB, similar to a Windows cluster. Our tablespace that is currently in use is called WALL. The webpage was set up to access the Oracle DB through the VIP associated with the two nodes. When we have physically taken one node offline the other node picked up the VIP & database and kept the system running without any loss of data when multiple users have been accessing the tablespace of WALL. I used this guide to create the cluster. I had to get the system up and running because our current student management system was end of life and we were set to upgrade.
When you refer to instance P1 do you mean a user accessing node1?
Am I misunderstanding the way an Oracle cluster works? Does an oracle cluster function in the way a windows one does where you have active/passive or active/active; does the second node just act as a standby?
I apologize if this is oracle 101 kind of stuff but this has been a learn as you go process for me and am doing my best to better understand the way Oracle and RAC function.
Edited by: Frank Spin on Oct 22, 2008 10:23 AM
Edited by: Frank Spin on Oct 22, 2008 10:34 AM
Yes, "P1" is "node1" (properly speaking "the database instance on node1") and "P2" would be the "database instance on node2".
In RAC you would have both database instances running concurrently and a user can connect to either instance. The way the client session connects is defined by the tnsnames.ora and listener configurations. Thus, you can have "LOAD_BALANCING=ON" in the client side tnsnames.ora meaning that some sessions go to "P1" and other sessions (from the same client) go to "P2".
(The "client" is, of course, your web/application server or even a user's desktop using SQLNet to connect to the database).
You can use RAC as Active-Passive -- aka "failover" but that is a big underinvestment of in the hardware and software licences.
What I suspect is happening is that the table is in a particular datafile (datafile 6) which has been added from one node but, apparently , not visible on the other node. The datafile seems to be in $ORACLE_HOME/dbs and I wonder if your $ORACLE_HOME/dbs is really a Clustered FileSystem accessible from both nodes.
The fact that 3 sessions are successful and the 4th session fails leads me to the strong suspicion that your application is attempting a LOAD_BALANCING connection. Those sessions that go to "P1" can access the data because that datafile is visible (accessible) to "node1". The 4th session that goes to "P2" fails to access the data because "node2" is unable to access the datafile.
1. Are you using a Cluster fileSystem ?
2. Are all your database files ((Control Files, RedoLogs and DatabaseFiles) visible (and with the correct permissions) to both nodes of the cluster ?
3. Is your client attempting a Load_Balancing connection either through the tnsnames.ora file OR by Server-Side load balancing (the listener on one node redirecting it to the other node) OR by defining static connection strings to "P1" and "P2" ?
We could look at your listener.ora filles (on both nodes) and the the tnsnames.ora file on the client (ie web/application server) AND the values for the instance parameters
"services" and "local_listener" in the two database instances.
Hemant K Chitale
1. Are you using a cluster file system?
--Yes, I installed OCFS2.
2. Are your files visibile?
--When I run oracleasm listdisks on both nodes all four ASM volumes show up. I can also hit the /u02 directory from both machines which is OCFS2 formatted and on a shared storage device.
3. Load balance or server side?
--I am not sure how the client is set up. I know my coworker (original poster) created the tablespace from the Oracle Client (Windows) and the ODBC connector is set up to access the database through just ORCL and not ORCL1 or ORCL2, which would make me think it's trying to connect via load balance; his ASP page is set to use the ODBC connector on the Web Server. Our functioning tablespace was set up by the vendor's (Genesis Student Management System) through SQLPlus right in the terminal; and the webserver that is accessing that data is on a Linux machine. Could the variables of Windows/Oracle Client & Linux/SqlPlus have something to do with it as well?
Based on what you are saying, with the tablespace being created from a load balanced point it's possible it was created on Node1 but Node2 does not have permission to read or see this tablespace.
Our current web application that is in place, and has been for the past month, is set to point at both VIP addresses. When I run an ls -l on /u02 (directory configured for OCFS2) it shows that data was written for today.
I apologize if the following questions are ignorant but as mentioned I'm very new to this:
1. How would I go about viewing the listener.ora and tnsnames.ora files? (I will look this up in the mean time but if I'm unsuccessful any help would be appreciated)
2. Based on your thought process of one node not being able to access/write data, how is that when I have done "fail over"/HA tests on our functioning tablespace I've been able to access the database and data regardless of what node the database resides on? (when I run crs_stat -t -v it says ora.oracle.db online oracle# based on which node "owns" it)
3. We installed RAC to be used in an active/passive sense, could this be over kill for what we're looking to do with Oracle? We basically want a HA database so that in the event of one server going down, the database stays online. As mentioned it's for a student management system and at any given time we'll have around 200 users accessing the database.
Is the datafile ''/u01/app/oracle/product/11.1.0/db_1/dbs/ORCL_GENESIS' accessible from both nodes of the cluster ? (you say that you can see /u02 but the datafile is in /u01)
If you connect to sqlplus AS SYSDBA you could try
ALTER SYSTEM CHECK DATAFILES ;
ALTER SYSTEM CHECKPOINT;
on both nodes (P1->node1 and P2->node2) and see if there are any error messages in the database alert log file.
Hemant K Chitale
Edited by: Hemant K Chitale on Oct 23, 2008 10:46 AM
From node1 if I browse to the directory I see the file and has -rw-r----- 1 oracle oinstall permissions. From node2 I do not see the file.
When I run crs_stat -t -v the ora.orcl.db shows as ora.orcl.db application 0/0 0/1 ONLINE ONLINE oracle1. If I restart node1 and run the same crs_stat -t -v that string will say oracle2 instead of oracle1.
When I did the alter system commands, no errors were returned.
Presumably, /u02 is your ClusterFileSystem. Certainly not /u01.
If you have a DBA who knows how to move a database file, that would be good.
1. Take the tablespace offline. ALTER TABLESPACE tablespace_name OFFLINE;
2. Move the file.
3. Issue the command "ALTER DATABASE RENAME FILE 'old_location' TO 'new_location' ;
4. Bring the tablespace online. ALTER TABLESPACE tablespace_name ONLINE ;
Since this is a Production database, you should log an SR with Oracle Support for the proper instructions if you don't have a DBA.
Yes, /u02 is the Cluster File System. The white paper (previously posted in the thread) I used had me installing Oracle to the /u01 directory. If you don't mind me asking, how would the currently functioning database work when node1 would go offline? When I've physically taken Node1 offline the database has continued to work without errors. One other thing I did notice with checking the tnsnames.ora file on the two oracle servers and the web server with the ODBC connector was that the web server had a different tnsnames.ora file despite it being copied from the oracle servers.
We do not have an Oracle DBA so I will look into creating a SR about properly moving the DB.
select file_name from dba_data_files
select member from v$logfile
select name from v$controlfile;
to identify all the database, online redo and controlfiles.
A database is the database files.
An instance is the set of processes running on a node.
In RAC, you have 2 (or more) instances running on the 2 (or more) nodes concurrently accessing the same database.
One node may go down or die or be shot in the head, yet the database continues to be available via the second instance on the other node. That is the whole rationale of RAC.
Was installing Oracle onto /u01 the wrong thing to do? Or do I need to still install to /u01 and then move the file(s) to a shared directory and repoint Sql to look there instead of /u01/app/.....
I apologize for the questions I'm just trying to understand what I did wrong or if I missed something. From what has been discussed it seems like RAC is in place but working in Active/Passive, so that when Node1 has the database you have to access it through Node1 and vice versa; what should have been done is have the db files moved to a shared location and point Sqlplus to that directory, thus causing Active/Active?
I just got done speaking with the DBA at the vendor we use and I was explaining to him what was going on. What we found was exactly what you were stating and I apologize for not understanding it right away. When the original poster created the database it was through an ODBC connector using SQL Developer. He created the tablespace parent_portal under the USERS tablespace which is not a shared directory, it's on the local machine. Our other functioning database WALL_DATA & WALL_INDEX are located on +ORCL_GENESIS which is on the ASM. We moved the parent_portal tablespace to the ASM instance and are going to test it out tomorrow.
I appreciate all the help you provided on the error we were receiving and apologize for any confusions or frustrations I was creating by not giving enough information.