We have a complex issue.
Client (actually originally we had the issue in the appserver, but we also reproduced in a win box): Win, ODP.Net unmanaged 126.96.36.199
Server: Oracle 10.2.0.4.0-64b running on AIX.
The simptomps: symple select running for long time. Same select take seconds in SqlDeveloper.
The query is of the form:
"select * from <Phisical Table> where field1=:0 and field2 between :1 and :2"
there are indices defined, that are being used when we call from SqlDeveloper, however, when we call from .net, using the following pseudocode:
DbCommand liacmd = liaFactory.CreateCommand();
DbParameter lioParameter1 = liacmd.CreateParameter();
lioParameter1.ParameterName = ":0";
lioParameter1.Value = "70100100347";
DbParameter lioParameter2 = liacmd.CreateParameter();
lioParameter2.ParameterName = ":1";
lioParameter2.Value = "20130101000000";
lioParameter2 = liacmd.CreateParameter();
lioParameter2.ParameterName = ":2";
lioParameter2.Value = "20131017000000";
DbDataReader liareader = liacmd.ExecuteReader();
With the unmanaged .net provider, the ExecuteReader takes around 3 minutes, the Read() 15 minutes!
If i switch to the OleDb provider, the Execute+Read takes less than a second in total.
We have the ODP.Net trace level 63, it taking long time in actions like:
TIME:2013/10/18-08:38:58:186 TID: b54 (ENTRY) OpsGetAvailPhysMemory()
TIME:2013/10/18-08:38:58:186 TID: b54 (EXIT) OpsGetAvailPhysMemory(): returnValue=0 Line=1068
However with the old ODP.Net unmanaged 188.8.131.52 the time was spent in...
TIME:2013/10/18-08:01:07:453 TID: c80 (ENTRY) OpsSqlExecuteReader(): (1c0f38)=0
TIME:2013/10/18-08:09:57:013 TID: c80 (ENTRY) OpsMetAddRef(): (1c3668)=0
Any ideas on how to debug this mess (in the production db)?
I would recommend opening up a SR with Oracle Support. It's likely Oracle will need to debug in the underlying software layers ODP.NET is calling. It's most expedient if you can reproduce it in a non-production environment.
To narrow down where the problem may lie, I would look to answer the following questions: Does this problem only occur on one machine (i.e. production, not on test machines)? Does this behavior occur with other queries? If so, what do all those queries have in common? What if the parameters are replaced by literal values?
Thanks, answering the questions:
Does this problem only occur on one machine (i.e. production, not on test machines)?
> when it happens, it happens everywhere (production/test machines, but allways when accesing the production 10g db).
> Today the problem is not happening, so i cannot test, waiting until the next occurrence so I can continue testing alternatives.
Does this behavior occur with other queries? If so, what do all those queries have in common?
> Other queries behave ok, is just this particular simple query. It is weird, I have to recheck, but apparently even changing the parameter value will make the query work in some occasions.
What if the parameters are replaced by literal values?
> Good advice, will try that the next we experience the problem.
So now we are just waiting for the next time the problem appears, to be able to try more options.
In the afternoon, from another factory (now in Chile) the performance issue started to appear in two products (while the products that were not working last week continue to come fast)
Using that new window of opportunity and the access i had from a client to the production db, i tested a version with the sql and verified that the problem are the parameters.
As soon as i do the query without parameters, is as fast as OleDb.
We are in the process of migration of the prod db to ora 11g, so in the meantime we will test the oledb provider to see if we can use it in production until the new version.
Today there is another situation,
if i run the query using Prod=:0 and Tms between :1 and :2, it takes many minutes.
If i intercharge the order of the where, using Tms between : 0 and :1 and prod=:2, it runs in less than a second.
also if i run prod=:0 and tms>=:1 and tms<=:2, it runs fast.
this behavior in OleDb and in the .net provider... ?any ideas how to diagnose this???
What are the data types you are retrieving from the Oracle DB? What are the data types you are binding to the SQL statement?
If one side uses Unicode (.NET, which it does by default) and the other side does not (DB columns), that could cause the slow behavior you observe. What would happen is that the DB would need to convert all its data over to Unicode for the query, which would also affect your existing indexes for those columns.
If you want to diagnose more quickly, you need to either open a SR with Oracle Support or publish a more complete test case. There are further traces that can be done at the Oracle Net, OCI, and Oracle DB layer to diagnose, though I'm not as knowledgeable about these areas as Oracle Support is.