Forum Stats

  • 3,853,846 Users
  • 2,264,287 Discussions
  • 7,905,475 Comments

Discussions

Oracle.ManagedDataAccess and open DB Sessions

Jindo
Jindo Member Posts: 7
edited Feb 14, 2014 5:03PM in ODP.NET

Hello,

We're using the Oracle.ManagedDataAccess.dll (v4.121.1.0) on a 64-bit .net 4 server application to connect to an Oracle (v11.2.0.3) database via Entity Framework (v4.4.20627.0).

We have two instances of our service which run from the same machine, but we have had some trouble where a connection error occurs and then when attempting to reconnect, one of the instances does not release Sessions on the DB instance and our DBA has to restart the DB to kill these off.  Both instances show some sort of disconnection, and then when they try to reconnect one instance starts to maintain Sessions on the DB when the connection fails.

Strangely, the other instance does not do this, despite running from the same machine and connecting to the same DB with the same connection string.

Looking at the log files from our service, it's apparent that a slightly different error is issued on the attempt to retry the connection:

"Bad" instance (note - no ORA code in stack trace):

System.Data.EntityException: The underlying provider failed on Open. ---> Oracle.ManagedDataAccess.Client.OracleException: Connection request timed out
   at OracleInternal.ConnectionPool.PoolManager`3.CreateNewPR(Int32 reqCount, Boolean bForPoolPopulation, ConnectionString csWithDiffOrNewPwd, String instanceName)
   at OracleInternal.ConnectionPool.PoolManager`3.Get(ConnectionString csWithDiffOrNewPwd, Boolean bGetForApp, String affinityInstanceName, Boolean bForceMatch)
   at OracleInternal.ConnectionPool.OraclePoolManager.Get(ConnectionString csWithNewPassword, Boolean bGetForApp, String affinityInstanceName, Boolean bForceMatch)
   at OracleInternal.ConnectionPool.OracleConnectionDispenser`3.Get(ConnectionString cs, PM conPM, ConnectionString pmCS, SecureString securedPassword, SecureString securedProxyPassword)
   at Oracle.ManagedDataAccess.Client.OracleConnection.Open()
   at System.Data.EntityClient.EntityConnection.OpenStoreConnectionIf(Boolean openCondition, DbConnection storeConnectionToOpen, DbConnection originalConnection, String exceptionCode, String attemptedOperation, Boolean& closeStoreConnectionOnFailure)
   --- End of inner exception stack trace ---
   at System.Data.EntityClient.EntityConnection.OpenStoreConnectionIf(Boolean openCondition, DbConnection storeConnectionToOpen, DbConnection originalConnection, String exceptionCode, String attemptedOperation, Boolean& closeStoreConnectionOnFailure)
   at System.Data.EntityClient.EntityConnection.Open()
   at System.Data.Objects.ObjectContext.EnsureConnection()
   at System.Data.Objects.ObjectQuery`1.GetResults(Nullable`1 forMergeOption)
   at System.Data.Objects.ObjectQuery`1.System.Collections.Generic.IEnumerable<T>.GetEnumerator()
   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)

"Good" instance (note ORA-12537 code in stack trace):

System.Data.EntityException: The underlying provider failed on Open. ---> Oracle.ManagedDataAccess.Client.OracleException: ORA-12537: Network Session: End of file ---> OracleInternal.Network.NetworkException: ORA-12537: Network Session: End of file
   at OracleInternal.Network.OracleCommunication.DoConnect(String tnsDescriptor)
   at OracleInternal.ServiceObjects.OracleConnectionImpl.Connect(ConnectionString cs, Boolean bOpenEndUserSession, String instanceName)
   --- End of inner exception stack trace ---
   at OracleInternal.ConnectionPool.PoolManager`3.CreateNewPR(Int32 reqCount, Boolean bForPoolPopulation, ConnectionString csWithDiffOrNewPwd, String instanceName)
   at OracleInternal.ConnectionPool.PoolManager`3.Get(ConnectionString csWithDiffOrNewPwd, Boolean bGetForApp, String affinityInstanceName, Boolean bForceMatch)
   at OracleInternal.ConnectionPool.OraclePoolManager.Get(ConnectionString csWithNewPassword, Boolean bGetForApp, String affinityInstanceName, Boolean bForceMatch)
   at OracleInternal.ConnectionPool.OracleConnectionDispenser`3.Get(ConnectionString cs, PM conPM, ConnectionString pmCS, SecureString securedPassword, SecureString securedProxyPassword)
   at Oracle.ManagedDataAccess.Client.OracleConnection.Open()
   at System.Data.EntityClient.EntityConnection.OpenStoreConnectionIf(Boolean openCondition, DbConnection storeConnectionToOpen, DbConnection originalConnection, String exceptionCode, String attemptedOperation, Boolean& closeStoreConnectionOnFailure)
   --- End of inner exception stack trace ---
   at System.Data.EntityClient.EntityConnection.OpenStoreConnectionIf(Boolean openCondition, DbConnection storeConnectionToOpen, DbConnection originalConnection, String exceptionCode, String attemptedOperation, Boolean& closeStoreConnectionOnFailure)
   at System.Data.EntityClient.EntityConnection.Open()
   at System.Data.Objects.ObjectContext.EnsureConnection()
   at System.Data.Objects.ObjectQuery`1.GetResults(Nullable`1 forMergeOption)
   at System.Data.Objects.ObjectQuery`1.System.Collections.Generic.IEnumerable<T>.GetEnumerator()
   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)

Do you know why the "Bad" instance doesn't seem to close its Sessions on the DB, wheras the "Good" does?

Many thanks

Jon

Tagged:
user9958911
«1

Answers

  • Alex Keh-Oracle
    Alex Keh-Oracle Posts: 3,129 Employee

    Are you disposing your the connection objects (or the associated EF objects) that error out?

    Is the same client code being used on both instances? Any notable differences in the client configuration between the two?

    How often does a connection error occur?

    Is this ODP.NET, Managed Driver from the ODAC release or from the main RDBMS client download?

  • Jindo
    Jindo Member Posts: 7

    Hi Alex,

    Yes - we enclose our EF objects in using (...) blocks, so they should get disposed automatically, and we also call the static OracleConnection.ClearPool(...) command when an OracleException is thrown.  Looking back at both of our stack traces though, the top-level exception thrown is not an Oracle exception, so I believe that ClearPool is not actually being called in this particular situation.

    Both instances are running identical code from the same deploy, and their configuration (for Oracle connection strings) is identical.  We've only seen this error occur once, for a few hours in the past two weeks, but because it consumed a lot of connections and required a DB restart, our DBA is keen for us to find a solution to prevent it happening again.


    The driver we're using ("Oracle Data Provider for .NET 4 12.1.0.1.0") came from the "ODTwithODAC121010.zip" file from http://www.oracle.com/technetwork/topics/dotnet/utilsoft-086879.html

    Thanks,

    Jon

  • Alex Keh-Oracle
    Alex Keh-Oracle Posts: 3,129 Employee

    Let me see if I can describe your situation. You have a managed ODP.NET EF app, which is orphaning sessions. Those orphaned sessions eventually produce a timeout error, which I assume is from hitting Max Pool Size or some session limit. To workaround the timeout error, you can increase the Max Pool Size and/or the DB session limit, depending on what is limiting a new sessions. Or you could call ClearPool and try to continue running the app after hitting a timeout error. You had indicated that ClearPool wasn't actually called yet.

    Both these solutions just treat the symptoms, but don't address the cause. The cause is the orphaned sessions. It is not not known yet why they are being orphaned or how quickly. The error happened once in the past two weeks. One question I would have is whether those orphaned sessions accumulated slowly or pretty rapidly. Knowing the triggering operation/circumstances would be the most helpful in deducing the root cause. Anything unusual happen to the DB instance at this time (i.e. lost network connectivity, but then restored)?

    Considering that the problem happens intermittently, I'm looking for unique circumstances around the time or right before the time the problem occurs.

  • Jindo
    Jindo Member Posts: 7
    edited Dec 19, 2013 5:32AM

    Alex,

    I think that summarises the situation well.  I'm guessing that we see the timeouts because the DB instance is maxed-out and doesn't return when we request a new connection.

    To add some more context, the code that is throwing the exceptions is called from a polling routine which checks the database every 15 seconds, if a poll fails it is logged and then retried again 15 seconds later.  Looking at the logs of the service, I think that the first exception was logged at 01:58 and other services using the same DB started reporting failures at 02:48, so I'm guessing that the accumulation took around 50 minutes.  We're aware that the "Database Change Notification" feature exists, but unfortunately we didn't have much luck getting it to work on our DB instances and due to time pressures we reverted to polling the DB until we can get some more development time to get it working.

    I'll email our DBA to ask if there was anything that happened to the DB instance around this time that may have triggered the situation.  In the meantime, I'll update our source code to call ClearPool when an EntityException is thrown.

    Thanks

  • Jindo
    Jindo Member Posts: 7

    Alex,

    I've spoken to our DBA, and there was apparently no issues logged on the DB instance around the time that the exceptions started at 01:58.  However, we did get some DB Instance log information as to when we hit our hard limit on connections:

    2013-12-12 00:02:18.149000 +00:00
    ORA-00020: maximum number of processes (200) exceeded
     ORA-20 errors will not be written to the alert log for  the next minute. Please look at trace files to see all the ORA-20 errors.
    

    We've been discussing the situation here today and we have a theory about what might be happening.  We've noticed before that if a connection errors in ODP.net, the faulted connection is returned to the pool and doesn't get removed unless you call ClearPool manually.  Assuming that that's still the case in this version of ODP.net, and given that our code currently doesn't call ClearPool, does the following process sound plausible:

    1. Code attempts to open a connection to DB Instance.
    2. An error occurs in the connection, and the faulted connection is returned to the Connection Pool.
    3. As the faulted connection is not cleaned up, is it possible that it maintains a connection to the DB Instance, depleting the connection count?
    4. Gradually as further connections are attempted from our polling code, and if they all fault, the pool will fill up with faulted connections, all holding open a connection to the DB Instance.
    5. Once the default pool size is hit (100, according to Connecting to Oracle Database), then all connections are faulted in the pool, and all further attempts to connect will fail from the service until ClearPool is called or the service is restarted.
    6. As we have two instances, perhaps they both get to this point, and they account for the 200 connections on the DB Instance


    I'm not sure why the connection would fault in the first place (Step 2), but we do have exceptions in the log indicating that the connection is broken.  Do you think that this behaviour might explain why the number of open connections on the DB Instance hits the hard limit of 200?

    Thanks,

    Jon

  • Alex Keh-Oracle
    Alex Keh-Oracle Posts: 3,129 Employee

    Is this an Oracle RAC setup with two instances? If so, do you have Fast Connection Failover (HA Events = true) turned on?

    One thing you can try is to dial down your cumulative Max Pool Size to be a little lower than the DB server connection max setting. That way, you ensure it's not too numerous valid connections being created causing the problem. It could explain why the problem happens intermittently, especially if it occurs during legitimately busy times for the app.

    Invalid connections are not supposed to be returned to the connection pool. ODP.NET checks connection validity before returning a connection back to the pool. If the connection is invalid, it's destroyed. Of course, this is all theoretical behavior. There always could be a bug. It's possible that the connection becomes invalid while in the pool, such as a loss of network connectivity or DB instance is bounced. Fast Connection Failover would address this latter scenario.

    Can you provide some additional data points?

    1) Turn ODP.NET tracing on to level 7. You can do this in the <oracle.manageddataaccess.client> of your .NET config file:

    <settings>

      <setting name="TraceLevel" value="7" />

      <setting name="TraceFileLocation" value="C:\"/>

    </settings>

    Your trace could be very long, depending on how quickly the problem reproduces.

    2) Provide your ODP.NET connection string. You can leave off the user id and password.

    3) Turn on ODP.NET performance counters. Any unusual behavior in the pool (i.e. only 100 connections in the pool, yet the DBA reports seeing much more than 100)?

    You can send 1) and 2) to dotnet_us (at) oracle.com. I don't think there's an easy way to share performance counters info over email.

  • Jindo
    Jindo Member Posts: 7

    Alex,

    Thanks for that - unfortunately we couldn't reproduce the issue today with the tracing in place, but we will try again later.  In terms of setup, I don't think we have a RAC setup in place.  I'll update our config to reduce the size of the connection pool too.

    We're down to minimum staffing over the New Year period now, so I won't be able to try this until early Jan 2014.  I'll reply once we're back to full staffing levels,

    Thanks

    Jon

  • Jindo
    Jindo Member Posts: 7

    Alex,

    We've managed to do a bit more work on this now.  I've not had much luck getting the performance counters to work on my PC, but we have got to the point where we are able to reproduce the issue (although not with 100% determinism) by producing a test app which fires many hundreds of requests at our service.  Our Connection String has the following parameters in it (I've removed source, username, password etc):

    persist security info=True;Self Tuning=True;Min Pool Size=3;Max Pool Size=25;Incr Pool Size=3;Decr Pool Size=1;Connection Timeout=20;Connection Lifetime=59;
    

    yet when we check the V$SESSION view on our database whilst the test is running we see that in some cases the number of connections from the process exceeds 25 (in one case we had 140 connections).  I've collected a trace file using the app.config values you provided, but unfortunately it's 1.6Gb in size.  I'll see if I can trim it down to the relevant points and then I'll email it to you.

  • Alex Keh-Oracle
    Alex Keh-Oracle Posts: 3,129 Employee
    edited Jan 22, 2014 9:06PM

    Reply to this thread when you send this. I haven't seen it in the inbox yet.

  • Jindo
    Jindo Member Posts: 7

    Alex,

    Apologies - I haven't had the opportunity to send it yet.  Can you let me know what sort of keywords you're looking for in the log file (is there a particular error message you are expecting to see for example) so I can trim it down to the right period please?

    Thanks,

    Jon

This discussion has been closed.