Just sharing my experience with this error "ORA-25408 cannot safely relay call". Although we are not been able to find out the exact cause of this error, but will share some of the experience that me and my team is facing. Will keep on updating this blog as we progress.
Lets first of all look at what Oracle documentation has to say about this : -
One may receive ORA-25408: can not safely replay call, when using failover. if you are doing an insert and the database goes down and failover occurs, "ORA-25408 can not safely replay call" is expected and the application should handle this exception and re-execute the insert.
Developers while running a batch job many a times but not very frequent, came across this error and there transaction got cancelled.
As per the oracle documentation we tried to find out weather there is any failover or node eviction is happening during the error time stamp. After digging a lot into alert log file we did not find any signs of node eviction or failover of the sessions.
May be anybody reading this blog would be able to suggest the right way to search or look for the transaction failover. what we were doing was getting the exact time stamping from the developer when the transaction failed and searching into the alert log file for the same timestamp.
But we could not find any error or signs of error before or after that provided time stamp. This was really crazy.
This went for almost more then a year to find out for some clue to detect the cause.
Finally we saw some smoke in the room, that could take our search to some new findings. What we saw was that the job they were running was interacting with DB2 database thru OTG (Oracle Transparent Gateway). One thing was confirm that some how the session is getting killed or failed. If not by eviction but by something else.
So we started digging into this DB2 connection, and we found that some how the sessions to the DB2 were getting timeout. I am not very much sure that this ORA-25408 has something to do with this sessions timeout. But still thought of working on it it.
After having a discussion with DB2 DBA's we discovered they have defined a parameter IDTHTOIN (Idle thread time Out). This parameter was set to a very small value and timeouts a session if it is idle for some time. We requested the DBA's to increase this time so that we can test our issue. After requesting a lot they were ready to increase this timing on dev environment. After the parameter was increased we did not faced ORA-25408 atleast on dev environments. But we were not sure weather this parameter has really impacted the issue or not. We were still getting this error on prod env.
Presently we are in talk with DB2 DBA's to increase this parameter in prod env so that we can confirm the issue was the same. But the DB2 DBA's are very reluctant to change this parameter on prod env. They wants a full proof surety that this parameter will do the work for us. Now My team is just planning how can we give a full proof surety that it will work. But we are pretty much sure it will work.
I request readers of this blog to put on some light into this issue ans share there experience.
---- Will update this blog once we come up with some further developments in this issue.