This content has been marked as final. Show 13 replies
Will the standby database take over the duty of the active database when the active database fails?
If not is there only a read-only standby database and how high availability of the active stand pair works?
Could you please be more specific on what you mean by active database ? Do you mean to say that the active database is the Primary database ?
If so, then yes. The standby database can start behaving as the primary database when the current primary database goes down. You need to perform a failover option in that case.
After I shutdown the active/primary database, I issue ttRepStateGet in the standby database and output shows it's still in standby status.
I have configed client DSN to support failover,when I restart my application the exception is thrown:
Caused by: java.sql.SQLException: [TimesTen][TimesTen 188.8.131.52.1 SERVER]Client failover configured but server is not in ACTIVE state
Seem that no active node is available?
The database (and client) failover is not automatic. TimesTen provides the mechanism but it has to be invoked by some external monitor process or HA framework (typically we recommend Oracle Clusterware).
If the active master fails or is shutdown then the standby needs to be promoted by declaring a failover. This is done by executing the following builtin procedures at the standby:
call ttRepStateSave('FAILED', 'failed-=store-name','name-of-host-of-failed-store');
Then the standby will become active and take over the various roles related to cache refresh, cache propagation, subscriber propagation etc. As part of this failover the client failover processign will happen too.
If you use Oracle Clusterware and the TimesTen Clusterware integration then all this is handled automatically (this is the recommended approach) otherwise you need to provide your own HA monitor mechanism (possible but pretty complex to implement).
For my java application, I usually to use apache commons dbcp to pool jdbc connections.
I know that I can register a connection event listener on the Timesten jdbc connection for async notification.
When my event listener receive a failover event it just destroy all pooled connections in the dbcp and direct the dbcp to reinitialize new connections
and then issue those statements you mentioned above to promote the standby node to be the active.
Not sure if this works or not or I have to employ Oracle cluster to solve this problem and hope that Oracle cluster is easy to get started...
As I mentioned before, there are two aspects to this; database failover and client failover. Database failover is initiated by calling the builtins as I previously described. Something outside of TimesTen has to do that when it detects a failure and decides to failover. You can write./script your own framework to do that (very complex and hard to get right) or use Oracle Clusterware (a free of charge product but can be a bit complex to setup and needs certain resources). The client failover is predicated on the database failover; when the database failover occurs any clients configured for client failover will failover at that time. You can't have client failover without the database failover happening. Note that client failover only works for true client/server connections; it can't be used for direct mode connections.
Thanks Chris,I have got it.
What you said in one word is that client failover is dependent on database failover.
If no database failover happening,client failover will never happen.
So the key point is how to detect database failure in time and then promote the standby to the active.
Oracle cluster comes to play again...
I'm crazy now...Please, don't be :) As Chris has already written, you can use Oracle Clusterware (Grid Infrastructure) for switching the roles in Active Standby Pair replication automatically in case of failure.
Basically the biggest problem is Oracle Clusterware (GI) installation. You can find a lot of documentation about it. I would recommend the brilliant article by Jeffrey Hunter (http://www.oracle.com/technetwork/articles/hunter-rac11gr2-iscsi-088677.html).
The next step is setting up the Active standby pair replication. Documentation here (http://download.oracle.com/otn_hosted_doc/timesten/1121/aspair_cache.html)
There is one more example of setting up the AS pair replication by using Clusterware (184.108.40.206). (http://ggsig.blogspot.co.uk/2010/07/tech-oracle-clusterware-and-oracle.html, sorry for Russian).
I hope this help.
Thanks for your links.
A fast and more simple solution may like this but I'm quite not certain:
A detection program initially establish connections to both databases,
and then issue "select 1 from dual" statement to check the database status periodically for health check.
Any failures such as IO exception imply that the database is shutdown or crashed.
If the failure occurs on the connection to the active node then my program will issue the built-in procedures
such as ttRepStateSet and ttRepStateSave on the standby database to switch the role to the active.
When the standby database change its role, it will send notice to all clients to failover.
The above is my assuming, if not work please let me known and the worst choice for me is to use Oracle clusterware.
That will work for a simple test/demo environment but it is not robust enough to use for real. If implementing a robust HA monitor and management framework were that easy then we wouldn't need Clusterware...What about all the different failure conditions, combinations and corner cases? Please don't make the mistake of thinking that this is close to adequate for production usage. You will just create a heap of pain for yourself. State management and co-ordination across multiple nodes is a difficult and complex problem which is why cluster managers have evolved into the thinsg they are today.
Just one simple example; the network that connects your detection program to both nodes is working fine but the network between the TT nodes is not. How does your monitor (a) detect this and (b) handle it? What of the network between the monitor and one of the nodes fails but the connection to the other node and between the TT nodes is okay? How is that detected/handled? Ensuring that you 'do the right thing' in each case and don't take any action that could result in loss of service? And these are the 'easy' cases...
Chris, your feedback is reasonable.
For network outage problem we can make sure that all servers such as
the active stand pair and the detection server where my detection program is deployed are connected via the same switcher or router.
It gives me some restrictions but may make the problem more simple.
There is a potential problem that the role switch will not happen in time because of periodic scanning.
I have asked for help to our DBA and hope she will master the usage of Oracle Clusterware.
Thank you again,Chris.Have a good weekend.