We had an interesting issue last week with high CPU usage (100%) on one of our servers. This caused quite a few operational issues and I wondered if there was any way round them from a JMS configuration point of view.
We run on WIndows 64bit, and have two clusters of 4 WebLogic JVM's configured in our domain and use SAF to forward messages from one cluster to another - this is purely one way.
When we had the high CPU usage, we had a very large backlog of messages on the SAF queue.
The server with the high CPU was also the first server in the remote JNDI context for SAF, so whilst the server wasn't down, just very busy, messages on the SAF queue were taking an age to transfer across. This caused us the bulk of our problems.
Is there any way in the SAF configuration that we can change it so that just because a server is maxed out on CPU, that it will try another server in the JNDI remote context?
Be grateful for any ideas.
So your looking for a way to force clients to load balance to a different SAF Agent if the server's CPU is maxed out?
Nothing comes immediately to mind except two things:
1: brute force - for example a script that monitors CPU usage and kills off connections if it knows that new connections can load balance to a less busy server
2: see if there's a way to prevent large backlogs from disproportionally accumulating on a single SAF agent in the first place
A note about load-balancing, just as background. There are at least four levels of load balancing:
-- A client URL determines the host for the client's JNDI context. T3 randomly chooses among the addresses in the URL to settle on this host (it doesn't start with the first host in the URL).
-- A client ConnectionFactory "createConnection" call essentially randomly load balances among the servers that are among the CF's configured targets. The resulting connection sticks for the life of the connection. (The policy is random unless you happen to have configured a different RMI load balancing policy - it's possible to set this so that it biases towards the JNDI context host, but rare...).
-- A client "createProducer" call to a SAF imported destination again load balances among the destination's members according to the client's connection factory "server affinity" and "load balance" settings -- where "affinity" is based on the connection host. It then sticks for the life of the producer. To avoid double hop routing from client->connection->member, it's probably best to make sure affinity is enabled. (The CF "affinity" and "load balance" setting only applies to the producer create step - SAF messages from a single producer currently cannot round-robin among servers.)
-- If a message has a "UOO" or "UOW", this forces messages to route from the clients connection host to the member that hosts the UOO or UOW.
Hope this helps,