We have a Distributed JMS queue in our application which has the redelivery retry limit set as 14 with a Redelivery Delay Override as 86400000 ms. That means, a message in this queue will be retried until 14 days, one retry per day and then reaches error destination. We have a logic in our application where it takes JMS property, to retrieve current Redelivery count of each message in queue and log it in a table.
We noticed the following behavior for few messages in this JMS queue:-
Suppose take a message M1 put in this queue today say 1/5/2013. And this message should ideally reach error destination queue after 14 days i.e. on 1/20/2013 only if the message is not successfully delivered. And on 1/7/2013, redelivery count is as '2' and 1/8/2013 as '3' which is correct. But on 1/9/2013 it is still as '3' and say on 1/15/2013 it will be having a redelivery count as '6' which is not correct.
So question is why few messages in a JMS queue are not getting retried on some particular days. And the message which is supposed to be reaching error destination on 1/20/2013 will reach probably on some date after this. Is there any reason for this kind of behavior for JMS messages? Can any one help me in understanding this?
I don't recall that there are any known problems with long term timers.
* Retry your testing with, say 10 second, instead of 1 day timers, to see if you can recreate the problem with a quick reproducer.
* Regardless, note that the redelivery count cannot be exact - there are conditions were a redelivery can occur even if the application never got the message, which would cause the message to appear to "disappear" for a day. Also, shutting down your server for long periods would throw off your current algorithm. In addition, it looks like a large backlog could also throw off your algorithm - if say, a message becomes visible, but it still takes a long time for your system to retrieve and process the message (due to a long wait in the queue and/or a lengthy message processing step), then the number of tries-per-day is necessarily going to be less than 1.
* I'm not sure, but I think it might help your design if you can consider using a more frequent retry, plus some sort of very simple application logic that checks the age of a message and reacts to 14 day messages rather than introduce a dependency on the delivery count. Also, you might want to consider reducing the "MessagesMaximum" configuration setting on your connection factories to one (the default is 10) -- this will reduce/eliminate the chance that a redelivery of a message that's already in the pipeline of an asynchronous consumer/MDB, but not seen by an application, is silently forced to redeliver on an error.