We have a project where we need to count transactions and group them by the time when they occurred, for example count the total number of transactions which happed every 2 minutes of each hour of each day.
Transactions can sometimes arrive in random order, for example, I can receive a transaction which occurred on minute 2 and then receive a transaction occurred in minute 1 as we have a queue and topics in place which could cause this due to retries and multi-threaded consumption of messages. We also have multiple timezones involved.
The # of transactions per time unit calculated is generated as a new event from CEP and sent to BAM.
I believe for this scenario, using application timestamp to control event timestamp is not a good idea, right?
Also, I believe the best approach would be to group the transactions by the time unit (minute) for example, via CQL and send them to be summarized at BAM. Is this the best approach?
You are right, if the application timestamp included in the event may be out of order (in relation to arrival time), then you should use the arrival time, that is, the system timestamp to derive time in CQL. This so because CQL does not support out-of-order events, otherwise a late event may invalidate a previously output event.
The simplest option would be to continue using system time-stamps, but if the application cannot cope with the fact that some of the output (number of trans) may be inconsistent in relation to the application time-stamp, then your approach of grouping by minute (single value) across a large enough window that would include all the events (even the late ones) for that 1-minute of application time should work.