We’ve application which synchronizes customers data at certain time intervals. For that we’re using JMS (HornetQ) with Java EE MDB’s (workers) as these synchronizations are asynchronous and running in the background. One JMS message contains one customer ID. When JMS message is picked up by MDB instance (worker) from queue then it starts syncing data from the external system for that one customer (we identify him through ID). This background synchronization can last from seconds up to several hours. It all depends on how much data the customer has (we fetch historical data for the last several years and external system has varying performance).
Our synchronization module mostly works without problems. But from time to time we’ve seen that long running synchronizations blocked messages in queue to be processed, although there were idle MDB instances left to process them. And even more interesting was that these blocked messages were subsequently processed by those MDB instances which performed these long-running synchronizations.
The problem was that some of the messages had stayed in the queue unnecessary long although we’ve had many idle workers. Annoying for small customers who would otherwise synchronize almost immediately.
Having considered many variations and searching the Internet, we came across this article: https://developer.jboss.org/thread/233664
I quote:
„HornetQ consumers buffer messages from the server in a client-side buffer before the client consumes them. This improves performance: otherwise every time the client consumes a message, HornetQ would have to go the server to request the next message. A network round trip would be involved for every message and considerably reduce performance.
…
By default, the consumer-window-size is set to 1 MiB (1024 * 1024 bytes). “
To sum up: each MDB instance can preload a set of messages into its buffer. And the other instances, although they have nothing to do, are not allowed to process these messages. Which meant that in some cases MDB instance could process large customer, which took a long time and at the same time hold small customers in its cache and process them last. Of course, it depended on the order in which these messages were put in the queue and how quickly and in which order the individual MDB instances picked them. Therefore, this situation didn’t always occur. Downside of parallelism is always debugging…
How we fix it: for this specific queue we set the consumer window size to 0 to have no buffer on the MDB side.