When Groovy, assert and threads gang together to fool you

At TransferWise, we use Kafka for our messaging needs, to communicate between our micro-services and with our legacy Grails monolith. A few days ago, I faced the strangest problem with the Kafka message processing in Grails.

The code that consumes Kafka messages works in the following way: It fetches the Kafka message, saves it to the database, acknowledges it to Kafka and then processes it in a separate thread. If the processing fails, there is a job that picks up unprocessed messages for a new try. With this approach, we are sure to never lose a message and we can also easily avoid duplicate processing.

The problem was that for some messages, the processing failed on the first attempt and worked on the second. But logs gave no explanation for the failure! There was no error message, no exception. The only thing we knew was it was not hanging, just failing silently.

Then I remembered, we process the message in a separate thread. With Spring or Grails, we are used to see every exception end up in the log file. But this does not apply to code executed in separate threads. Threads will die silently, if we don’t catch and log exceptions explicitly. Was this my problem?

No, that couldn’t be it, the try/catch in the run method was there:

try {
    // ...
} catch (Exception e) {
    log.error("Exception occurred during processing message….”, e)
}

In despair, I had this crazy idea:

[12:22] long shot, but we catch only Exception, maybe it's an exception that doesn't extend Exception...

I didn’t really believe at this theory myself. After all, things extending directly Throwable like Error are for problems we cannot handle at all, OutOfMemoryError, VirtualMachineError, etc. It's not something that would happen regularly for some events only.

Still I added catching Throwable there and almost forgot about it.

Later, a colleague released our Grails app and got alarmed by a very scary new exception:

org.codehaus.groovy.runtime.powerassert.PowerAssertionError: assert serviceUrl

That was caused by my two extra lines:

} catch (Throwable e) {
    log.error("Throwable occurred during processing message... ", e)
}

So what couldn’t happen was happening, an Error was thrown.

Deep in one of the method called by our event processing, there was an assert that was failing:

assert serviceUrl

Wait a minute, in Java assert are DISABLED by default.

But this is not Java code, it’s Groovy. And in Groovy, asserts are ENABLED by default.

And a failed assert throws an AssertionError, which extends Error!

Why did this affect only the first attempt at processing the event? Before hitting the assert, there is a check against the age of the event. Events over 1 minute old didn’t execute this code path at all, therefore they didn’t fail.

So worth remembering:

assert in Groovy != assert in Java

to catch all exceptions, at least in Groovy, catch Throwable