Blameless postmortems

We don’t know yet how to build the perfect money transfer service, but one thing we know is how to screw up!

What’s a screw up at TransferWise? There are plenty of variants. The most obvious one is introducing a bug in production that requires an emergency release. It can also be a bug in a release that requires a rollback to the previous release. This can be either our web application (the one serving the customers), our back office application or one of the multitude of micro-services we have these days.

The usual screw up is that a page of the web application or the back office doesn’t load. Or it is missing an important part. There are also the type of screw ups that causes transfers to be delayed in our backoffice.

A screw up can also not be code related at all. For example regulations can force us to close a currency.

When it hits the fan, and usually once the problem is fixed, we share with the rest of the team a postmortem. In it, we explain shortly:

  • what happened;
  • what was the impact;
  • what caused it;
  • how we fixed it;
  • and what we will do to avoid such problems in the future.

The purpose of these postmortems is quite obvious: We want the whole organization to learn and improve, not do the same mistakes again.

Most importantly, these are blameless postmortems. Screw-ups are expected to happen. In fact, if they don’t, it means we are not taking enough risks and moving too slowly. We can’t afford that. Nobody will blame you for screwing up. It’s actually almost the opposite: A well-written postmortem with good action points will often get you praise.

On the other hand, if the postmortem doesn’t go to the point, you will be challenged by others. For example, sometimes we see in the “how we will prevent it from happening again” section fixes that it’s clear the team will never do. They would take too much time for too little benefit. We are a fast moving startup and there are issues we will just not spend time trying to fix.

In addition to encourage people to take risks, the blameless aspect is also very important for transparency. If people are afraid to get blamed, they will usually hide problems, at the risk of creating even bigger problems. Highly unhealthy.

Blameless postmortems are part of the culture we have at TransferWise. It’s with things like this that we build something with real impact.

You grow at the rate you solve problems. Growth is constrained by talent, intellect and culture, not marketing dollars.