Product Engineering Principles @ TransferWise
At TransferWise, we have seen phenomenal growth in the last few years - growth in users, transaction volumes, and team. Our engineering team has grown from 50 to 120 in the last 18 months. With this kind of growth, it’s easy for the company culture to evolve and degrade very quickly unless we reiterate and be mindful of our key principles and values we operate on.
I am often asked by engineers at conferences, potential hires, startup founders and others in the industry what are the key principles we organize around while building the TransferWise product. I thought I'd pen down some thoughts on this.
Before we hit the main principles we follow, here’s a quick primer on how our teams are organized. As of today, TransferWise has about 25 teams - each of which is autonomous and independent, that focus on customer-centric KPIs which eventually drive our growth. A mouthful but let’s break this down.
Teams are autonomous: Teams own their decisions. They decide how to evolve the product by talking to customers and by looking at data. The teams seek input and are challenged by others in the company on their decisions but eventually they are the final decision makers.
Teams are independent: Teams own their destiny. We try to keep cross team dependencies to a minimum. While there are some cases where a team may need help from another team to get something done, we try and avoid this as much as possible.
Teams focus on customer-centric KPIs: Teams solve customer problems. They are organized around a specific customer (team organized around a specific region, say the US) or a specific problem all customers face (making payments instant across the world). Given teams are autonomous and independent, they can pick what they want to work on but everything they work on has to drive a customer metric. Our product moves money around the world. Our customers tell us constantly that they care about their money moving super fast, conveniently, for cheap. So everything a team does looks to optimize these factors. The team should be able to stand up in front of the entire company and explain how their work impacts the customer and which metric it moves.
Now that we’ve got the team setup out of the way, let’s talk about how Product Engineering works at TransferWise. Here are the key principles we follow.
Hire smart people and empower them
Our product is a function of the people who build it. That means how we hire and who we hire has a massive impact on what ends up becoming our live product. For product engineering, our hiring process includes the following steps:
- Take home test
- Technical interview with 2 engineers
- Optional follow-up technical interview
- Product interview with a product manager and an engineer
- Final interview with our VP of engineering or co-founder
While this interview loop may seem familiar, most candidates comment about the product interview being a unique experience. In the product interview, we focus on your ability to put yourself in the shoes of a customer, understand what customer problem you are solving, how you would build something to get validation on an idea and then iterate on it to deliver a stellar customer experience. Read Mihkel’s post on demystifying product interviews to get more details on what we cover and why.
Once hired, product engineers are empowered to move mountains. Engineers chose which problem to solve, why, what the customer impact will be and the prioritization of their tasks. Of course, this should be in line with team goals and not solely based on individual goals.
Weak code ownership
As mentioned above, we believe in teams being independent. A big part of this is that teams don’t have dependencies on other teams. But how does this work in a large organization with an ever evolving and growing product?
Let’s take an example. As our product expands across the world, every country has different rules on what data we are required to verify on our customers. Let’s say as we launch in Australia. There is a new regulatory requirement to check some additional data on Australian customers. This requires Team Australia - an autonomous and independent team focused on the Australian customers - to make a change to our verification engine. But the verification engine is owned by the Verification team. In a lot of organizations, Team Australia would request the Verification team to pick up this change on their roadmap. But the Verification team also has a lot of such requests from other teams. They also have their own projects to improve the core verification engine to support all our different regions. So what usually ends up happening in other organizations is Team Australia can’t move as fast as they desire as they are dependent on the Verification team and their priorities.
This is why we follow the weak code ownership principle. In this setup, every part of the code is owned by a team but a team is allowed to change any part of other team's code. Sounds like chaos but there are some basic enforcement rules around this. The owning team sets the rules that other teams have to follow to play in their codebase.
In the above example, instead of the Verification team making the requested change, Team Australia is empowered to make the change in the verification codebase. But they have to follow the rules set by the Verification team to commit to their code base. These rules are up to the owning team to decide on. They could be something like below:
- Before taking on any major change, the team making the change must go through a design discussion on said changes with the owning team.
- The team making the change has to follow certain design patterns
- No code will be accepted without adequate test coverage
- All changes have to go through peer-reviewed pull requests
This setup allows product engineering teams to be independent and helps teams remove dependencies on other teams and allows teams to iterate at the pace they desire.
We compare this setup to an internal open source project. Owners define the rules to play with and own the codebase and others can commit and make changes as long as they follow the rules. As an additional benefit of this setup, owning teams are incentivized to make their code more modular and add relevant tests so that another team cannot easily break things. This leads to code readability and higher quality.
Product engineers focus on customer impact
In a lot of companies engineers never talk to real customers. Some engineers we talk to during our interview process don’t really know who they are building the product for.
Information flow in a lot of companies from customer to engineer:
Information is lost with every person introduced along the way.
At TransferWise, being a product engineer means you get close to customers. You talk directly to customers, look at data to make decisions and understand how the features you build are evolving and being used by different customers in production. We use Looker and Mixpanel as our analytics engine and this is available to everyone in the company. Anyone can run queries and slice and dice the data the way they desire.
Product engineers also take customer calls, chats and respond directly to customer emails. Here’s an example of a challenge our co-founder Kristo set out to inspire engineers to take more calls and get closer to our customers.
The resulting picture speaks for itself. :-)
No one else can build your product but you
Given how involved engineers are in analyzing the data, talking to customers, understanding the reason to make a change, and how fast our iteration cycles are, we believe that we cannot just write down our specifications and have someone outside our company build the product. We don’t do offshoring, outsourcing or use consultants to build our product. This doesn’t mean we don’t have independent workers (i.e. non-salaried employees who work at TransferWise engineering). We do. Some of them have been with us for a long time and are critical contributors. But they are embedded within our teams and operate the same way any other employee does. They get close to our customers, take part in all decisions.
Some rules, more common-sense
We have a few rules that are standard across the entire product engineering organization. We believe teams should be able to pick the tools to get the job done within certain limits (more below on limits). All our teams run sprints but it’s up to them to define their sprint cadence. It has just happened that most teams run a one week sprint but now we are seeing some teams looking to move to a two-week sprint as their projects get more complex. Similarly, some teams follow scrum to the book, while some do kanban and others run their own variation on scrum.
That said, we have a few common sense rules:
- Product engineers to own their code in production. This means managing your own releases, monitoring your code in production, getting alerts when something goes wrong and being present if your system is having an issue. We believe this incentivizes the right behavior. When you know you will be alerted at 2AM when something goes wrong in production, the quality of code that gets shipped tends to be better.
- We have weekly technical exchange sessions called “TeX”. It’s a forum where product engineers share knowledge on various technical topics. These can range from design discussions, changes made to a specific part of our system, new technologies we should be investigating.
- We are a JVM shop. We are open to other languages. We have some PHP, Node running around but our main stack has always been a JVM with our monolith application written in Groovy on Grails and our microservices written in Java 8 on Spring Boot. We believe language wars are good conversations over beers but try to avoid them at work and get on with building product.
- If you want to introduce a new language or that shiny new technology to our system, it’s simple! Do a TeX and invite your fellow engineers. Explain to them the specific benefits of introducing this technology. Do an honest pro and con analysis and explain why it’s worth the rest of the engineers to go through the learning curve to pick this technology up. This is crucial! As we have weak code ownership people need to be able to make changes to parts of the system they don’t own. So new technologies introduced not only impact the specific team owning the service but also impact other engineering teams.
Honest blameless postmortems
This one is probably our favorite principle. Everyone makes mistakes and when you move fast, things break. The key is how we recover from these mistakes, what we learn and how we prevent them in the future.
In most companies, an individual isn’t really incentivized to ship fast and learn with the fear of breaking things. One is rarely rewarded for taking massive risks to improve something tenfold as the risk of breaking something is much higher. People tend to get dinged on their annual reviews when they break something leading to a production outage. So what ends up happening is people triple check their work and put in more and more safeguards for that one possible failure that can happen.
We want to build a culture where we aren’t afraid to fail, but are accountable for our failures and make the cost of a failure smaller and smaller.
One major tool we use to reflect, learn and be accountable for our failures is public honest blameless postmortems.
Vincent wrote a great post on what makes a good postmortem. The intent of the post mortem is to go deep into why something happened, how many customers were impacted, what were our learnings and what measures did we put in place to make sure this doesn’t happen again. People challenge postmortems publicly if the postmortem isn’t deep enough or doesn’t have real learnings.
Culturally this is one of the most empowering and powerful tools we have in the company. We started this in product engineering but this has evolved where we do public postmortems across the company on most teams.
Like any model of organizing, this model has challenges too. Below are a few challenges we have learned along the way:
- Duplication of effort: With autonomous independent teams, we can have some overlap in work done by different teams. We try and counter this by having a few people in the organization who spend time across teams and have a view on what different teams are building. This would include engineering leads who spend time with different teams and get an understanding of successes and challenges each team has. So when a team starts building a new service with similarities to another service being worked on by another team, we try to consolidate effort and get both teams on the same page to hopefully not duplicate effort.
- Collective decision making: Sometimes it’s just hard to get the whole team to align on a decision taking varied opinions into consideration. We counter this some of this by running small teams so there are fewer people who need to get on the same page. Also when teams get stuck they seek out help from others in the organization who have been in a similar situation before or could help them break a gridlock.
- Singularity in vision: Given we have devolved decision making to teams, there’s no one person who calls all the shots. We have a company mission but teams can decide their own way to achieve the mission. This can be especially unnerving to some folks given they can't just go over to one person and ask for direction or say "I am doing this as the CEO wants it."
- Communication: With teams being independent and working on their specific problems, we tend to run the spectrum of teams that over communicate to make sure others know what they are working on and to those who under communicate. TransferWise runs primarily on Slack. We have specific channels for sharing things cross team. We also have horizontal virtual teams called guilds where engineers get together to work on a problem that cuts across the organization. For example, we have a performance guild which has representatives from different teams. This is a group of individuals who are interested in building tooling, standards, and practices to help all our teams improve the performance of our systems. They focus on building the required monitoring, alerting for everyone to use. That said, we are still learning how to improve communication across teams as our organization grows.
Why do we operate this way?
As a start up we have a major weapon - speed! When people closest to the customers are making the decisions, we can run more tests and iterate quicker as compared to a setup where teams rely on managers to make decisions. In the latter, managers become bottlenecks slowing down decision making. Additionally, they usually aren’t as close to the day to day operations and the customer feedback loop to make an informed decision. In our setup, we believe we get more fail and pass signals faster.
We fundamentally believe companies that iterate faster, fail faster and learn faster will succeed in the long run. That means to learn faster than others we need to optimize for speed with checks for building a high-quality customer experience that our customers love. This is the main reason for our setup.
We realize that running this way has its drawbacks as listed above but we believe we can take these drawbacks and solve for them while we optimize for speed.
This is, of course, something that has worked for us so far and we will have more learnings as our product and company evolves. We will share those learnings as we go along. We would love to hear your thoughts on what you optimize for, how you do it, and scenarios where you think the above setup doesn’t work.
Thanks to Harald, Jordan, Martin Magar, Taras, Vincent for their input.