There’s a lot of talk in the industry about microservices and monoliths. A few years ago microservices were still early in the hype cycle and being rapidly adopted. These days most companies use microservices in production, at least to some degree.
Many articles come out heavily favoring microservices or monoliths, while in reality the choice is fraught with subtle tradeoffs and not at all an easy choice.
This article explores the tradeoffs in this choice in a few different categories, and how we might make a more educated decision by carefully weighing the pros and cons for the particular circumstances at hand.
An important aspect of sustainable software engineering is the development of sensible interfaces between different parts of the system (“components”). This is not new wisdom but with microservices you’re forced to create an API which means this is more top of mind than before.
But this is actually less of a tradeoff than it initially appears. If we take for granted that sensible interfaces between components is important, then we can just as easily create such interfaces between components within a single service as we can between services. In fact it’s even easier: we get the help from our compiler to catch interface misuse and we don’t need to version our interfaces within a service.
So in terms of sensible interfaces, they are as important regardless of where you fall on the monolith – microservices spectrum, but they’re easier to change within services than between.
Having well-defined areas of responsibility means that each component has a clear boundary and a well defined owner. In my experience this is critical to sustainable software engineering, because with no owner or with “collective ownership” the quality of a component degrades over time (see Tragedy of the commons).
When it comes to the monolith – microservices spectrum, generally a distinct service naturally lends itself to a clear area of responsibility and ownership. When you have several components within a larger service, with different owners, it’s slightly easier to work on “both sides of the fence” at once which has a tendency to muddle the ownership.
Having a quick feedback loop is critical to developer productivity. It’s why we value fast tests and build times and so on. When it comes to service oriented architectures, the speed of the feedback loop varies heavily depending on whether or not your change is local to one service.
When it is local to one service you have a great experience: type systems and compilers to catch errors, debuggers, refactoring support, and more. Testing within a service is generally reliable and provides high confidence during development. As soon as two or more services are involved most of that goes out the window.
With two or more services, most developer tooling breaks down. RPCs (Remote Procedure Calls) lack type safety between client and server. No compiler catches issues like “you’re calling this endpoint with the wrong types or parameters”. Debugging (at least with a traditional debugger) ends at the service boundary. Refactoring RPC endpoints across client and server is a messy affair and usually requires several deployments to roll out. Testing across services is generally flaky, slow, and — especially if the services live in different repositories — poorly maintained.
These are a real pain points when working with multiple services and mean that to optimize for developer productivity, you must be very careful about where you draw the line between services that will interact in nontrivial ways. This is to me one of the bigger dangers with the “micro” in microservices: it encourages a culture of not drawing the lines carefully by chanting “smaller is better.” Sometimes it is; sometimes it is not.
The larger your service, the larger fraction of your entire system’s functionality is running within it. While that may be an obvious statement the implications are often overlooked.
It’s usually quite feasible to boot up multiple services and connect them to each other, so that you end up with the same amount of system functionality running locally regardless of where you fall on the microservices spectrum. The missing piece is the concept of friction.
It’s largely irrelevant if something is feasible; what matters is whether it’s done. In my experience there is a large difference in practice in how much of the system you run locally depending on where you fall on the microservices spectrum. What this means is that microservices tend to result in the majority of development happening in an environment where many of the changes are never tested against the components they interact with (because they are in different services) before merge.
In an ideal world you would design your tests to validate the behaviors you want your application to perform— nothing more or less. In practice many implementation details of the system need to be considered when designing tests. A particularly large factor that influences how you write tests is the size of your service.
While we have many options for how to write tests within a service (unit, integration, and so on), tests that span multiple services are generally reserved for system or end-to-end testing. Such tests tend to be slow and brittle. As a result, if there is a critical behavior of your application that needs to be thoroughly tested, it is desirable if that functionality is within a single service.
My general philosophy about API versioning is that it’s the least bad approach to gradually migrating clients to a new release of the service. It’s particularly desirable when we don’t have full control over the clients, or there are so many of them that anything else would be infeasible.
I called it “least bad” because even though it’s the best way we know, it’s still expensive to introduce multiple versions, support them, work with clients to migrate, go back and clean up the old version if everyone finally migrated, and so on. So if we can do without it, we should.
The easiest way to avoid API versioning and migrations is simply to avoid having APIs. Or rather, fewer APIs. We can accomplish this by moving toward the larger services side of the spectrum, where what was previously an API between services is now often a function call between components. That way we side step the problem of API versioning altogether.
We have as an industry largely discovered the key ingredients to horizontal scalability: distributed (meaning several replicas), stateless backend services each handling many concurrent requests, with a suitable data store depending on the scale.
Many people automatically substitute “distributed, eventually consistent data store” for “suitable data store”, though for many workloads that’s not needed and only gives you something you don’t need (additional scale) at the cost of more complexity in the application.
The main difference that people focus on when it comes to microservices and scale is that when you break things apart into smaller services they can be scaled independently. That’s true. What is often overlooked is how often the load of two would-be services is actually heavily correlated. Either because they both scale with usage of the application as a whole, or more simply because one calls another. As a result the “independent scaling” is often less important than it’s made out to be simply because both services would be at the same scale.
There are cases where different services tend to require very different scales. These tend to be cases where the same service is used for many different APIs across the whole system. A common example is services providing information about the user, which is often needed by many different services (for example to check if certain operations are allowed). As a result, for a typical request flow for a single logical request by the user where many services are contacted, the user service may receive many requests as each of the other services calls it. Having such a service scaled independently (and usually leveraging large amounts of caching) is critical.
Beyond scaling to handle a large amount of load, the performance and latency of any given request is also important. As there is a relatively large overhead for making cross-service requests over the network, going far down on the microservices spectrum is generally associated with greater latency from the client’s perspective.
Being able to isolate failure is a very valuable property. This is an area where microservices shine. When something goes wrong it tends to take the whole service down with it, and the smaller that service is the lesser the collateral damage. Failures of one service tend to cascade to other services that interact with it, and so even if the failing service is small, it’s a benefit from the failure isolation point of view that other services are also small. It means that even in the case of a cascading failure, it may be isolated only to a subset of all functionality rather than affecting the whole application.
There are a few aspects of microservices that make debugging harder. The primary reason for this is that with microservices it’s more unlikely that the bug will be isolated to one tiny microservice. Bugs that span services are much more cumbersome to track down. We touched on this in the Feedback Loops section. Attaching a debugger to step through the code becomes harder. Your data is distributed between multiple databases (since you don’t share a database across services, right?), so exploring the data is tedious. And you’re more exposed to distributed systems problems: cascading failures, flaky networks, race conditions, eventual consistency.
The tradeoffs in the microservices - monolith spectrum are many and complex. Simplifying slightly, the general conclusion is that development and debugging is easier with fewer services, as our development tooling is better suited for function calls and single processes.
Microservices have benefits at runtime, but this comes at huge operational complexity. Scaling your application is simpler with a service oriented architecture. Having services introduces clearer boundaries of responsibility and ownership, while they at the same time become harder to change.
Ideally we could get the best of both worlds -- great development tools and great runtime properties. That's what we're trying to build here at Encore.
To conclude, as with all tradeoffs the extremes are rarely optimal. This is why, in my opinion, both the Microservices School and the Monoliths Camp are inadequate and harmful to our industry. They encourage adopting an extreme viewpoint ("super-tiny services are the only true way" or "a single service to rule them all") rather than a more nuanced approach.
I encourage everybody to carefully consider the particular challenges their application faces and place yourself somewhere on the spectrum with intentionality (or, even better, consider different approaches for different parts of your application).
Thanks to Peter Seebach for reviewing a draft of this.