The Impending Problem - A Tangled Mess
The goal of microservices is to produce independent systems that have loose coupling. This allows for the ability to manage and release upgrades without impacting other applications. However, many first implementations of microservices make the developers believe that the only way to communicate with another API is through a client. Thinking that obviously this is part of the whole process, they design many APIs that rely on other APIs and the chain goes on and on, and in some cases back to the original API for more data. As you can imagine this becomes a tangled mess. Imagine if one API in that chain is slow, or goes down, or sees a spike in traffic that the whole system could come to a halt. Imagine that a single call has two or more calls to make how long would it take for a response? Or even worse, imagine causing a deadlock of a system due to a circular dependency?
Outside of the tangled mess, client management would become a nightmare with changes that could affect other APIs or tons of versions to support old APIs. So in the end…why did we ever break up our monolithic application? Why even bother?
Oh yeah…that loose coupling thing that will give us independence…but how?
Possible Solution - Event Driven Design
So the goal is to limit calls made to other APIs, yet still be able to utilize necessary data from them. The most obvious answer is to just have a copy of other APIs relevant data. This can be done through events or messages. An event occurs and a subscriber to an event takes the relevant data it needs and stores it in a format it likes. There are many different implementations of this, from pushing a message to a queue, a pub/sub system through Redis, Kafka, or other technologies, or even within an actor model like in Erlang or Akka. All of these technologies provide a model where multiple APIs, workers, or methods are all listening for a new event to occur, and when it does they take what they need so they can use it later or act on it now.
This gives us the independence we want from our systems. An API could go down or upgrade or see a spike in traffic and it should not have a major effect on other APIs. This also prevents chaining of API calls and reducing latency because the time that it takes to get the data is as fast as the database connection. The data in the database also becomes more relevant to the service since it can choose which fields to store. For example, a Notification API may only care about a User’s id, emailAddress, company, firstName, and lastName instead of the 19+ rows we have currently to describe a User. Its only concern will be to watch for create, update, delete events that may occur. Other than that there is no need to communicate with an API to retrieve user information.
With this level of loose coupling, impacts on other systems can be minimized, the system as a whole can become more robust, and in the end we remove all of the tangled wires that would be created.
Designs are not without trade-offs.
By adding new database or tables per API we are adding complexity to both the API and the underlying system, which can make testing and deployments a bit more complicated. In general when we see complication like this arise we get concerned that things might break. However, one could argue that microservices themselves add this complexity and that adding a database or table is nothing compared to the complexity of writing and maintenance of client libraries.
If we look at what makes a successful microservice architecture we will see that it all comes down to how well you test your service in isolation and then see how it works as part of the overall system. Overall systems should then be tested but a structure should be put in place to allow for quick deployments and rollbacks.
So in the end, yes things are going to be more complex, but we knew that was going to happen when we switched to microservices. Didn’t we?
An obvious red flag is that we will now be replicating data across multiple systems and that data could get out of sync with one another. The solution to this is more of a policy issue. One API should be the “system of record” for its domain. Domain objects like Company, User, Licenses, etc., need to house and maintain data relating to that specific area. On a create, update, or delete it is the responsibility of that API to push an event notifying its subscribers that something has changed. It is then essential that the underlying messaging system allow for some sort of message guarantee, whether it’s a queue or a history, so that if a consumer is down and comes back online it can receive the data necessary.
That is to say that when an API receives an event it has the right to change that data in any way that it wishes as long as it is consistent. The tradeoff you receive here is one of speed, control, and resilience by not depending on a single API for necessary data to your process.
But What Circuit Breakers
Microservices often implement a circuit breaker system to allow for fail-over systems for clients, databases, etc. So how is that not a solution for the problem described at the beginning?
Let’s continue with the circuit breaker analogy:
The circuit breaker in your house flips if a set of outlets get overloaded to prevent your house from catching fire. Or they will flip if there is a surge of power coming into your house. In either case it protects the flow in either direction. However, the problem we face is if you plug a surge protector into one outlet, then another surge protector into the previous surge protector, and so on and so on…eventually you have tons of wires and interconnected systems—the problem is not with the circuit breaker on the outlet but what’s all connected to the outlet.
Use Case - Licensing
All of our products use licenses and are managed by our Admin App. Currently the Admin App makes a call to a single API to add licenses for all of our products and is managed in a single database. Other APIs make calls to fetch the license or just access that table to fetch the information. But shouldn’t each product know about its own license and make decisions based on that? What if an application is using an API to check if a license is valid but that API is down, should that part of the application grind to a halt?
Lets refactor the flow:
First the Admin App makes a call to a License API with a POST for Training, Cyberstrength, ThreatSim Phishing, and Phish Alarm. After the the license information is validated and persisted the License API throws an event up that a company has new licenses. The TrainingAPI grabs the Training and Cyberstrengh license information because it wants to know if it’s allowed to send out Training and Cyberstrength Assignments. Threatsim and Phish Alarm similarly grab their licenses. Each now can check to see if actions from a company within their APIs are valid based on if it is licensed.
Now for some reason the License API goes down and a user is still within the system creating an assignment and decides to navigate to ThreatSim to create a campaign. The end user experiences no outage because the ThreatSim App is able to see that the user is licensed based on the company from the user’s token, so everything is able to continue as normal.
Is this a silver bullet to solving all microservice problems? No. But if we start untangling the wires and start thinking in a new way we may be able to create a more stable and robust architecture for our products.