This isn’t a microservices diss. I think going the multiple services route is inevitable as organisations scale up and teams start splitting up into more focused roles and responsibilities.
The point is that the services should not be “micro”; and that startups shouldn’t invest in multiservices architecture too soon. The focus should be on:
- shipping features in the shortest time possible, with
- a small team (at most 10 engineers per manager) while
- servicing as many users as possible
To be able to do so, the solution is to have “scalable” “monoliths”.
Capabilities of a “scalable monolith” architecture:
1. Application’s state should be in a database only
This is the bare minimum requirement of any scalable application. An application which hold’s state is hard to scale, because it’s replicas won’t be aware of the internal state.
For example - suppose the application is an API Rate Limiter, if the number of requests per IP is stored in a
HashMap in the application, and the rate limit is 100 per IP per minute; how will the application holistically count the number of requests per IP - which may be 80 on one replica, and 50 in another, thereby defeating the purpose of the application.
2. The database should itself be scalable
Scaling should be built into any other technology leveraged by the application, not just the database. A non-scalable piece of technology added to a scalable application will become it’s bottleneck. The most common reason for it will be reduced performance under load.
For example, if 1000 instances of the API Rate Limiter are writing to just a single instance of Redis, it may not be able to handle the load - connections and reads/writes combined. A slow read/write behaviour will increase latency of web requests reaching the customer’s application - thereby making your product worse than any competitor.
3. Any other services required (like a queueing system) should be scalable and versatile (doing more is better)
The “scalable” aspect of the heading is covering by the previous one. I want to focus on why I think versatility is important here -
Reduced DevOps overhead. Developers in a startup should focus on shipping features, rather than spending time on maintaining any infrastructure. The more a technology you use can do, the more the reason to adopt it at this point of time. As engineers, we generally believe that “Doing one thing best” is the path to success. In this case, it’s not!
Nowadays, databases like Postgres have become versatile enough to behave both like a database (obviously), and a queueing system (with
SKIP LOCKED SQL), and a good enough cache (with
hstore extension), and more. Imagine the DevOps overhead we are saved from, and the additional time gained in - Shipping. Customer. Features!
In the example before, the Redis cluster will be able to act as a Pub-Sub for your application too!
Deployment strategy of “scalable monoliths”
Scaling “horizontally” means adding identical instances of your application - same CPU, same memory, same storage, same everything, to the infrastructure.
Every application will have its own complexities involved in deployment. With scalable monoliths, we want to minimise the manual intervention required to scale our application. So here’s the strategy:
1. Use “managed” services as much as possible
Your cloud provider offers a variety of “managed” services - at a small premium. “Small” in terms of the amount of time spent working on menial DevOps issues versus the hourly pay rate of the DevOps Engineer(s).
Most managed services have the “scalability” aspect in-built into the solution they are offering - which makes scaling just a click away.
2. Use “Alarm-based scaling”
Cloud services like Cloudwatch from AWS and Monitor from Azure make scaling easy based on the limiting factor of your application. These alarms can be setup with minimal effort, mostly at the beginning of your services creation - Setup once and forget. Just be sure to double check while setting thee params for scaling your application.
For instance, consider the Redis cluster in the example above, the limiting factor is the memory - so your alarms must be setup to scale based on memory usage of the cluster. Similarly, in Node.js applications, the limiting factor is generally memory, but it could be CPU usage also - you will have to figure it out through a staging-like environment and then setup the production alarms.
If just one factor does not suffice, you can always setup alarms for multiple limiting factors. Then, just be sure to check if you’re scaling too fast - which would require you to have “grace periods” for your alarms.
“Scalable monoliths” aren’t a new concept. It has existed for decades. I believe all budding startups should go this route until they hit serious scalability issues. Don’t go doing whatever is hot in market. Remember, Segment?
Not all bleeding-edge technologies will solve problems for the scale your company is in. I believe Engineers love to introduce complexity into systems before they are even ready to scale - interviews are the prime example where interviews want interviewees to make system design complex before even understanding use cases of the application for which they are solving.
So, dear Engineers, solve problems specific to the scale at which your company is, with the challenge being - keep things simple!