The excitement of launching a new venture often leads to hasty decisions and the accumulation of unnecessary resources, resulting in skyrocketing monthly bills. In this post, we delve into the typical cloud path of startups, the challenges they face in managing costs, and the importance of setting up infrastructures properly from the start. We'll also provide you with practical tips and strategies for optimizing cloud expenses based on our experience at Devolut.
Typical startup cloud path (when it comes to number of resources and cloud costs that they generate) looks like this:
The excitement of starting a company (or a new product) is high, let's just spin a stack that supports the initial phase of our development and let's keep costs at a reasonable level
Great, we now have the first version of our app/service/product running, but it's on infrastructure, who knows how it was set up (usually manually through the cloud console?) and it is only in development mode, it's time to build a set of environments that can support our growth. Let's have a Production environment, a Pre-production environment, a UAT environment, and maintain the current development environment. Maybe we even need a Staging environment. Why not? We are now mature and have a viable product!
Waaaait, it's great to have multiple environments, to have the ability to do development and testing in multiple phases, but our AWS/GCP/Azure bill went from X to almost 10X! We are burning money at a rapid pace, akin to a first-time investor caught in a cryptocurrency bubble! This needs to stop, our product is still in its early phase, and we don't have many paying customers, so let's, please, revisit our setup & infrastructure to bring costs down to earth.
Now we are getting rid of 1-2 environments (we can live without UAT and Staging, let's keep Production, Pre-production and Dev), we also reassess the sizes of our Kubernetes clusters and the instance types we are using, checking if our apps can use fewer CPU and RAM, and explore storage policies to minimize data storage
By this time, the company has burned tens or even hundreds of thousands $ on both cloud costs and has spent additional resources on engineering efforts to bring down those costs. Moreover, acquiring several 'cloud cost optimization' tools or services has become almost a necessity in this process.
And all this could have been mitigated if things were set up properly from the start and if the right metrics were used to determine the necessity and cost of new infrastructure services.
Devolut takes pride in making customers’ infrastructures optimal in size and in resources, while always having the option to scale flexibly as traffic grows and services are utilized by end-users.
If you find yourself in the later stages of development where multiple environments have been spun up and the need arises to streamline them, our approach would involve profiling your stack. We would identify the bare minimum of essential infrastructure services that you cannot do without, and proceed with the removal or downscaling of everything else. This process includes working closely with developers to optimize their applications, reducing their CPU and RAM requirements. We also ensure that proper Kubernetes requests and limits are set for each application, and valuable metrics are established to monitor resource usage, accompanied by dashboards and alerts. Most importantly, we establish appropriate thresholds that allow the infrastructure to scale up or down while ensuring uninterrupted service availability throughout the entire process.
In case you are just starting your business or a new stack, we would be involved from day 1, working together with your engineering team and making sure your stack is never over-provisioned and that there are no dangling resources spending your money.
The process includes (not limited to) the following:
Choosing the right instance types for your services (you need more CPU or you need more RAM or you have balanced ratio for those, etc)
Making sure that various services have just the right amount of storage (databases, volumes provisioned for VMs, or k8s clusters, etc)
Making sure that apps do not have any unnecessary outgoing traffic, there are many stories out there where outgoing cloud data transfer have nearly bankrupted companies
Setting proper requests and limits (especially for RAM) for your apps, that controls what they are consuming (you can end up with wrong requests where an app takes more RAM than it actually needs as a request parameter, that can lead to k8s node being used up and new apps can't start/schedule on the same node, so your scaling system would bring additional nodes in the cluster and effectively increase your cloud bill)
Providing the option for partial or full upfront payments for environment/services that are expected to be live for an extended period
Utilizing your cloud billing console and making sure we are alerted in case of certain events (we would establish multiple thresholds based on various parameters)
Implementing tooling around general infrastructure and Kubernetes costs, check tools like https://www.infracost.io/ , https://www.kubecost.com/ , and https://github.com/opencost/opencost
Implementing monitoring and logging systems to track resource usage, it can be done in many ways (we like to work either with Prometheus/Grafana/Alertmanager or Datadog, among other options)
Building new environments only when absolutely necessary. Although this sounds obvious, we often see a tendency to build new copies of environments just because someone suggests that having 4-5 environments (Prod, Pre-prod, Staging, UAT, Dev) is necessary for a serious company. However, the truth is that each company should make decisions based on its own needs and requirements.
These are just a few of the tips we employ when working on setting up cost-optimal cloud infrastructures for our customers. If you would like to learn more about how we can help reduce your cloud bills, please don't hesitate to reach out to us at hello@devolut.io.