The “cold start” problem

5 min readMay 7, 2022

Content:

What is the“cold start” problem?
The basic/popular approach to solving the “cold start” problem
Overview of the one advanced approach
Summary

Introduction

Nowadays, a Function-as-a-Service (FaaS) paradigm is one of the popular approaches for building highly scalable applications. It offers high use of data center resources by allocating resources on-demand at per-function request granularity. This paradigm allows for the implementation of a serverless computing model, in which cloud providers manage and deliver resources. FaaS services are now available from the majority of cloud platforms, including Amazon Lambda, Google Functions, Azure Functions, and others.

The main benefits of using FaaS are:

Improved developer velocity. With FaaS, developers can spend more time writing application logic and less time worrying about servers and deployment. This typically means a much faster development turnaround.
Built-in scalability. Since serverless code is inherently scalable, developers do not have to worry about creating contingencies for high traffic or heavy use. The serverless provider will handle all of the scaling concerns
Cost efficiency. Unlike traditional cloud providers, serverless FaaS providers do not charge their clients for idle computation time. Because of this, clients only pay for as much computation time as they use, and do not need to waste money over-provisioning cloud resources

As FaaS is a relatively new development paradigm, it comes with its own set of concerns that must be considered when implementing these services. The so-called “cold start problem” is one of the most significant challenges in the serverless era. A “cold start” problem can be defined as an initialization of the function container before execution. When your function is regarded to be in a ”sleeping state,” a cold start will occur.

Basic approach

There are a few techniques suggested by the community that can help reduce setup costs.

The most straightforward method is to keep the execution environment “warm” for a while. From the developer’s perspective, it means periodically sending a keep-alive request to the service provider. In this case, all future invocations of the same function will be executed in the “warm” environment, which will take substantially less time to complete. The interval time between two subsequent requests depends on the service provider and should be chosen appropriately.

Another recommended practice in employing functions is to keep your application’s dependencies to a minimum. Also, usually scripting languages (Python, Ruby, Javascript) perform better in startup time in comparison to compiled runtime.

FaasCache approach

The emergence of serverless platforms hasn’t had a long development history, and since most details of serverless platforms originated from the industry. The industry proposes a wide range of solutions to that problem. In this post, I want to describe one of them, that I find really interesting. This technique utilizes caching and termination policy: FaasCache: keeping serverless computing alive with greedy-dual caching (You can find the link to the paper at the end of this post).

The central idea of the FaasCash approach is facilitating a cache and defining a keep-alive policy that balances priorities based on some characteristics of function and environment. When all servers are fully utilized, the problem of shutting down a server is equivalent to evicting an object from the cache. This caching analogy helps to use well studied caching method for FaaS systems.

Many caching strategies can be applied to function keep-alive policies such as LRU or LFU however these policies do not take object size into account and hence cannot be entirely transferred to the keep-alive situation where resource footprint is critical. The policy proposed in the paper is based on Greedy-Dual-Size-Frequency object caching, which provides a general framework for designing and implementing keep-alive policies that consider the frequency and recency of invocations of various functions, as well as their initialization overheads and sizes.

Essentially, the keep-alive policy presented in the paper is a function termination policy. This indicates that the warm feature will be maintained as long as server resources are available. If a new container is to be started and there is no content available, it decides which container to terminate. A ”priority” for each environment is computed based on the “cold start” overhead and resource footprint, with the container with the lowest priority being terminated. In the context of caching, reusing a warm container is a ”cache hit” because it avoids the overhead of initialization. If there are insufficient resources to launch a new container and we need to create a new environment that can be considered as a “cache miss”. The priority value is calculated as follows:

where,

Clock — recent execution i.e. “logical clock” per server that is updated on every eviction. For example, if a container i is terminated (because it has the lowest priority), then Clock = Priority(i). If multiple containers have to be terminated then, Clock = max (Priority(i)) where i ∃ E, E — set of terminated containers.

Frequency — the number of times a given function is invoked. If multiple containers are executed, then the frequency is the number of function invocations across all these containers.

Cost — the termination cost, which is equal to the total initialization time and shows the benefit of keeping a container alive.

Size — the amount of resource (CPU, network, storage) needed for the container. The priority is inversely proportional to the size and larger containers are terminated before smaller ones.

Summary

In this post, we had a look at basic and advanced approaches that can mitigate the “cold start” problem. One advanced approach is a function termination policy. This indicates that the warm feature will be maintained as long as server resources are available. If a new container is to be started and there is no content available, based on the “priority” it decides which container to terminate. There is no ideal solution, that would suitable for each specific case. Depending on the environment and type of application we can choose one or multiple techniques to mitigate the problem.