Scaling Kubernetes on Application and Infrastructure Levels
Once an internal project from Google, Kubernetes has changed the way software development is done these days. A white steering wheel on a blue background seems to be everywhere now. Business wants to grow and pay less, DevOps want a stable platform that can run applications at scale, developers want reliable and reproducible flows to write, test and debug code. Kubernetes promises it all. Now that Jelastic offers managed Kubernetes, spinning up a cluster can’t be easier.
However, have you given some thoughts on how to get such a powerful container orchestration platform and pay just for resources that you actually need? This article will shed some light on horizontal and vertical scaling both on Jelastic (infra) and Kubernetes (application) level.
Scaling Kubernetes on Infrastructure Level
A Kubernetes cluster typically consists of a master (or a couple of them) and multiple nodes where application pods are scheduled. The maths is quite simple here: the more applications you run in your cluster, the more resources (nodes) you need. Say, you have a microservices application consisting of 3 services, each started as an individual pod which requests 1GiB of RAM. It means that you will need a 4 GiB node (K8s components and OS will require some RAM too). What if you need additional RAM in case of high load, potential memory leaks or if you deploy more services to the cluster? Correct, you either need a larger node or add an additional node to the cluster. Usually in both cases, you will pay for the exact amount of resources that come with a VM (i.e. you will pay for, say, 3GiB of RAM even if half of it is unused). That’s not the case with Jelastic though.
Vertical Scaling of Kubernetes Nodes
Let’s get back to maths. If the application roughly needs 3GiB of RAM, and there’s not much going on in the cluster, you need just one node. Sure thing, having some extra free RAM is always a good idea, so a 5GiB node makes a lot of sense.
Again, not with Jelastic. What you can do is request a 3GiB node and have a 2GiB in stash. When your application (K8s pod) starts consuming more (which is configured on K8s side too) or you simply deploy more pods (as in the chart below), those 2 extra GiB become immediately available, and you start paying for those resources only when they are used.
As a result, you can do some simple math and figure out the best cluster topology: say, 3 nodes, with 4 GiB of reserved RAM and 3GiB of dynamic resources.
Horizontal Auto-Scaling of Kubernetes Nodes
Having one huge node in a Kubernetes cluster is not a good idea since all deployments will be affected in case of an outage or any other major incident. Having several nodes in a stand-by mode is not cost efficient. Is it possible that Kubernetes adds a node when it needs it? Yes, a Kubernetes cluster in Jelastic can be configured with horizontal node auto-scaling. New nodes will be added to a cluster when RAM, CPU, I/O or Disk usage reaches certain levels. Needless to say, you get billed for additional resources only when they are used. Newly added nodes will be created according to current topology, i.e. existing vertical scaling configurations will be applied. The system will scale down as soon as resource consumption gets back to expected levels. Your Kubernetes cluster will not starve, yet you will not pay for unused resources.
Scaling Kubernetes on Application Level
Kubernetes has its own horizontal pod auto-scalers (HPA). In simple words, HPA will replicate chosen deployments based on utilization of CPU. If CPU consumption of all pods grows more than, say, 70%, HPA will schedule more pods, and when CPU consumption gets back to normal, deployment is scaled back to the original number of replicas.
Why is this cool and how does it work with automatic horizontal scaling of Kubernetes nodes? Say, you have one node with a couple of running pods. All of a sudden, a particular service in the pod starts getting a lot of requests and performing some CPU costly operations. RAM utilization does not grow, and as a result at this point there is no mechanism to scale the application which will soon become unresponsive. Kubernetes HPA will scale up pods, and an internal K8s load balancer will redirect requests to healthy pods. Those new pods will require more resources, and this is where Jelastic horizontal and vertical scaling comes into play. New pods will be either placed on the same node and utilize dynamic RAM, or a new node will be added (in case there’s not enough resources on existing ones).
Kubernetes Scaling Out
Kubernetes Scaling In
On top of that, you may set resource caps on Kubernetes pods. For example, if you know for sure that a particular service should not consume more than 1GiB, and there’s a memory leak if it does, you instruct Kubernetes to kill the pod when RAM utilization reaches 1GiB. A new pod will start automatically. This gives you control over resources your Kubernetes deployments utilize.
Living Proof with WordPress Hosted in Kubernetes
Now, let’s deploy a real-life application to a Kubernetes cluster to show all of the above mentioned scaling features. A WordPress site would be a great example. Huge Kubernetes community is one of its biggest advantages and adoption factors, so it’s really easy to find tutorials on how to deploy popular applications. Let’s go through the official WordPress tutorial (non-production deployment is chosen for simplicity of this article, and you may deploy WordPress using helm charts).
Once done, there is one more thing to do. We will need to create an Ingress bound to a WordPress service since access to running applications in Jelastic Kubernetes cluster is provided by Traefik reverse proxy.
Create a file called wp-ingress.yaml with the following content:
- path: /
After ingress is created, a WordPress site is available at https://<your-env-name>.<jelastic_url>
Great, we have a running WordPress instance in a Kubernetes cluster. Let’s create horizontal pod auto-scalers (HPA) for WordPress deployment to make sure the service always responds despite high load. In your terminal, run:
kubectl autoscale deployment wordpress --cpu-percent=30 --min=1 --max=10 -n wp
Now, if WordPress pod starts utilizing >30% of CPU for all pods, autoscaler will modify deployment to add more pod replicas, so that an internal load balancer routes requests to different pods. Of course, chosen values are for demo purposes only and can be adjusted based on your needs.
Next step is to review vertical and horizontal scaling options of a Jelastic Kubernetes environment. There are two main goals here:
- Pay only for actually utilized resources
- Make sure Kubernetes cluster has a spare node when required
To make the demo simpler (i.e. run out of RAM faster), let’s configure the node to use up to 1.5GiB RAM and 4.8GHz CPU. Click Change Environment Topology and state Scaling Limit per node.
Configuration of an automatic horizontal scaling is the final step. Similar to vertical scaling, values are set to be promptly triggered to suit purposes of this demo. Let’s instruct Jelastic when to add and remove Kubernetes nodes. Open Settings > Auto Horizontal Scaling and Add a set of required triggers.
Let’s take a look at memory consumption. Kubernetes node (Workers) uses 8 cloudlets with 4 more cloudlets that can be dynamically added, i.e. there is some RAM to schedule a few more pods.
It’s time to put some stress on a WordPress site. There are multiple ways to initiate HTTP GET requests. We will use the simplest – wget in a while-true loop executed within WordPress pod itself (wordpress is a service name that will be resolved to an internal IP accessible from the cluster only):
while true; do wget -q -O- http://wordpress; done
A few moments later, we can observe the following data from HPA:
Kubernetes autoscaler has modified WordPress deployment to add more replicas. Expectedly, the cluster does not have enough RAM, so as Kubernetes Dashboard suggests, the remaining 5 pods cannot be scheduled:
A few pods were able to start though, as Jelastic dynamically added cloudlets (RAM&CPU) to the node. However, the remaining 2 cloudlets were not enough for at least one more pod to start.
This is where the magic begins. Since we have configured automatic node auto scaling, Jelastic is adding a new node now.
Let’s check if a new node is registered with master by running a simple command kubectl get nodes
A few moments later:
Fantastic! All pods have started which means WordPress can handle all those incoming requests that we have initiated a few minutes ago. Now, let’s abort this `while true` command and wait for a minute or so. Replica count gets back to 1 again which means we do not need an additional node.
Memory utilization figures suggest that the node can be deleted.
And Jelastic indeed removes it in about one minute.
What have we just witnessed? Let’s recap. Creating HPA for WordPress deployment initiates scheduling of more pod replicas to handle high load. It is up to application admin to configure triggers. Next, Jelastic dynamically allocates RAM within nodes. If you run out of RAM, a new node is added to the cluster. When resource utilization gets back to normal, HPA sets replica count back to 1, and Jelastic removes a node. When HPA scales up deployment again, Jelastic reacts accordingly. No extra RAM is in use at any time! No downtime!
With so many scaling options, both on cluster (infrastructure) and application (deployment/pod) level, a Kubernetes cluster in Jelastic becomes a smart platform that either grows or shrinks according to your application workloads. Kubernetes even checks if your application is up and running, and redeploys it if necessary with zero downtime when the application is being updated with new images, which makes continuous delivery a reality, not just a buzzword. Try out yourself at one of Jelastic public cloud service providers or request private cloud installation with Kubernetes cluster.