How We Designed Qovery To Manage Thousands of Kubernetes Clusters with a Single Control Plane

How We Designed Qovery To Manage Thousands of Kubernetes Clusters with a Single Control Plane

Deploying and managing applications in the cloud can be complex and time-consuming, especially when using Kubernetes. However, Qovery has made this process much easier with its platform abstracts away the complexity of Kubernetes and provides developers with the tools they need to manage their deployments. But how does Qovery manage thousands of Kubernetes clusters with a single control plane? In this article, we'll take a closer look at how Qovery operates Kubernetes, handles thousands of deployments per day, and what happens when the control plane is unavailable. We'll also discuss the core technology behind Qovery's control plane and how it ensures reliable and efficient management of Kubernetes clusters.

What is Qovery?

Qovery is a cloud platform that simplifies the deployment of applications on Kubernetes. With Qovery, developers can focus on writing code and leave the infrastructure management to the platform. Qovery provides a better developer experience on Kubernetes by streamlining the deployment process, automating repetitive tasks, and offering an intuitive web interface and CLI for managing the deployment lifecycle.

Qovery runs on top of Kubernetes, and on top of your cloud account.

Qovery is particularly useful for companies that want to leverage Kubernetes for their applications but don't have the expertise or resources to manage the infrastructure. Qovery enables companies to deploy their applications on Kubernetes with minimal effort, reducing the time and costs associated with infrastructure management.

Whether you're a junior, experienced, or senior developer, Qovery makes deploying your applications to the cloud using Kubernetes a breeze. Qovery abstracts away the complexity of Kubernetes, allowing developers to focus on writing code and delivering value to their customers. With Qovery, you can confidently deploy your applications, knowing that the platform takes care of the underlying infrastructure and provides the tools you need to manage your deployment. Qovery also allows access to your infrastructure and Kubernetes cluster if needed, giving you full control and flexibility over your deployment.

How Kubernetes is operated

Qovery uses a unique architecture to manage thousands of Kubernetes clusters with a single control plane. The control plane is a core component of Kubernetes that manages the state of the cluster and communicates with the API server to perform operations such as deploying and scaling applications. In Qovery's case, the control plane is written in Rust and Kotlin and handles API requests from Git providers and web and CLI interfaces. The control plane then interprets, transforms, and forwards the requests to the appropriate Kubernetes cluster.

The most significant part of the Qovery Control Plane is written in Kotlin, but we have satellite services (not represented here) written in Rust

The most significant part of the Qovery Control Plane is written in Kotlin, but we have satellite services (not represented here) written in Rust

On the customer side, Qovery runs a set of binaries, including the Engine and Agent, that handle requests from the control plane and execute tasks. Each Kubernetes cluster managed by Qovery is autonomous and does not rely on Qovery Control Plane. This means that in case of a failure of the Qovery control plane or any Qovery components, the customer infrastructure and Kubernetes cluster are not impacted.

Qovery Engine pulls tasks from the Qovery Control Plane and executes those tasks on the Kubernetes cluster. Note that the Qovery Engine intiates the connection to the Qovery Control Plane.

Qovery Engine pulls tasks from the Qovery Control Plane and executes those tasks on the Kubernetes cluster. Note that the Qovery Engine initiates the connection to the Qovery Control Plane.

Qovery's unique architecture enables horizontal scaling out of the box, as each Kubernetes cluster is independent and only picks up instructions from the control plane to execute. This means that the Qovery control plane does not need to scale, and the only resource consumption on Qovery is the stream flow of events when a deployment happens.

Metadata are sent over gRPC/TLS

Metadata is sent over gRPC/TLS

Qovery receives metadata on deployment information, which it handles to manage the state and report what happened to the user.

Handling Thousands of Kubernetes Clusters

When a deployment occurs, metadata is sent from the Qovery Engine and Agent to the control plane. This metadata includes information such as the application name, version, environment, and other details that are relevant to the deployment.

Thousands of Kubernetes clusters can be connected to the Qovery Control Plane

Thousands of Kubernetes clusters can be connected to the Qovery Control Plane

Here's an example of what the metadata payload looks like:

{
  "type": "info",
  "timestamp": "2023-03-02T07:59:44.405961310Z",
  "details": {
    "pool_id": "4ceb7649-ed84-4c52-a27b-e7fca06afaaa",
    "organization_id": "141c07cc-0dd9-4623-9999-3fdd61867555",
    "cluster_id": "a8ad0659-bbbb-4c83-ad77-092e97bb2cae",
    "execution_id": "fba2ac8b-6f78-4444-85c2-b137981056ff-65-1677743982",
    "stage": {
      "step": "PreCheck"
    },
    "transmitter": {
      "type": "Environment",
      "id": "fba2ac8b-6f78-430f-85c2-b1379810ffff",
      "name": "production"
    }
  },
  "error": null,
  "message": {
    "safe_message": "\uD83C\uDFC1 Deployment request fba2ac8b-6f78-430f-85c2-b137981056ff-65-1677743982 for stage 1 `first stage` has been sent to the engine",
    "full_details": null
  }
}

One payload like this represents one line into the Qovery deployment console (like below)

Deployment logs view from the Qovery Web Console

Deployment logs view from the Qovery Web Console

By managing thousands of deployments per day (and growing steadily), we receive hundreds of thousands of metadata like the one below, which is relatively compact. Despite the high volume of deployments, Qovery's control plane performance is not impacted, as there is no transformation on the control plane side. Simple ingestion.

Most of the Qovery Control Plane operations are handled by what we call the Core - written in Kotlin (JVM based). Here is the Core load average on the last 30 days. Each sliced color is a new release of the Core.

Most of the Qovery Control Plane operations are handled by what we call the Core - written in Kotlin (JVM based). Here is the Core load average for the last 30 days. Each sliced color is a new release of the Core.

This means that Qovery can handle large-scale deployments with ease without worrying. Simplicity is key.

Core Control Plane configuration

Most of the Qovery Control Plane operations are handled by what we call the Core, written in Kotlin (I will explain why we decided to use it in a future article). The Core runs on an instance in the AWS us-east-2 region with 4GB of RAM and 4vCPU. This instance uses the v17 LTS version of the JVM (Java Virtual Machine) and Kotlin 1.8.

Qovery Control Plane Availability

Availability is one of the most important aspects of any cloud infrastructure, and Qovery is no exception. While Qovery's control plane is essential for managing Kubernetes clusters, it's important to note that the infrastructure still runs even if the control plane is unavailable.

Kubernetes clusters keep running without any downtime even if they no longer can connect to the Qovery Control Plane

Kubernetes clusters keep running without any downtime even if they no longer can connect to the Qovery Control Plane

When the Qovery control plane is unavailable, it's no longer possible to deploy applications or updates via Qovery. However, the infrastructure remains fully functional and independent since each Kubernetes cluster is autonomous and does not rely on Qovery.

Sometimes it's just better to wait and expect it will work again... 😅

Sometimes it's just better to wait and expect it will work again... 😅

To use an analogy, the relationship between a TV and a remote control is similar to that between Qovery and the remote Kubernetes clusters it manages. If your TV remote control has no more battery, your TV will keep running, but you won't be able to change the current channel until you replace the batteries. Similarly, if the Qovery control plane is unavailable, the infrastructure keeps running, but you won't be able to use Qovery to deploy new updates or make changes until the control plane is back up and running.

If you are curious about our reliability, check out our status page 😄

Wrapping up

Qovery is a robust platform that simplifies Kubernetes cluster management and empowers developers to deploy and update their applications easily. The unique architecture of a single control plane allows Qovery to handle thousands of autonomous Kubernetes clusters worldwide while maintaining cluster autonomy ensure that infrastructure remains operational even in the event of a control plane interruption.

By providing a seamless developer experience, Qovery enables developers of all levels to manage their applications easily. Its reliability and scalability give them peace of mind to focus on their code and development work.

Our upcoming article will explore how Qovery manages the upgrade and maintenance of Kubernetes clusters at scale. Stay tuned to learn more about how Qovery can help developers streamline their workflows and focus on what matters most - building great applications.

Resources: