From Terraform to GitOps: Building a Self-Healing K3s Cluster

The GitOps Revolution: Achieving Zero Downtime with ArgoCD

Welcome! In the first article of our series, I explained step-by-step how we spun up a 3-node Kubernetes (K3s) cluster from scratch on Hyper-V using Terraform, and how we injected our enterprise applications (Nginx, Redis, Postgres) into this system.

So, are we done? No.

Automation is a wonderful thing, but simply deploying applications once and walking away is not enough in the modern DevOps world. What happens if someone inside the system accidentally deletes an application, or if a server crashes? Our infrastructure provisioning code cannot instantly detect this change and intervene. We needed a brain that not only “builds” but also continuously monitors and “self-heals” the system.

In this article, I will explain how we took our project one step further and transitioned to a fully-fledged GitOps architecture.

Before we begin, you can watch our 7-8 minute recorded video below to see the power of this new architecture live in action.

In the video, you will witness the following breathtaking process:

  1. Hardware and the K3s cluster spin up in seconds via Terraform.
  2. We see our new e-commerce interface, the “TechStore PRO” page, go live.
  3. We log into the ArgoCD Dashboard and examine how our system is directly “locked” to a GitHub repository.
  4. The Climax (Disaster): We intentionally trigger a disaster by ruthlessly deleting our Nginx application from the terminal. Our site crashes instantly.
  5. The Magic: Without any human intervention, ArgoCD detects the missing application within seconds, pulls the configuration from GitHub, and automatically restores our e-commerce site!

Now, let’s take a closer look at the architectural evolution and GitOps philosophy behind the show you just watched.

Paradigm Shift: Why GitOps?

In our initial project, Terraform installed our applications via SSH immediately after provisioning the infrastructure. This was a “Push” method. However, this approach had a blind spot: once the system was up and running, the connection between the state inside the cluster and the code we wrote was severed.

We needed to move to a “Pull” method—a structure where the system constantly questions itself: “Does my current state match the state I am supposed to be in?”

This is exactly where ArgoCD stepped in. ArgoCD is an operator that resides inside the Kubernetes cluster and continuously monitors a specified Git repository.

What Did We Change in the Architecture?

To build this self-healing structure, we made fundamental changes to our Terraform code and architecture:

1. Simplifying Terraform’s Role (Infrastructure Only)

We completely deleted the Nginx, Redis, and Postgres deployment blocks from our old code. Now, Terraform’s sole responsibility is to prepare the hardware (virtual machines), install the K3s cluster, inject ArgoCD (the brain) into it, and step aside.

2. Smart Check Mechanism

Before installing ArgoCD, we had to be certain the system was truly ready. Instead of waiting blindly, we added a smart check mechanism into Terraform: “Go to the K3s API and check if all 3 nodes have reached the ‘Ready’ state.” The system was updated to trigger the ArgoCD installation only after receiving this confirmation.

3. Building the Bridge (ArgoCD Bootstrap)

As its final move while installing ArgoCD, Terraform dropped a small YAML file inside. This file whispered the following to ArgoCD: “The only source of truth you will look at and remain faithful to, no matter what, is this GitHub repository.”

4. Single Source of Truth

We moved all our application configurations (Nginx Deployment, Service ports) to a public GitHub repository. Moreover, instead of the default Nginx welcome page, we coded a sleek e-commerce interface (“TechStore PRO”) using a ConfigMap.

The new constitution of our system was written: GitHub is the absolute truth. Whatever is written in the GitHub repository must be running in the cluster every single second.

The Climax: Disaster and Resurrection

The real show of the architecture begins towards the end of the video.

While our site was running flawlessly on port 30080, we executed a deliberate destruction command in the background: kubectl delete deployment nginx-web

In a traditional infrastructure, this command means phones ringing at midnight, panicking engineers, and “Site Unreachable” complaints. However, in a GitOps architecture, things work very differently.

Because we configured ArgoCD with the selfHeal: true parameter, it was listening to everything happening in the Kubernetes API at the millisecond level. The moment we deleted the application, ArgoCD realized the situation: “Wait a second, GitHub says this application should be running with 3 replicas, but it’s not in the cluster!”

Before we could even understand what was happening, without requiring any human intervention or a new command, ArgoCD pulled the configuration from GitHub again and instantly rebuilt the missing system. When we refreshed the browser, our site was right there in front of us as if nothing had happened.

Conclusion

In our first project, we managed to dominate the hardware using muscle power (Terraform). In this project, we added a continuously active, environment-aware, and fault-tolerant intelligence (ArgoCD) to that hardware.

This structure is a concrete simulation of how today’s tech giants keep their systems alive during disaster moments (Disaster Recovery) where even seconds are critical. The only way to transform the “Servers are down” panic into the “No problem, the system is self-healing” comfort is by adopting GitOps standards.