On the importance of Continuous Integration for GitOps

Continuous Integration is now a common practice in software engineering. But not yet in the more recent field of “GitOps”. Let’s see why, and how we can improve this situation.

8 min readAug 2, 2021

First, what do I mean by “Continuous Integration for GitOps” exactly?

A “modern” definition of Continuous Integration in 2021, is to ensure that the changes pushed to the main branch are “valid”. We usually do that by proposing changes through a Pull Request — or Merge Request, depending on your git provider — and configuring one or more pipelines to build the project, execute various tests, run static analyzers and security scanners. Any failure would be seen early in the process and fixed before merging the changes in the main branch. The benefits are numerous and already well documented, such as having an always-green main branch, being able to start a new feature without spending hours fixing the build/tests first, less context switching, more confidence, higher quality, and so on.

Applied to GitOps, nothing is really different from software engineering. Instead of building software, we’re building infrastructure. Changes pushed to the main branches are automatically picked up by a “GitOps Operator”, which will apply the latest state defined in the Git repository to the target infrastructure — usually one or more Kubernetes clusters, but it’s not restricted to the Kubernetes world. We still need to ensure that this infrastructure state is valid. Otherwise, we might end up with a broken main branch, which will impact everybody working on it.

It may sound obvious to you, and you may already be practicing CI in a gitops workflow. Great, in this case, you won’t learn much more here…

But if you’re doing gitops without applying the Continuous Integration practices to the git repositories that represent your environments/infrastructure, then you have a huge opportunity to improve your situation!

Why CI for gitops is not more common?

CI has been a common practice in software engineering for more than 10 years now. Hudson/Jenkins, which played a key part in popularizing it, is more than 15 years old. But still, in 2021 you’ll find plenty of git repositories used in a gitops workflow — representing an environment/infrastructure — without any CI pipelines, or with people pushing directly to the main branch, without creating pull requests first. Why is there such a gap between our code repositories and our environment repositories?

I think one of the reasons is that it’s not always the same people working on these 2 different kinds of repositories/codebase. You’ll often find software engineers working on “code repositories” and people with a more operational background working on the “environment repositories”. “devops” has been lost to the marketing/recruitment worlds, and SRE is not applied everywhere — yet. People lacking software engineering skills — or just without enough practice — don’t see CI as a mandatory prerequisite. Instead, they’ll just focus on different priorities, such as maintaining the production systems.

Another reason is that the main GitOps tools don’t promote CI. At all. Try to find any mention of Continuous Integration in the documentation of Flux or Argo CD. Oh sure, you’ll find something in the contribution guides, explaining how CI is configured to build and test those software, but nothing on how you can use it for your own need. For example, Flux has a feature to automatically create a new branch/commit whenever there is a new container image in a registry, along with a documentation guide to automatically create a pull request for it, but nothing on how to configure a continuous integration pipeline to validate it. On its homepage, it’s promising that you should “just push to Git and Flux does the rest”. Well, almost ;-) Note that I have nothing against these tools, I’m just pointing out that by not at least mentioning the benefits of CI, they are contributing to the abandonment of CI in the gitops world.

And last but not least, GitOps has often been compared and opposed to CIOps. I think that it might confuse people, thinking that you either do CI or GitOps. But not pushing to your target environment from your CI pipeline doesn’t mean you can’t use CI in your GitOps workflow…

What are the impacts of not doing Continuous Integration?

If you don’t validate the changes before applying them, you have a high risk that these changes won’t apply. In this case, your GitOps Operator will keep retrying — and failing — until at some point you notice the issue, understand it and fix it — by adding a new commit to the main branch. As a result:

you will have a poor developer experience: people will push changes with fingers crossed that nothing will break, and then they’ll have to understand the issue and fix it knowing they are blocking their coworkers from pushing new changes.
if you’re pushing changes through Pull Requests, and ask people to review them, you’re imposing them to manually validate YAML syntax, you’re expecting them to know the full Kubernetes and/or Helm API — because you’re asking them to do the work of a computer: running validity checks.
breaking the main branch will impact everybody working on it. They will have to help you fix it, before starting their own new features.

And so on. If you’re doing gitops without CI, you must have a few stories of broken main branches, and how it can quickly turn into a nightmare.

The benefits of applying CI practices for GitOps

So let’s agree you want to change this situation. What will you gain if you start applying CI practices to your gitops workflow?

your reviewers can focus on “business logic/knowledge”, they won’t need to review the YAML syntax anymore: this will be done by a tool, such as YAML lint.
you will get faster feedback on your changes: yes, the pipeline will most likely be faster than your reviewers.
you will notice issues sooner, so you’ll be able to fix them faster, without context switching, and more importantly without impact.
you will gain confidence in your tooling: no more prayers and sacrifices to the YAML god or the Kubernetes god!
it will be easier/safer to onboard new people. You won’t need to find people with 10+ years of Kubernetes experience anymore, to ensure they will never make any mistake.
you will get a better developer experience: people won’t be afraid to push changes anymore — you might even see application developers contributing.

But surely, there can’t be only benefits of applying CI? Otherwise, everybody would do it. Well, everybody is already doing it in software engineering ;-)

But you’re right, there are 2 things that can prevent people from adopting CI in their gitops workflow:

if you have a quick and easy fix, without CI you can get it merged in seconds — if you have a coworker available to review it right away and who trusts you. With CI, you’ll have to wait for the pipeline to finish, and if you have a slow pipeline, it might take a long time. Well, in fact in this case the issue is not the practice, it’s the implementation. Getting it right is not that easy.
and people might get lost in the complexity of configuring their CI pipelines. Spending too much time trying to get it pixel-perfect, fighting against tools. And they don’t have the time for that, they have production systems to maintain. So… no CI.

The challenges of Continuous Integration for GitOps

Doing CI in the context of a GitOps workflow is not as easy as building a Go application or running unit tests — because you might have a complex setup, such as:

multiple target environments — Kubernetes clusters. Maybe with different versions of Kubernetes.
strict network policies — such as no external communication allowed to your Kubernetes API server.

So the first challenge is to decide from where you will run your CI pipelines: from a unique “central” CI platform, or from multiple CI systems — running inside each of your target environments/clusters. This decision is often influenced by your infrastructure and internal rules: what you can do and what you can’t.

The second challenge is directly related to the content of your gitops repositories: what is stored there, in which format, and more importantly how it is processed before being applied.

if you store raw Kubernetes manifests and applying them is just running some variation of kubectl apply, then it will be easy to run any kind of validation on these resources: YAML and Kubernetes linters, security scanners, …
on the opposite, if you store high-level definitions which are specific to your GitOps Operator, and only it can process them to produce the final result, then you’ll have an interesting challenge to solve ;-) As you can guess, replicating the logic of the gitops operator is not an easy task.

So you might not be able to build a CI pipeline that validates your changes with 100% confidence. But most of the time, you won’t need to. Of course, having 100% confidence is great. But do you have 100% uptime? Do you need 100% uptime? For some services, 99.999% is good, for others it can be 99%, or sometimes even less for internal services. It’s the same for your CI pipeline: it depends on your context. If there is a bug that slips through your validations checks from time to time, and you know that it would take you weeks to change your platform/CI to catch them, well…

How to start applying CI practices for GitOps

Start easy, and proceed step-by-step. If most of your configuration files are YAML files, start by validating them with a YAML linter for example. It’s easy, fast, and with immediate results.

The next step is the “offline” validation — meaning without a direct connection to the target environment. Ideally, you should be able to render your final resources at this point, but if you can’t, try to validate the high-level resources you store. If the tool that processes these resources don’t provide you some kind of syntax validator CLI, then it’s time to change it ;-)

And then, you can work your way towards the “online” validation — which connects to the target environment to validate your local state. This is where you’ll find most of the complexity — and also most of the rewards. Note that if connecting to the real target environment is not possible, you might be able to use a fake one. For example in the Kubernetes world, you can run kind, minikube, k3s, or loft to get a small temporary Kubernetes cluster and use it to validate your changes. Sure, it won’t be 100% similar to your target environment, but it will get you halfway there — until you get more time to do the rest of the journey.

As you add more validations, ensure that the time to run your pipeline doesn’t grow out of hand. For example, you can run linters in parallel. And if you have multiple target environments, you can either write multiple pipelines — one per environment — or 1 single pipeline with parallel stages for each environment.

Also, as you extend your pipeline(s) with more steps, you should ensure that new checks don’t report false positives. What’s worst than a slow pipeline, is an unreliable pipeline, which fails for no reason. People will just restart it without even looking at the result/logs and sometimes force-merge.

Conclusion

If there are a few points you should remember from this article:

don’t forget the common practices that have proved successful, just because you’re working on something new with a cool marketing name.
if you’re new to gitops, think about how you’ll validate your changes while selecting a gitops vendor/tool/operator. Have a look at Jenkins X, which has out-of-the-box pipelines to validate your changes, its own basic — but very functional — operator, and the ability to integrate with Flux or ArgoCD.
it’s never too late to improve your workflow, and you don’t have to spend weeks on it to see results. Go step-by-step, iterate.