Jenkins X Pipelines Internals Part 3 — Stages

This is the third part of a series of blog posts on the internals of the Jenkins X Pipelines. We’ll focus on the “stages” that compose a pipeline, and we’ll see how they are implemented using Tekton.

Vincent Behar
7 min readFeb 17, 2020

In the first part of this series, we’ve walked through everything that happened in the cluster, from the incoming GitHub WebHook event to a running Tekton pipeline. In the second part, we’ve talked about the “Meta Pipeline”, which is responsible for converting your Jenkins X Pipeline into a Tekton Pipeline. Now it’s time to dive into the internals of the Tekton pipelines, and we’ll start with an investigation of how the Jenkins X Stages — that compose a pipeline — are implemented.

So a pipeline has stages, and each stage has steps. But why do we need stages? Well, for basic pipelines you might not really need them if you only have a couple of sequential steps to run. But if you want to run more complex pipelines, with parallelism for example, then you’ll need the power of the stages.

First, a few pointers:

As you can see, a stage has a name and a list of steps, but it can also have embedded stages to run sequentially, or in parallel. We’ll start simple, with a basic pipeline with a single-stage and a single step:

buildPack: none
pipelineConfig:
pipelines:
pullRequest:
pipeline:
stages:
- name: unit-tests
steps:
- name: unit-tests
command: make test
image: golang:1.13

As we saw in the previous blog post, stages are converted into Tekton Tasks, which are then converted into Pods. We can confirm that by running the jx get build pods -r yourGithubRepo command which lists the pods for the given GitHub repository.

Output of the `jx get build pods` command

In the output, we can see 2 pods: 1 for the meta-pipeline task — with the “meta-” prefix— and one for our task. If we inspect our task’s pod — using the kubectl get pod xxx -o yaml command for example — we can see that it has a few containers:

There are a few init containers as well:

And some volumes, including:

  • workspace — an “emptyDir” volume, which is mounted on every container at /workspace
  • home — another “emptyDir” volume, which is mounted on every container at /builder/home
  • tools — another “emptyDir” volume, which is mounted on every container at /builder/tools

The init containers are inserted in the pod by Tekton’s taskrun Controller — more specifically by the MakePod function. They are used to initialize resources used by the Task’s steps:

  • The credentials initializer is used to write git and/or docker credentials files in the home shared volume, that will be available for all further steps. It is called with the -basic-git=knative-git-user-pass=https://github.com argument, which means it will retrieve the Git credentials for GitHub from the knative-git-user-pass Kubernetes Secret — which is coming from the Tekton Helm chart for Jenkins X and configured in your dev environment Git repository.
  • The working dir initializer is used to create the directory where our Git repository will be cloned, by running a simple mkdir -p /workspace/source in our case.
  • The tools initializer is used to copy the entrypoint binary from its container image into the tools shared volume, to make it available for all further steps. It will then be used as the container entry-point for all the steps — more on that in a later blog post.

The non-init containers are defined by Jenkins X. We can see that by looking at the task definition — using the kubectl get taskruns.tekton.dev xxx -o yaml command for example — which has the following steps:

so that explains 2 of the 3 containers:

But what about the step-git-source-... step/container? In fact, it comes from Tekton, which interprets the “input resource” defined by Jenkins X in the Task:

inputs:
resources:
- name: workspace
targetPath: source
type: git

The git resource type is converted to a Tekton GitResource. Tekton resources can implement a GetInputTaskModifier function to modify the task on which they are defined. In our case, the GitResource is prepending a step to run the git-init command.

It is interesting to have a look at the implementation of the git-init command because it doesn’t perform a basic git clone operation, but instead use an optimized git fetch with the --depth=1 flag to retrieve only a single commit. It won’t retrieve the other branches or tags either — it’s up to you to retrieve them if you need them.

This means that as a user of Tekton, Jenkins X only requires a git workspace, and let Tekton handle the “git clone” operation. Jenkins X is then responsible for performing the right “checkout” — because it requires a specific merge logic, to merge the Pull Request branch commits on top of the master branch, see the previous blog post related to the meta pipeline for more details.

Multiple stages

What happens if we use multiple stages instead of a single one?

buildPack: none
pipelineConfig:
pipelines:
pullRequest:
pipeline:
stages:
- name: unit-tests-1
steps:
- name: unit-tests
command: make test
image: golang:1.13
- name: unit-tests-2
steps:
- name: unit-tests
command: make test
image: golang:1.13

This Jenkins X Pipeline will result in 2 tasks — 1 per stage — and so 2 pods.

The pod for the first stage/task has 6 containers:

  • step-create-dir-workspace-b9c69 which runs the mkdir -p sourcecommand.
  • step-git-source-githubOrg-repoName-pr-1-vzkpx which runs the git-init command.
  • step-git-merge which runs the jx step git merge command.
  • step-unit-tests — our own step
  • step-source-mkdir-githubOrg-repoName-pr-1-c2lhw which runs the mkdir -p /pvc/unit-tests-1/workspace command.
  • step-source-copy-githubOrg-repoName-pr-1-chjlp which runs the cp -r source/. /pvc/unit-tests-1/workspace command.

The pod for the second stage/task has only 3 containers:

  • step-create-dir-workspace-p76f7 which runs the mkdir -p /workspace/source command.
  • step-source-copy-workspace-5gc5n which runs the cp -r /pvc/unit-tests-1/workspace/. /workspace/source command.
  • step-unit-tests — our own step

Both pods also have a Kubernetes Persistent Volume Claim (“PVC”) mounted at /pvc.

Why are our 2 pods so different, when the stage from which they are built is the same? If we inspect our 2 tasks, we can see that they are almost the same, except that the first one has the following output resource declared:

outputs:
resources:
- name: workspace
targetPath: source
type: git

So Jenkins X will just ask Tekton to bind the output of the first task to the input of the second task — this is done in the stageToTask transformation function. On the Tekton side, this is handled by the AddOutputResources function which internally uses a PVC to store the workspace content.

When you have 2 consecutive stages, they are converted as 2 consecutive tasks, which are scheduled by the Tekton pipelinerun Controller one after the other. The logic for it is in the PipelineRunState's GetNextTasks function. This implies that the second pod will only be created after the first pod has been completed. This is why the only way to conserve data for the duration of the pipeline is to use a persistent volume.

Tekton will take care of inserting extra steps in your tasks to copy everything from your workspace to the persistent volume, and then from the persistent volume into the workspace. These are the step-source-copy-xxx steps, running cp commands. The PVC itself is managed by the pipelinerun Controller using the ArtifactStorage, and it has the same lifecycle as the pipeline. The PVC settings are retrieved from the config-artifact-pvc ConfigMap — see the createPVC function. This ConfigMap is created by the Tekton Helm chart for Jenkins X — and it defaults to using a 5Gi volume. You can change the volume size in the env/tekton/values.tmpl.yaml file of your dev environment Git repository.

So if you want to split your pipeline’s steps into multiple stages, remember that it will introduce overhead for persisting the workspace between each stage.

In the next blog post, we’ll explore how the steps are implemented.

--

--

Vincent Behar

I’m a developer, and I love it ;-) My buzzwords of the moment are Go, Kubernetes, Observability, Continuous Delivery, and everything open-source