Integrating OpenShift Pipelines (CI) with GitOps (CD)

Introduction

When organizations adopt GitOps there are many challenges to face such as how do I manage secrets in git, what directory structure should I use for my repos, etc. One of the more vexing challenges is how do I integrate my CI processes with GitOps, aka CD.

A CI pipeline is largely a synchronous process that goes from start to end, i.e. we compile our source code, we build an image, push it out to deployments, run integration tests, etc in a continuous flow. Conversely GitOps follows an event driven flow, a change in git or in cluster state drives the reconciliation process to synchronize the state. As anyone who has worked with messaging systems knows, trying to get synchronous and asynchronous systems working well together can be akin to herding cats and hence why it is a vexing challenge.

In this blog article I will cover the three different approaches that are most often used along with the pros and cons of each approach and the use cases where it makes sense. I will cover some of the implementation details through the lens of OpenShift Pipelines (Tekton) and OpenShift GitOps (Argo CD).

A quick note on terminology, the shorthand acronyms CI (Continuous Integration) and CD (Continuous Deployment) will be used throughout the remainder of this article. CI is referring to pipelines for compiling applications, building images, running integration tests, and more covered by tools like Jenkins, Tekton, Github Actions, etc. When talking about CD we are meaning GitOps tools like Argo CD, Flux or RHACM that are used to deploy applications.

It’s important to keep this distinction in mind since tools like Jenkins are traditionally referred to as CI/CD tools, however when referencing CD here the intent is specifically GitOps tool which Jenkins is not.

Approaches

As mentioned there are three broad approaches typically used to integrate CI with CD.

1. CI Owned and Managed. In this model the CI tool completely owns and manages new deployments on it’s own though GitOps can still manage the provisioning of the manifests. When used with GitOps this approach often uses floating tags (aka dev, test, etc) or has the GitOps tool not track image changes so the CI tool can push new developments without impacting GitOps.

CI Managed

The benefit of this approach is that it continues to follow the traditional CI/CD approach that organizations have gotten comfortable with using tools like Jenkins thus reducing the learning curve. As a result many organizations start with this approach at the earliest stages of their GitOps journey.

The drawback of this model is that it runs counter to the GitOps philosophy that git is the source of truth. With floating tags there is no precision with regards to what image is actually being used at any given moment, if you opt to ignore image references what’s in git is definitely not what’s in the cluster since image references are never updated.

As a result I often see this used in organizations which are new to GitOps and have existing pipelines they want to reuse, it’s essentially step 1 of the GitOps journey for many people.

2. CI Owned, CD Participate’s. Here the CI tool owns and fully manages the deployment of new images but engages the GitOps tool to do the actual deployment, once the GitOps process has completed the CI tool validates the update. From an implementation point of view the CI pipeline will update the manifests, say a Deployment, in git with a new image tag along with a corresponding commit. At this point the pipeline will trigger and monitor the GitOps deployment via APIs in the GitOps tool keeping the entire process managed by the CI pipeline from start to end in a synchronous fashion.

CI Owned, CD Participates

The good part here is that it fully embraces GitOps in the sense that what is in git is what is deployed in the cluster. Another benefit is that it maintains a start-to-end pipeline which keeps the process easy to follow and troubleshoot.

The negative is the additional complexity of integrating the pipeline with the GitOps tool, essentially integrating a synchronous process (CI) with what is inherently an event-driven asynchronous activity (CD) can be challenging. For example, the GitOps tool may already be deploying a change when the CI tool attempts to initiate a new sync and the call fails.

This option makes sense in environments (dev/test) where there are no asynchronous gating requirements (i.e. human approval) and the organization has a desire to fully embrace GitOps.

3. CI Triggered, CD Owned. In this case we have a CI tool that manages the build of the application and image but for deployment it triggers an asynchronous event which will cause the GitOps tool to perform the deployment. This can be done in a variety of ways including a Pull Request (PR) or a fire-and-forget commit in git at which point the CD owns the deployment. Once the CD process has completed the deployment, it can trigger additional pipeline(s) to perform post-deployment tasks such as integration testing, notifications, etc.

CI Triggered, CD Owned

When looking at this approach the benefit is we avoid the messiness of integrating the two tools directly as each plays in their own swimlane. The drawback is the pipeline is no longer a simple start-to-end process but turns into a loosely coupled asynchronous event-driven affair with multiple pipelines chained together which can make troubleshooting more difficult. Additionally, the sync process can happen for reasons unrelated to an updated deployment so chained pipelines need to be able to handle this.

Implementation

In this section we will review how to implement each approach, since I am using Red Hat OpenShift my focus here will be on OpenShift Pipelines (Tekton) and OpenShift GitOps (Argo CD) however the techniques should be broadly applicable to other toolsets. Additionally I am deliberately not looking at tools which fall outside of the purview of Red Hat Supported products. So while tools like Argo Events, Argo Rollouts and Argo Image Updater are very interesting they are not currently supported by Red Hat and thus not covered here.

Throughout the implementation discussion we will be referencing a Product Catalog application. This application is a three tier application, as per the diagram below, that consists of a Node.js single page application (SPA) running in an NGINX container connecting to a Java Quarkus API application which in turn performs CRUD actions against a Maria DB database.

Product Catalog Topology

As a result of this architecture there are separate pipelines for the client and server components but they share many elements. The implementation approaches discussed below are encapsulated in sub pipelines which are invoked as needed by the client and server for reuse. I’ve opted for sub-pipelines in order to better show the logic in demos, but it could just as easily be encapsulated into a custom tekton task.

From a GitOps perspective we have two GitOps instances deployed, a cluster scoped instance and a namespaced scoped instance. The cluster scoped instance is used to configure the cluster as well as resources required by tenants which need cluster-admin rights, things like namespaces, quotas, operators. In this use case the cluster scoped instance deploys the environment namespaces (product-catalog-dev, product-catalog-test and product-catalog-prod) as well as the Product Catalog teams namespaced GitOps instance.

This is being mentioned because you will see references to two different repos and this could be confusing, specifically the following two repos:

1. cluster-config. Managed by the Platform (Ops) team, this is the repo with the manifests deployed by the cluster scoped instance including the product-catalog teams namespaces and gitops instance in the tenants folder.
2. product-catalog. Managed by the application (product-catalog) team, this contains the manifests deployed the namespaced scoped instance. It deploys the actual application and is the manifest containing application image references.

If you are interested in seeing more information about how I organize my GitOps repos you can view my standards document here.

Common Elements

In all three approaches we will need to integrate with git for authentication purposes, in OpenShift Pipelines you can easily integrate a git token with pipelines via a secret as per the docs. The first step is creating the secret and annotating it, below is an example for github:

apiVersion: v1
data:
  email: XXXXXXXXXXXX
  password: XXXXXXXXX
  username: XXXXXXXXX
kind: Secret
metadata:
  annotations:
    tekton.dev/git-0: https://github.com
  name: github
  namespace: product-catalog-cicd
type: kubernetes.io/basic-auth

The second thing we need to do is to link it to the pipeline service account, since this account is created and managed by the Pipelines operator I prefer doing the linking after the fact rather than overwriting it with yaml deployed from git. This is done using a postsync hook in Argo (aka a Job) to make it happen. Below is the basic job, the serviceaccount and role aspects needed to go with this are available here.

apiVersion: batch/v1
kind: Job
metadata:
  name: setup-local-credentials
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
      - image: registry.redhat.io/openshift4/ose-cli:v4.9
        command:
          - /bin/bash
          - -c
          - |
            echo "Linking github secret with pipeline service account"
            oc secrets link pipeline github
        imagePullPolicy: Always
        name: setup-local-credentials
      serviceAccount: setup-local-credentials
      serviceAccountName: setup-local-credentials
      dnsPolicy: ClusterFirst
      restartPolicy: OnFailure
      terminationGracePeriodSeconds: 30

CI Owned and Managed

This is the traditional approach that has been around since dinosaurs roamed the earth (T-Rex was a big fan of Hudson), so we will not go into great detail on this but cover some of the OpenShift Pipelines specifics from an implementation point of view.

In OpenShift, or for that matter Kubernetes, the different environments (dev/test/prod) will commonly be in different namespaces. In OpenShift Pipelines it creates a pipeline service account that by default the various pipelines use when running. In order to allow the pipeline to interact with the different environments in their namespaces we need to give the pipeline SA the appropriate role to do so. Here is an example of a Rolebinding in the development environment (product-catalog-dev namespace) giving edit rights to the pipeline service account in the cicd namespace where the pipeline is running.

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: cicd-pipeline-edit
  namespace: product-catalog-dev
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: edit
subjects:
- kind: ServiceAccount
  name: pipeline
  namespace: product-catalog-cicd

This would need to be done for each environment that the pipeline needs to interact with. Also note that I am taking the easy way and using the OOTB edit ClusterRole in OpenShift, more security conciousness organizations may wish to define a Role with more granular permissions.

If you are using floating tags in an enterprise registry (i.e. dev or test) you will need to set the imagePullPolicy to Always to ensure the new image gets deployed on a rollout. At this point a new deployment can be triggered in a task simply by running oc rollout restart.

CI Owned, CD Participate’s

As discussed previously, in this approach the CI pipeline manages the flow from start-to-finish but instead of doing the deployment itself it defers it to CD, aka GitOps. To accomplish this, the pipeline will clone the manifest repo with all of the yaml managed by GitOps, update the image tag for the new image and then commit it back to git. It will then wait for the GitOps tool to perform the deployment and validate the results. This flow is encapsulated in the following pipeline:

This sub-pipeline is invoked by the server and client pipelines via the tkn CLI since we want to trigger this pipeline and wait for it to complete maintaining a synchronous process. Here is an example that calls this pipeline to deploy a new server image in the dev environment:

tkn pipeline start --showlog --use-param-defaults --param application=server --param environment=dev --prefix-name=server-gitops-deploy-dev --param tag=$(tasks.generate-id.results.short-commit)-$(tasks.generate-id.results.build-uid) --workspace name=manifest-source,claimName=manifest-source gitops-deploy

Let’s look at this gitops deployment sub-pipeline in a little more detail for each individual task.

1. acquire-lease. Since this pipeline is called by other pipelines there is a lease which acts as a mutex to ensure only one instance of this pipeline can run at a time. The details are not relevant to this article however for those interested the implementation was based on an article found here.

2. clone. This task clones the manifest yaml files into a workspace to be used in subsequent steps.

3. update-image. There are a variety of ways to update the image reference depending on how you are managing yaml in GitOps. If GitOps is deploying raw yaml, you many need something like yq to patch a deployment. If you are using a helm chart, yq again could help you update the values.yaml. in my case I am using kustomize which has the capability to override the image tag in an overlay with the following command:

kustomize edit set image <image-name>=<new-image>:<new-image-tag>

In this pipeline we have a kustomize task for updating the image reference. It takes parameters to the image name, new image name and tag as well as the path to the kustomize overlay. In my case the overlay is associated with the environment and cluster, you can see an example here for the dev environment in the home cluster.

4. commit-change. Once we have updated the image we need to commit the change back to git using a git task and running the appropriate git commands. In this pipeline the following commands are used:

if git diff --exit-code;
then
  echo "No changes staged, skipping add/commit"
else
  echo "Changes made, committing"
  git config --global user.name "pipeline"
  git config --global user.email "pipelines@nomail.com"
  git add clusters/$(params.cluster)/overlays/$(params.environment)/kustomization.yaml
  git commit -m 'Update image in git to quay.io/gnunn/$(params.application):$(params.tag)'
  echo "Running 'git push origin HEAD:$(params.git_revision)'"
  git push origin HEAD:$(params.git_revision)
fi

One thing to keep in mind is that it is possible for the pipeline to be executed when no code has been changed, for example testing the pipeline with the same image reference. The if statement here exists as a guard for this case.

5. gitops-deploy. This is where you trigger OpenShift GitOps to perform the deployment. In order to accomplish this the pipeline needs to use the argocd CLI to interact with OpenShift GitOps which in turns requires a token before the pipeline runs.

Since we are deploying everything with a cluster level GitOps, including the namespace GitOps that is handling the deployment here, we can have the cluster level GitOps create a local account and then generate a corresponding token for that account in the namespaced GitOps instance. A job running as a PostSync hook does the work here, it checks if the local account already exists and if not creates it along with a token which is stored as a secret in the CICD namespace for the pipeline to consume.

apiVersion: batch/v1
kind: Job
metadata:
  name: create-pipeline-local-user
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
      - image: registry.redhat.io/openshift-gitops-1/argocd-rhel8:v1.4.2
        command:
          - /bin/bash
          - -c
          - |
            export HOME=/home/argocd
            echo "Checking if pipeline account already there..."
            HAS_ACCOUNT=$(kubectl get cm argocd-cm -o jsonpath={.data."accounts\.pipeline"})
            if [ -z "$HAS_ACCOUNT" ];
            then
                echo "Pipeline account doesn't exist, adding"
                echo "Getting argocd admin credential..."
                if kubectl get secret argocd-cluster;
                then
                  # Create pipeline user
                  kubectl patch cm argocd-cm --patch '{"data": {"accounts.pipeline": "apiKey"}}'
                  # Update password
                  PASSWORD=$(oc get secret argocd-cluster -o jsonpath="{.data.admin\.password}" | base64 -d)
                  argocd login --plaintext --username admin --password ${PASSWORD} argocd-server
                  TOKEN=$(argocd account generate-token --account pipeline)
                  kubectl create secret generic argocd-env-secret --from-literal=ARGOCD_AUTH_TOKEN=${TOKEN} -n ${CICD_NAMESPACE}
                else
                  echo "Secret argocd-cluster not available, could not interact with API"
                fi
            else
                echo "Pipeline account already added, skipping"
            fi
        env:
        # The CICD namespace where the token needs to be deployed to
        - name: CICD_NAMESPACE
          value: ""
        imagePullPolicy: Always
        name: create-pipeline-local-user
      serviceAccount: argocd-argocd-application-controller
      serviceAccountName: argocd-argocd-application-controller
      dnsPolicy: ClusterFirst
      restartPolicy: OnFailure
      terminationGracePeriodSeconds: 30

Since the argocd-argocd-application-controller already has access to the various namespaces we just reuse it for the job since it needs to create a secret in the product-catalog-cicd namespace, again more security conscious organizations may wish to use more granular permissions.

Finally we also need to give this local account, pipeline, appropriate RBAC permissions in the namespaced GitOps ionstance as well. The following roles are defined in the argocd CR for the namespaced instance:

spec:
  ...
  rbac:
    defaultPolicy: 'role:readonly'
    policy: |
      p, role: pipeline, applications, get, apps-product-catalog/*, allow
      p, role: pipeline, applications, sync, apps-product-catalog/*, allow
      g, product-catalog-admins, role:admin
      g, system:cluster-admins, role:admin
      g, pipeline, role: pipeline
    scopes: '[accounts,groups]'

Once we have the integration in play we can use a task in the pipeline to trigger a sync in GitOps via the argocd CLI. Unfortunately this part can be a bit tricky depending on how you have GitOps configured given it is an asynchronous process and timing issues can occur. For example if you are using webhooks with GitOps it’s quite possible that the deploy is already in progress and trying to trigger it again will fail.

In this pipeline we took the example Argo CD Sync and Wait task in Tekton Hub and modified it to make it somewhat more resilient. The key change was having the task execute argocd app wait first and then validate if the image was already updated before performing an explicit sync. The full task is available here, but here is the portion doing the work:

if [ -z "$ARGOCD_AUTH_TOKEN" ]; then
  yes | argocd login "$ARGOCD_SERVER" --username="$ARGOCD_USERNAME" --password="$ARGOCD_PASSWORD";
fi
# Application may already be syncing due to webhook
echo "Waiting for automatic sync if it was already triggered"
argocd app wait "$(params.application-name)" --health "$(params.flags)"
echo "Checking current tag in namespace $(params.namespace)"
CURRENT_TAG=$(oc get deploy $(params.deployment) -n $(params.namespace) -o jsonpath="{.spec.template.spec.containers[$(params.container)].image}" | cut -d ":" -f2)
if [ "$CURRENT_TAG" = "$(params.image-tag)" ]; then
  echo "Image has been synced, exiting"
  exit 0
fi
echo "Running argocd sync..."
argocd app sync "$(params.application-name)" --revision "$(params.revision)" "$(params.flags)"
argocd app wait "$(params.application-name)" --health "$(params.flags)"
CURRENT_TAG=$(oc get deploy $(params.deployment) -n $(params.namespace) -o jsonpath="{.spec.template.spec.containers[$(params.container)].image}" | cut -d ":" -f2)
if [ "$CURRENT_TAG" = "$(params.image-tag)" ]; then
  echo "Image has been synced"
else
  echo "Image failed to sync, requested tag is $(params.image-tag) but current tag is $CURRENT_TAG"
  exit 1;
fi

Also note that this task validates the required image was deployed and fails the pipeline if it was not deployed for any reason. I suspect this task will likely require some further tuning based on comments in this Argo CD issue.

CI Triggered, CD Owned

As a refresher, in this model the pipeline triggers the deployment operation in GitOps but at that point the pipeline completes with the actual deployment being owned by GitOps. This can be done via a Pull Request (PR), fire-and-forget commit, etc but in the case we will look at this being done by a PR to support gating requirements requiring human approval which is an asynchronous process.

This pipeline does require a secret for GitHub in order to create the PR however we simply reuse the same secret that was provided earlier.

The pipeline that provides this capability in the product-catalog demo is as follows:

This pipeline is invoked by the server and client pipelines via a webhook since we are treating this as an event.

In this pipeline the following steps are performed:

1. clone. Clone the git repo of manifest yaml that GitOps is managing

2. branch. Create a new branch in the git repo to generate the PR from, in the product catalog I use push-<build-id> as the branch identifier.

3. patch. Update the image reference using the same kustomize technique that we did previously.

4. commit. Commit the change and push it in the new branch to the remote repo.

5. prod-pr-deploy. This task creates a pull request in GitHub, the GitHub CLI makes this easy to do:

 gh pr create -t "$(params.title)" -b "$(params.body)"

One thing to note in the pipeline is that it passes in links to all of the gating requirements such as image vulnerabilities in Quay and RHACS as well as static code analysis from Sonarqube.

Once the PR is created the pipeline ends, when the application is sync’ed by Argo CD it runs a post-sync hook Job to start a new pipeline to run the integration tests and send a notification if the tests fail.

apiVersion: batch/v1
kind: Job
metadata:
  name: post-sync-pipeline
  generateName: post-sync-pipeline-
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
        - image: registry.access.redhat.com/ubi8
          command:
          - "curl"
          - "-X"
          - "POST"
          - "-H"
          - "Content-Type: application/json"
          - "--data"
          - "{}"
          - "http://el-server-post-prod.product-catalog-cicd:8080"
          imagePullPolicy: Always
          name: post-sync-pipeline
      dnsPolicy: ClusterFirst
      restartPolicy: OnFailure
      terminationGracePeriodSeconds: 30

Conclusion

We reviewed the three different approaches to integrating OpenShift Pipelines and OpenShift GitOps and then examined a concrete implementation of each approach.

One thought on “Integrating OpenShift Pipelines (CI) with GitOps (CD)

  1. I really enjoyed this blog article on the same topic for readers that are looking for more on the topic:

    https://codefresh.io/about-gitops/how-to-model-your-gitops-environments-and-promote-releases-between-them/

    While this article focuses on the promotion of source code, since I wrote it last month I’m starting to look at shifting more to a mindset of promoting manifests where a source code change is really just an image update in the manifest. This mindset ensures that testing is applied when either manifest or source code changes. Expect more on this in the near future.

Leave a Reply

Your email address will not be published. Required fields are marked *


*