Updating Kustomize Version in ArgoCD

I love kustomize, particularly when paired with ArgoCD, and find that it’s been a great way to reduce yaml duplication. As much as I love it, there have been some annoying bugs with it over the months particularly in how it handles remote repositories.

For those not familiar with using remote repositories, you can have a kustomization that imports bases and resources from a git repository instead of having to be on your local file system. This makes it possible to develop a common set of kustomizations that can be re-used across an organization. This is essentially what we do in the Red Hat Canada Catalog repo where we share common components across our team. Here is an an example of using a repo repository where my cluster-config repo imports the cost management operator from the Red Hat Canada Catalog:

kind: Kustomization
apiVersion: kustomize.config.k8s.io/v1beta1

bases:
- github.com/redhat-canada-gitops/catalog/cost-management-operator/overlays/default

patchesJson6902:
  - path: patch-source-and-name.yaml
    target:
      group: koku-metrics-cfg.openshift.io
      kind: KokuMetricsConfig
      name: instance
      version: v1beta1

This works really well but as mentioned previously bugs prevail, the format to reference the git repository has worked/not worked in different ways over previous versions and most annoyingly importing a kustomization which in turn has bases that nest more then one level deep in the repo will fail with an evalsymlink error. A lot of these issues were tied to the usage of go-getter.

Fortunately this all seems to have been fixed in the 4.x versions of kustomize with the dropping of go-getter, unfortunately ArgoCD is using 3.7.3 last time I checked. The good news is that it is easy enough to create your own version of the ArgoCD image and include whatever version of kustomize you want. The ArgoCD documentation goes through the options for including custom tools however at the moment the operator only supports embedding new tools in an image at the moment.

As a result the first step to using a custom version of kustomize (lol alliteration!) is to create the image through a Dockerfile:

FROM docker.io/argoproj/argocd:v1.7.12
 
# Switch to root for the ability to perform install
USER root
 
ARG KUSTOMIZE_VERSION=v4.0.1
 
# Install tools needed for your repo-server to retrieve & decrypt secrets, render manifests 
# (e.g. curl, awscli, gpg, sops)
RUN apt-get update && \
    apt-get install -y \
        curl \
        awscli \
        gpg && \
    apt-get clean && \
    curl -o /tmp/kustomize.tar.gz -L https://github.com/kubernetes-sigs/kustomize/releases/download/kustomize%2F${KUSTOMIZE_VERSION}/kustomize_${KUSTOMIZE_VERSION}_linux_amd64.tar.gz && \
    ls /tmp && \
    tar -xvf /tmp/kustomize.tar.gz -C /usr/local/bin && \
    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
 
# Switch back to non-root user
USER argocd

Note in the Dockerfile above I have chosen to overwrite the existing kustomize version. As per the ArgoCD Custom Tooling documentation, you can add multiple versions of kustomize and reference specific versions in your applications. However I see my fix here as a temporary measure until the ArgoCD image catches up with kustomize so I would prefer to keep my application yaml unencumbered with kustomize version references.

To build the image, simply run the following substituting my image repo and name that maps to your own registry and repository:

docker build . -t quay.io/gnunn/argocd:v1.7.12
docker push quay.io/gnunn/argocd:v1.7.12

Once we have the image, we can just update the ArgoCD CR that the operator uses to reference our image as per the example below:

apiVersion: argoproj.io/v1alpha1
kind: ArgoCD
metadata:
  name: example-argocd
  namespace: argocd
spec:
  image: quay.io/gnunn/argocd
  version: v1.7.12

OpenShift User Application Monitoring and Grafana the GitOps way!

Update: All of the work outlined in this article is now available as a kustomize overlay in the Red Hat Canada GitOps repo here.

Traditionally in OpenShift, the cluster monitoring that was provided out-of-the-box (OOTB) was only available for cluster monitoring. Administrators could not configure it to support their own application workloads necessitating the deployment of a separate monitoring stack (typically community prometheus and grafana). However this has changed in OpenShift 4.6 as the cluster monitoring operator now supports deploying a separate prometheus instance for application workloads.

One great capability provided by the OpenShift cluster monitoring is that it deploys Thanos to aggregate metrics from both the cluster and application monitoring stacks thus providing a central point for queries. At this point in time you still need to deploy your own Grafana stack for visualizations but I expect a future version of OpenShift will support custom dashboards right in the console alongside the default ones. The monitoring stack architecture for OpenShift 4.6 is shown in the diagram (click for architecture documentation) below:

Monitoring Architecture

In this blog entry we cover deploying the user application monitoring feature (super easy) as well as a Grafana instance (not super easy) using GitOps, specifically in this case with ArgoCD. This blog post is going to assume some familiarity with Prometheus and Grafana and will concentrate on the more challenging aspects of using GitOps to deploy everything.

The first thing we need to do is deploy the user application monitoring in OpenShift, this would typically be done as part of your cluster configuration. To do this, as per the docs, we simply need to deploy the following configmap in the openshift-monitoring namespace:

apiVersion: v1
kind: ConfigMap
metadata:
  name: cluster-monitoring-config
  namespace: openshift-monitoring
data:
  config.yaml: |
    enableUserWorkload: true

You can see this in my GitOps cluster-config here. Once deployed you should see the user monitoring components deployed in the openshift-user-workload-monitoring project as per below:

Now that the user monitoring is up and running we can configure the monitoring of our applications by adding the ServiceMonitor object to define the monitoring targets. This is typically done as part of the application deployment by application teams, it is a separate activity from the deployment of the user monitoring itself which is done in the cluster configurgation by cluster administrators. Here is an example that I have for my product-catalog demo that monitors my quarkus back-end:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: server
  namespace: product-catalog-dev
spec:
  endpoints:
  - path: /metrics
    port: http
    scheme: http
  selector:
    matchLabels:
      quarkus-prometheus: "true"

In the service monitor above, it defines that any kubernetes services, in the same namespace as the ServiceMonitor, which have the label quarkus-prometheus set to true will have their metrics collected on the port named ‘http’ using the path ‘/metrics’. Of course, your application needs to be enabled for prometheus metrics and most modern frameworks like quarkus make this easy. From a GitOps perspective deploying the ServiceMonitor is just another yaml to deploy along with the application as you can see in my product-catalog manifests here.

As an aside please note that the user monitoring in OpenShift does not support the namespace selector in ServiceMonitor for security reasons, as a result the ServiceMonitor must be deployed in the same namespace as the targets being defined. Thus if you have the same application in three different namespaces (say dev, test and prod) you will need to deploy the ServiceMonitor in each of those namespaces independently.

Now if I were to stop here it would hardly merit a blog post, however for most folks once they deploy the user monitoring the next step is deploying something to visualize them and in this example that will be Grafana. Deploying the Grafana operator via GitOps in OpenShift is somewhat involved since we will use the Operator Lifecycle Manager (OLM) to do it but OLM is asynchronous. Specifically, with OLM you push the Subscription and OperatorGroup and asynchronously OLM will install and deploy the operator. From a GitOps perspective managing the deployment of the operator and the Custom Resources (CR) becomes tricky since the CRs cannot be installed until the Operator Custom Resource Definitions (CRDs) are installed.

Fortunately in ArgoCD there are a number of features available to work around this, specifically adding the `argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true` annotation to our resources will instruct ArgoCD not to error out if some resources cannot be added initially. You can also combine this with retries in your ArgoCD application for more complex operators that take significant time to initialize, for Grafana though the annotation seems to be sufficient. In my product-catalog example, I am adding this annotation across all resources using kustomize:

apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

namespace: product-catalog-monitor

commonAnnotations:
    argocd.argoproj.io/sync-options: SkipDryRunOnMissingResource=true

bases:
- https://github.com/redhat-canada-gitops/catalog/grafana-operator/overlays/aggregate?ref=grafana
- ../../../manifests/app/monitor/base

resources:
- namespace.yaml
- operator-group.yaml
- cluster-monitor-view-rb.yaml

patchesJson6902:
- target:
    version: v1
    group: rbac.authorization.k8s.io
    kind: ClusterRoleBinding
    name: grafana-proxy
  path: patch-proxy-namespace.yaml
- target:
    version: v1alpha1
    group: integreatly.org
    kind: Grafana
    name: grafana
  path: patch-grafana-sar.yaml

Now it’s beyond the scope of this blog to go into a detailed description of kustomize, but in a nutshell it’s a patching framework that enables you to aggregate resources from either local or remote bases as well as add new resources. In the kustomize file above, we are using the Red Hat Canada standard deployment of Grafana, which includes OpenShift OAuth integration, and combining it with my application specific monitoring Grafana resources such as Datasources and Dashboards which is what we will look at next.

Continuing along we need to setup the plumbing to connect Grafana to the cluster monitoring Thanos instance in the openshift-monitoring namespace. This blog article, Custom Grafana dashboards for Red Hat OpenShift Container Platform 4, does a great job of walking you through the process and I am not going to repeat it here, however please do read that article before carrying on.

The first step we need to do is define a GrafanaDatasource object:

apiVersion: integreatly.org/v1alpha1
kind: GrafanaDataSource
metadata:
  name: prometheus
spec:
  datasources:
    - access: proxy
      editable: true
      isDefault: true
      jsonData:
        httpHeaderName1: 'Authorization'
        timeInterval: 5s
        tlsSkipVerify: true
      name: Prometheus
      secureJsonData:
        httpHeaderValue1: 'Bearer ${BEARER_TOKEN}'
      type: prometheus
      url: 'https://thanos-querier.openshift-monitoring.svc.cluster.local:9091'
  name: prometheus.yaml

Notice in httpsHeaderValue1 we are expected to provide a bearer token, this token comes from the grafana-serviceaccount and can only be determined at runtime which makes it a bit of a challenge from a GitOps perspective. To manage this, we deploy a kubernetes job as an ArgoCD PostSync hook in order to patch the GrafanaDatasource with the appropriate token:


apiVersion: batch/v1
kind: Job
metadata:
  name: patch-grafana-ds
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
  template:
    spec:
      containers:
        - image: registry.redhat.io/openshift4/ose-cli:v4.6
          command:
            - /bin/bash
            - -c
            - |
              set -e
              echo "Patching grafana datasource with token for authentication to prometheus"
              TOKEN=`oc serviceaccounts get-token grafana-serviceaccount -n product-catalog-monitor`
              oc patch grafanadatasource prometheus --type='json' -p='[{"op":"add","path":"/spec/datasources/0/secureJsonData/httpHeaderValue1","value":"Bearer '${TOKEN}'"}]'
          imagePullPolicy: Always
          name: patch-grafana-ds
      dnsPolicy: ClusterFirst
      restartPolicy: OnFailure
      serviceAccount: patch-grafana-ds-job
      serviceAccountName: patch-grafana-ds-job
      terminationGracePeriodSeconds: 30

This job runs using a special ServiceAccount which gives the job just enough access to retrieve the token and patch the datasource, once that’s done the job is deleted by ArgoCD.

The other thing we want to do is control access to Grafana, basically we want to grant OpenShift users who have view access on the Grafana route in the namespace access to grafana. The grafana operator uses the OpenShift OAuth Proxy to integrate with OpenShift. This proxy enables the definition of a Subject Access Review (SAR) to determine who is authorized to use Grafana, the SAR is simply a check on a particular object that acts as a way to determine access. For example, to only allow cluster administrators to have access to the Grafana instance we can specify that the user must have access to get namespaces:

-openshift-sar={"resource": "namespaces", "verb": "get"}

In our case we want anyone who has view access to the grafana route in the namespace grafana is hosted, product-catalog-monitor, to have access. So our SAR would appear as follows:

-openshift-sar={"namespace":"product-catalog-monitor","resource":"routes","name":"grafana-route","verb":"get"}

To make this easy for kustomize to patch, the Red Hat Canada grafana implementation passes the SAR as an environment variable. To patch the value we can include a kustomize patch as follows:

- op: replace
  path: /spec/containers/0/env/0/value
  value: '-openshift-sar={"namespace":"product-catalog-monitor","resource":"routes","name":"grafana-route","verb":"get"}'

You can see this patch being applied at the environment level in my product-catalog example here. In my GitOps standards, environments is where the namespace is created and thus it makes sense that any namespace patching that is required is done at this level.

After this it is simply a matter of including the other resources such as the cluster-monitor-view rolebinding to the grafana-serviceaccount so that grafana is authorized to retrieve the metrics.

If everything has gone well to this point you should be able to create a dashboard to view your application metrics.

Updated GitOps Standards

I maintain a small document in Github outlining the GitOps standards I use in my own repositories. I find with kustomize it’s very important to have a standardized layout in terms of folder structure in an organization or else it becomes challenging for everyone to understand what kustomize is doing. A common frame of reference makes all the difference.

I’ve recently tweaked my standards, feel free to check them out at https://github.com/gnunn-gitops/standards. Comments always welcome as I’m very interested in learning what other folks are doing.