Red Hat’s productized version of Keycloak is Red Hat Single Sign-On (RH-SSO), if you are not familair with Keycloak it is a popular open source identity and access management project. RH-SSO is a core base of application infrastructure at many organizations and monitoring it effectively is critical to ensuring service goals are being met.
Out of the box, the RH-SSO 7.4 image exposes Prometheus metrics however these metrics are for the underlying JBoss EAP platform that RH-SSO is running on rather than Keycloak specific metrics. While these low level JBoss EAP metrics are very useful and we definitely want to capture them, wouldn’t it be great if we could get highler level metrics on the number of logins, failed logins, client logins, etc from Keycloak as well?
This is where the community Aerogear Keycloak Metrics SPI project comes in play, it is a Keycloak extension that provides these metrics by leveraging the Keycloak eventing capabilities. Using this extension with RH-SSO, while not directly supported by Red Hat, is easy and straightforward. Note that this article was written using RH-SSO 7.4, your mileage may vary on other versions but conceptually it should follow the same process.
The first order of business is to create a container image that deploys the aerogear extension, here is the Containerfile
that I am using:
FROM registry.redhat.io/rh-sso-7/sso74-openshift-rhel8:latest ARG aerogear_version=2.5.0 RUN cd /opt/eap/standalone/deployments && \ curl -LO https://github.com/aerogear/keycloak-metrics-spi/releases/download/${aerogear_version}/keycloak-metrics-spi-${aerogear_version}.jar && \ touch keycloak-metrics-spi-${aerogear_version}.jar.dodeploy && \ cd -
This container file is referencing the default rh-sso image from Red Hat and then downloading and installing the Aerogear SPI extension. I expect that many organizations using RH-SSO likely have already created their own image already to support themes and other extensions. You can either put your own image in the FROM block or simply incorporate the above into your own Containerfile. Once you have created the custom image you can deploy it into your OpenShift cluster.
NOTE. Currently this metrics SPI exposes the keycloak metrics on the default https port with no authentication which is a significant security concern as documented here. There is a pull request (PR) in progress to mitigate this in OpenShift here, I will update this blog once the PR is merged.
One other thing that needs to be done as part of the deployment is expose the EAP metrics because we want to capture them as well. By default the RH-SSO exposes the metrics on the management port which only binds to localhost thereby preventing Prometheus from scraping them. In order to enable Prometheus to scrape these metrics you will need to bind the management port to all IP addresses (0.0.0.0) so it can be read from the Pod IP. To do this, add -Djboss.bind.address.management=0.0.0.0
to the existing JAVA_OPTS_APPEND
environment variable for the Deployment or StatefulSet you are using to deploy RH-SSO. If it doesn’t exist, just add it.
Once the SPI is deployed you then need to configure the realms you want to monitor to route events to the metrics-listener
. To do this go to Manage > Events > Config and make the change in Event Listeners as per the screenshot below, be careful not to delete existing listeners.
This needs to be done on every realm for which you want to track metrics.
Once you have the SPI deployed and added the event listener to the realms to be monitored you are now ready to validate that it is working. The SPI works by adding a /metrics
at the end of each realm URL. For example to view the metrics from the master realm, you would use the path /auth/realms/master/metrics
. To test the metrics RSH to one of the SSO pods and run the following two curl commands:
# Test keycloak metrics for master realm on pod IP $ curl -k https://$(hostname -i):8443/auth/realms/master/metrics # HELP keycloak_user_event_CLIENT_REGISTER_ERROR Generic KeyCloak User event # TYPE keycloak_user_event_CLIENT_REGISTER_ERROR counter # HELP keycloak_user_event_INTROSPECT_TOKEN Generic KeyCloak User event ... # Test EAP metrics on pod IP curl http://$(hostname -i):9990/metrics # HELP base_cpu_processCpuLoad Displays the "recent cpu usage" for the Java Virtual Machine process. # TYPE base_cpu_processCpuLoad gauge base_cpu_processCpuLoad 0.009113504556752278 ...
If everything worked you should see a lot of output after the each curl commands with the first few lines being similar to the outputs shown. Now comes the next step, having prometheus scrape this data. In this blog I am using OpenShift’s User Workload monitoring feature that I blogged about here so I will not go into the intricacies of setting up the prometheus operator again.
To configure scraping of the EAP metrics we define a PodMonitor since this port isn’t typically defined in the SSO service, for my deployment the pod monitor appears as follows:
apiVersion: monitoring.coreos.com/v1 kind: PodMonitor metadata: name: eap spec: selector: matchLabels: app: sso podMetricsEndpoints: - targetPort: 9990
Note that my deployment of sso has the pods labelled app: sso
, make sure to update the selector above to match on a label in your sso pods. After that we define a servicemonitor to scrape the Aerogear Keycloak SPI metrics:
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: keycloak spec: jobLabel: keycloak selector: matchLabels: app: sso endpoints: - port: keycloak path: /auth/realms/master/metrics scheme: https tlsConfig: insecureSkipVerify: true relabelings: - targetLabel: job replacement: keycloak - targetLabel: provider replacement: keycloak - targetLabel: instance replacement: sso - port: keycloak path: /auth/realms/openshift/metrics scheme: https tlsConfig: insecureSkipVerify: true relabelings: - targetLabel: job replacement: keycloak - targetLabel: provider replacement: keycloak - targetLabel: instance replacement: sso - port: keycloak path: /auth/realms/3scale/metrics scheme: https tlsConfig: insecureSkipVerify: true relabelings: - targetLabel: job replacement: keycloak - targetLabel: provider replacement: keycloak - targetLabel: instance replacement: sso
A couple of items to note here, first be aware that each realm’s metrics are on a separate path so multiple endpoints must be defined, one per realm. Second my SSO deployment is set to re-encrypt and is using a self-signed certificate at the service level. As a result we need to set insecureSkipVerify
to true
otherwise Prometheus will not scrape it due to an invalid certificate. Similar to the PodMonitor, update the selector to match labels in your service.
I’m using relabelings
to set various labels that will appear with the metrics. This is needed because the Grafana dashboard I am using from the grafana library expects certain labels like job
and provider
to be set to keycloak
otherwise it’s queries will not find the metrics. Setting these labels here is easier then modifying the dashboard. Finally I set the instance
label to sso
, if you don’t set this the instance label will default to the IP and port so this is a friendlier way of presenting it.
At this point we can deploy some grafana dashboards. Again I covered deploying and connecting Grafana to the cluster monitoring in a previous article so will not be covering it again. To deploy the keycloak dashboard we can reference the existing one in the grafana library in a dashboard object as follows:
apiVersion: integreatly.org/v1alpha1 kind: GrafanaDashboard metadata: name: sso-dashboard labels: app: grafana spec: url: https://grafana.com/api/dashboards/10441/revisions/1/download datasources: - inputName: "DS_PROMETHEUS" datasourceName: "Prometheus" plugins: - name: grafana-piechart-panel version: 1.3.9
When rendered the dashboard appears as follows, note that my environment is not loaded so it’s not quite as interesting as it would be in a real production environment.
You can see that various metrics around heap, logins, login failures as well as other statistics are presented making it easier to understand what is happening with your SSO installation at any given time.
Next we do the same thing to create an EAP dashboard so we can visualize the EAP metrics:
apiVersion: integreatly.org/v1alpha1 kind: GrafanaDashboard metadata: name: eap-dashboard labels: app: grafana spec: url: https://grafana.com/api/dashboards/10313/revisions/1/download datasources: - inputName: "DS_PROMETHEUS" datasourceName: "Prometheus"
And here is the EAP dashboard in all it’s glory:
The dashboard displays detailed metrics on the JVM heap status but you can also monitor other EAP platform components like databases and caches by customizing the dashboard. One of the benefits of Grafana is that it enables you to design dashboards that makes the most sense for your specific use case and organization. You can start with an off-the-shelf dashboard and then modify it as needed to get the visualization that is required.
RH-SSO is a key infrastructure component for many organizations and monitoring it effectively is important to ensure that SLAs and performance expectations are being met. Hopefully this article will provide a starting point for your organization to define and create a monitoring strategy around RH-SSO.