Version: v3.21.x

Metrics & Observability

Observability

This section covers how to gather more detailed statistics about Gatekeeper's query performance. This can be helpful in diagnosing situations such as identifying a constraint template with a long execution time. Statistics are written to Gatekeeper's stdout logs.

Logging Constraint Execution Stats

set --log-stats-audit. This flag enables logging the stats for the audit process.
set --log-stats-admission. This flag enables logging the stats for the admission review process.

Example Log Line

To see how long it takes to review a constraint kind at admission time, enable the --log-stats-admission flag and watch the logs for a constraint kind K8sRequiredLabels, for example:

{
  "level": "info",
  "ts": 1683692576.9093642,
  "logger": "webhook",
  "msg": "admission review request stats",
  "hookType": "validation",
  "process": "admission",
  "event_type": "review_response_stats",
  "resource_group": "",
  "resource_api_version": "v1",
  "resource_kind": "Namespace",
  "resource_namespace": "",
  "request_username": "kubernetes-admin",
  "execution_stats": [
    {
      "scope": "template",
      "statsFor": "K8sRequiredLabels",
      "stats": [
        {
          "name": "templateRunTimeNS",
          "value": 762561,
          "source": {
            "type": "engine",
            "value": "Rego"
          },
          "description": "the number of nanoseconds it took to evaluate all constraints for a template"
        },
        {
          "name": "constraintCount",
          "value": 1,
          "source": {
            "type": "engine",
            "value": "Rego"
          },
          "description": "the number of constraints that were evaluated for the given constraint kind"
        }
      ],
      "labels": [
        {
          "name": "TracingEnabled",
          "value": false
        },
        {
          "name": "PrintEnabled",
          "value": false
        },
        {
          "name": "target",
          "value": "admission.k8s.gatekeeper.sh"
        }
      ]
    }
  ]
}

In the excerpt above, notice templateRunTimeNS and constraintCount. The former indicates the time it takes to evaluate the number of constraints of kind K8sRequiredLabels, while the latter surfaces how many such constraints were evaluated for this template. Labels provide additional information about the execution environment setup, like whether tracing was enabled (TraceEnabled).

Caveats

The additional log volume from enabling the stats logging can be quite high.

Metrics

If you are using a Prometheus client library, for counter metrics, the _total suffix is recommended and sometimes automatically appended by client libraries to indicate that the metric represents a cumulative total.

Below are the list of metrics provided by Gatekeeper:

Constraint

Name: gatekeeper_constraints
Description: Current number of known constraints
Tags:
- enforcement_action: [deny, dryrun, warn]
- status: [active, error]
  Aggregation: LastValue

Constraint Template

Name: gatekeeper_constraint_templates
Description: Number of observed constraint templates
Tags:
- status: [active, error]
  Aggregation: LastValue
Name: gatekeeper_constraint_template_ingestion_count
Description: Total number of constraint template ingestion actions
Tags:
- status: [active, error]
  Aggregation: Count
Name: gatekeeper_constraint_template_ingestion_duration_seconds
Description: Distribution of how long it took to ingest a constraint template in seconds
Tags:
- status: [active, error]
  Aggregation: Distribution

Expansion Template

Name: gatekeeper_expansion_templates
Description: Number of observed expansion templates
Tags:
- status: [active, error]
Aggregation: LastValue

Webhook

Name: gatekeeper_validation_request_count
Description: The number of requests that are routed to validation webhook
Tags:
- admission_status: [allow, deny]
- admission_dryrun: [true, false]
  Aggregation: Count
Name: gatekeeper_validation_request_duration_seconds
Description: The validation webhook response time in seconds
Tags:
- admission_status: [allow, deny]
  Aggregation: Distribution
Name: gatekeeper_mutation_request_count
Description: The number of requests that are routed to mutation webhook
Tags:
- admission_status: [allow, deny]
  Aggregation: Count
Name: gatekeeper_mutation_request_duration_seconds
Description: The mutation webhook response time in seconds
Tags:
- admission_status: [allow, deny]
  Aggregation: Distribution

Audit

Name: gatekeeper_violations
Description: Total number of audited violations
Tags:
- enforcement_action: [deny, dryrun, warn]
  Aggregation: LastValue
Name: gatekeeper_audit_duration_seconds
Description: Latency of audit operation in seconds
Aggregation: Distribution
Name: gatekeeper_audit_last_run_time
Description: Timestamp of last audit run starting time
Aggregation: LastValue
Name: gatekeeper_audit_last_run_end_time
Description: Timestamp of last audit run ending time
Aggregation: LastValue

Mutation

Name: gatekeeper_mutator_ingestion_count
Description: Total number of Mutator ingestion actions
Tags:
- status: [active, error]
  Aggregation: Count
Name: gatekeeper_mutator_ingestion_duration_seconds
Description: The distribution of Mutator ingestion durations
Tags:
- status: [active, error]
  Aggregation: Distribution
Name: gatekeeper_mutators
Description: The current number of Mutator objects
Tags:
- status: [active, error]
  Aggregation: Count
Name: gatekeeper_mutator_conflicting_count
Description: The current number of conflicting Mutator objects
Tags:
- status: [active, error]
  Aggregation: Count

Sync

Name: gatekeeper_sync
Description: Total number of resources of each kind being cached
Tags:
- status: [active, error]
- kind (examples, pod, namespace, ...)
  Aggregation: LastValue
Name: gatekeeper_sync_duration_seconds
Description: Latency of sync operation in seconds
Aggregation: Distribution
Name: gatekeeper_sync_last_run_time
Description: Timestamp of last sync operation
Aggregation: LastValue

Watch

Name: gatekeeper_watch_manager_watched_gvk
Description: Total number of watched GroupVersionKinds
Aggregation: LastValue
Name: gatekeeper_watch_manager_intended_watch_gvk
Description: Total number of GroupVersionKinds with a registered watch intent
Aggregation: LastValue

External Data

Name: gatekeeper_providers
Description: Number of external data providers by status
Tags:
- status: [active, error]
  Aggregation: LastValue
Name: gatekeeper_provider_error_count
Description: Incremental counter for all provider errors occurring over time
Aggregation: Count

Metric Backends

This section covers how to configure different metric backends to export the metrics.

set --metrics-backend. Backend used for metrics. e.g. prometheus, stackdriver, opentelemetry. This flag can be declared more than once. Omitting will default to supporting prometheus.

### Prometheus

Gatekeeper exposes Prometheus metrics by default on port 8888 at the /metrics path. You can configure a Prometheus instance to scrape this endpoint. This is the default exporter.

set --prometheus-port. Prometheus port for metrics backend.

OpenTelemetry

Gatekeeper can be configured to export metrics to an OpenTelemetry collector. This is useful for integrating with a variety of observability backends that support OpenTelemetry.

set --otlp-endpoint. OpenTelemetry exporter endpoint (HTTP exporter only).
set --otlp-metric-interval. Interval to read metrics for OpenTelemetry exporter. Defaulted to 10 secs if unspecified.

Stackdriver

Gatekeeper can be configured to export metrics to Google Cloud's operations suite (formerly Stackdriver). This allows for monitoring Gatekeeper's performance and behavior within the Google Cloud ecosystem.

set --stackdriver-only-when-available. Only attempt to start the stackdriver exporter if credentials are available.
set --stackdriver-metric-interval. Interval to read metrics for stackdriver exporter. Defaulted to 10 secs if unspecified.

Observability​

Logging Constraint Execution Stats​

Example Log Line​

Caveats​

Metrics​

Constraint​

Constraint Template​

Expansion Template​

Webhook​

Audit​

Mutation​

Sync​

Watch​

External Data​

Metric Backends​

OpenTelemetry​

Stackdriver​