Metrics & Observability
Observability
This section covers how to gather more detailed statistics about Gatekeeper's query performance. This can be helpful in diagnosing situations such as identifying a constraint template with a long execution time. Statistics are written to Gatekeeper's stdout logs.
Logging Constraint Execution Stats
set
--log-stats-audit. This flag enables logging the stats for the audit process.set
--log-stats-admission. This flag enables logging the stats for the admission review process.
Example Log Line
To see how long it takes to review a constraint kind at admission time, enable the --log-stats-admission flag and watch the logs for a constraint kind K8sRequiredLabels, for example:
{
"level": "info",
"ts": 1683692576.9093642,
"logger": "webhook",
"msg": "admission review request stats",
"hookType": "validation",
"process": "admission",
"event_type": "review_response_stats",
"resource_group": "",
"resource_api_version": "v1",
"resource_kind": "Namespace",
"resource_namespace": "",
"request_username": "kubernetes-admin",
"execution_stats": [
{
"scope": "template",
"statsFor": "K8sRequiredLabels",
"stats": [
{
"name": "templateRunTimeNS",
"value": 762561,
"source": {
"type": "engine",
"value": "Rego"
},
"description": "the number of nanoseconds it took to evaluate all constraints for a template"
},
{
"name": "constraintCount",
"value": 1,
"source": {
"type": "engine",
"value": "Rego"
},
"description": "the number of constraints that were evaluated for the given constraint kind"
}
],
"labels": [
{
"name": "TracingEnabled",
"value": false
},
{
"name": "PrintEnabled",
"value": false
},
{
"name": "target",
"value": "admission.k8s.gatekeeper.sh"
}
]
}
]
}
In the excerpt above, notice templateRunTimeNS and constraintCount. The former indicates the time it takes to evaluate the number of constraints of kind K8sRequiredLabels, while the latter surfaces how many such constraints were evaluated for this template. Labels provide additional information about the execution environment setup, like whether tracing was enabled (TraceEnabled).
Caveats
The additional log volume from enabling the stats logging can be quite high.
Metrics
If you are using a Prometheus client library, for counter metrics, the _total suffix is recommended and sometimes automatically appended by client libraries to indicate that the metric represents a cumulative total.
Below are the list of metrics provided by Gatekeeper:
Constraint
Name:
gatekeeper_constraintsDescription:
Current number of known constraintsTags:
enforcement_action: [deny,dryrun,warn]status: [active,error]Aggregation:
LastValue
Constraint Template
Name:
gatekeeper_constraint_templatesDescription:
Number of observed constraint templatesTags:
status: [active,error]Aggregation:
LastValue
Name:
gatekeeper_constraint_template_ingestion_countDescription:
Total number of constraint template ingestion actionsTags:
status: [active,error]Aggregation:
Count
Name:
gatekeeper_constraint_template_ingestion_duration_secondsDescription:
Distribution of how long it took to ingest a constraint template in secondsTags:
status: [active,error]Aggregation:
Distribution
Expansion Template
Name:
gatekeeper_expansion_templatesDescription:
Number of observed expansion templatesTags:
status: [active,error]
Aggregation:
LastValue
Webhook
Name:
gatekeeper_validation_request_countDescription:
The number of requests that are routed to validation webhookTags:
admission_status: [allow,deny]admission_dryrun: [true,false]Aggregation:
Count
Name:
gatekeeper_validation_request_duration_secondsDescription:
The validation webhook response time in secondsTags:
admission_status: [allow,deny]Aggregation:
Distribution
Name:
gatekeeper_mutation_request_countDescription:
The number of requests that are routed to mutation webhookTags:
admission_status: [allow,deny]Aggregation:
Count
Name:
gatekeeper_mutation_request_duration_secondsDescription:
The mutation webhook response time in secondsTags:
admission_status: [allow,deny]Aggregation:
Distribution
Audit
Name:
gatekeeper_violationsDescription:
Total number of audited violationsTags:
enforcement_action: [deny,dryrun,warn]Aggregation:
LastValue
Name:
gatekeeper_audit_duration_secondsDescription:
Latency of audit operation in secondsAggregation:
DistributionName:
gatekeeper_audit_last_run_timeDescription:
Timestamp of last audit run starting timeAggregation:
LastValueName:
gatekeeper_audit_last_run_end_timeDescription:
Timestamp of last audit run ending timeAggregation:
LastValue
Mutation
Name:
gatekeeper_mutator_ingestion_countDescription:
Total number of Mutator ingestion actionsTags:
status: [active,error]Aggregation:
Count
Name:
gatekeeper_mutator_ingestion_duration_secondsDescription:
The distribution of Mutator ingestion durationsTags:
status: [active,error]Aggregation:
Distribution
Name:
gatekeeper_mutatorsDescription:
The current number of Mutator objectsTags:
status: [active,error]Aggregation:
Count
Name:
gatekeeper_mutator_conflicting_countDescription:
The current number of conflicting Mutator objectsTags:
status: [active,error]Aggregation:
Count
Sync
Name:
gatekeeper_syncDescription:
Total number of resources of each kind being cachedTags:
status: [active,error]kind(examples,pod,namespace, ...)Aggregation:
LastValue
Name:
gatekeeper_sync_duration_secondsDescription:
Latency of sync operation in secondsAggregation:
DistributionName:
gatekeeper_sync_last_run_timeDescription:
Timestamp of last sync operationAggregation:
LastValue
Watch
Name:
gatekeeper_watch_manager_watched_gvkDescription:
Total number of watched GroupVersionKindsAggregation:
LastValueName:
gatekeeper_watch_manager_intended_watch_gvkDescription:
Total number of GroupVersionKinds with a registered watch intentAggregation:
LastValue
External Data
Name:
gatekeeper_providersDescription:
Number of external data providers by statusTags:
status: [active,error]Aggregation:
LastValue
Name:
gatekeeper_provider_error_countDescription:
Incremental counter for all provider errors occurring over timeAggregation:
Count
Metric Backends
This section covers how to configure different metric backends to export the metrics.
- set
--metrics-backend. Backend used for metrics. e.g. prometheus, stackdriver, opentelemetry. This flag can be declared more than once. Omitting will default to supporting prometheus.
### Prometheus
Gatekeeper exposes Prometheus metrics by default on port 8888 at the /metrics path. You can configure a Prometheus instance to scrape this endpoint. This is the default exporter.
- set
--prometheus-port. Prometheus port for metrics backend.
OpenTelemetry
Gatekeeper can be configured to export metrics to an OpenTelemetry collector. This is useful for integrating with a variety of observability backends that support OpenTelemetry.
set
--otlp-endpoint. OpenTelemetry exporter endpoint (HTTP exporter only).set
--otlp-metric-interval. Interval to read metrics for OpenTelemetry exporter. Defaulted to 10 secs if unspecified.
Stackdriver
Gatekeeper can be configured to export metrics to Google Cloud's operations suite (formerly Stackdriver). This allows for monitoring Gatekeeper's performance and behavior within the Google Cloud ecosystem.
set
--stackdriver-only-when-available. Only attempt to start the stackdriver exporter if credentials are available.set
--stackdriver-metric-interval. Interval to read metrics for stackdriver exporter. Defaulted to 10 secs if unspecified.