Version: Next

Performance Tuning

Below we go into some of the considerations and options for performance tuning Gatekeeper.

General Performance

GOMAXPROCS

GOMAXPROCS sets the number of threads golang uses. Gatekeeper uses automaxprocs to default this value to the CPU limit set by the linux cgroup (i.e. the limits passed to the Kubernetes container).

This value can be overridden by setting a GOMAXPROCS environment variable.

Generally speaking, too many threads can lead to CPU throttling, which can increase webhook jitter and can result in not enough available CPU per operation, which can lead to increased latency.

Webhook Performance

Max Serving Threads

The --max-serving-threads command line flag caps the number of concurrent goroutines that are calling out to policy evaluation at any one time. This can be important for two reasons:

Excessive numbers of serving goroutines can lead to CPU starvation, which means there is not enough CPU to go around per goroutine, causing requests to time out.
Each serving goroutine can require a non-trivial amount of RAM, which will not be freed until the request is finished. This can increase the maximum memory used by the process, which can lead to OOMing.

By default, the number of webhook threads is capped at the value of GOMAXPROCS. If your policies mostly rely on blocking calls (e.g. calling out to external services via http.send() or via external data), CPU starvation is less of a risk, though memory scaling could still be a concern.

Playing around with this value may help maximize the throughput of Gatekeeper's validating webhook.

Audit

Audit Interval

The --audit-interval flag is used to configure how often audit runs on the cluster.

The time it takes for audit to run is dependent on the size of the cluster, any throttling the K8s API server may do, and the number and complexity of policies to be evaluated. As such, determining the ideal audit interval is use-case-specific.

If you have overlapping audits, the following things can happen:

There will be parallel calls to the policy evaluation backend, which can result in increased RAM usage and CPU starvation, leading to OOMs or audit sessions taking longer per-audit than they otherwise would.
More requests to the K8s API server. If throttled, this can increase the time it takes for an audit to finish.
A newer audit run can pre-empt the reporting of audit results of a previous audit run on the status field of individual constraints. This can lead to constraints not having violation results in their status field. Reports via stdout logging should be unaffected by this.

Ideally, --audit-interval should be set long enough that no more than one audit is running at any time, though occasional overlap should not be harmful.

Constraint Violations Limit

Memory usage will increase/decrease as --constraint-violations-limit is increased/decreased.

Audit Chunk Size

The --audit-chunk-size flags tells Gatekeeper to request lists of objects from the API server to be paginated rather than listing all instances at once. Setting this can reduce maximum memory usage, particularly if you have a cluster with a lot of objects of a specific kind, or a particular kind that has very large objects (say config maps).

One caveat about --audit-chunk-size is that the K8s API server returns a resumption token for list requests. This token is only valid for a short window (~O(minutes)) and the listing of all objects for a given kind must be completed before that token expires. Decreasing --audit-chunk-size should decrease maximum memory usage, but may also lead to an increase in requests to the API server. In cases where this leads to throttling, it's possible the resumption token could expire before object listing has completed.

Match Kind Only

The --audit-match-kind-only flag can be helpful in reducing audit runtime, outgoing API requests and memory usage if your constraints are only matching against a specific subset of kinds, particularly if there are large volumes of config that can be ignored due to being out-of-scope. Some caveats:

If the bulk of the K8s objects are resources that are already in-scope for constraints, the benefit will be mitigated
If a constraint is added that matches against all kinds (say a label constraint), the benefit will be eliminated. If you are relying on this flag, it's important to make sure all constraints added to the cluster have spec.match.kind specified.

General Performance

GOMAXPROCS​

Webhook Performance

Max Serving Threads​

Audit

Audit Interval​

Constraint Violations Limit​

Audit Chunk Size​

Match Kind Only​