External Data
Feature State
: Gatekeeper version v3.7+ (alpha)
❗ This feature is still in alpha stage, so the final form can still change (feedback is welcome!).
✅ Mutation is supported with external data starting from v3.8.0.
Motivation
Gatekeeper provides various means to mutate and validate Kubernetes resources. However, in many of these scenarios this data is either built-in, static or user defined. With external data feature, we are enabling Gatekeeper to interface with various external data sources, such as image registries, using a provider-based model.
A similar way to connect with an external data source can be done today using OPA's built-in http.send
functionality. However, there are limitations to this approach.
- Gatekeeper does not support Rego policies for mutation, which cannot use the OPA
http.send
built-in function. - Security concerns due to:
- if template authors are not trusted, it will potentially give template authors access to the in-cluster network.
- if template authors are trusted, authors will need to be careful on how rego is written to avoid injection attacks.
Key benefits provided by the external data solution:
- Addresses security concerns by:
- Restricting which hosts a user can access.
- Providing an interface for making requests, which allows Gatekeeper to better handle things like escaping strings.
- Addresses common patterns with a single provider, e.g. image tag-to-digest mutation, which can be leveraged by multiple scenarios (e.g. validating image signatures or vulnerabilities).
- Provider model creates a common interface for extending Gatekeeper with external data.
- It allows for separation of concerns between the implementation that allows access to external data and the actual policy being evaluated.
- Developers and consumers of data sources can rely on that common protocol to ease authoring of both constraint templates and data sources.
- Makes change management easier as users of an external data provider should be able to tell whether upgrading it will break existing constraint templates. (once external data API is stable, our goal is to have that answer always be "no")
- Performance benefits as Gatekeeper can now directly control caching and which values are significant for caching, which increases the likelihood of cache hits.
- For mutation, we can batch requests via lazy evaluation.
- For validation, we make batching easier via
external_data
Rego function design.
Enabling external data support
YAML
You can enable external data support by adding --enable-external-data
in gatekeeper audit and controller-manager deployment arguments.
Helm
You can also enable external data by installing or upgrading Helm chart by setting enableExternalData=true
:
helm install gatekeeper/gatekeeper --name-template=gatekeeper --namespace gatekeeper-system --create-namespace \
--set enableExternalData=true
Dev/Test
For dev/test deployments, use make deploy ENABLE_EXTERNAL_DATA=true
Providers
Providers are designed to be in-cluster components that can communicate with external data sources (such as image registries, Active Directory/LDAP directories, etc) and return data in a format that can be processed by Gatekeeper.
Example provider template can be found at: https://github.com/open-policy-agent/gatekeeper-external-data-provider
Providers maintained by the community
If you have built an external data provider and would like to add it to this list, please submit a PR to update this page.
If you have any issues with a specific provider, please open an issue in the applicable provider's repository.
The following external data providers are maintained by the community:
Sample providers
The following external data providers are samples and are not supported/maintained by the community:
API (v1alpha1)
Provider
Provider resource defines the provider and the configuration for it.
apiVersion: externaldata.gatekeeper.sh/v1alpha1
kind: Provider
metadata:
name: my-provider
spec:
url: http://<service-name>.<namespace>:<port>/<endpoint> # URL to the external data source (e.g., http://my-provider.my-namespace:8090/validate)
timeout: <timeout> # timeout value in seconds (e.g., 1). this is the timeout on the Provider custom resource, not the provider implementation.
ProviderRequest
ProviderRequest
is the API request that is sent to the external data provider.
// ProviderRequest is the API request for the external data provider.
type ProviderRequest struct {
// APIVersion is the API version of the external data provider.
APIVersion string `json:"apiVersion,omitempty"`
// Kind is kind of the external data provider API call. This can be "ProviderRequest" or "ProviderResponse".
Kind ProviderKind `json:"kind,omitempty"`
// Request contains the request for the external data provider.
Request Request `json:"request,omitempty"`
}
// Request is the struct that contains the keys to query.
type Request struct {
// Keys is the list of keys to send to the external data provider.
Keys []string `json:"keys,omitempty"`
}
ProviderResponse
ProviderResponse
is the API response that a provider must return.
// ProviderResponse is the API response from a provider.
type ProviderResponse struct {
// APIVersion is the API version of the external data provider.
APIVersion string `json:"apiVersion,omitempty"`
// Kind is kind of the external data provider API call. This can be "ProviderRequest" or "ProviderResponse".
Kind ProviderKind `json:"kind,omitempty"`
// Response contains the response from the provider.
Response Response `json:"response,omitempty"`
}
// Response is the struct that holds the response from a provider.
type Response struct {
// Idempotent indicates that the responses from the provider are idempotent.
// Applies to mutation only and must be true for mutation.
Idempotent bool `json:"idempotent,omitempty"`
// Items contains the key, value and error from the provider.
Items []Item `json:"items,omitempty"`
// SystemError is the system error of the response.
SystemError string `json:"systemError,omitempty"`
}
// Items is the struct that contains the key, value or error from a provider response.
type Item struct {
// Key is the request from the provider.
Key string `json:"key,omitempty"`
// Value is the response from the provider.
Value interface{} `json:"value,omitempty"`
// Error is the error from the provider.
Error string `json:"error,omitempty"`
}
Implementation
Provider is an HTTP server that listens on a port and responds to ProviderRequest
with ProviderResponse
.
As part of ProviderResponse
, the provider can return a list of items. Each item is a JSON object with the following fields:
Key
: the key that was sent to the providerValue
: the value that was returned from the provider for that keyError
: an error message if the provider returned an error for that key
If there is a system error, the provider should return the system error message in the SystemError
field.
📎 Recommendation is for provider implementations to keep a timeout such as maximum of 1-2 seconds for the provider to respond.
Example provider implementation: https://github.com/open-policy-agent/gatekeeper/blob/master/test/externaldata/dummy-provider/provider.go
External data for Gatekeeper validating webhook
External data adds a custom OPA built-in function called external_data
to Rego. This function is used to query external data providers.
external_data
is a function that takes a request and returns a response. The request is a JSON object with the following fields:
provider
: the name of the provider to querykeys
: the list of keys to send to the provider
e.g.,
# build a list of keys containing images for batching
my_list := [img | img = input.review.object.spec.template.spec.containers[_].image]
# send external data request
response := external_data({"provider": "my-provider", "keys": my_list})
Response example: [["my-key"
, "my-value"
, ""
], ["another-key"
, 42
, ""
], ["bad-key"
, ""
, "error message"
]]
📎 To avoid multiple calls to the same provider, recommendation is to batch the keys list to send a single request.
Example template: https://github.com/open-policy-agent/gatekeeper/blob/master/test/externaldata/dummy-provider/policy/template.yaml
External data for Gatekeeper mutating webhook
External data can be used in conjunction with Gatekeeper mutating webhook.
API
You can specify the details of the external data provider in the spec.parameters.assign.externalData
field of AssignMetadata
and Assign
.
Note:
spec.parameters.assign.externalData
,spec.parameters.assign.value
andspec.parameters.assign.fromMetadata
are mutually exclusive.
Field | Description |
---|---|
provider String | The name of the external data provider. |
dataSource DataSource | Specifies where to extract the data that will be sent to the external data provider. - ValueAtLocation (default): extracts an array of values from the path that will be modified. See mutation intent for more details.- Username : The name of the Kubernetes user who initiated the admission request. |
failurePolicy FailurePolicy | The policy to apply when the external data provider returns an error. - UseDefault : use the default value specified in spec.parameters.assign.externalData.default - Ignore : ignore the error and do not perform any mutations.- Fail (default): do not perform any mutations and return the error to the user. |
default String | The default value to use when the external data provider returns an error and the failure policy is set to UseDefault . |
AssignMetadata
apiVersion: mutations.gatekeeper.sh/v1beta1
kind: AssignMetadata
metadata:
name: annotate-owner
spec:
match:
scope: Namespaced
kinds:
- apiGroups: ["*"]
kinds: ["Pod"]
location: "metadata.annotations.owner"
parameters:
assign:
externalData:
provider: my-provider
dataSource: Username
Provider response
{
"apiVersion": "externaldata.gatekeeper.sh/v1alpha1",
"kind": "ProviderResponse",
"response": {
"idempotent": true,
"items": [
{
"key": "kubernetes-admin",
"value": "admin@example.com"
}
]
}
}
Mutated object
...
metadata:
annotations:
owner: admin@example.com
...
Assign
apiVersion: mutations.gatekeeper.sh/v1beta1
kind: Assign
metadata:
name: mutate-images
spec:
applyTo:
- groups: [""]
kinds: ["Pod"]
versions: ["v1"]
match:
scope: Namespaced
location: "spec.containers[name:*].image"
parameters:
assign:
externalData:
provider: my-provider
dataSource: ValueAtLocation
failurePolicy: UseDefault
default: busybox:latest
Provider response
{
"apiVersion": "externaldata.gatekeeper.sh/v1alpha1",
"kind": "ProviderResponse",
"response": {
"idempotent": true,
"items": [
{
"key": "nginx",
"value": "nginx:v1.2.3"
}
]
}
}
Mutated object
...
spec:
containers:
- name: nginx
image: nginx:v1.2.3
...
Limitations
There are several limitations when using external data with the mutating webhook:
- Only supports mutation of
string
fields (e.g..spec.containers[name:*].image
). AssignMetadata
only supportsdataSource: Username
becauseAssignMetadata
only supports creation ofmetadata.annotations
andmetadata.labels
.dataSource: ValueAtLocation
will not return any data.ModifySet
does not support external data.- Multiple mutations to the same object are applied alphabetically based on the name of the mutation CRDs. If you have an external data mutation and a non-external data mutation with the same
spec.location
, the final result might not be what you expected. Currently, there is no way to enforce custom ordering of mutations but the issue is being tracked here.