ADR-0028 Federated Authentication for AWS
Publication Date | 2021-11-24 |
---|---|
Last Update | 2021-11-24 |
Status | Accepted |
Context
In several places in the architecture we have Cloud Functions triggered by writes to the asset store that then call AWS Step Functions to run tasks such as avatar creation and composition provided by Metail. These calls are to an AWS API that requires AWS keys for authentication.
Our initial thought was to create an AWS IAM user for eTryOn and store the AWS access keys in the Google Secret Manager where they could be read by the Cloud Function. From a security point of view, it is not desirable to share keys across systems in this way as there is a risk of exposure and it makes key rotation and revocation difficult.
A better approach is to use AWS’s AssumeRoleWithWebIdentity. This allows the Cloud Function to authenticate with AWS using a JSON Web Token issued by GCP and retrieve temporary access keys that can be used to call AWS services.
An application running in Google Cloud runs under a specific Service Account and can request an identity token for the Service Account from the GCP metadata service. The AWS Security Token Service has built-in support for verifying these tokens - all we need to do is create an AWS IAM role with a trust policy that allows the Google Service Account to assume the role.
Decision
We will create an AWS IAM role for eTryOn Cloud Functions and configure the trust policy to allow federated authentication from GCP. The trust policy will be configured to allow only specific GCP Service Accounts to assume the role. We will grant the IAM role permission to invoke the Step Functions that Metail is providing for the eTryOn project.
Consequences
We will not need to store AWS access keys in GCP.
Cloud Functions running in GCP will be able to retrieve temporary credentials for invoking AWS services.
Cloud Functions can use the AWS SDK to create a credentials cache to manage credentials. This means they only need to authenticate and retrieve new credentials when the temporary credentials expire.
If we require more granular permissions, we can create an AWS IAM role per Cloud Function, each with just the permissions it needs.
Appendix
Terraform for AWS IAM Role
The following Terraform snippet creates an AWS IAM role with a trust policy allowing specific GCP Service Accounts to assume the role. The Service Account ids can be found in the Google Cloud Console.
variable "allow_gcp_service_accounts" {
type = list(string)
description = "List of GCP service account ids that are allowed to assume the eTryOn role"
}
data "aws_iam_policy_document" "gcp_assume_role" {
statement {
principals {
type = "Federated"
identifiers = ["accounts.google.com"]
}
actions = [
"sts:AssumeRoleWithWebIdentity"
]
condition {
test = "StringEquals"
variable = "accounts.google.com:oaud"
values = ["aws-trust-1"]
}
condition {
test = "StringEquals"
variable = "accounts.google.com:sub"
values = var.allow_gcp_service_accounts
}
}
}
resource "aws_iam_role" "etryon" {
name = "etryon"
assume_role_policy = data.aws_iam_policy_document.gcp_assume_role.json
}
Golang Example
The following snippet shows how to retrieve an identity token from the GCP metadata service. It creates a custom type that implements the GetIdentityToken()
method of the stscreds.IdentityTokenRetriever
interface:
// MetadataServiceTokenRetriever satisfies the stscreds.IdentityTokenRetriever interface.
// It retrieves an identity token for the specified audience from the Google metadata service.
type MetadataServiceTokenRetriever struct {
Audience string
}
// tokenURL is the URL for retrieving a token from the metadata service
const tokenURL = "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/identity?audience="
// GetIdentityToken retrieves a token for the service account from the metadata service
func (p *MetadataServiceTokenRetriever) GetIdentityToken() ([]byte, error) {
req, err := http.NewRequest("GET", tokenURL+url.QueryEscape(p.Audience), nil)
if err != nil {
return nil, err
}
req.Header.Add("Metadata-Flavor", "Google")
resp, err := http.DefaultClient.Do(req)
if err != nil {
return nil, err
}
defer resp.Body.Close()
b, err := io.ReadAll(resp.Body)
if err != nil {
return nil, err
}
return b, nil
}
The MetadataServiceTokenRetriever
is passed as an argument to
stscreds.NewWebIdentityRoleProvider
. The role
is the ARN of the role we want to assume, and the audience
must match the audience configured in the assume role policy.
// LoadConfig returns an AWS configuration with temporary credentials for the
// specified role using an id token for the Cloud Function service account.
// The AWS role must be configured with a trust policy that allows this federated
// identity to assume the desired role.
func LoadConfig(region string, audience string, role string) (aws.Config, error) {
cfg, err := config.LoadDefaultConfig(context.TODO(), config.WithRegion(region))
if err != nil {
return aws.Config{}, fmt.Errorf("error loading default config: %v", err)
}
appCreds := aws.NewCredentialsCache(stscreds.NewWebIdentityRoleProvider(
sts.NewFromConfig(cfg),
role,
&MetadataServiceTokenRetriever{Audience: audience},
))
cfg, err = config.LoadDefaultConfig(context.TODO(), config.WithRegion(region), config.WithCredentialsProvider(appCreds))
if err != nil {
return aws.Config{}, fmt.Errorf("error loading config with WebIdentityRoleProvider: %v", err)
}
return cfg, nil
}
This functionality has been encapsulated in the gitlab.com/etryon/shared/gcf-utils.git/aws
package so a Cloud Function that needs access to AWS resources is easily implemented:
package assumerole
import (
"context"
"fmt"
"io"
"net/http"
"github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/service/s3"
awsutil "gitlab.com/etryon/shared/gcf-utils.git/aws"
)
const (
awsRegion = "eu-west-1"
roleARN = "arn:aws:iam::123456789012:role/etryon"
audience = "aws-trust-1"
outBucket = "some-s3-bucket"
outKey = "test/plaiceholder.jpg"
)
var s3cli *s3.Client
func init() {
cfg, err := awsutil.LoadConfig(awsRegion, audience, roleARN)
if err != nil {
panic(err)
}
s3cli = s3.NewFromConfig(cfg)
}
func Handler(w http.ResponseWriter, _ *http.Request) {
obj, err := s3cli.GetObject(context.TODO(), &s3.GetObjectInput{
Bucket: aws.String(outBucket),
Key: aws.String(outKey),
})
if err != nil {
http.Error(w, fmt.Sprintf("Error retrieving S3 object: %v", err), http.StatusInternalServerError)
return
}
defer obj.Body.Close()
w.Header().Set("Content-Type", "image/jpeg")
w.WriteHeader(http.StatusOK)
io.Copy(w, obj.Body)
}