ADR-0028 Federated Authentication for AWS

Date2021-11-24
StatusAccepted

Context

In several places in the architecture we have Cloud Functions triggered by writes to the asset store that then call AWS Step Functions to run tasks such as avatar creation and composition provided by Metail. These calls are to an AWS API that requires AWS keys for authentication.

Our initial thought was to create an AWS IAM user for eTryOn and store the AWS access keys in the Google Secret Manager where they could be read by the Cloud Function. From a security point of view, it is not desirable to share keys across systems in this way as there is a risk of exposure and it makes key rotation and revocation difficult.

A better approach is to use AWS’s AssumeRoleWithWebIdentity. This allows the Cloud Function to authenticate with AWS using a JSON Web Token issued by GCP and retrieve temporary access keys that can be used to call AWS services.

An application running in Google Cloud runs under a specific Service Account and can request an identity token for the Service Account from the GCP metadata service. The AWS Security Token Service has built-in support for verifying these tokens - all we need to do is create an AWS IAM role with a trust policy that allows the Google Service Account to assume the role.

sequenceDiagram participant Cloud Function participant GCP Metadata Service participant AWS Security Token Service participant AWS Service Cloud Function ->> GCP Metadata Service: Request id token for Service Account GCP Metadata Service -->> Cloud Function: id token Cloud Function ->> AWS Security Token Service: AssumeRoleWithWebIdentity using id token AWS Security Token Service -->> Cloud Function: temporary credentials Cloud Function ->> AWS Service: API call using temporary credentials

Decision

We will create an AWS IAM role for eTryOn Cloud Functions and configure the trust policy to allow federated authentication from GCP. The trust policy will be configured to allow only specific GCP Service Accounts to assume the role. We will grant the IAM role permission to invoke the Step Functions that Metail is providing for the eTryOn project.

Consequences

We will not need to store AWS access keys in GCP.

Cloud Functions running in GCP will be able to retrieve temporary credentials for invoking AWS services.

Cloud Functions can use the AWS SDK to create a credentials cache to manage credentials. This means they only need to authenticate and retrieve new credentials when the temporary credentials expire.

If we require more granular permissions, we can create an AWS IAM role per Cloud Function, each with just the permissions it needs.

Appendix

Terraform for AWS IAM Role

The following Terraform snippet creates an AWS IAM role with a trust policy allowing specific GCP Service Accounts to assume the role. The Service Account ids can be found in the Google Cloud Console.

variable "allow_gcp_service_accounts" {
  type        = list(string)
  description = "List of GCP service account ids that are allowed to assume the eTryOn role"
}

data "aws_iam_policy_document" "gcp_assume_role" {
  statement {
    principals {
      type        = "Federated"
      identifiers = ["accounts.google.com"]
    }
    actions = [
      "sts:AssumeRoleWithWebIdentity"
    ]
    condition {
      test     = "StringEquals"
      variable = "accounts.google.com:oaud"
      values   = ["aws-trust-1"]
    }
    condition {
      test     = "StringEquals"
      variable = "accounts.google.com:sub"
      values   = var.allow_gcp_service_accounts
    }
  }
}

resource "aws_iam_role" "etryon" {
  name               = "etryon"
  assume_role_policy = data.aws_iam_policy_document.gcp_assume_role.json
}

Golang Example

The following snippet shows how to retrieve an identity token from the GCP metadata service. It creates a custom type that implements the GetIdentityToken() method of the stscreds.IdentityTokenRetriever interface:

// MetadataServiceTokenRetriever satisfies the stscreds.IdentityTokenRetriever interface.
// It retrieves an identity token for the specified audience from the Google metadata service.
type MetadataServiceTokenRetriever struct {
        Audience string
}

// tokenURL is the URL for retrieving a token from the metadata service
const tokenURL = "http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/identity?audience="

// GetIdentityToken retrieves a token for the service account from the metadata service
func (p *MetadataServiceTokenRetriever) GetIdentityToken() ([]byte, error) {
        req, err := http.NewRequest("GET", tokenURL+url.QueryEscape(p.Audience), nil)
        if err != nil {
                return nil, err
        }
        req.Header.Add("Metadata-Flavor", "Google")
        resp, err := http.DefaultClient.Do(req)
        if err != nil {
                return nil, err
        }
        defer resp.Body.Close()
        b, err := io.ReadAll(resp.Body)
        if err != nil {
                return nil, err
        }
        return b, nil
}

The MetadataServiceTokenRetriever is passed as an argument to stscreds.NewWebIdentityRoleProvider. The role is the ARN of the role we want to assume, and the audience must match the audience configured in the assume role policy.

// LoadConfig returns an AWS configuration with temporary credentials for the
// specified role using an id token for the Cloud Function service account.
// The AWS role must be configured with a trust policy that allows this federated
// identity to assume the desired role.
func LoadConfig(region string, audience string, role string) (aws.Config, error) {
        cfg, err := config.LoadDefaultConfig(context.TODO(), config.WithRegion(region))
        if err != nil {
                return aws.Config{}, fmt.Errorf("error loading default config: %v", err)
        }
        appCreds := aws.NewCredentialsCache(stscreds.NewWebIdentityRoleProvider(
                sts.NewFromConfig(cfg),
                role,
                &MetadataServiceTokenRetriever{Audience: audience},
        ))
        cfg, err = config.LoadDefaultConfig(context.TODO(), config.WithRegion(region), config.WithCredentialsProvider(appCreds))
        if err != nil {
                return aws.Config{}, fmt.Errorf("error loading config with WebIdentityRoleProvider: %v", err)
        }
        return cfg, nil
}

This functionality has been encapsulated in the gitlab.com/etryon/shared/gcf-utils.git/aws package so a Cloud Function that needs access to AWS resources is easily implemented:

package assumerole

import (
        "context"
        "fmt"
        "io"
        "net/http"

        "github.com/aws/aws-sdk-go-v2/aws"
        "github.com/aws/aws-sdk-go-v2/service/s3"

        awsutil "gitlab.com/etryon/shared/gcf-utils.git/aws"
)

const (
        awsRegion = "eu-west-1"
        roleARN   = "arn:aws:iam::123456789012:role/etryon"
        audience  = "aws-trust-1"
        outBucket = "some-s3-bucket"
        outKey    = "test/plaiceholder.jpg"
)

var s3cli *s3.Client

func init() {
        cfg, err := awsutil.LoadConfig(awsRegion, audience, roleARN)
        if err != nil {
                panic(err)
        }
        s3cli = s3.NewFromConfig(cfg)
}

func Handler(w http.ResponseWriter, _ *http.Request) {
        obj, err := s3cli.GetObject(context.TODO(), &s3.GetObjectInput{
                Bucket: aws.String(outBucket),
                Key:    aws.String(outKey),
        })
        if err != nil {
                http.Error(w, fmt.Sprintf("Error retrieving S3 object: %v", err), http.StatusInternalServerError)
                return
        }
        defer obj.Body.Close()
        w.Header().Set("Content-Type", "image/jpeg")
        w.WriteHeader(http.StatusOK)
        io.Copy(w, obj.Body)
}