Our Journey Migrating to AWS IMDSv2

We are heavy users of Amazon Compute Compute Cloud (EC2) at Slack — we run approximately 60,000 EC2 instances across 17 AWS regions while operating hundreds of AWS accounts. A multitude of teams own and manage our various instances. The Instance Metadata Service (IMDS) is an on-instance component that can be used to gain an…

Archie Gunasekara
13 min readbeginner
--
View Original

Overview

The article discusses Slack's migration from AWS Instance Metadata Service version 1 (IMDSv1) to version 2 (IMDSv2), emphasizing the security enhancements and challenges faced during the transition. It details the methods used to identify IMDSv1 usage, the steps taken to enforce IMDSv2, and the tools developed to ensure compliance across their extensive AWS infrastructure.

What You'll Learn

1

How to implement AWS IMDSv2 in your EC2 instances

2

Why transitioning to IMDSv2 enhances security against SSRF vulnerabilities

3

How to monitor and enforce IMDSv2 usage across multiple AWS accounts

Prerequisites & Requirements

  • Understanding of AWS EC2 and instance provisioning
  • Familiarity with AWS CloudWatch and Prometheus for monitoring(optional)

Key Questions Answered

What are the main differences between IMDSv1 and IMDSv2?
IMDSv1 uses a simple request-and-response pattern, which can be exploited through Server Side Request Forgery (SSRF) vulnerabilities. In contrast, IMDSv2 requires session authentication via token-based requests, making it significantly more secure against such attacks.
How did Slack identify instances still using IMDSv1?
Slack utilized the EC2 CloudWatch metric called MetadataNoToken to track the usage of IMDSv1 across their instances. They developed an application called imds-cw-metric-collector to map these metrics to instance IDs and alert service teams for necessary updates.
What steps did Slack take to enforce IMDSv2 on new instances?
Slack modified their Terraform provisioning tools to disable IMDSv1 for new instances. They created a custom Terraform module to manage the transition and implemented AWS Service Control Policies (SCPs) to block the launching of instances with IMDSv1 enabled.
What mechanisms did Slack implement to monitor IMDSv1 usage?
Slack set up a notification system using AWS EventBridge and Lambda to capture EC2 API events related to IMDSv1 usage. This allows them to receive alerts whenever an instance is launched with IMDSv1 enabled, ensuring timely remediation.

Key Statistics & Figures

Number of EC2 instances at Slack
60,000
Slack operates approximately 60,000 EC2 instances across 17 AWS regions.
AWS accounts managed by Slack
hundreds
Slack manages hundreds of AWS accounts for different environments and service teams.

Technologies & Tools

Some links below are affiliate links. We may earn a commission if you make a purchase.

Cloud Computing
AWS EC2
Used for running Slack's extensive infrastructure.
Monitoring
AWS Cloudwatch
Used to track IMDSv1 usage through the MetadataNoToken metric.
Infrastructure As Code
Terraform
Used for provisioning and managing AWS resources.
Event Management
AWS Eventbridge
Used to capture EC2 API events for monitoring IMDSv1 usage.
Serverless Computing
AWS Lambda
Used to process events from EventBridge and send notifications.
Monitoring
Prometheus
Used for collecting and visualizing metrics related to IMDS status.

Key Actionable Insights

1
Transitioning to IMDSv2 is crucial for enhancing security in your AWS environment.
By implementing IMDSv2, you reduce the risk of SSRF attacks, which can lead to unauthorized access to sensitive instance metadata.
2
Utilize AWS CloudWatch metrics to monitor the use of IMDSv1 across your instances.
This proactive monitoring allows you to identify and remediate instances that have not yet transitioned to IMDSv2, ensuring compliance and security.
3
Leverage Terraform modules to manage instance metadata options effectively.
By creating standardized modules, you can enforce IMDSv2 across multiple AWS accounts without disrupting ongoing operations.

Common Pitfalls

1
Failing to monitor and enforce the transition from IMDSv1 to IMDSv2 can lead to security vulnerabilities.
Without proper monitoring, instances may continue to use IMDSv1, exposing sensitive metadata to potential attacks.
2
Overlooking legacy systems that may not support IMDSv2.
Some older instances may require additional tools or manual intervention to transition, which can complicate the migration process.

Related Concepts

AWS Instance Metadata Service
Server Side Request Forgery (ssrf)
Infrastructure As Code With Terraform
AWS Security Best Practices