Exam Cram: Designing and Testing an AWS Incident Response Plan (SCS-C03)
Design and test an incident response plan
SCS-C03 Cram Sheet: Designing and Testing an Incident Response Plan
This document focuses on the preparation, design, and validation of security incident response (IR) strategies within the AWS Cloud, specifically for the AWS Certified Security – Specialty exam.
## Topic Weighting
| Domain | Weighting |
|---|---|
| Content Domain 2: Incident Response | 20% of Exam |
| Focus: Task 2.1 (Design & Test IR Plan) | High (Crucial for scenario-based questions) |
[!IMPORTANT] The exam heavily tests your ability to choose the correct automation tool (Lambda vs. Systems Manager) and ensure the "Blast Radius" is minimized through proactive design.
## Key Concepts Summary
- Runbooks vs. Playbooks:
- Playbooks: High-level strategic guides (e.g., "What to do if an S3 bucket is public").
- Runbooks: Specific, technical step-by-step instructions or automated scripts (e.g., a Python script to revoke IAM sessions).
- Preparation Phase: Provisioning access before an incident occurs. Includes cross-account IAM roles, security tooling deployment, and log aggregation (CloudTrail/VPC Flow Logs).
- Minimizing Blast Radius: Using AWS Organizations, SCPs, and isolated VPCs to ensure a compromise in one account doesn't spread.
- AWS Systems Manager (SSM) Incident Manager: Centralizes IR by tracking incidents, automating response plans, and providing a unified dashboard for operations.
- Automated Remediation: Using EventBridge to trigger Lambda or SSM Automation documents when a security finding is detected.
## Common Pitfalls
- Don't assume standard IAM users are enough; always use Service-Linked Roles or dedicated IR roles with pre-provisioned cross-account access.
- Don't wait for an incident to test your plan. Use AWS Fault Injection Service (FIS) to simulate failures and validate response times.
- Don't confuse Amazon Detective with remediation; Detective is for Root Cause Analysis (RCA), not for fixing the issue.
- Don't forget AWS Shield Advanced for edge-case DDoS incident response support (DRT access).
## Mnemonics / Memory Triggers
- P-I-C-E-R-L (The IR Lifecycle):
- Prepare (Roles/Logs)
- Identify (Detect anomalies)
- Contain (Change SGs/Disable Keys)
- Eradicate (Delete malicious code)
- Recover (Back to baseline)
- Lessons Learned (Post-mortem)
- FIS is for Fails: Remember AWS Fault Injection Service is the primary tool for "Testing" the plan by injecting real-world stress.
## Formula / Equation Sheet
| Service | Primary IR Role | Example Use Case |
|---|---|---|
| SSM Incident Manager | Orchestration | Automating the "Response Plan" lifecycle |
| SSM OpsCenter | Tracking | Managing "OpsItems" (security findings) |
| Lambda | Remediation | Running custom code to isolate an EC2 instance |
| EventBridge | Trigger | Connecting a GuardDuty finding to a response script |
| AWS Backup | Recovery | Restoring encrypted snapshots after ransomware |
| AWS Resilience Hub | Validation | Assessing if architecture meets RTO/RPO targets |
## Practice Set
- Scenario: You need to ensure that if an IAM Access Key is leaked, it is automatically disabled. What is the most efficient AWS-native workflow?
- Scenario: A company wants to simulate a network outage to test their IR runbooks. Which service should they use?
- Scenario: During an incident, the security team cannot access the affected member account. What was missing in the Preparation phase?
- Scenario: You need to perform a forensic analysis on an EC2 instance. What is the first step in the Containment phase?
- Scenario: Which service provides a graphical representation of entity relationships to help investigate the root cause of a security finding?
▶Click to view answers
- GuardDuty Finding -> EventBridge -> Lambda (to run
iam-deactivate-key). - AWS Fault Injection Service (FIS).
- Pre-provisioned Cross-Account IAM Roles (Break-glass roles).
- Isolate the instance using a restrictive Security Group (no ingress/egress) and take a snapshot.
- Amazon Detective.
## Worked Examples
Example 1: Automated Remediation Workflow
Goal: Automatically isolate an EC2 instance found to be communicating with a known Command & Control (C2) server.
\begin{tikzpicture}[node distance=2cm, every node/.style={draw, rectangle, align=center, fill=blue!10}] \node (gd) {GuardDuty \ (Detects C2)}; \node (eb) [right of=gd, xshift=2cm] {EventBridge \ (Rules Match)}; \node (lambda) [right of=eb, xshift=2cm] {Lambda \ (Python Script)}; \node (ec2) [below of=lambda] {EC2 Instance \ (Isolate)};
\draw[->, thick] (gd) -- (eb);
\draw[->, thick] (eb) -- (lambda);
\draw[->, thick] (lambda) -- (ec2);
\node[draw=none, fill=none, right of=ec2, xshift=1cm] (note) {\tiny 1. Attach Isolation SG \\ \tiny 2. Remove IAM Role \\ \tiny 3. Snapshot EBS};\end{tikzpicture}
Example 2: Testing with FIS
Goal: Validate that the IR team is notified when a database becomes unavailable.
- Define Experiment: Target an RDS instance.
- Action:
aws:rds:reboot-instancesoraws:rds:failover-db-cluster. - Stop Condition: CloudWatch Alarm (if latency exceeds 5s, stop experiment).
- Outcome: Check if SSM Incident Manager triggered a response plan.
## Fact Recall Blanks
- The service used to track and resolve operational issues is SSM __________.
- To minimize the __________ __________, use AWS Organizations to isolate workloads.
- __________ __________ provides the ability to perform complex, multi-step investigation of security findings by aggregating data from CloudTrail and VPC Flow Logs.
- A __________ is a set of strategic procedures, while a __________ is a set of technical, repeatable steps.
- AWS __________ __________ is the tool of choice for simulating faults to test IR plan effectiveness.
(Answers: OpsCenter, Blast Radius, Amazon Detective, Playbook/Runbook, Fault Injection Service)