BrainyBeeBrainyBee
ExploreBlogStart Studying
HomeAWS Certified CloudOps Engineer - Associate (SOA-C03)Automating AWS Operations: Incident Remediation with Systems Manager and EventBridge
Hands-On Lab845 words

Automating AWS Operations: Incident Remediation with Systems Manager and EventBridge

Unit 8: AWS Operational Foundations

Automating AWS Operations: Incident Remediation with Systems Manager and EventBridge

This lab provides hands-on experience in implementing Operational Excellence and Reliability pillars of the AWS Well-Architected Framework. You will configure an automated remediation workflow that detects when an EC2 instance is stopped and automatically restarts it using Amazon EventBridge and AWS Systems Manager (SSM) Automation.

[!WARNING] Remember to run the teardown commands at the end of this lab to avoid ongoing charges for the EC2 resources provisioned.

Prerequisites

  • An active AWS Account with Administrator access.
  • AWS CLI installed and configured with your credentials.
  • A basic understanding of IAM roles and EC2 instances.
  • Region: Ensure you are operating in a single region (e.g., us-east-1).

Learning Objectives

  • Configure an IAM Role for Systems Manager Automation.
  • Implement Event-Driven Remediation using Amazon EventBridge.
  • Execute SSM Automation Runbooks to manage resource states.
  • Verify automated recovery actions in the AWS Management Console.

Architecture Overview

This lab follows a closed-loop remediation architecture:

Loading Diagram...

Step-by-Step Instructions

Step 1: Create an IAM Service Role for SSM

SSM Automation requires permissions to perform actions (like starting an instance) on your behalf.

bash
# 1. Create the trust policy file echo '{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": "ssm.amazonaws.com" }, "Action": "sts:AssumeRole" } ] }' > ssm-trust-policy.json # 2. Create the IAM role aws iam create-role --role-name "BrainyBee-SSM-Automation-Role" --assume-role-policy-document file://ssm-trust-policy.json # 3. Attach the AmazonSSMAutomationRole managed policy aws iam attach-role-policy --role-name "BrainyBee-SSM-Automation-Role" --policy-arn arn:aws:iam::aws:policy/service-role/AmazonSSMAutomationRole
▶Console alternative

Navigate to

IAM > Roles > Create Role

. Select

AWS Service

, then choose

Systems Manager

. Under use case, select

Allow SSM to call AWS services on your behalf

. Attach the

AmazonSSMAutomationRole

policy and name it

BrainyBee-SSM-Automation-Role

.

Step 2: Launch a Test EC2 Instance

We need an instance to monitor and remediate. We will use a t3.micro (or t2.micro if t3 is unavailable).

bash
# Find a standard Amazon Linux 2023 AMI AMI_ID=$(aws ec2 describe-images --owners amazon --filters "Name=name,Values=al2023-ami-*-x86_64" --query 'Images[0].ImageId' --output text) # Launch the instance aws ec2 run-instances \ --image-id $AMI_ID \ --count 1 \ --instance-type t3.micro \ --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=Remediation-Lab-Instance}]'

[!NOTE] Note down the InstanceId from the output; you will need it for verification.

Step 3: Create the EventBridge Remediation Rule

We will create a rule that triggers when our specific instance enters the "stopped" state.

bash
# 1. Create the rule aws events put-rule \ --name "AutoRestartStoppedInstance" \ --event-pattern '{"source":["aws.ec2"],"detail-type":["EC2 Instance State-change Notification"],"detail":{"state":["stopped"]}}' \ --state ENABLED # 2. Add the SSM Automation target # Replace <YOUR_INSTANCE_ID> and <YOUR_ACCOUNT_ID> accordingly aws events put-targets --rule "AutoRestartStoppedInstance" --targets '[{ "Id": "1", "Arn": "arn:aws:ssm:<YOUR_REGION>:<YOUR_ACCOUNT_ID>:automation-definition/AWS-StartEC2Instance:$DEFAULT", "RoleArn": "arn:aws:iam::<YOUR_ACCOUNT_ID>:role/BrainyBee-SSM-Automation-Role", "InputTransformer": { "InputPathsMap": {"instance":"$.detail.instance-id"}, "InputTemplate": "{\"InstanceId\": [\"<instance>\"]}" } }]'
▶Console alternative

Navigate to

Amazon EventBridge > Rules > Create rule

. Define the pattern as

AWS service: EC2

,

Event type: EC2 Instance State-change Notification

, and select

Specific state(s): stopped

. For the target, choose

Systems Manager Automation

and document

AWS-StartEC2Instance

.

Step 4: Test the Remediation

Now, manually stop the instance to see the automation in action.

bash
# Stop the instance aws ec2 stop-instances --instance-ids <YOUR_INSTANCE_ID>

Wait approximately 60 seconds for the EventBridge rule to trigger and the SSM Automation to execute.

Checkpoints

CheckpointActionExpected Result
Verification 1Run aws ec2 describe-instances --instance-ids <ID>State should transition from stopping to stopped and then back to pending/running automatically.
Verification 2Check SSM ExecutionsNavigate to Systems Manager > Automation. You should see a successful execution of AWS-StartEC2Instance.
Verification 3EventBridge MetricsCheck CloudWatch Metrics for TriggeredRules under the AWS/Events namespace.

Troubleshooting

Error / IssuePossible CauseFix
Automation FailsIAM Role missing permissionsEnsure the role has AmazonSSMAutomationRole and the trust policy is correct for ssm.amazonaws.com.
Rule Not TriggeringIncorrect Event PatternCheck if the EventBridge rule pattern correctly matches the stopped state and the aws.ec2 source.
Permission DeniedCLI user lacks iam:PassRoleEnsure your IAM user has permission to pass the role to the EventBridge target.

The "Big Idea": Operational Foundations

This lab demonstrates the Operational Excellence pillar by treating operations as code. By using TikZ to visualize the flow of operational health, we see how AWS services provide a foundation for reliability.

Compiling TikZ diagram…
⏳
Running TeX engine…
This may take a few seconds

Teardown

To avoid unexpected costs, delete all resources created during this lab.

bash
# 1. Delete the EventBridge Rule and Targets aws events remove-targets --rule "AutoRestartStoppedInstance" --ids "1" aws events delete-rule --name "AutoRestartStoppedInstance" # 2. Terminate the EC2 Instance aws ec2 terminate-instances --instance-ids <YOUR_INSTANCE_ID> # 3. Delete the IAM Role aws iam detach-role-policy --role-name "BrainyBee-SSM-Automation-Role" --policy-arn arn:aws:iam::aws:policy/service-role/AmazonSSMAutomationRole aws iam delete-role --role-name "BrainyBee-SSM-Automation-Role"
All AWS Certified CloudOps Engineer - Associate (SOA-C03) Study Resources

Related Notes

  • Curriculum Overview: AWS Operational Foundations832 words
  • Curriculum Overview: Advanced Observability Services820 words
  • Amazon CloudWatch Metrics and Alarms: Curriculum Overview811 words
  • Curriculum Overview: Amazon EBS Performance, Troubleshooting, and Cost Optimization810 words
  • Curriculum Overview: Amazon EBS Performance, Troubleshooting, and Optimization878 words
  • Mastering EBS and S3 Performance Metrics: AWS CloudOps Study Guide985 words
  • Curriculum Overview: Analyzing Events with the AWS Personal Health Dashboard703 words
  • Analyzing Security Findings: Amazon Inspector and AWS Security Hub820 words
  • SOA-C03 Study Guide: Performance Analysis & Automated Remediation1,050 words
  • Study Guide: Analyzing Spend Patterns with AWS Cost Explorer890 words
  • AWS Well-Architected Principles & CloudOps Engineering Curriculum Overview863 words
  • Auditing AWS Network Protection Services820 words

Ready to study AWS Certified CloudOps Engineer - Associate (SOA-C03)?

Practice tests, flashcards, and all study notes — free, no sign-up.

Start Studying

Ready to study AWS Certified CloudOps Engineer - Associate (SOA-C03)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free
AWS Certified CloudOps Engineer - Associate (SOA-C03) ResourcesExplore All HivesBlogHome

© 2026 BrainyBee. Free AI-powered exam prep.