AWS Systems Manager (SSM) Operations: Comprehensive Study Guide
AWS Systems Manager (SSM) Operations
AWS Systems Manager (SSM) Operations
AWS Systems Manager (SSM) is the central hub for operational management in AWS. It allows you to gain operational insights and take action on your AWS resources at scale, reducing the need for manual logins and repetitive tasks.
Learning Objectives
By the end of this module, you should be able to:
- Explain the critical role of the SSM Agent in managing EC2 instances.
- Execute and customize SSM Automation runbooks for operational remediation.
- Design automated patching schedules using SSM Patch Manager.
- Integrate SSM with Amazon EventBridge to respond to system state changes.
- Understand how SSM supports external security services like Amazon Inspector.
Key Terms & Glossary
- Managed Node: Any EC2 instance or on-premises server that is configured for use with Systems Manager. Requires the SSM Agent and an IAM role with appropriate permissions (e.g.,
AmazonSSMManagedInstanceCore). - SSM Agent: Software installed on managed nodes that processes requests from the Systems Manager service.
- Runbook: A document (JSON or YAML) that defines the actions that Systems Manager performs on your managed nodes.
- Maintenance Window: A defined schedule for when disruptive operations (like patching or reboots) can occur.
- Patch Baseline: A set of rules that define which patches are approved for installation on your managed nodes.
The "Big Idea"
[!IMPORTANT] The core philosophy of SSM is Operations as Code. Instead of manually SSHing into 500 instances to check a configuration or install a patch, you define the desired state or action once and let SSM distribute that execution across your fleet securely and auditably.
Formula / Concept Box
| Operational Component | Primary Function | Trigger Mechanism |
|---|---|---|
| Automation | Executes workflows (Runbooks) | EventBridge, CLI, or Manual |
| Patch Manager | Automates software updates | Maintenance Windows |
| Parameter Store | Centralized configuration data | API calls, CloudFormation |
| Fleet Manager | Visual node management | AWS Management Console |
Hierarchical Outline
- The SSM Agent Layer
- Core Dependency: Instances must have the agent installed and running to communicate with the SSM API.
- IAM Permissions: Instances must assume an IAM role (via Instance Profile) to grant SSM permission to perform actions.
- Automation & Remediation
- Predefined Runbooks: AWS-provided scripts for common tasks (e.g.,
AWS-RestartEC2Instance). - Custom Runbooks: Tailored workflows for complex logic and cross-service actions.
- Event-Driven Actions: Using EventBridge to detect a state change (e.g., EC2 Instance Stop) and trigger an SSM Automation document to fix it.
- Predefined Runbooks: AWS-provided scripts for common tasks (e.g.,
- Fleet Maintenance at Scale
- Patch Manager: Scans instances for missing patches and installs them based on a Patch Baseline.
- Compliance: Reporting on which instances are compliant with your patching and configuration standards.
- Security Integration
- Amazon Inspector: Uses the SSM Agent to collect data on Common Vulnerabilities and Exposures (CVEs).
- AWS License Manager: Uses the agent to track software license consumption across the fleet.
Visual Anchors
SSM Automation Workflow
SSM Architecture
\begin{tikzpicture} % Cloud boundary \draw[dashed, blue, thick] (-1,-1) rectangle (6,4); \node[blue] at (2.5, 3.7) {AWS Cloud};
% SSM Service
\node[draw, fill=orange!20, minimum width=2cm, minimum height=1cm] (SSM) at (2.5, 2.5) {SSM Service};
% Managed Instances
\node[draw, fill=green!20, minimum width=1.5cm] (EC2) at (0, 0) {EC2 Node};
\node[draw, fill=green!20, minimum width=1.5cm] (ONPREM) at (5, 0) {On-Prem};
% Agent indicator
\node[draw, circle, scale=0.6, fill=gray!30] at (0, 0.5) {Agent};
\node[draw, circle, scale=0.6, fill=gray!30] at (5, 0.5) {Agent};
% Communication paths
\draw[<->, thick] (SSM) -- (0, 1) node[midway, left] {HTTPS (443)};
\draw[<->, thick] (SSM) -- (5, 1) node[midway, right] {Hybrid Link};\end{tikzpicture}
Definition-Example Pairs
- SSM Document: A configuration file defining a set of steps.
- Example: A document that checks if the
httpdservice is running and starts it if it is stopped.
- Example: A document that checks if the
- State Manager: A tool to keep instances in a defined state (Configuration Management).
- Example: Ensuring that a specific monitoring agent is installed on every instance tagged
Productionevery 24 hours.
- Example: Ensuring that a specific monitoring agent is installed on every instance tagged
- Inventory: A collection of metadata from managed instances.
- Example: Querying the fleet to see which versions of Python are installed across 1,000 instances.
Worked Examples
Problem: Automating Instance Recovery
Scenario: You have a mission-critical application on EC2. If the hardware underlying the instance fails, you want it to recover automatically without manual intervention.
Step-by-Step Solution:
- Configure Status Check: Create an Amazon CloudWatch Alarm based on the
StatusCheckFailed_Systemmetric. - Define Action: Within the CloudWatch Alarm, select EC2 Action.
- Recover Instance: Choose the "Recover this instance" action. This moves the instance to a new physical host while maintaining its ID, IP addresses, and EBS volume attachments.
- SSM Integration: Alternatively, use EventBridge to trigger an SSM Automation Runbook (like
AWS-RestartEC2Instance) when the alarm state changes toALARM.
Problem: Patching a Fleet with Different OS Types
Scenario: You have a mix of Amazon Linux 2 and Windows Server 2022 instances.
Solution:
- Create two Patch Baselines: one for Linux (approving 'Security' patches with 7 days delay) and one for Windows (approving 'Critical' patches immediately).
- Use Patch Groups: Tag Linux instances with
PatchGroup: LinuxFleetand Windows withPatchGroup: WinFleet. - Associate the baselines with the respective tags.
- Schedule a Maintenance Window for Saturday at 2:00 AM to run the
AWS-RunPatchBaselinedocument.
Checkpoint Questions
- What is the minimum requirement for an EC2 instance to be seen in the SSM Console?
- Answer: It must have the SSM Agent installed/running and an IAM Instance Profile with permissions to communicate with the SSM service.
- How does Amazon Inspector use SSM to identify vulnerabilities?
- Answer: Inspector uses the SSM Agent to collect software inventory and CVE data from the operating system of the managed node.
- Which SSM feature would you use to store a database password securely?
- Answer: SSM Parameter Store (using the SecureString type).
- What is the difference between a Patch Baseline and a Maintenance Window?
- Answer: A Patch Baseline defines what patches are approved; a Maintenance Window defines when those patches (or other tasks) are allowed to run.