☁️ AWS

Free AWS Certified Solutions Architect - Professional (SAP-C02) Study Resources

Comprehensive AWS Certified Solutions Architect - Professional (SAP-C02) hive provides study notes, question bank with practice tests, flashcards, and hands-on labs, all supported by a personal AI tutor to help you master the AWS Certified Solutions Architect - Professional (SAP-C02) certification.

1,035
Practice Questions
12
Mock Exams
230
Study Notes
824
Flashcard Decks
2
Source Materials

AWS Certified Solutions Architect - Professional (SAP-C02) Study Notes & Guides

230 AI-generated study notes covering the full AWS Certified Solutions Architect - Professional (SAP-C02) curriculum. Showing 10 complete guides below.

Study Guide945 words

Optimizing Operations: Adopting Managed Services & Reducing Infrastructure Overhead

Adopting managed services as needed to reduce infrastructure provisioning and patching overhead

Read full article

Optimizing Operations: Adopting Managed Services & Reducing Infrastructure Overhead

This guide explores the transition from manual infrastructure management to leveraging AWS managed services. By shifting the burden of provisioning, patching, and scaling to AWS, organizations can focus on application logic and business value.

Learning Objectives

After studying this guide, you should be able to:

  • Differentiate between mutable and immutable infrastructure strategies.
  • Explain how Infrastructure as Code (IaC) reduces configuration drift.
  • Assess the trade-offs between refactoring effort and operational cost savings when moving to managed services.
  • Design a patching strategy that integrates with CI/CD pipelines for immutable environments.

Key Terms & Glossary

  • Managed Service: An AWS service where the provider handles the underlying infrastructure, maintenance, and patching (e.g., Amazon RDS, AWS Fargate).
  • Infrastructure as Code (IaC): The management of infrastructure in a descriptive model, using the same versioning as DevOps teams use for source code (e.g., AWS CloudFormation).
  • Configuration Drift: The phenomenon where the environment's configuration deviates from the "source of truth" or initial template due to manual ad-hoc changes.
  • Immutable Infrastructure: A strategy where servers are never modified after deployment. If a change is needed, new servers are built from a common image and replace the old ones.
  • Undifferentiated Heavy Lifting: Tasks like racking servers or patching OS kernels that are necessary but do not provide a unique competitive advantage to a business.

The "Big Idea"

In traditional on-premises environments, servers are long-lived assets to be amortized. In the cloud, infrastructure is disposable. Adopting managed services is not just about technology; it is a mindset shift from "maintaining servers" to "consuming capabilities." Every hour spent patching an OS is an hour not spent improving your product. AWS managed services allow you to delegate this "undifferentiated heavy lifting" back to the provider.

Formula / Concept Box

ConceptImpact on OverheadStrategic Requirement
EC2 (Self-Managed)High (Manual Patching/Ops)Lowest Refactoring
Containers (Fargate)Medium (Image Patching)Moderate Refactoring
Serverless (Lambda)Low (AWS Managed Runtime)High Rearchitecting

[!IMPORTANT] The Inverse Rule of Refactoring: The more advanced the managed service (e.g., Lambda), the higher the initial refactoring effort required, but the lower the long-term operational cost.

Hierarchical Outline

  • I. Infrastructure Provisioning via IaC
    • Automation: Use tools like CloudFormation to ensure consistent environments.
    • Disaster Recovery: Enables rapid recreation of the stack from a "clean slate" during outages.
  • II. Patching and Maintenance Strategies
    • Mutable Approach: Patching live servers using AWS Systems Manager Patch Manager.
    • Immutable Approach: Patching the AMI (Amazon Machine Image) or Container Image in the build phase of a CI/CD pipeline.
  • III. The Managed Service Spectrum
    • Compute Optimization: Transitioning from EC2 to Fargate or Lambda.
    • Storage Optimization: Moving from self-managed EBS/EC2 databases to Amazon RDS or DynamoDB.
  • IV. Modernization Opportunities
    • Architecture Shift: Decoupling monoliths into microservices.
    • Instruction Sets: Moving from x86 to AWS Graviton (ARM) for better price-performance.

Visual Anchors

Infrastructure Evolution Flow

Loading Diagram...

The Shared Responsibility Boundary

Compiling TikZ diagram…
Running TeX engine…
This may take a few seconds

Definition-Example Pairs

  • Immutable Environment: A setup where updates are performed by replacing the entire stack rather than updating in place.
    • Example: Instead of SSH-ing into a server to update Nginx, you trigger a CI/CD pipeline that builds a new AMI with the latest Nginx version and performs a Blue/Green deployment.
  • Infrastructure Drift: When manual changes make a server different from its original specification.
    • Example: An engineer manually installs a security patch on one server in a cluster but forgets the others, causing a version mismatch during the next scaling event.

Worked Examples

Scenario: Migrating a Legacy Web App to Reduce Patching

1. Current State: A Java application runs on 10 EC2 instances. Every month, the sysadmin spends 8 hours manually applying Linux kernel patches and restarting services.

2. Strategy Selection:

  • Option A (Mutable): Use AWS Systems Manager (SSM) Patch Manager. Result: Automates the patching, but servers are still long-lived and susceptible to drift.
  • Option B (Immutable): Migrate to AWS Fargate. Result: AWS manages the underlying EC2 instances. The team only needs to update the Docker base image periodically.

3. Implementation Logic (Option B):

  • Step 1: Containerize the Java application.
  • Step 2: Use AWS CloudFormation to define the ECS Cluster and Fargate Service.
  • Step 3: Integrate image scanning in Amazon ECR to detect vulnerabilities.
  • Step 4: When a patch is needed, update the Dockerfile, push to ECR, and update the Fargate service task definition.

Outcome: Monthly manual patching time drops from 8 hours to 0 hours (automated via CI/CD).

Checkpoint Questions

  1. Why is an immutable infrastructure approach easier to implement in the cloud than on-premises?
  2. If a service is "Serverless," does patching still occur? If so, who performs it?
  3. What is the main risk of performing manual "hot-fixes" on production EC2 instances?
  4. Which AWS service would you use to define your infrastructure as a version-controlled template?

Muddy Points & Cross-Refs

  • Does Serverless mean NO patching?: No. Patching still happens, but it is performed by AWS. For Lambda, AWS patches the underlying OS and runtime. For Fargate, AWS patches the host OS, while you remain responsible for the container image.
  • Cost vs. Effort: Managed services often have higher per-unit costs but lower Total Cost of Ownership (TCO) because they reduce human labor costs.
  • Cross-Reference: For deeper dives into reliability and SLAs when using these services, see Chapter 6: Meeting Reliability Requirements.

Comparison Tables

FeatureSelf-Managed (EC2)Managed Containers (Fargate)Serverless (Lambda)
OS PatchingCustomerAWSAWS
Runtime PatchingCustomerCustomer (in Image)AWS
ScalingManual/Auto-Scaling GroupsAutomatic (Task-based)Fully Native/Automatic
Refactoring NeedMinimal (Lift & Shift)ModerateHigh
Cost ModelHourly / Savings PlansPer vCPU/GB per hourPer Request / Duration

[!TIP] When evaluating services for the SAP-C02 exam, prioritize managed services (Fargate/Lambda/RDS) unless the requirement specifically mentions OS-level customization or legacy software that cannot be containerized.

Study Guide850 words

Study Guide: Alerting and Automatic Remediation Strategies

Alerting and automatic remediation strategies

Read full article

Alerting and Automatic Remediation Strategies

This study guide focuses on the design and implementation of automated responses to operational and security incidents within AWS, a core requirement for the AWS Certified Solutions Architect - Professional (SAP-C02) exam.

Learning Objectives

After studying this guide, you should be able to:

  • Evaluate the necessity of automation in scaling incident response for large-scale environments.
  • Design alerting workflows using Amazon CloudWatch, AWS Config, and Amazon EventBridge.
  • Implement automatic remediation strategies using AWS Systems Manager (SSM) Automation and AWS Lambda.
  • Distinguish between configuration-based remediation (AWS Config) and security-finding remediation (Security Hub/GuardDuty).
  • Leverage the Automated Security Response on AWS library for pre-built playbooks.

Key Terms & Glossary

  • AWS Config: A service that enables you to assess, audit, and evaluate the configurations of your AWS resources. It acts as a managed CMDB.
  • SSM Automation Runbook: A document that defines the actions that Systems Manager performs on your managed instances and other AWS resources.
  • EventBridge (formerly CloudWatch Events): A serverless event bus that makes it easy to connect applications using data from your own applications, integrated SaaS applications, and AWS services.
  • Security Hub: A cloud security capacity management service that performs security best practice checks, aggregates alerts, and enables automated remediation.
  • Remediation Action: A predefined or custom task (like a Lambda function or SSM runbook) triggered automatically when a resource is found to be non-compliant.

The "Big Idea"

In modern cloud architectures, manual intervention is the enemy of scale and reliability. Automatic remediation shifts the operational burden from humans to code. Instead of waiting for an engineer to receive an email and log in to a console, the system detects a drift from the "ideal state" (compliance or health) and executes a predefined script to correct it instantly. This reduces Mean Time to Remediation (MTTR) and ensures security policies are enforced 24/7 without exception.

Formula / Concept Box

ComponentRole in StrategyKey Service Example
DetectionMonitors state and identifies deviations.AWS Config, Amazon GuardDuty
RoutingConnects the detection event to the logic.Amazon EventBridge
Logic/ActionThe actual code/steps to fix the issue.AWS Systems Manager Automation, AWS Lambda
NotificationInforming stakeholders of the action taken.Amazon SNS

Hierarchical Outline

  1. Monitoring and Detection
    • AWS Config: Tracks Configuration Items (CIs); evaluates compliance against rules (e.g., "Is encryption enabled?").
    • Amazon GuardDuty: Intelligent threat detection monitoring for malicious activity (e.g., crypto-mining, unauthorized access).
    • AWS Security Hub: Centralizes findings from GuardDuty, Macie, Inspector, and Config.
  2. Alerting Mechanisms
    • Event-Driven Architecture: Use EventBridge to route findings based on pattern matching.
    • Custom Actions: Security Hub "Custom Actions" allow manual triggering of automated workflows from the console.
  3. Remediation Execution
    • SSM Automation: Preferred for infrastructure-level changes (e.g., stopping an instance, modifying S3 bucket policies).
    • AWS Lambda: Preferred for complex, multi-step logic or calling external APIs.
  4. Scaling and Best Practices
    • Automated Security Response on AWS: A library of pre-built playbooks for FSBP and PCI-DSS standards.
    • Risk-Based Remediation: Choosing between "Immediate Block" (high risk) vs. "Notify and Wait" (low risk).

Visual Anchors

Incident Response Flowchart

Loading Diagram...

Resource Monitoring State Diagram

Compiling TikZ diagram…
Running TeX engine…
This may take a few seconds

Definition-Example Pairs

  • Configuration Drift: When a resource's settings change from the approved baseline.
    • Example: An engineer manually turns off EBS encryption on a volume to test a performance issue and forgets to turn it back on.
  • Remediation Playbook: A documented and automated set of steps to resolve a specific security issue.
    • Example: A playbook that identifies an S3 bucket with public "Read" access and immediately applies the PutPublicAccessBlock API call.
  • Idempotency: The property where an operation can be applied multiple times without changing the result beyond the initial application.
    • Example: An SSM Runbook that ensures a specific IAM policy is attached. If the policy is already there, it does nothing and reports success.

Worked Examples

Case: Automating S3 Public Access Block

Scenario: Your organization prohibits public S3 buckets. You need to ensure any bucket that becomes public is automatically remediated.

  1. Step 1: Detection: Enable the AWS Config managed rule s3-bucket-public-read-prohibited.
  2. Step 2: Association: Link this rule to a Remediation Action.
  3. Step 3: Action Choice: Select the document AWS-PublishPublicAccessBlockCustom.
  4. Step 4: Parameters: Pass the BucketName from the Config event to the SSM document.
  5. Result: When a user makes a bucket public, AWS Config detects it within minutes, triggers the SSM Runbook, and the bucket is set back to private automatically.

Checkpoint Questions

  1. What is the primary difference between a "managed" AWS Config rule and a "custom" AWS Config rule?
  2. How does Amazon EventBridge facilitate cross-account remediation strategies?
  3. In Security Hub, what is required to trigger a "Custom Action"?
  4. Why is AWS Systems Manager Automation often preferred over Lambda for simple resource modifications?

Muddy Points & Cross-Refs

  • Config vs. Security Hub: Students often confuse these. Remember: Config is for resource properties (is the setting right?); Security Hub is for findings (did something bad happen?).
  • EventBridge vs. SNS: Use SNS if a human needs to read an email. Use EventBridge if a system (Lambda/SSM) needs to take an action.
  • Permissions: Remediation fails most often due to the SSM Automation Role lacking the specific permissions (e.g., s3:PutBucketPolicy) to perform the fix.

Comparison Tables

FeatureAWS Config RemediationSecurity Hub Remediation
Primary TriggerConfiguration change (Resource state)Security finding (Alert/Event)
Automation ToolSSM Automation (direct integration)EventBridge -> Lambda/SSM
Best ForCompliance and GovernanceIncident Response & Threat Hunting
Manual OptionNot typical (usually auto-triggered)Custom actions (Manual trigger from console)
Study Guide925 words

AWS Usage Analysis & Resource Optimization Study Guide

Analyzing usage reports to identify underutilized and overutilized resources

Read full article

AWS Usage Analysis & Resource Optimization Study Guide

This guide focuses on the critical skill of analyzing usage reports to identify underutilized and overutilized resources, a core competency for the AWS Certified Solutions Architect - Professional (SAP-C02) exam.

Learning Objectives

After studying this guide, you should be able to:

  • Configure and interpret AWS Cost and Usage Reports (CUR) for granular analysis.
  • Use AWS Cost Explorer to identify spending patterns and anomalies.
  • Define the process of right-sizing and explain its importance in cloud economics.
  • Identify metrics that signal underutilization versus overutilization.
  • Implement a tagging strategy to facilitate cost allocation and reporting.

Key Terms & Glossary

  • Right-sizing: The process of matching instance types and sizes to your workload performance and capacity requirements at the lowest possible cost.
  • AWS Cost and Usage Reports (CUR): The most granular AWS billing tool, delivering CSV or Parquet files to an S3 bucket with hourly or daily detail.
  • Over-provisioning: Allocating more resources (CPU, RAM, Storage) than a workload actually requires, leading to wasted spend.
  • Under-provisioning: Allocating fewer resources than required, leading to performance bottlenecks or application failure.
  • Cost Allocation Tags: Metadata assigned to AWS resources used to track costs on a detailed level (e.g., by Department or Project).

The "Big Idea"

In traditional on-premises environments, over-provisioning is a "safety net" because hardware procurement is slow and expensive. In the cloud, this habit becomes a financial liability. Effective AWS architecture requires shifting from "capacity guessing" to "data-driven rightsizing." By analyzing usage reports, an architect transforms a static infrastructure into a dynamic, cost-efficient organism that scales with actual demand rather than theoretical peaks.

Formula / Concept Box

ConceptMetric / Rule of ThumbAction
Idle ResourcesCPU < 5% and Max Network < 5 KBps over 7 daysTerminate or Downsize
UnderutilizedCPU < 20% consistentlyDownsize (e.g., m5.large to m5.medium)
OverutilizedCPU > 80% or Memory Paging > 0Upsize or Scale Out (Add instances)
CUR DeliveryS3 Bucket + Bucket Policy + CUR DefinitionEnable for 100% Granularity

Hierarchical Outline

  1. Usage Analysis Tools
    • AWS Cost Explorer: Best for visual trends and 12-month forecasting.
    • AWS CUR: Best for deep-dives using Amazon Athena or QuickSight.
    • AWS Compute Optimizer: Uses Machine Learning to suggest specific right-sizing moves.
  2. The Right-sizing Process
    • Monitor: Collect CloudWatch metrics (CPU, RAM, Disk, Network).
    • Analyze: Identify patterns (Steady state vs. Bursting).
    • Optimize: Change instance families (e.g., T-series for bursty, M-series for general).
  3. Governance and Metadata
    • Tagging: Mandatory for mapping usage to business units.
    • Billing Alarms: Proactive notification of unexpected usage spikes.

Visual Anchors

The Optimization Lifecycle

Loading Diagram...

Cost-Performance Trade-off

Compiling TikZ diagram…
Running TeX engine…
This may take a few seconds

Definition-Example Pairs

  • Term: Horizontal Scaling

  • Definition: Adding or removing similar resources (e.g., more EC2 instances) to a pool.

  • Example: A web server group that adds 2 more instances during a Black Friday sale to handle high traffic and terminates them afterward.

  • Term: Vertical Scaling (Rightsizing)

  • Definition: Increasing or decreasing the power (CPU/RAM) of a single resource.

  • Example: Upgrading an RDS instance from db.t3.medium to db.r5.large because the database cache hit ratio is too low.

Worked Examples

Analyzing a CUR for EC2 Instances

Scenario: You notice a spike in your monthly bill. You query the CUR in Amazon Athena to find the culprit.

  1. Step 1: Filter CUR data by line_item_usage_type. You see BoxUsage:m5.4xlarge accounts for 60% of spend.
  2. Step 2: Correlate with CloudWatch. You find the CPUUtilization for these instances averages 4% over 30 days.
  3. Step 3: Remediation. You determine the workload is memory-bound but only needs 16GB. You switch from m5.4xlarge (64GB RAM/16 vCPU) to r5.large (16GB RAM/2 vCPU).
  4. Result: Performance remains stable while costs drop by approximately 80%.

Checkpoint Questions

  1. What is the primary difference in data availability between Cost Explorer and CUR?
  2. Why is "Lift and Shift" often the cause of over-provisioning in the cloud?
  3. Which AWS service provides ML-based recommendations for right-sizing EC2 and Lambda?
  4. True or False: To set up CUR, you must first create an S3 bucket and apply a specific bucket policy.

Muddy Points & Cross-Refs

[!TIP] Common Confusion: Students often confuse Cost Explorer with Trusted Advisor.

  • Cost Explorer is for analysis and reporting.
  • Trusted Advisor provides specific checks (e.g., "You have 5 idle load balancers").

Cross-References:

  • For automation of these tasks, see AWS Auto Scaling and AWS Instance Scheduler.
  • For purchasing models, review Savings Plans vs. Reserved Instances.

Comparison Tables

FeatureAWS Cost ExplorerAWS Cost & Usage Report (CUR)
Primary UseVisual trends, quick insightsGranular data mining, deep analytics
Data FormatDashboard/GraphsCSV / Parquet (in S3)
Retention12 months (standard)Continuous (as long as S3 exists)
GranularityDaily/Monthly (Hourly optional)Hourly / Resource-level
SetupEnabled by defaultRequires S3 and IAM configuration
Study Guide1,145 words

AWS Application Integration: Architecting for Decoupling and Resiliency

Application integration (for example, Amazon SNS, Amazon SQS, AWS Step Functions)

Read full article

AWS Application Integration: Architecting for Decoupling and Resiliency

This guide explores the essential AWS integration services used to build modern, scalable, and resilient cloud-native applications. Understanding the nuances between messaging, event-driven architectures, and workflow orchestration is a core requirement for the AWS Certified Solutions Architect - Professional (SAP-C02) exam.

Learning Objectives

After studying this guide, you should be able to:

  • Distinguish between synchronous and asynchronous communication patterns.
  • Select the appropriate AWS integration service (SQS, SNS, EventBridge, Step Functions) based on specific architectural requirements.
  • Design decoupled architectures using the "Fan-out" and "Messaging" patterns.
  • Evaluate opportunities for modernization using serverless integration tools.

Key Terms & Glossary

  • Decoupling: The practice of ensuring that application components can operate independently. If one component fails or slows down, the others remain functional.
  • Fan-out: A pattern where a single message sent to a topic is pushed to multiple endpoints (e.g., SQS queues, Lambda functions, or HTTP endpoints) simultaneously.
  • Idempotency: The property of certain operations where they can be applied multiple times without changing the result beyond the initial application. Crucial for retry logic in distributed systems.
  • Orchestration: A centralized approach to managing complex workflows where a "coordinator" (like Step Functions) manages the state and sequence of tasks.
  • Choreography: A decentralized approach where components communicate via events (like EventBridge) without a central coordinator.
  • Dead Letter Queue (DLQ): A specialized SQS queue used to store messages that cannot be processed successfully after a certain number of retries.

The "Big Idea"

In traditional monolithic architectures, components are tightly coupled; a failure in the "Order Service" might bring down the "Shipping Service." The Big Idea of application integration is to move from a synchronous "chain" to an asynchronous "web." By using AWS integration services as buffers and translators, you build systems that are highly resilient, elastically scalable, and easier to modernize because each piece can evolve independently.

Formula / Concept Box

FeatureAmazon SQSAmazon SNSAmazon EventBridgeAWS Step Functions
Primary ModelPull (Polling)Push (Pub/Sub)Push (Event Bus)State Machine
PersistenceDurable (up to 14 days)Ephemeral (Immediate)Ephemeral (Retry up to 24h)Durable State
OrderingFIFO availableNo (except with SQS FIFO)NoStrict Sequencing
Target Count1 consumer per messageMany (Fan-out)Many (Rules/Filtering)1 Workflow Path

Hierarchical Outline

  1. Asynchronous Messaging Patterns
    • Point-to-Point (Queueing): Buffering requests between producers and consumers (Amazon SQS).
    • Publish/Subscribe (Broadcasting): Delivering one message to multiple interested parties (Amazon SNS).
  2. Event-Driven Architectures
    • Event Buses: Routing events based on content/rules (Amazon EventBridge).
    • Schema Registry: Managing event structures to ensure compatibility.
  3. Workflow Management
    • Standard Workflows: For long-running, auditable processes (AWS Step Functions).
    • Express Workflows: High-volume, short-duration executions (AWS Step Functions).
  4. API & Specialized Integration
    • GraphQL Integration: Unified data access (AWS AppSync).
    • Legacy Protocols: Managed message brokers (Amazon MQ for ActiveMQ/RabbitMQ).

Visual Anchors

The Fan-out Pattern

This diagram illustrates how SNS acts as a dispatcher to multiple downstream SQS queues for parallel processing.

Loading Diagram...

SQS Queue Structure

The following TikZ diagram visualizes the buffer mechanism of an SQS queue where messages wait to be polled by consumers.

Compiling TikZ diagram…
Running TeX engine…
This may take a few seconds

Definition-Example Pairs

  • Standard SQS Queue

    • Definition: A queue offering near-unlimited throughput and at-least-once delivery, but no guarantee of strict ordering.
    • Example: A photo-sharing app where users upload high-res images; SQS holds the image metadata while a background worker resizes them at its own pace.
  • Step Functions (Standard)

    • Definition: A visual workflow service that uses state machines to coordinate multiple AWS services into serverless workflows.
    • Example: An e-commerce checkout process that must check inventory, charge a credit card, and update a shipping database in a specific sequence with error handling.
  • Amazon EventBridge

    • Definition: A serverless event bus that makes it easy to connect applications using data from your own apps, integrated SaaS apps, and AWS services.
    • Example: When an S3 bucket receives a new file, EventBridge triggers a specific Lambda function only if the file name ends in ".pdf".

Worked Examples

Scenario: Modernizing a Monolithic Order System

The Problem: A company has a monolithic "OrderManager" that processes payments, sends emails, and updates inventory in a single synchronous function. If the payment gateway is slow, the whole application hangs.

The Solution:

  1. Step 1: Use Amazon API Gateway to receive the order request.
  2. Step 2: The API triggers a Lambda that puts the order data into an Amazon SNS Topic.
  3. Step 3 (The Fan-out): Three SQS queues subscribe to the SNS topic:
    • PaymentQueue: Processed by a Payment Worker.
    • InventoryQueue: Processed by an Inventory Worker.
    • EmailQueue: Processed by a Notification Worker.
  4. Step 4 (Resiliency): If the PaymentQueue worker fails, the message stays in the queue (or goes to a DLQ) without affecting the EmailQueue or InventoryQueue.

Checkpoint Questions

  1. Which service should you choose if you need to ensure that messages are processed exactly once and in the strict order they were received?
    • Answer: Amazon SQS FIFO (First-In-First-Out) queue.
  2. What is the primary difference between SNS and EventBridge for message routing?
    • Answer: SNS is better for high-throughput fan-out to thousands of subscribers; EventBridge is better for complex rule-based filtering (content-based routing) and integrating with 3rd-party SaaS applications.
  3. True or False: SQS consumers must poll the queue to retrieve messages.
    • Answer: True. SQS is a pull-based service, unlike SNS which is push-based.

Muddy Points & Cross-Refs

  • SNS vs. SQS: A common point of confusion. Remember: SQS is a container (holds messages until you pull them); SNS is a post office (delivers copies immediately to anyone who asked).
  • Step Functions vs. Lambda: Use Lambda for short, discrete tasks; use Step Functions to stitch those tasks together into a "stateful" journey.
  • Further Study: Check the "AWS Well-Architected Framework: Reliability Pillar" for more on loose coupling.

Comparison Tables

Orchestration (Step Functions) vs. Choreography (EventBridge)

FeatureOrchestration (Step Functions)Choreography (EventBridge)
ControlCentralized (The "Brain")Decentralized (The "Network")
VisibilityVisualizes flow state and historyEvents flow without a single visual path
CouplingSlightly tighter (The coordinator knows all)Very loose (Services just listen for events)
Best ForComplex multi-step business logicDecoupling microservices and SaaS apps

[!TIP] For the Professional exam, look for keywords like "ordering," "high throughput," or "retry logic" to decide between SQS Standard and FIFO. If the requirement mentions "third-party SaaS" or "event schemas," lean toward EventBridge.

Study Guide1,050 words

Mastering AWS Application Migration Tools: SAP-C02 Study Guide

Application migration tools (for example, AWS Application Discovery Service, AWS Application Migration Service)

Read full article

Mastering AWS Application Migration Tools

This study guide covers the essential tools and strategies for migrating application workloads to AWS, specifically focusing on the AWS Application Discovery Service (ADS) and the AWS Application Migration Service (MGN) as required for the SAP-C02 exam.


Learning Objectives

After studying this module, you should be able to:

  • Differentiate between agent-based and agentless discovery methods using AWS Application Discovery Service.
  • Evaluate workloads according to the 7Rs migration strategy (Re-host, Re-platform, Refactor, etc.).
  • Explain the architectural flow of data in AWS Application Migration Service (MGN).
  • Apply security best practices, including encryption at rest and in transit, to migration workflows.
  • Select the appropriate tool (MGN vs. VMC on AWS) based on source infrastructure and business requirements.

Key Terms & Glossary

  • AWS MGN (Application Migration Service): The primary service recommended for lift-and-shift (re-host) migrations to AWS.
  • Agent-based Discovery: A method of collecting deep performance and dependency data by installing software directly on source servers.
  • Block-level Replication: A data transfer method that copies disk blocks rather than individual files, ensuring byte-for-byte consistency.
  • Staging Area VPC: A temporary environment in AWS where replication servers receive and write data to EBS volumes before the final cutover.
  • 7Rs: A framework for categorizing migration strategies: Re-host, Re-platform, Refactor, Re-purchase, Retire, Retain, and Relocate.

The "Big Idea"

Migration is not a single event but a lifecycle. It begins with Discovery (understanding what you have), moves to Assessment (deciding the strategy via the 7Rs), and concludes with Execution (using tools like MGN to move bits). The "Professional" level architect must ensure this lifecycle is secure, cost-effective, and causes minimal downtime by selecting the right orchestration tool for the specific source environment.


Formula / Concept Box

Strategy (The 7Rs)Key CharacteristicTooling Example
Re-host"Lift and Shift" with no changesAWS MGN
RelocateMove hypervisor-to-hypervisorVMware Cloud on AWS
Re-platform"Lift, tinker, and shift" (e.g., move to RDS)AWS DMS / SCT
RefactorRe-architect for cloud-native (Lambda/S3)Manual Rewrite
Re-purchaseSwitch to a SaaS modelMarketplace
RetireDecommission the applicationN/A
RetainKeep on-premises for nowN/A

Hierarchical Outline

  1. Phase 1: Discovery & Assessment
    • AWS Application Discovery Service (ADS)
      • Agentless: Uses a connector on VMware; identifies VM inventory.
      • Agent-based: Installed on OS; identifies processes and network dependencies.
    • AWS Migration Hub
      • Centralized dashboard to track migration progress across different tools.
  2. Phase 2: Server Migration
    • AWS Application Migration Service (MGN)
      • Replaces SMS (Server Migration Service) and CloudEndure.
      • Continuous block-level replication.
    • VMware Cloud (VMC) on AWS
      • Specific for VMware-to-VMware "Relocate" strategy.
  3. Phase 3: Security & Governance
    • Encryption in Transit: Secured via TLS 1.2.
    • Encryption at Rest: Managed via AWS KMS on Amazon EBS volumes.

Visual Anchors

The Migration Workflow

Loading Diagram...

MGN Architecture Detail

Compiling TikZ diagram…
Running TeX engine…
This may take a few seconds

Definition-Example Pairs

  • Continuous Replication: The process of copying changes to the cloud in real-time as they happen at the source.
    • Example: An e-commerce database server on-premises constantly writes new orders; MGN captures these sub-second changes so the cloud version is always up-to-date for cutover.
  • Test Mode: A state in MGN where a source server is launched in AWS for validation without stopping the original server.
    • Example: Launching a production web server in a test VPC to ensure the database connection strings work in the new network environment before the actual migration weekend.

Worked Examples

Problem: Migrating a Legacy SQL Server

Scenario: A company has a 10TB SQL Server running on an old physical machine. They need to migrate it with less than 30 minutes of downtime.

Solution Steps:

  1. Assessment: Use AWS Application Discovery Service (ADS) Agents to verify process dependencies (e.g., which apps talk to this SQL server).
  2. Setup: Install the AWS MGN Replication Agent on the physical SQL server.
  3. Replication: MGN begins a "Baseline" sync of the 10TB. This may take days, but the source remains live.
  4. Continuous Sync: Once the baseline is done, MGN keeps the EBS volumes in the AWS Staging Area synced with new writes.
  5. Cutover: During a maintenance window, stop the source SQL service, allow the final bits to sync (seconds), and trigger the "Cutover" launch in MGN. This converts the server into an EC2 instance.

Checkpoint Questions

  1. Which service is the successor to AWS SMS and CloudEndure for re-hosting servers?
  2. What is the primary difference in data collection between the ADS Agentless and Agent-based discovery?
  3. Why is a "Staging Area VPC" used in AWS MGN instead of launching directly into production?
  4. Which migration strategy (from the 7Rs) applies specifically to moving VMs to VMware Cloud on AWS?

Muddy Points & Cross-Refs

  • MGN vs. DMS: Use MGN for full server migrations (OS + Apps + Data). Use DMS (Database Migration Service) if you are only moving the database and want to change the engine (e.g., SQL Server to Aurora).
  • Agentless vs. Agent-based: Remember that Agentless discovery is fast but only sees metadata (CPU, RAM, Disk). Agent-based discovery is deep (it sees what software is installed and which network ports are active).
  • Network Security: All MGN traffic from the agent to the replication instance uses Port 1500 for data and Port 443 for control. Ensure firewall rules are updated.

Comparison Tables

Discovery Options

FeatureAgentless (Connector)Agent-based
Platform SupportVMware vCenter onlyWindows & Linux (Physical/Virtual)
Data DepthInfrastructure MetadataProcess & Network Dependencies
InstallationSingle OVA applianceEvery individual server
Use CaseRapid initial inventoryDetailed dependency mapping

Server Migration Tools

ServiceStrategyUse Case
AWS MGNRe-hostDefault choice for most server migrations
VMC on AWSRelocateRapid move for VMware clusters with no change in hypervisor
AWS App2ContainerRe-platformConverting existing ASP.NET or Java apps into containers
Study Guide950 words

Performance Optimization: Caching, Buffering, and Replicas

Applying design patterns to meet performance objectives with caching, buffering, and replicas

Read full article

Performance Optimization: Caching, Buffering, and Replicas

This guide covers the essential design patterns for meeting performance objectives in high-scale AWS environments, focusing on reducing latency and managing resource contention.

Learning Objectives

  • Evaluate the differences between cache-aside and write-through caching patterns.
  • Design architectures that utilize read replicas to eliminate resource contention between read and write operations.
  • Implement buffering mechanisms to smooth out traffic spikes and prevent system overload.
  • Select appropriate AWS services (ElastiCache, DAX, RDS, SQS) based on specific performance requirements.

Key Terms & Glossary

  • TTL (Time to Live): The duration for which an item is stored in a cache before it is considered expired and deleted.
  • Cache Hit/Miss: A 'hit' occurs when the requested data is found in the cache; a 'miss' occurs when the data must be fetched from the primary data store.
  • Read Replica: A copy of a database instance that handles read-only queries, reducing the load on the primary (source) database.
  • Throttling: The process of limiting the number of requests a service can handle to maintain stability.
  • Asynchronous Replication: A data-syncing method where the primary database does not wait for the replica to acknowledge receipt of data before proceeding.

The "Big Idea"

Performance optimization is not just about raw speed; it is about resource management. In a high-traffic system, the database is often the primary bottleneck. Design patterns like caching, buffering, and replicas act as "pressure relief valves" that move data closer to the user, distribute the workload across multiple nodes, or decouple the timing of requests from the timing of processing.

Formula / Concept Box

ConceptMetric / RuleSignificance
Cache Hit RatioHit Ratio=Cache HitsCache Hits+Cache Misses\text{Hit Ratio} = \frac{\text{Cache Hits}}{\text{Cache Hits} + \text{Cache Misses}}Higher ratios indicate a more effective caching strategy.
Sub-millisecond Latency<1ms< 1\text{ms}Required for real-time applications; necessitates in-memory solutions.
Read Contention RuleWrites    Reads\text{Writes} \uparrow \implies \text{Reads} \downarrowHigh write volume locks tables/rows, slowing down reads.

Hierarchical Outline

  1. Caching Strategies
    • In-Memory Storage: Using RAM for sub-millisecond access (e.g., Redis, Memcached).
    • Cache-Aside (Lazy Loading): Application manages the cache. Data is only loaded on a miss.
    • Write-Through: Data is written to the cache and the database simultaneously.
  2. Database Scaling & Replicas
    • Vertical Scaling: Increasing CPU/RAM (simple but limited).
    • Read Replicas: Offloading read traffic (RDS, Aurora).
    • DAX (DynamoDB Accelerator): Integrated cache for DynamoDB.
  3. Buffering & Decoupling
    • SQS (Simple Queue Service): Buffer for spikes in write traffic.
    • Kinesis/Firehose: Buffering streaming data before ingestion.

Visual Anchors

Caching Logic Flow

Loading Diagram...

Multi-Layer Performance Architecture

Compiling TikZ diagram…
Running TeX engine…
This may take a few seconds

Definition-Example Pairs

  • Pattern: Cache-Aside
    • Definition: The application checks the cache first. If data is missing, it fetches it from the DB and writes it to the cache for future use.
    • Example: A news website caching the top story of the day only after the first visitor requests it.
  • Pattern: Buffering
    • Definition: Using a message queue to store incoming requests so the downstream system can process them at its own pace.
    • Example: An e-commerce system using SQS to hold order requests during a Black Friday sale to prevent the database from crashing.
  • Pattern: Read Replicas
    • Definition: Creating read-only copies of a database to serve analytics or reporting queries.
    • Example: A mobile app where users update their profiles (Primary) but millions of others view those profiles (Replicas).

Worked Examples

Scenario: The Overloaded Catalog

Problem: An e-commerce catalog page is loading slowly. CloudWatch shows 90% CPU usage on the RDS instance, specifically during read-heavy hours. Writes are steady but low.

Step-by-Step Solution:

  1. Identify the Pattern: The bottleneck is read-contention.
  2. Option A (Read Replica): Create an RDS Read Replica. Point the web application's "GET /catalog" endpoint to the replica endpoint. This offloads the high-CPU reads from the primary instance.
  3. Option B (Caching): Deploy Amazon ElastiCache (Redis). Implement the Cache-Aside pattern for the catalog items.
  4. Result: The database CPU drops to 20%, and the catalog page load time drops from 2 seconds to 50ms (for cached hits).

Checkpoint Questions

  1. What is the main disadvantage of a Write-Through cache compared to Cache-Aside?
  2. In Amazon RDS, does creating a Read Replica in a Single-AZ environment cause downtime?
  3. Which AWS service provides sub-millisecond response times for DynamoDB?
  4. When should you use a buffer (SQS) instead of a cache (ElastiCache)?

[!TIP] Answers: 1. Increased write latency (must write to two places). 2. It may cause a short I/O suspension. 3. DAX. 4. Use SQS when you need to smooth out spikes in writes or decouple processing; use ElastiCache to speed up reads.

Muddy Points & Cross-Refs

  • Caching vs. Replicas: Learners often confuse these. Remember: Caching is for speed (in-memory); Replicas are for volume (distribution of database load).
  • Asynchronous Lag: Read replicas are asynchronous. This means a user might write data to the primary and immediately try to read it from the replica, but the data hasn't arrived yet (Eventual Consistency).
  • See Also: Well-Architected Framework - Performance Efficiency Pillar.

Comparison Tables

FeatureRead ReplicasCaching (ElastiCache)Buffering (SQS)
Primary GoalOffload ReadsReduce LatencyDecouple/Smooth Spikes
Data TypeStructured (Relational)Key-Value / ObjectsMessages/Tasks
ConsistencyEventualDepends on PatternN/A (Processing order)
Code ChangeLow (New endpoint)Medium (Logic for hits/misses)High (Async processing)
Study Guide925 words

AWS Migration Security: Best Practices & Implementation Guide

Applying the appropriate security methods to migration tools

Read full article

AWS Migration Security: Best Practices & Implementation Guide

This guide explores the critical security methods required when utilizing AWS migration tools such as AWS Application Migration Service (MGN), AWS Database Migration Service (DMS), and AWS Storage Gateway. Securing the migration path is essential to ensure data integrity and confidentiality during the transition from on-premises to the cloud.

Learning Objectives

By the end of this guide, you should be able to:

  • Implement network isolation for migration services using custom-managed VPCs.
  • Configure private connectivity via AWS PrivateLink and Direct Connect for secure data transfer.
  • Apply the principle of Least Privilege using IAM roles and attribute-based access control (ABAC).
  • Enforce multi-factor authentication (MFA) and tagging strategies to govern migration tool access.

Key Terms & Glossary

  • AWS PrivateLink: A technology that provides private connectivity between VPCs, AWS services, and on-premises applications on the Amazon network.
  • Least Privilege: The security discipline of granting only the minimum permissions necessary to perform a task.
  • ABAC (Attribute-Based Access Control): An authorization strategy that defines permissions based on attributes (tags) attached to users and AWS resources.
  • Interface VPC Endpoint: An elastic network interface with a private IP address from the IP address range of your subnet that serves as an entry point for traffic destined to a supported service.
  • AWS MGN (Application Migration Service): The primary service used to lift-and-shift applications to AWS with minimal changes.

The "Big Idea"

Security in migration is not just about the final destination; it is about protecting the transit lane. If migration tools are deployed in default VPCs or with overly permissive IAM roles, the data being moved is at risk before it even arrives. A secure migration treats the migration tool itself as a high-security workload, isolating it from the public internet and strictly controlling who (and what) can interact with it.

Formula / Concept Box

PrincipleImplementation MethodGoal
Network IsolationCustom VPC + PrivateLinkPrevent exposure to the public internet.
Identity GovernanceIAM Roles + MFAEnsure only authenticated, authorized actors can trigger migrations.
Resource ControlTagging + ABACScale security by allowing access based on project/environment tags.
Secure TransportDirect Connect / VPNProvide a dedicated, encrypted path for massive data volumes.

Hierarchical Outline

  1. Network Security for Migration
    • VPC Placement: Avoid default VPCs; use customer-managed VPCs with specific NACLs.
    • Private Connectivity:
      • Use AWS PrivateLink for interface endpoints.
      • Leverage Direct Connect for consistent, private bandwidth.
  2. Identity and Access Management (IAM)
    • Least Privilege: Avoid * permissions; use service-specific actions.
    • Identity-Based Policies: Use conditions to restrict access based on tags.
    • MFA Enforcement: Required for high-privilege migration actions (e.g., deleting replication instances).
  3. Data Protection & Tool Configuration
    • AWS DMS: Launch replication instances within private subnets.
    • AWS MGN: Use system transformation coupled with block-level data duplication.

Visual Anchors

Secure Migration Architecture

Loading Diagram...

IAM Policy Evaluation Logic

Compiling TikZ diagram…
Running TeX engine…
This may take a few seconds

Definition-Example Pairs

  • Interface VPC Endpoint: A private entry point for AWS services without requiring an Internet Gateway.
    • Example: Creating an interface endpoint for AWS DMS so that your on-premises database can send data to the replication instance without the traffic ever touching the public internet.
  • Attribute-Based Access Control (ABAC): Using tags to grant permissions.
    • Example: An IAM policy that allows a user to start an AWS MGN migration only if the target server has the tag Environment: Development.

Worked Examples

Scenario: Securing AWS Storage Gateway with Tag-Based Policies

You need to ensure that only authorized administrators can describe file shares for resources tagged for migration.

Step 1: Tag the Resource Apply a tag to your Storage Gateway resource: AllowAccess: yes.

Step 2: Create the IAM Policy

json
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "storagegateway:ListTagsForResource", "storagegateway:ListFileShares", "storagegateway:DescribeNFSFileShares" ], "Resource": "*", "Condition": { "StringEquals": { "aws:ResourceTag/AllowAccess": "yes" } } } ] }

Step 3: Verification If the user attempts to list shares on a gateway tagged AllowAccess: no, the request will be denied implicitly despite having the storagegateway:ListFileShares action allowed globally in the policy block, because the Condition is not met.

Checkpoint Questions

  1. Why should you avoid using the "Default VPC" for AWS DMS replication instances?
  2. What is the benefit of using an Interface VPC Endpoint for AWS MGN compared to an Internet Gateway?
  3. How does MFA enhance the security of the migration process?

Muddy Points & Cross-Refs

  • VPC Peering vs. PrivateLink: Students often confuse these. Remember: VPC Peering connects two entire networks; PrivateLink exposes a specific service (like a migration tool) privately into your VPC.
  • Least Privilege Overkill: It is tempting to use AdministratorAccess during a migration because it is a "temporary" project. Do not do this. Use Condition keys to limit the scope to specific migration regions or tags.

Comparison Tables

FeaturePublic InternetAWS Client VPNAWS Direct Connect
Security LevelLow (Encrypted but exposed)Medium (Private tunnel)High (Physical isolation)
PerformanceUnpredictableVariableConsistent / Dedicated
CostLowModerateHigh
Best Use CaseSmall, non-sensitive dataRemote admin accessLarge-scale enterprise migration
Study Guide1,050 words

Architecting for Resilience: Automated Backups and Business Continuity

Architecting a backup solution that is automated, is cost-effective, and supports business continuity across multiple Availability Zones or AWS Regions

Read full article

Architecting for Resilience: Automated Backups and Business Continuity

This study guide focuses on designing automated, cost-effective backup solutions that ensure business continuity (BC) across multiple Availability Zones (AZs) and AWS Regions, aligned with the AWS Certified Solutions Architect - Professional (SAP-C02) domain.

Learning Objectives

By the end of this module, you should be able to:

  • Define and apply Recovery Time Objective (RTO) and Recovery Point Objective (RPO) to architectural decisions.
  • Compare and contrast the four primary Disaster Recovery (DR) strategies: Backup & Restore, Pilot Light, Warm Standby, and Multi-site Active/Active.
  • Design automated backup workflows using AWS Backup and Amazon S3.
  • Implement Infrastructure as Code (IaC) using AWS CloudFormation to ensure consistent multi-region environment replication.
  • Evaluate when to use Multi-AZ versus Multi-Region architectures based on workload requirements.

Key Terms & Glossary

  • RTO (Recovery Time Objective): The maximum acceptable delay between the interruption of service and restoration of service. (Example: An RTO of 2 hours means the system must be back up within 2 hours of a failure.)
  • RPO (Recovery Point Objective): The maximum acceptable amount of data loss measured in time. (Example: An RPO of 15 minutes means you can afford to lose at most 15 minutes of data updates.)
  • Cross-Region Replication (CRR): An S3 feature that automatically, asynchronously copies objects across buckets in different AWS Regions.
  • Infrastructure as Code (IaC): Managing and provisioning infrastructure through machine-readable definition files (e.g., CloudFormation) rather than manual hardware configuration.
  • Zonal vs. Regional Services: Zonal services (like EC2) are tied to a specific AZ; Regional services (like DynamoDB or S3) are managed by AWS across multiple AZs automatically.

The "Big Idea"

Business Continuity is not merely about having a copy of your data; it is about orchestration and automation. In the cloud, reliability is achieved by assuming failure will happen. By using Infrastructure as Code (IaC) to recreate the environment and automated data replication to keep it current, organizations can transition from expensive "idle" hardware to cost-effective, "on-demand" recovery environments.

Formula / Concept Box

Metric/ConceptDefinitionArchitectural Impact
RTO"How long to fix it?"Determines the level of automation and environment readiness (e.g., Pilot Light vs. Warm Standby).
RPO"How much data loss?"Determines the frequency and method of data replication (e.g., Snapshot frequency vs. Synchronous replication).
S3 Durability99.999999999% (11 9's)Makes S3 the definitive target for backup storage and CRR.

Hierarchical Outline

  1. Foundational Backup Strategy
    • Automation First: Use AWS Backup for centralized policy management across RDS, EBS, and DynamoDB.
    • S3 as the Backbone: Leverage S3 for high durability and Lifecycle Policies for cost-optimization (transitioning to Glacier).
    • Data Security: Implement KMS (Key Management Service) for server-side or client-side encryption of backups.
  2. Disaster Recovery (DR) Patterns
    • Backup & Restore: Lower cost, higher RTO/RPO. Manual or scripted restoration.
    • Pilot Light: Minimal version of environment always running (Databases/Live data), while App servers are scaled on-demand via IaC.
    • Warm Standby: Scaled-down but functional version of the full environment.
    • Multi-site Active/Active: Zero downtime; traffic split between regions via Route 53 or Global Accelerator.
  3. Cross-Region Continuity
    • Identity & Access: Use IAM Roles and cross-account access for isolated recovery environments.
    • Global Networking: Use Route 53 routing policies (Latency, Failover, Geoproximity) to manage traffic during regional disruptions.

Visual Anchors

DR Strategy Decision Flow

Loading Diagram...

Multi-AZ vs. Multi-Region Scope

Compiling TikZ diagram…
Running TeX engine…
This may take a few seconds

Definition-Example Pairs

  • Definition: Pilot Light Strategy — Keeping a minimal version of a workload functional in a second region, primarily the data layer.
    • Example: An application has its database replicated to a second region, but the EC2 instances are only provisioned via an Auto Scaling Group triggered by a Route 53 Health Check failure.
  • Definition: Drift Detection — A CloudFormation feature that identifies when resources have been modified outside of the stack template.
    • Example: A developer manually changes a security group rule in the DR region; CloudFormation Drift Detection flags this so the IaC template can re-enforce the standard.

Worked Examples

Scenario: Optimizing Cost for a 4-Hour RTO

Problem: A company currently uses a Multi-site Active/Active setup for a non-critical internal tool. The monthly cost is $5,000. The business determines that a 4-hour RTO is acceptable. How should the architect redesign this for cost-effectiveness?

Step-by-Step Solution:

  1. Analyze RTO: A 4-hour RTO does not require resources to be running in the second region (Warm/Multi-site).
  2. Select Pattern: Transition to Pilot Light or Backup & Restore.
  3. Implement Automation:
    • Store all environment definitions in AWS CloudFormation.
    • Use AWS Backup to create daily snapshots and copy them to the DR region.
  4. Cost Result: By terminating the idle EC2 and RDS instances in the DR region and relying on S3 storage + on-demand restoration, the monthly cost drops to ~$200 for storage.

Checkpoint Questions

  1. What is the primary difference between a Pilot Light and a Warm Standby strategy?
  2. Which AWS service would you use to centrally manage backup policies across multiple AWS accounts in an Organization?
  3. True or False: Using Infrastructure as Code (IaC) is only beneficial for initial deployment, not for Disaster Recovery.
  4. Why is S3 considered the "backup destination of choice" for AWS services?

Muddy Points & Cross-Refs

  • Fate Sharing: A common confusion is why Multi-AZ isn't enough. Remember: While Multi-AZ protects against hardware/data center failure, Multi-Region protects against regional service outages or natural disasters.
  • Cross-Region Data Transfer Costs: Replication is not free. Always account for data transfer out (DTO) costs when architecting multi-region replication.
  • Deep Dive Reference: For more on automated recovery, see the AWS Well-Architected Framework: Reliability Pillar.

Comparison Tables

StrategyRTO / RPORelative CostComplexity
Backup & RestoreHours / 24h+LowSimple
Pilot LightMinutes / Real-time dataMedium-LowModerate
Warm StandbySeconds / Real-time dataMedium-HighHigh
Multi-siteNear ZeroVery HighVery High

[!IMPORTANT] Automation (IaC) is the bridge that makes low-cost strategies (Backup & Restore) viable by ensuring that environment restoration is repeatable and fast.

Hands-On Lab820 words

Lab: Building a Scalable Hub-and-Spoke Network with AWS Transit Gateway

Architect network connectivity strategies

Read full article

Lab: Building a Scalable Hub-and-Spoke Network with AWS Transit Gateway

This hands-on lab guides you through architecting a scalable network using AWS Transit Gateway (TGW). You will connect two separate VPCs (Spoke A and Spoke B) through a central hub to enable transitive routing, a core requirement for the AWS Certified Solutions Architect - Professional exam.

[!WARNING] Remember to run the teardown commands at the end of this lab to avoid ongoing charges for Transit Gateway attachments.

Prerequisites

  • An active AWS Account.
  • AWS CLI installed and configured with Administrator access.
  • Basic knowledge of VPC CIDR blocks and Route Tables.
  • Region: We will use us-east-1, but you can substitute with your preferred region.

Learning Objectives

  1. Provision a hub-and-spoke network topology using AWS Transit Gateway.
  2. Configure VPC route tables to enable communication across the Transit Gateway.
  3. Verify transitive connectivity between isolated workloads.
  4. Understand the performance benefits of Transit Gateway over complex VPC peering meshes.

Architecture Overview

We will build a hub-and-spoke model where the Transit Gateway acts as the central router connecting two isolated VPCs.

Loading Diagram...

Step-by-Step Instructions

Step 1: Create Spoke VPCs

First, we need two VPCs with non-overlapping IP ranges.

bash
# Create Spoke VPC A aws ec2 create-vpc --cidr-block 10.1.0.0/16 --tag-specifications 'ResourceType=vpc,Tags=[{Key=Name,Value=brainybee-spoke-a}]' # Create Spoke VPC B aws ec2 create-vpc --cidr-block 10.2.0.0/16 --tag-specifications 'ResourceType=vpc,Tags=[{Key=Name,Value=brainybee-spoke-b}]'
Console alternative

Navigate to

VPC > Your VPCs > Create VPC

. Use Name:

brainybee-spoke-a

and CIDR:

10.1.0.0/16

. Repeat for Spoke B with

10.2.0.0/16

.

Step 2: Provision the Transit Gateway

The Transit Gateway will serve as our regional network hub.

bash
aws ec2 create-transit-gateway --description "Hub for Spoke A and B" --tag-specifications 'ResourceType=transit-gateway,Tags=[{Key=Name,Value=brainybee-tgw}]'

[!TIP] Note the TransitGatewayId from the output; you will need it for the next steps.

Step 3: Attach VPCs to the Transit Gateway

We must "plug" our VPCs into the hub using TGW Attachments.

bash
# Attach Spoke A aws ec2 create-transit-gateway-vpc-attachment \ --transit-gateway-id <TGW_ID> \ --vpc-id <VPC_A_ID> \ --subnet-ids <SUBNET_A_ID> # Attach Spoke B aws ec2 create-transit-gateway-vpc-attachment \ --transit-gateway-id <TGW_ID> \ --vpc-id <VPC_B_ID> \ --subnet-ids <SUBNET_B_ID>

Step 4: Configure VPC Routing

Even with an attachment, instances don't know where to send traffic. We must update the VPC Route Tables to point the CIDR of the other VPC to the Transit Gateway.

bash
# In VPC A's Route Table: Route to VPC B goes to TGW aws ec2 create-route --route-table-id <RT_A_ID> --destination-cidr-block 10.2.0.0/16 --gateway-id <TGW_ID> # In VPC B's Route Table: Route to VPC A goes to TGW aws ec2 create-route --route-table-id <RT_B_ID> --destination-cidr-block 10.1.0.0/16 --gateway-id <TGW_ID>

Checkpoints

  1. Verify TGW State: Run aws ec2 describe-transit-gateways. The state should be available.
  2. Verify Attachments: Run aws ec2 describe-transit-gateway-vpc-attachments. You should see two attachments in the available state.
  3. Ping Test: If you launch EC2 instances in both VPCs (with appropriate Security Groups allowing ICMP), a ping from 10.1.x.x to 10.2.x.x should succeed.

Visualizing the Route Logic

Below is a TikZ diagram representing the packet flow decision for an instance in VPC A trying to reach VPC B.

Compiling TikZ diagram…
Running TeX engine…
This may take a few seconds

Troubleshooting

ProblemPotential CauseFix
Ping TimeoutSecurity Group / NACLEnsure SG allows inbound ICMP from the other VPC's CIDR range.
Attachment "Pending"AWS internal provisioningWait 2-3 minutes; TGW attachments take longer than VPC creation.
Route "Blackhole"Deleted TGWEnsure the TGW ID in the route table still exists and is attached.

Stretch Challenge

Scenario: You need to provide internet access to both spokes through a single centralized Inspection VPC.

  1. Create a third VPC called Inspection-VPC with an Internet Gateway.
  2. Modify Spoke route tables to point 0.0.0.0/0 to the TGW.
  3. Configure TGW route table to route default traffic to the Inspection-VPC attachment.

Cost Estimate

  • Transit Gateway (us-east-1): $0.05 per hour.
  • TGW VPC Attachment: $0.05 per hour per attachment (Total $0.10/hr for this lab).
  • Data Processing: $0.02 per GB processed by the TGW.
  • Estimated Total for 1 Hour: ~$0.15 (Free-tier does not cover Transit Gateway).

Concept Review

FeatureVPC PeeringTransit Gateway
TopologyPoint-to-Point (Mesh)Hub-and-Spoke
Transitive RoutingNoYes
ScalabilityComplex at scale (N*(N-1)/2)Highly Scalable (up to 5000 VPCs)
ComplexityHigh (many peerings)Low (central management)

Clean-Up / Teardown

To avoid ongoing costs, delete resources in this specific order:

  1. Delete EC2 Instances (if any were created for testing).
  2. Delete VPC Attachments:
    bash
    aws ec2 delete-transit-gateway-vpc-attachment --transit-gateway-attachment-id <ATTACH_ID_A> aws ec2 delete-transit-gateway-vpc-attachment --transit-gateway-attachment-id <ATTACH_ID_B>
  3. Delete Transit Gateway:
    bash
    aws ec2 delete-transit-gateway --transit-gateway-id <TGW_ID>
  4. Delete VPCs:
    bash
    aws ec2 delete-vpc --vpc-id <VPC_A_ID> aws ec2 delete-vpc --vpc-id <VPC_B_ID>
Study Guide980 words

Mastering AWS Network Connectivity Strategies (SAP-C02)

Architect network connectivity strategies

Read full article

Mastering AWS Network Connectivity Strategies (SAP-C02)

Learning Objectives

After studying this guide, you should be able to:

  • Evaluate and select appropriate connectivity options for multiple VPCs (Peering vs. Transit Gateway).
  • Design resilient hybrid architectures using AWS Direct Connect (DX) and Site-to-Site VPN.
  • Calculate IPv4 subnet requirements while accounting for AWS-reserved addresses and future growth.
  • Implement high-availability patterns for DNS resolution and service integration using PrivateLink.
  • Optimize network performance using Equal Cost Multi-Path (ECMP) and Transit Gateway.

Key Terms & Glossary

  • Transit Gateway (TGW): A network transit hub that connects VPCs and on-premises networks through a central managed gateway.
  • Direct Connect (DX): A dedicated, private network connection from a corporate data center to AWS, bypassing the public internet.
  • AWS PrivateLink: Technology that provides private connectivity between VPCs, AWS services, and on-premises applications without exposing traffic to the public internet.
  • Route 53 Resolver: A regional service that enables recursive DNS queries between VPCs and on-premises networks in a hybrid cloud environment.
  • ECMP (Equal Cost Multi-Path): A routing strategy that allows for increased bandwidth by balancing traffic across multiple paths (e.g., multiple VPN tunnels).

The "Big Idea"

In a complex organizational environment, network connectivity is the "nervous system" of the architecture. It is not just about moving bits; it is about creating a future-proof, scalable, and resilient topology that balances performance requirements with cost and operational complexity. Choosing a hub-and-spoke model (Transit Gateway) over a mesh model (VPC Peering) is a "one-way door" decision that dictates how the organization scales for years to come.

Formula / Concept Box

ConceptRule / Constraint
Subnet ReservationsAWS reserves 5 IP addresses per subnet (x.x.x.0, .1, .2, .3, and .255).
VPN BandwidthEach Site-to-Site VPN tunnel is limited to 1.25 Gbps.
Scaling VPNTotal Bandwidth = $1.25 Gbps \times n$$ (where n$ is the number of tunnels using ECMP).
Direct Connect SpeedAvailable in 1 Gbps, 10 Gbps, or 100 Gbps (Hosted: 50 Mbps to 10 Gbps).

Hierarchical Outline

  • I. Inter-VPC Connectivity
    • VPC Peering: Point-to-point, non-transitive, no bottleneck, lowest cost.
    • Transit Gateway (TGW): Hub-and-spoke, supports transitive routing, simplifies management at scale.
  • II. Hybrid Connectivity
    • Site-to-Site VPN: Fast to deploy, encrypted over public internet, 1.25 Gbps limit per tunnel.
    • Direct Connect (DX): Consistent performance, high bandwidth, private (not encrypted by default).
    • Resiliency Patterns: DX as primary with VPN as cost-effective failover.
  • III. Service Integration & DNS
    • Interface Endpoints (PrivateLink): Private access to AWS services via ENIs in your subnets.
    • Route 53 Resolver: Inbound/Outbound endpoints for hybrid DNS resolution.
  • IV. IP Address Management
    • CIDR Planning: Ensure non-overlapping blocks across the organization.
    • Expansion: Leave room for Elastic Load Balancers (ELB), RDS, and container services.

Visual Anchors

Transit Gateway Hub-and-Spoke Topology

Loading Diagram...

Hybrid Connectivity Architecture

Compiling TikZ diagram…
Running TeX engine…
This may take a few seconds

Definition-Example Pairs

  • Transitive Routing: The ability for traffic to pass through a middle-hop to reach a destination.
    • Example: If VPC A is connected to a Transit Gateway, and VPC B is also connected, VPC A can reach VPC B through the TGW without a direct peer.
  • Interface VPC Endpoint: A private entry point to an AWS service using an ENI with a private IP address.
    • Example: Allowing an EC2 instance in a private subnet to upload files to an S3 bucket without using an Internet Gateway.
  • Anycast Routing: A network addressing and routing method in which incoming requests can be routed to a variety of different nodes.
    • Example: Route 53 uses Anycast to ensure DNS queries are answered from the closest edge location to the user.

Worked Examples

Example 1: Calculating Usable IPs

Scenario: You create a subnet with a CIDR of 10.0.1.0/28. How many EC2 instances can you launch?

  1. Step 1: Calculate total addresses: 2(3228)=24=162^{(32-28)} = 2^4 = 16.
  2. Step 2: Subtract AWS reserved addresses: $16 - 5 = 11$. Answer: 11 usable IP addresses.

Example 2: High Bandwidth VPN Failover

Scenario: A company needs 4 Gbps of bandwidth for failover from their Direct Connect. A single VPN tunnel only provides 1.25 Gbps. Solution:

  1. Deploy an AWS Transit Gateway.
  2. Establish 4 Site-to-Site VPN connections.
  3. Enable ECMP (Equal Cost Multi-Path) on the TGW.
  4. The traffic will be balanced across the 4 tunnels, providing a total aggregate bandwidth of 5 Gbps.

Checkpoint Questions

  1. What are the 5 specific IP addresses reserved by AWS in every subnet?
  2. Why is Transit Gateway preferred over VPC Peering for large-scale organizations with hundreds of VPCs?
  3. If a workload requires consistent 10 Gbps throughput and low latency, which connectivity option should be selected?
  4. How does AWS PrivateLink improve the security posture of an application?

Muddy Points & Cross-Refs

  • Transitive Routing (VPC Peering): A common mistake is assuming VPC Peering is transitive. If VPC A peers with B, and B peers with C, A cannot talk to C. You must use Transit Gateway for this.
  • DX vs. DX Gateway: Remember that Direct Connect is the physical/logical link, while the DX Gateway is the global resource that allows a single DX to connect to VPCs in any AWS region.
  • Public vs. Private VIFs: A Private Virtual Interface (VIF) is for VPC resources; a Public VIF is for public endpoints like S3 or DynamoDB over Direct Connect.

Comparison Tables

VPC Peering vs. Transit Gateway

FeatureVPC PeeringTransit Gateway
TopologyMesh (Point-to-Point)Hub-and-Spoke
ManagementDifficult at scaleCentralized/Simple
TransitiveNoYes
CostNo hourly charge (Data only)Hourly charge + Data processing
PerformanceNo aggregate bottleneck50 Gbps per VPC attachment

Security Groups vs. Network ACLs

FeatureSecurity GroupsNetwork ACLs
LevelInstance (ENI)Subnet
StatefulnessStateful (Return traffic allowed)Stateless (Must allow both ways)
RulesAllow rules onlyAllow and Deny rules
ProcessingAll rules evaluatedRules processed in order

More Study Notes (190)

AWS Rightsizing Strategy & Performance Optimization Guide

Assessing solutions and applying rightsizing based on requirements

945 words

AWS Asset Planning & Workload Migration Study Guide

Asset planning

880 words

Mastering the Principle of Least Privilege: Auditing and Implementation Guide

Auditing an environment for least privilege access

948 words

Automated Monitoring and Remediation Strategies in AWS

Automated monitoring and remediation strategies (for example, AWS Config rules)

985 words

AWS Auto Scaling Policies and Events: Master Study Guide

Auto scaling policies and events

945 words

Mastering AWS Cost Management & Monitoring Tools

AWS cost and usage monitoring tools (for example, AWS Cost Explorer, AWS Trusted Advisor, AWS Pricing Calculator)

890 words

Mastering AWS Cost Management: Monitoring, Analysis, and Optimization Tools

AWS cost and usage monitoring tools (for example, AWS Trusted Advisor, AWS Pricing Calculator, AWS Cost Explorer, AWS Budgets)

985 words

AWS Global Infrastructure: A Foundation for Resilient Architectures

AWS Global Infrastructure

920 words

AWS Global Infrastructure: Architecture for High Availability and Resilience

AWS Global Infrastructure

945 words

AWS Global Infrastructure: Design for Reliability and Performance

AWS Global Infrastructure

945 words

Mastering AWS Global Infrastructure for Resilience and Performance

AWS Global Infrastructure

985 words

AWS Identity and Access Management (IAM) & Identity Center Study Guide

AWS Identity and Access Management (IAM) and AWS IAM Identity Center

1,050 words

AWS Managed Security Services: Shield, WAF, GuardDuty, and Security Hub

AWS managed security services (for example, AWS Shield, AWS WAF, Amazon GuardDuty, AWS Security Hub)

925 words

AWS Managed Service Offerings: Modernization & Efficiency

AWS managed service offerings

1,045 words

AWS Global Networking & Route 53: SAP-C02 Study Guide

AWS networking concepts (for example, Amazon Route 53, routing methods)

985 words

Advanced AWS Networking and Hybrid Connectivity: SAP-C02 Study Guide

AWS networking concepts (for example, Amazon Virtual Private Cloud [Amazon VPC], AWS Direct Connect, AWS VPN, transitive routing, AWS container services)

1,142 words

AWS Networking & DNS: Architecting for Organizational Complexity

AWS networking services and DNS (for example, AWS Direct Connect, AWS Site-to-Site VPN, Amazon Route 53)

925 words

Study Guide: AWS Organizations and AWS Control Tower

AWS Organizations and AWS Control Tower

1,150 words

AWS Purchasing Options: Cost Optimization Strategy Guide

AWS purchasing options (for example, Reserved Instances, Savings Plans, Spot Instances)

945 words

AWS Resource Sharing Across Environments: Study Guide

AWS resource sharing across environments

945 words

AWS Rightsizing Visibility: AWS Compute Optimizer and S3 Storage Lens

AWS rightsizing visibility tools (for example, AWS Compute Optimizer, Amazon Simple Storage Service [Amazon S3] Storage Lens)

1,350 words

AWS Security, Identity, and Compliance: Tools & Governance

AWS security, identity, and compliance tools (for example, AWS CloudTrail, AWS Identity and Access Management Access Analyzer, AWS Security Hub, Amazon Inspector)

1,050 words

Mastering AWS Service Endpoints: A Comprehensive Study Guide

AWS service endpoints

920 words

AWS Storage Services and Replication Strategies: SAP-C02 Study Guide

AWS storage services and replication strategies (for example Amazon S3, Amazon RDS, Amazon ElastiCache)

1,142 words

AWS Storage Services & Hybrid Integration Study Guide

AWS storage services (for example, Amazon EBS, Amazon EFS, Amazon FSx, Amazon S3, AWS Storage Gateway Volume Gateway)

1,150 words

AWS Storage Services Strategy: S3, EFS, EBS, and FSx

AWS storage services (for example, Amazon S3, Amazon EFS)

948 words

AWS Backup Practices & Methods: Comprehensive Study Guide

Backup practices and methods

945 words

Mastering AWS Change Management Processes

Change management processes

920 words

CI/CD Pipelines and Advanced Deployment Strategies

CI/CD pipelines and deployment strategies (for example, blue/green, all-at-once, rolling)

820 words

Mastering Application Migration Assessment: AWS SAP-C02 Study Guide

Completing an application migration assessment

945 words

AWS Compute Services: EC2, Elastic Beanstalk, and Beyond

Compute services (for example, Amazon EC2, AWS Elastic Beanstalk)

1,050 words

AWS Configuration Management & Systems Administration

Configuration management tools (for example, AWS Systems Manager)

1,050 words

AWS Configuration Management: Systems Manager, Config, and OpsWorks

Configuration management tools (for example, AWS Systems Manager)

1,150 words

AWS Database Replication: Mastery Guide for DMS and SCT

Configuring data and database replication

1,145 words

Comprehensive Study Guide: Configuring Disaster Recovery Solutions on AWS

Configuring disaster recovery solutions

1,055 words

AWS Container Services: Comprehensive Study Guide (SAP-C02)

Containers (for example, Amazon ECS, Amazon EKS, AWS Fargate, Amazon ECR)

925 words

AWS Container Services: ECS, EKS, and Fargate Study Guide

Containers (for example, Amazon ECS, Amazon EKS, Fargate)

895 words

CI/CD: Strategies for High-Velocity Software Delivery

Continuous integration and continuous delivery (CI/CD)

945 words

AWS Cost-Conscious Architecture Study Guide

Cost-conscious architecture choices (for example, using Spot Instances, scaling policies, and rightsizing resources)

985 words

Mastering AWS Cost Management: Alerting and Reporting

Cost management, alerting, and reporting

1,055 words

Credential Management Services: Secure Strategies & Implementation

Credential management services

945 words

Data Backup and Restoration: AWS Business Continuity Study Guide

Data backup and restoration

1,152 words

Mastering AWS Database Migration: AWS DMS and AWS SCT

Database migration tools (for example, AWS DMS, AWS SCT)

925 words

Mastering AWS Database Architectures: SAP-C02 Study Guide

Databases (for example, Amazon DynamoDB, Amazon OpenSearch Service, Amazon RDS, self-managed databases on Amazon EC2)

1,120 words

AWS Data Migration: Online and Offline Strategies

Data migration options and tools (for example, AWS DataSync, AWS Transfer Family, AWS Snow Family, Amazon S3 Transfer Acceleration)

920 words

AWS Data Replication Methods & Disaster Recovery Strategy

Data replication methods

945 words

Mastering Data Protection: Classification, Retention, and Compliance

Data retention, data sensitivity, and data regulatory requirements

875 words

Mastering AWS Data Transfer Costs: Architect's Study Guide

Data transfer costs

948 words

AWS Encryption Strategies: Protecting Data at Rest and in Transit

Deploying encryption strategies for data at rest and data in transit

1,145 words

AWS Lab: Implementing Blue/Green Deployments with CloudFormation and Route 53

Design a deployment strategy to meet business requirements

895 words

Mastering AWS Deployment Strategies: SAP-C02 Study Guide

Design a deployment strategy to meet business requirements

925 words

Lab: Architecting a Secure Multi-Account Environment with AWS Organizations

Design a multi-account AWS environment

890 words

Mastering Multi-Account AWS Architecture: SAP-C02 Study Guide

Design a multi-account AWS environment

945 words

AWS SAP-C02: Designing for Business Continuity

Design a solution to ensure business continuity

920 words

Lab: Implementing a Pilot Light Disaster Recovery Strategy on AWS

Design a solution to ensure business continuity

945 words

Lab: Designing for Performance with Auto Scaling and ElastiCache

Design a solution to meet performance objectives

820 words

Mastering Performance: Designing High-Efficiency AWS Architectures

Design a solution to meet performance objectives

1,150 words

AWS Certified Solutions Architect - Professional: Designing for Reliability

Design a strategy to meet reliability requirements

920 words

Lab: Designing and Testing a Reliable Multi-AZ Web Architecture

Design a strategy to meet reliability requirements

1,054 words

Resilience and Availability: Designing for Disruption in AWS

Designing an architecture that provides application and infrastructure availability in the event of a disruption

1,150 words

Comprehensive Guide to Designing and Implementing a Backup Process

Designing and implementing a backup process

1,150 words

Study Guide: Designing and Implementing a Patch and Update Process

Designing and implementing a patch and update process

865 words

Designing an Effective Backup and Restoration Strategy

Designing an effective backup and restoration strategy

895 words

Designing Elastic & Performance-Optimized Architectures for Business Objectives

Designing an elastic architecture based on business objectives

925 words

Designing a Rightsizing Strategy: AWS Cost Optimization Study Guide

Designing a rightsizing strategy

860 words

AWS Study Guide: Designing Billing Alarms and Usage Monitoring

Designing billing alarms based on expected usage patterns

825 words

Mastering Disaster Recovery: RTO and RPO Strategy Guide

Designing disaster recovery solutions based on RTO and RPO requirements

920 words

Designing Highly Available Application Environments

Designing highly available application environments based on business requirements

1,050 words

Mastering Large-Scale Application Architectures: Performance and Scalability (SAP-C02)

Designing large-scale application architectures for a variety of access patterns

1,240 words

Design Reliable and Resilient Architectures (SAP-C02)

Design reliable and resilient architectures

850 words

Lab: Building a Resilient Multi-AZ Architecture on AWS

Design reliable and resilient architectures

925 words

Lab: Implementing AWS Cost Optimization and Governance

Determine a cost optimization strategy to meet solution goals and objectives

950 words

Mastering Cost Optimization: AWS Solutions Architect Professional (SAP-C02)

Determine a cost optimization strategy to meet solution goals and objectives

1,142 words

Architectural Design for Existing Workloads (SAP-C02)

Determine a new architecture for existing workloads

945 words

Lab: Re-architecting Legacy Workloads to AWS Managed Services

Determine a new architecture for existing workloads

850 words

Lab: Building Self-Healing Infrastructure for Operational Excellence

Determine a strategy to improve overall operational excellence

920 words

Mastering Operational Excellence: AWS SAP-C02 Study Guide

Determine a strategy to improve overall operational excellence

1,050 words

Lab: Implementing a High-Performance Auto-Scaling Architecture on AWS

Determine a strategy to improve performance

950 words

Optimizing Performance for Existing Solutions (SAP-C02)

Determine a strategy to improve performance

1,150 words

AWS Lab: Implementing Reliable Architectures with Auto Scaling and Load Balancing

Determine a strategy to improve reliability

925 words

Continuous Improvement: Strategies for Improving Reliability

Determine a strategy to improve reliability

850 words

Continuous Security Improvement: Strategies & Automation (SAP-C02)

Determine a strategy to improve security

1,085 words

Lab: Implementing Automated Security Remediation and Secrets Management

Determine a strategy to improve security

925 words

Lab: Implementing AWS Cost Visibility and Governance

Determine cost optimization and visibility strategies

865 words

Mastering AWS Cost Optimization and Visibility (SAP-C02)

Determine cost optimization and visibility strategies

925 words

AWS Modernization and Enhancements: Decoupling and Microservices

Determine opportunities for modernization and enhancements

920 words

Lab: Modernizing Monolithic Workloads using Serverless Decoupling

Determine opportunities for modernization and enhancements

920 words

AWS Certified Solutions Architect - Professional: Determining Security Controls

Determine security controls based on requirements

1,100 words

Lab: Implementing Least Privilege and Private Connectivity on AWS

Determine security controls based on requirements

1,150 words

AWS Migration Strategy Guide: Determining the Optimal Migration Approach

Determine the optimal migration approach for existing workloads

870 words

Selecting the Optimal Migration Path: AWS Migration Hub & Assessment Lab

Determine the optimal migration approach for existing workloads

1,342 words

Modernization and Upgrade Paths for AWS Workloads

Determining an application or upgrade path for new services and features

920 words

AWS Certified Solutions Architect Professional: Logging and Monitoring Strategy

Determining the most appropriate logging and monitoring strategy

925 words

Mastering Multi-Account Governance on AWS

Developing a multi-account governance model

985 words

AWS Tagging Strategy: Mapping Costs to Business Units

Developing an effective tagging strategy that maps costs to business units

820 words

Methodology for Selecting Purpose-Built AWS Services: A Strategic Study Guide

Developing a process methodology for selecting purpose-built services for required tasks

1,085 words

AWS Expenditure & Usage Awareness Strategy

Developing a strategy and implementing controls for expenditure and usage awareness

945 words

Strategic Centralization: Security Event Notifications and Auditing in AWS

Developing a strategy for centralized security event notifications and auditing

940 words

Comprehensive Attack Mitigation Strategies for Large-Scale Web Applications

Developing attack mitigation strategies for large-scale web applications

860 words

AWS Encryption Strategies: Protecting Data at Rest and in Transit

Developing encryption strategies for data at rest and data in transit

1,342 words

AWS Patch Management & Compliance Strategies

Developing strategies for patch management to remain compliant with organizational standards

985 words

Scalability Strategies: Mastering Scale-Up vs. Scale-Out for Optimal AWS Architecture

Developing the optimal architecture by considering scale-up and scale-out options

1,085 words

Mastering Disaster Recovery on AWS: Methods, Tools, and Strategies

Disaster recovery methods and tools

1,450 words

Mastering Disaster Recovery Planning: AWS SAP-C02 Study Guide

Disaster recovery planning

912 words

AWS Disaster Recovery Strategies: A Comprehensive Study Guide

Disaster recovery scenarios (for example, backup and restore, pilot light, warm standby, multi-site)

1,054 words

AWS Disaster Recovery and Business Continuity

Disaster recovery solutions on AWS

925 words

AWS Disaster Recovery: Architecting for Business Continuity

Disaster recovery strategies (for example, using AWS Elastic Disaster Recovery, pilot light, warm standby, and multi-site)

920 words

AWS Remediation Techniques and Automated Response Strategies

Employing remediation techniques

950 words

Architectural Resilience: Data Replication, Self-Healing, and Elasticity

Enabling data replication, self-healing, and elastic features and services

980 words

Mastering AWS Encryption and Certificate Management (SAP-C02)

Encryption keys and certificate management (for example, AWS Key Management Service [AWS KMS], AWS Certificate Manager [ACM])

980 words

Mastering AWS Data Encryption: At Rest and In Transit

Encryption options for data at rest and data in transit

860 words

Engineering Failure Scenarios and Recovery Exercises

Engineering failure scenario activities to support and exercise an understanding of recovery actions

925 words

AWS Migration Strategies: The 7Rs Master Study Guide

Evaluating applications according to the seven common migration strategies (7Rs)

1,150 words

Evaluating Strategies for Secure Secrets and Credentials Management

Evaluating a strategy for the secure management of secrets and credentials

890 words

Study Guide: Evaluating Connectivity Options for Multiple VPCs

Evaluating connectivity options for multiple VPCs

865 words

Hybrid Connectivity Strategies: On-Premises to AWS Integration

Evaluating connectivity options for on-premises, co-location, and cloud integration

1,185 words

Evaluating Cross-Account Access Management: SAP-C02 Study Guide

Evaluating cross-account access management

920 words

Optimizing Deployment Processes for Operational Excellence

Evaluating current deployment processes for improvement opportunities

945 words

Architectural Reliability Evaluation and Improvement

Evaluating existing architecture to determine areas that are not sufficiently reliable

1,056 words

AWS Multi-Account Governance: Evaluating and Implementing Organizational Structures

Evaluating the most appropriate account structure for organizational requirements

1,054 words

Evaluating Total Cost of Ownership (TCO) and Cost Optimization

Evaluating total cost of ownership (TCO)

1,250 words

AWS Global Services Study Guide: CloudFront, Global Accelerator, & Edge Computing

Global service offerings (for example, AWS Global Accelerator, Amazon CloudFront, edge computing services)

945 words

Governance at Scale: AWS Organizations and Control Tower

Governance tools (for example, AWS Control Tower, AWS Organizations)

890 words

Mastering High Availability and Resiliency on AWS

High availability and resiliency

945 words

High-Performing Systems Architectures: Elasticity, Fleets, and Placement Groups

High-performing systems architectures (for example, auto scaling, instance fleets, placement groups)

925 words

Architecting Hybrid DNS: Route 53 Resolver and On-Premises Integration

Hybrid DNS concepts (for example, Amazon Route 53 Resolver, on-premises DNS integration)

1,152 words

Deep Dive: AWS Identity and Access Management (IAM)

IAM

945 words

AWS SAP-C02 Study Guide: Identifying and Examining Performance Bottlenecks

Identifying and examining performance bottlenecks

945 words

Mastering AWS Pricing Models: A Comprehensive SAP-C02 Study Guide

Identifying appropriate pricing models

1,250 words

Identifying Opportunities for Purpose-Built Databases: A Modernization Guide

Identifying opportunities for purpose-built databases

1,084 words

Identifying Opportunities for Serverless Solutions: Study Guide

Identifying opportunities for serverless solutions

875 words

Identifying Opportunities to Decouple Application Components

Identifying opportunities to decouple application components

892 words

Optimizing Infrastructure: Selection & Rightsizing for Cost-Efficiency

Identifying opportunities to select and rightsize infrastructure for cost-effective resources

1,084 words

AWS Lab: Identifying and Implementing Cost Optimization Opportunities

Identify opportunities for cost optimizations

890 words

Mastering Cost Optimization: Strategies for the AWS Solutions Architect Professional

Identify opportunities for cost optimizations

940 words

Mastering AWS Identity Services: IAM Identity Center & Directory Service

Identity services (for example, AWS IAM Identity Center, AWS Directory Service)

985 words

Architectural Resiliency: Automatically Recovering from Failure

Implementing architectures to automatically recover from failure

1,085 words

Amazon Route 53 Routing Policies: A Solutions Architect's Guide

Implementing DNS routing policies (for example, Route 53 latency-based routing, geolocation routing, simple routing)

1,150 words

Mastering Loosely Coupled Dependencies for AWS Architecting

Implementing loosely coupled dependencies

925 words

Study Guide: Infrastructure as Code (IaC) and AWS CloudFormation

Infrastructure as code (IaC) (for example, AWS CloudFormation)

890 words

Mastering AWS EC2: Instance Families, Sizing, and Optimization

Instance families and use cases

965 words

Mastering Identity Federation: Integrating Third-Party IdPs with AWS

Integrating with third-party identity providers

940 words

AWS Application Integration Services: Decoupling and Orchestration

Integration services (for example, Amazon SQS, Amazon SNS, Amazon EventBridge, AWS Step Functions)

1,150 words

Mastering AWS Cost and Usage Reports (CUR) for Granular Analysis

Investigating AWS Cost and Usage Reports at a granular level

920 words

Modernizing with AWS: Delegating Development and Deployment Tasks

Making advanced technologies accessible by delegating complex development and deployment tasks to AWS

1,050 words

AWS Migration Assessment and Tracking: Mastering AWS Migration Hub

Migration assessment and tracking tools (for example, AWS Migration Hub)

925 words

AWS Monitoring and Logging Solutions: Comprehensive Study Guide

Monitoring and logging solutions (for example, Amazon CloudWatch)

925 words

Mastering AWS Cost and Usage Monitoring

Monitoring cost and usage with AWS tools

925 words

Mastering AWS Monitoring: CloudWatch and Beyond (SAP-C02 Study Guide)

Monitoring tool sets and services (for example, CloudWatch)

945 words

AWS Multi-Account Event Notifications: Architecting Centralized Observability

Multi-account event notifications

945 words

Comprehensive Study Guide: Multi-AZ and Multi-Region Architectures

Multi-AZ and multi-Region architectures

985 words

AWS Networking & Data Transfer Cost Optimization Study Guide

Networking and data transfer costs

925 words

AWS Network Segmentation and Connectivity: Architect's Study Guide

Network segmentation (for example, subnetting, IP addressing, connectivity among VPCs)

1,084 words

Mastering Network Traffic Monitoring on AWS

Network traffic monitoring

940 words

Operating and Maintaining High-Availability Architectures

Operating and maintaining high-availability architectures (for example, application failovers, database failovers)

1,050 words

Mastering Patching Practices in AWS: Strategies for Mutable and Immutable Infrastructure

Patching practices

925 words

AWS Performance Monitoring & Objectives Study Guide

Performance monitoring technologies

890 words

AWS Data Transfer Modeling and Cost Optimization

Performing data transfer modeling and selecting services to reduce data transfer costs

1,056 words

Mastering Disaster Recovery Testing: Strategy and Execution

Performing disaster recovery testing

945 words

AWS Portfolio Assessment & Migration Strategy

Portfolio assessment

820 words

Lab: Automated Remediation of Security Controls with AWS Config

Prescribe security controls

920 words

Mastering Security Controls: AWS SAP-C02 Study Guide

Prescribe security controls

1,250 words

AWS Price Model Adoptions: Reserved Instances & Savings Plans

Price model adoptions (for example, Reserved Instances, AWS Savings Plans)

925 words

Comprehensive Study Guide: AWS Pricing Models and Cost Optimization

Pricing models (for example, Reserved Instances, AWS Savings Plans)

920 words

Study Guide: Principle of Least Privilege (PoLP) in AWS

Principle of least privilege access

1,084 words

Prioritization and Migration: Wave Planning and Portfolio Assessment

Prioritization and migration of workloads (for example, wave planning)

920 words

Mastering Automated Vulnerability Response in AWS

Prioritizing automated responses to the detection of vulnerabilities

915 words

Prioritizing Automation in the AWS Solution Stack

Prioritizing opportunities for automation within a solution stack

1,050 words

Modernizing AWS Architectures: Adopting New Technologies and Managed Services

Proposing opportunities for the adoption of new technologies and managed services

985 words

Comprehensive Study Guide: Purpose-Built AWS Databases

Purpose-built databases

875 words

AWS Purpose-Built Databases: Architectural Selection and Modernization

Purpose-built databases (for example, DynamoDB, Amazon Aurora Serverless, Amazon ElastiCache)

948 words

AWS Strategy: Central Logging and Event Notifications

Recommending a strategy for central logging and event notifications

945 words

AWS Configuration Management & Automation Study Guide

Recommending the appropriate AWS solution to enable configuration management automation

985 words

Mastering Disaster Recovery Metrics: RTO and RPO

Recovery time objectives (RTOs) and recovery point objectives (RPOs)

920 words

Remediating Single Points of Failure: Architectural Strategies

Remediating single points of failure

985 words

Comprehensive Traceability of Users and Services

Reviewing comprehensive traceability of users and services

1,285 words

Reviewing Multi-Layered Security Solutions in AWS

Reviewing implemented solutions to ensure security at every layer

940 words

Mastering AWS Network Security: Route Tables, Security Groups, and NACLs

Route tables, security groups, and network ACLs

875 words

Mastering AWS Network Security: Route Tables, Security Groups, and NACLs

Route tables, security groups, and network ACLs

1,150 words

Mastering Disaster Recovery: Understanding RTO and RPO

RTOs and RPOs

945 words

AWS Scaling Methodologies: Load Balancing & Auto Scaling

Scaling methodologies (for example, load balancing, auto scaling)

985 words

AWS Secrets Management: Systems Manager & Secrets Manager

Secrets management (for example, Systems Manager, AWS Secrets Manager)

920 words

Mastering Security-Specific AWS Solutions: A Professional Study Guide

Security-specific AWS solutions

890 words

Lab: Assessing and Prioritizing Workloads with AWS Migration Hub

Select existing workloads and processes for potential migration

845 words

SAP-C02 Study Guide: Selecting Workloads for Migration

Select existing workloads and processes for potential migration

1,050 words

AWS Infrastructure Design: Region and AZ Selection for Performance

Selecting AWS Regions and Availability Zones based on network and latency requirements

1,180 words

AWS Advanced Deployment Strategies and Rollback Mechanisms

Selecting services to develop deployment strategies and implement appropriate rollback mechanisms

890 words

Selecting the Appropriate AWS Application Integration Service

Selecting the appropriate application integration service

1,180 words

Study Guide: Selecting Appropriate Application Transfer Mechanisms

Selecting the appropriate application transfer mechanism

980 words

AWS Compute Selection: Migration & Modernization Guide

Selecting the appropriate compute platform

940 words

Showing 200 of 230 study notes. View all →

Ready to practice? Jump straight in — no sign-up needed.

Take practice tests, review flashcards, and read study notes right now.

Take a Practice Test

AWS Certified Solutions Architect - Professional (SAP-C02) Practice Questions

Try 15 sample questions from a bank of 1,035. Answers and detailed explanations included.

Q1easy

Which type of disaster recovery (DR) test involves creating an isolated environment to restore backups and verify that the organization can meet its Recovery Time Objective (RTO) without impacting production systems?

A.

Full-interruption test

B.

Simulation test

C.

Tabletop exercise

D.

Walkthrough test

Show answer & explanation

Correct Answer: B

A simulation test (or sandbox test) involves 'spinning off' a separate environment to restore backups and assess RTO and RPO capabilities. This allows the organization to validate its recovery strategy and data integrity in a realistic scenario without the risks associated with a full-interruption test. Answer: B

Q2easy

A company is defining its disaster recovery strategy and needs to establish the maximum acceptable amount of time that an application can be offline following a service disruption before it must be restored. Which business metric is being identified in this requirement?

A.

RPO (Recovery Point Objective)

B.

RTO (Recovery Time Objective)

C.

SLA (Service Level Agreement)

D.

MTBF (Mean Time Between Failures)

Show answer & explanation

Correct Answer: B

Recovery Time Objective (RTO) is a business metric that defines the maximum acceptable duration of downtime following a disaster before the service must be back online. In contrast, Recovery Point Objective (RPO) deals with the maximum acceptable amount of data loss measured in time (i.e., the time since the last backup). Answer: B

Q3medium

An organization establishes a Disaster Recovery (DR) plan for a mission-critical application with a Recovery Time Objective (RTO) of 4 hours and a Recovery Point Objective (RPO) of 1 hour. Which of the following statements correctly explains the implications of these targets?

A.

The system must be fully restored and back online within 1 hour, and the organization can tolerate losing up to 4 hours of data.

B.

The system must be fully restored and back online within 4 hours, and the organization can tolerate losing up to 1 hour of data.

C.

Data must be backed up at least every 4 hours, and the recovery procedure must be completed within 1 hour of starting.

D.

The organization has a 4-hour window to detect the disaster, and then 1 hour to restore data to the state it was in at midnight.

Show answer & explanation

Correct Answer: B

Recovery Time Objective (RTO) refers to the maximum acceptable duration of downtime; in this case, the system must be back online within 4 hours. Recovery Point Objective (RPO) refers to the maximum acceptable amount of data loss measured in time; here, the organization can tolerate losing up to 1 hour of data (i.e., data must be restored from a point no more than 1 hour before the disaster). Answer: B

Q4easy

In the context of AWS Security Hub, which metadata attribute is primarily used to categorize the urgency of a finding (ranging from low to critical) to help prioritize automated responses?

A.

Workflow status

B.

Severity level

C.

Record state

D.

Product name

Show answer & explanation

Correct Answer: B

According to the AWS security documentation, each finding in Security Hub includes attributes that document its context. The severity level attribute (which ranges from low to critical) is used to categorize the urgency of the issue, allowing organizations to prioritize and automate responses for the most critical findings first. Answer: B

Q5medium

A financial services company is migrating its monolithic on-premises application to AWS. The application requires a relational database with strict ACID compliance and must maintain high availability (HA) even if an entire Availability Zone (AZ) goes offline. To minimize operational overhead, the company prefers managed services over self-managed solutions. Which architecture should the company implement to meet these requirements?

A.

Deploy an Application Load Balancer (ALB) to distribute traffic across an Auto Scaling group of EC2 instances in multiple AZs, and use Amazon RDS with a Multi-AZ deployment.

B.

Deploy the application on a single EC2 instance in one AZ and use Amazon Aurora with a cross-region Read Replica for manual failover if the primary AZ fails.

C.

Use Amazon DynamoDB with Global Tables to provide multi-region active-active availability for the relational data store.

D.

Set up two EC2 instances in different AZs with a custom script for data synchronization, using a Route 53 Failover routing policy to manage the primary and secondary endpoints.

Show answer & explanation

Correct Answer: A

Amazon RDS Multi-AZ deployment provides high availability and failover support for DB instances. When you provision a Multi-AZ DB instance, Amazon RDS automatically creates a primary DB instance and synchronously replicates the data to a standby instance in a different Availability Zone. In case of an AZ failure, RDS performs an automatic failover to the standby. Pairing this with an Application Load Balancer (ALB) and an Auto Scaling group across multiple AZs ensures that the application tier also remains highly available with minimal manual intervention. Option B lacks HA for the compute tier and relies on manual failover. Option C is incorrect because DynamoDB is a non-relational database. Option D involves high operational overhead and is a self-managed solution rather than a managed service. Answer: A

Q6medium

A solutions architect is tasked with remediating single points of failure (SPOF) for a monolithic web application currently hosted on a single Amazon EC2 instance with an attached Amazon EBS volume. Which of the following architectural changes best explains the strategy required to eliminate SPOFs at both the compute and data layers?

A.

Increase the instance size by 2×2\times (vertical scaling) to provide a higher buffer for resource exhaustion.

B.

Migrate the application to an Auto Scaling group across multiple Availability Zones and enable Multi-AZ deployment for the backend database.

C.

Implement a cold standby instance in a secondary AWS Region and use Amazon Route 53 to manually point DNS records during a failure.

D.

Tighten the coupling between the application and the database to ensure that failures are detected faster by the internal logic.

Show answer & explanation

Correct Answer: B

To eliminate single points of failure, an architecture must implement redundancy at every layer of the system. According to the source material, horizontal scaling—replacing a large resource with multiple smaller ones—is essential for reducing the impact of a single failure. Migrating to an Auto Scaling group across multiple Availability Zones (AZs) ensures compute redundancy, while a Multi-AZ database deployment provides a synchronous standby in a different AZ for automated failover. Vertical scaling (Option A) and single-region standby (Option C) do not fully eliminate SPOFs or provide the automated recovery needed for high reliability. Tight coupling (Option D) is discouraged as it makes the system more fragile, whereas loose coupling increases overall system resilience. Answer: B

Q7easy

According to the AWS Well-Architected Framework's reliability pillar, which approach best identifies the core concept of the 'Automatically recover from failure' design principle?

A.

Relying on manual human intervention to monitor vital signals and take action during failure events.

B.

Monitoring key performance indicators (KPIs) and triggering automated processes when defined thresholds are breached.

C.

Scaling a system vertically by adding more CPU and RAM to a single resource when failures occur.

D.

Testing recovery procedures only in production environments to ensure they can handle real-world traffic.

Show answer & explanation

Correct Answer: B

The 'Automatically recover from failure' principle states that human monitoring is not scalable or sustainable. Instead, systems should be designed to monitor KPIs and use automation to trigger recovery processes, notifications, or repairs when thresholds are exceeded. Answer: B

Q8medium

A security engineer is designing a centralized strategy for real-time security event notifications in a multi-account environment. The requirement is to receive an email alert immediately whenever AWS Security Hub detects a 'High' or 'Critical' severity finding in any member account. Which architecture provides the most efficient and scalable workflow using Amazon EventBridge and Amazon SNS?

A.

Configure each member account to forward Security Hub findings to a central Amazon EventBridge event bus in a dedicated security account. Create an EventBridge rule in the security account to filter for findings with a severity of 'High' or 'Critical' and set an Amazon SNS topic as the target for email notifications.

B.

Set up Amazon CloudWatch Logs subscription filters in each member account to forward Security Hub findings to a central Kinesis Data Firehose, which then triggers an Amazon SNS notification based on the finding severity using a custom Lambda transformation.

C.

Deploy an AWS Lambda function in the central security account that periodically polls the Security Hub API for findings from all member accounts and publishes any results with high severity to a central Amazon SNS topic for distribution.

D.

Configure Security Hub to export all findings to a centralized Amazon S3 bucket in the security account. Use S3 Event Notifications to trigger an Amazon SNS topic whenever a new finding file is uploaded to the bucket.

Show answer & explanation

Correct Answer: A

Amazon EventBridge is the native, event-driven mechanism for routing AWS Security Hub findings. By utilizing a central event bus, findings from all member accounts can be aggregated into a single security account in real-time. This allows for centralized filtering and alerting via Amazon SNS, which provides lower latency and higher scalability than log-based processing (B), scheduled polling (C), or S3-based workflows (D). Answer: A

Q9medium

A solutions architect is evaluating an existing cloud architecture that utilizes a combination of zonal services, such as compute instances, and regional services, such as managed NoSQL databases. When determining the reliability of this system, which statement correctly explains the impact of a single Availability Zone (AZ) outage compared to a regional outage?

A.

A zonal outage impacts only the zonal resources within that specific AZ, while regional services continue to function through other AZs in the region; a regional outage, however, disrupts both service types across all zones in the region.

B.

A zonal outage triggers an automatic failover of all regional services to a backup cloud region, while a regional outage requires the architect to manually enable multi-AZ replication for the database.

C.

A regional outage only affects the control plane of regional services, whereas a zonal outage completely disables the data plane for both zonal and regional services throughout the entire geographic area.

D.

Zonal outages have a wider blast radius than regional outages because they affect the underlying physical infrastructure that regional services rely on for inter-region communication.

Show answer & explanation

Correct Answer: A

According to cloud reliability principles, zonal services are confined to a single Availability Zone (AZ), meaning they share the fate of that specific zone. If an AZ goes offline, the instances within it become unavailable. In contrast, regional services are architected to span multiple AZs automatically in an active-active configuration. This design allows them to withstand a single AZ failure without disruption to the service as a whole. A regional outage is significantly more severe, as it affects all AZs within that region, typically requiring a multi-region architecture for recovery and business continuity. Answer: A

Q10easy

Which version of AWS Shield is automatically available to all AWS customers at no additional cost and provides protection against common network and transport layer DDoS attacks?

A.

AWS Shield Advanced

B.

AWS Shield Basic

C.

AWS Shield Lite

D.

AWS Shield Standard

Show answer & explanation

Correct Answer: D

AWS Shield Standard is the default protection service provided to all AWS customers at no extra charge. It protects against common Layer 3 (network) and Layer 4 (transport) DDoS attacks, such as SYN/UDP floods and reflection attacks. AWS Shield Advanced is a paid subscription service that provides additional protections, including Layer 7 monitoring and access to the AWS Shield Response Team (SRT). Answer: D

Q11medium

A network engineer is configuring a hybrid connection between an on-premises data center and an AWS VPC using an AWS Site-to-Site VPN. Which of the following statements correctly explains how encryption and high availability are managed for this connection?

A.

The connection uses IPsec for Layer 3 encryption and provides two tunnels for high availability, where each tunnel supports a maximum throughput of $1.25 Gbps.

B.

The connection uses MACsec for Layer 2 encryption and provides a single tunnel that automatically scales bandwidth up to 10 Gbps.

C.

The connection uses SSL/TLS for Layer 4 encryption and requires a dedicated physical Direct Connect link to maintain two active-active tunnels.

D.

The connection uses GRE for encapsulation and provides four tunnels by default, which are aggregated into a single logical link for load balancing.

Show answer & explanation

Correct Answer: A

An AWS Site-to-Site VPN connection consists of two tunnels to ensure high availability. Each tunnel has its own public IP address on the AWS side, allowing traffic to failover if one tunnel becomes unavailable. The service uses the IPsec protocol suite to provide encryption at the network layer (Layer 3). Each individual tunnel is limited to a maximum throughput of $1.25 Gbps. Answer: A

Q12hard

An organization is using AWS IAM Identity Center (successor to AWS SSO) integrated with an external Identity Provider (IdP) via SAML 2.0 and SCIM. They wish to implement Attribute-Based Access Control (ABAC) to allow developers to manage only the EC2 instances tagged with a Project value matching the developer's ProjectCode attribute in the IdP.

The ProjectCode attribute is successfully synchronized to the IAM Identity Center identity store via SCIM. However, when a developer with ProjectCode: Apollo attempts to stop an instance tagged with Project: Apollo, they receive an 'Access Denied' error. Their Permission Set includes the following policy condition:

json
"Condition": { "StringEquals": { "ec2:ResourceTag/Project": "${aws:PrincipalTag/ProjectCode}" } }

Which of the following configuration steps is most likely missing?

A.

The IAM role trust policy for the Identity Center instance must be updated to explicitly allow the sts:TagSession action for the SAML provider.

B.

The 'Attributes for access control' feature in the IAM Identity Center settings has not been configured to map the ProjectCode identity store attribute to a session tag.

C.

The SCIM configuration must be modified to use the https://aws.amazon.com/SAML/Attributes/PrincipalTag:ProjectCode namespace for the attribute to be recognized by IAM policies.

D.

The IAM policy condition must be changed to use aws:PrincipalTag/Project to match the resource tag key, as IAM cannot perform cross-key comparisons between principal and resource tags.

Show answer & explanation

Correct Answer: B

In AWS IAM Identity Center, synchronizing attributes via SCIM populates the identity store, but it does not automatically make those attributes available as session tags for authorization. To use these attributes in IAM policies via the ${aws:PrincipalTag/Key} variable, you must explicitly enable and configure the 'Attributes for access control' feature. This feature allows you to map attributes from your identity source (like those synced via SCIM) to specific session tags that are injected into the user's temporary security token during sign-in. Without this mapping, the aws:PrincipalTag/ProjectCode context key remains null, causing the policy condition to fail. Answer: B

Q13easy

Which metric describes the maximum acceptable length of downtime allowed for a workload following a disaster before the service must be restored?

A.

Recovery Point Objective (RPO)

B.

Recovery Time Objective (RTO)

C.

Service Level Objective (SLO)

D.

Mean Time Between Failures (MTBF)

Show answer & explanation

Correct Answer: B

Recovery Time Objective (RTO) is defined as the maximum acceptable amount of time a system can be offline after a disaster or disruption occurs. It focuses on the speed of recovery to ensure business continuity. In contrast, Recovery Point Objective (RPO) focuses on the maximum acceptable amount of data loss measured in time. Answer: B

Q14hard

A multi-national corporation is migrating its legacy web applications from an on-premises data center to AWS. The Chief Technology Officer has mandated the following requirements:

  1. The architecture must support the use of standard open-source Kubernetes operators and custom resource definitions (CRDs) to maintain configuration parity with their remaining on-premises environments.
  2. The solution must minimize operational overhead related to host operating system patching, capacity provisioning, and hardware maintenance.
  3. The application requires a service mesh (such as Istio) for advanced traffic management and observability.

The internal engineering team is proficient in Kubernetes but wants to reduce the burden of infrastructure management. Which of the following container services and launch types most effectively meets these requirements while maximizing administrative efficiency?

A.

Amazon Elastic Container Service (ECS) with the Fargate launch type

B.

Amazon Elastic Kubernetes Service (EKS) with the Fargate launch type

C.

Amazon Elastic Container Service (ECS) with the EC2 launch type

D.

Amazon Elastic Kubernetes Service (EKS) with the EC2 launch type

Show answer & explanation

Correct Answer: B

The requirement to use Kubernetes operators, CRDs, and a Kubernetes-native service mesh like Istio necessitates the use of Amazon EKS. Amazon ECS is an AWS-native orchestrator and does not support the Kubernetes-specific ecosystem or CRDs. To meet the second requirement of minimizing operational overhead (such as OS patching and hardware maintenance), the Fargate launch type is the optimal choice. Unlike the EC2 launch type, which requires the customer to manage and patch the underlying instances, Fargate provides a serverless compute engine for containers where AWS manages the underlying infrastructure. Answer: B

Q15medium

A solutions architect is designing an environment for two specific applications with different resource requirements:

  1. Application Alpha: A distributed NoSQL database that requires high random I/O performance and very low-latency access to local NVMe-based storage.
  2. Application Beta: A high-performance in-memory analytics engine that needs to process 500 GB of data entirely within RAM to maintain sub-millisecond query performance.

Which selection of Amazon EC2 instance families most effectively meets the performance objectives for these applications?

A.

Storage Optimized (D family) for Application Alpha; Memory Optimized (R family) for Application Beta.

B.

Storage Optimized (I family) for Application Alpha; Memory Optimized (R family) for Application Beta.

C.

Memory Optimized (X family) for Application Alpha; Storage Optimized (I family) for Application Beta.

D.

Compute Optimized (C family) for Application Alpha; General Purpose (M family) for Application Beta.

Show answer & explanation

Correct Answer: B

Storage Optimized I family instances (such as I3 or I4i) are specifically designed for workloads that require high random I/O performance and low latency, utilizing local NVMe SSDs, which is a primary requirement for NoSQL databases. While the D family is also storage-optimized, it is better suited for high storage density and sequential throughput (HDD-based). Memory Optimized R family instances (such as R5 or R6g) are designed for memory-intensive workloads, providing the high RAM-to-vCPU ratio needed for in-memory databases and real-time analytics. Answer: B

These are 15 of 1,035 questions available. Take a practice test →

AWS Certified Solutions Architect - Professional (SAP-C02) Flashcards

824 flashcards for spaced-repetition study. Showing 30 sample cards below.

Adopting Managed Services for Reduced Overhead(4 cards shown)

Question

Immutable Infrastructure

Answer

An infrastructure management strategy where servers or resources are never modified after deployment. When a change (like a patch) is needed, the old resources are destroyed and replaced by new ones built from a common image (like an AMI).

[!TIP] Think of this as "Cattle, not Pets." Instead of healing a sick server, you replace it with a healthy one.

Question

What is the primary operational trade-off when migrating from self-managed EC2 instances to high-level managed services like AWS Lambda?

Answer

The primary trade-off is reduced operational overhead (patching, scaling, provisioning) in exchange for increased initial refactoring or rearchitecting effort.

AspectSelf-Managed (EC2)Managed (Lambda/Fargate)
OS PatchingCustomer ResponsibilityAWS Responsibility
Infrastructure CodeManage OS, scaling, networkingFocus on application logic
ComplexityHigher operational complexityHigher architectural complexity

[!NOTE] Managed services allow you to delegate "undifferentiated heavy lifting" to AWS.

Question

In the cloud, customers can achieve better long-term cost savings and reduced drift by moving applications to ___ services, though this typically requires more ___ of the application compared to a lift-and-shift approach.

Answer

  1. higher-level managed (or serverless)
  2. refactoring (or rearchitecting)

By moving to services like AWS Fargate or Lambda, you limit infrastructure configuration drift because you no longer manage the long-lived underlying virtual machines.

Question

Explain how Adopting Managed Services impacts the Shared Responsibility Model regarding patching.

Answer

As you move from IaaS (Infrastructure as Code) to PaaS/Serverless (Platform/Function as a Service), the line of responsibility moves upward.

Loading Diagram...

Key Benefit: By using managed services (e.g., Amazon RDS, AWS Fargate), the customer is no longer responsible for the underlying OS, which eliminates the need for manual patching schedules and reduces the risk of security vulnerabilities due to unpatched systems.

Alerting and Automatic Remediation Strategies(4 cards shown)

Question

AWS Config

Answer

A managed service that acts as a Configuration Management Database (CMDB) by recording and tracking AWS resource configurations.

It evaluates whether resource settings align with desired configurations through Config Rules.

[!NOTE] It is the primary tool for detecting configuration drift and compliance violations in real-time.

Question

When an AWS Config rule violation is detected, the service can trigger a remediation action using ___ ___ ___ Automation runbooks.

Answer

AWS Systems Manager (SSM)

SSM Automation runbooks define the specific steps (scripts or API calls) required to resolve a non-compliant state.

Example: Using the predefined runbook AWS-DisableS3BucketPublicReadWrite to automatically block public access when a bucket is incorrectly configured.

[!TIP] Remediation can be set to occur automatically upon detection or manually after review.

Question

How does AWS Security Hub facilitate automated remediation for security findings?

Answer

Security Hub aggregates findings and routes them to Amazon EventBridge.

The workflow involves:

  1. Finding Generation: Security Hub identifies a vulnerability (e.g., PCI-DSS violation).
  2. Event Routing: The finding is sent to EventBridge as an event.
  3. Target Trigger: EventBridge rules match the finding and trigger a target, such as an AWS Lambda function or an SSM Automation document to execute the fix.
ComponentRole
Security HubDetection & Aggregation
EventBridgeRouting & Filtering
Lambda/SSMExecution of Remediation

Question

Automated Security Response on AWS (Playbooks)

Answer

A library of pre-built remediations for common security standards (CIS, PCI-DSS, etc.) supported by AWS Security Hub.

Loading Diagram...

Key Characteristics:

  • Contextual Awareness: Remediation can be adapted based on risk (e.g., notify first for dev, block immediately for prod).
  • Scalability: Critical for organizations with hundreds of AWS accounts where manual intervention is impossible.

[!WARNING] Always ensure your remediation logic accounts for business continuity (e.g., don't block a production bucket without proper alerting/exception handling).

Amazon Route 53 Routing Policies(4 cards shown)

Question

Latency-Based Routing (LBR)

Answer

A routing policy used when you have resources in multiple AWS Regions and want to route traffic to the region that provides the lowest network latency for the end-user.

[!NOTE] Route 53 measures latency (Round Trip Time) over time and maintains a database to determine the best region.

Loading Diagram...

Question

How does Geolocation Routing differ from Geoproximity Routing?

Answer

While both use geographic data, they serve different primary purposes:

FeatureGeolocation RoutingGeoproximity Routing
BasisBased on the user's physical location (continent, country, or state).Based on the geographic distance between user and resource.
ControlAllows for localized content or restricted distribution.Uses a Bias value to expand or shrink the size of a geographic region.
Use CaseCompliance, language-specific sites.Complex traffic shifting across global regions.

[!TIP] Remember: Geolocation = Where the user is. Geoproximity = Where the resource is + a 'bias' adjustment.

Question

Failover Routing Policy (Active-Passive)

Answer

This policy is used to configure active-passive failover, where one resource (Primary) handles traffic as long as it is healthy, and Route 53 switches to a backup resource (Secondary) if the primary fails.

Components required:

  1. Primary Record: The main resource (e.g., an ALB in us-east-1).
  2. Secondary Record: The disaster recovery resource (e.g., a static S3 site).
  3. Health Check: Monitored by Route 53 to trigger the switch.
Loading Diagram...

[!WARNING] If you don't associate a health check with the primary record, Route 53 will continue to route traffic to it even if it is down.

Question

In Route 53, the ___ routing policy allows you to return up to eight healthy records of the same type (such as A records) in response to a DNS query, providing a basic form of load balancing and high availability.

Answer

Multivalue Answer Routing

Unlike Simple Routing, which returns all values regardless of health, Multivalue Answer only returns values for healthy resources.

Key Characteristics:

  • Returns up to 8 records.
  • Not a substitute for an ELB, but provides DNS-level availability.
  • Requires Route 53 health checks.

Analyzing AWS Usage Reports for Cost Optimization(4 cards shown)

Question

Right-sizing

Answer

The process of matching instance types and sizes to your workload performance and capacity requirements at the lowest possible cost.

[!TIP] Organizations often over-provision during "lift-and-shift" migrations to ensure performance; right-sizing is the corrective step to align resources with actual cloud performance benefits.

Question

What is the primary difference between AWS Cost Explorer and AWS Cost and Usage Reports (CUR) in terms of data granularity and delivery?

Answer

FeatureAWS Cost ExplorerAWS Cost and Usage Reports (CUR)
GranularityDaily/Monthly (Hourly available for a fee)Most granular (Hourly, daily, or monthly)
DeliveryAWS Management Console / APICSV or Parquet files delivered to an S3 Bucket
Use CaseVisualizing trends and quick filteringIn-depth analysis and integration with BI tools like Amazon QuickSight

[!NOTE] CUR provides the cost breakup of resources by tags, products, and services at the most comprehensive level available.

Question

To enable AWS Cost and Usage Reports (CUR), you must apply an S3 bucket policy that grants the ___ service principal the permissions s3:GetBucketAcl and s3:GetBucketPolicy.

Answer

billingreports.amazonaws.com

This permission is required so that the AWS Billing service can verify the bucket ownership and successfully deliver the report files to your account.

json
{ "Principal": { "Service": "billingreports.amazonaws.com" }, "Action": [ "s3:GetBucketAcl", "s3:GetBucketPolicy" ] }

Question

How can an architect identify underutilized resources using the native reporting tools in AWS?

Answer

Architects can identify inefficiencies through a systematic review of usage reports:

  1. Analyze Cost Explorer Reports: Use the "Monthly costs by service" report and filter by specific Regions or Accounts to find high-cost/low-usage anomalies.
  2. Examine CUR Data: Investigate granular hourly data in S3 to find instances with consistently low CPU/Memory utilization.
  3. Review Default Boilerplates: Utilize out-of-the-box reports like the Reserved Instance (RI) Report to see if committed capacity is being fully utilized.
Loading Diagram...

Applying Design Patterns for Performance: Caching, Buffering, and Replicas(4 cards shown)

Question

Cache-Aside (Lazy Loading)

Answer

A caching strategy where the application is responsible for managing the cache.

Process:

  1. Check the cache.
  2. If Cache Hit: Return data.
  3. If Cache Miss: Query database, update cache, then return data.

[!TIP] This pattern is highly effective for read-heavy workloads where data is not frequently updated, but it can result in stale data if the database is updated without invalidating the cache.

Question

How do Read Replicas differ from a Caching Layer (like ElastiCache) in addressing read performance bottlenecks?

Answer

While both offload reads from the primary database, they serve different performance profiles:

FeatureRead Replicas (e.g., RDS)Caching Layer (e.g., ElastiCache)
LatencyMillisecondsSub-millisecond (In-memory)
Query CapabilitySupports complex SQL queriesKey-Value or simple data structures
Data FreshnessAsynchronous replication (Lag)Depends on pattern (Write-through vs. Lazy)
Use CaseScaling analytical/complex readsOffloading frequent simple lookups

[!NOTE] Read replicas reduce the load on the primary DB instance, whereas caching significantly reduces the response time for individual requests.

Question

To achieve transparent, in-memory acceleration for Amazon DynamoDB without modifying complex application logic, an architect should use ___.

Answer

DynamoDB Accelerator (DAX)

DAX is a fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement—from milliseconds to microseconds—even at millions of requests per second.

[!TIP] Use DAX for eventually consistent read-intensive workloads where you don't want to manage cache-aside logic in your application code.

Question

Write-Through Caching Pattern

Answer

In this pattern, data is written to the cache and the backend database simultaneously.

Loading Diagram...

Pros:

  • Data in the cache is always up-to-date with the database.
  • Simple read logic (reads always hit the cache first).

Cons:

  • Write Penalty: Higher latency for write operations because they must hit two systems.
  • Cache Churn: May populate the cache with data that is rarely read, wasting memory.

[!WARNING] Use this pattern when data consistency between the cache and the database is critical.

Architecting AWS Network Connectivity Strategies(4 cards shown)

Question

AWS Transit Gateway (TGW)

Answer

A fully managed network hub used to interconnect Virtual Private Clouds (VPCs) and on-premises networks.

[!TIP] Think of it as a "Cloud Router" that simplifies network topology by replacing complex peering meshes with a hub-and-spoke model.

Key Benefits:

  • Transitive Routing: Easily connect multiple VPCs through a single hub.
  • Centralized Management: Simplify edge connectivity to on-premises via VPN or Direct Connect.
  • Scalability: Supports thousands of VPCs and massive throughput.

Question

How can you increase the aggregate bandwidth of AWS Site-to-Site VPN connections beyond the standard 1.25 Gbps limit?

Answer

By using Equal Cost Multi-Path (ECMP) routing on an AWS Transit Gateway (TGW).

How it works:

  1. Establish multiple VPN tunnels between your on-premises customer gateway and the TGW.
  2. Enable ECMP on the Transit Gateway.
  3. The TGW will load balance traffic across the multiple tunnels.

Bandwidth Calculation: Total Bandwidth=Number of Tunnels×1.25 Gbps\text{Total Bandwidth} = \text{Number of Tunnels} \times 1.25\text{ Gbps}

[!NOTE] Your on-premises router must also support ECMP to utilize the full aggregate bandwidth for outbound traffic.

Question

When planning VPC subnets, AWS reserves ___ IP addresses in every CIDR block for internal use.

Answer

5

AWS reserves the following addresses in every subnet:

  1. x.x.x.0: Network address.
  2. x.x.x.1: Reserved by AWS for the VPC router.
  3. x.x.x.2: Reserved by AWS for mapping to the Amazon Provided DNS.
  4. x.x.x.3: Reserved by AWS for future use.
  5. x.x.x.255: Network broadcast address (AWS does not support broadcast, but the address is reserved).

[!WARNING] Always account for these 5 addresses when calculating the required size for your subnets (e.g., a /28 subnet has 16 addresses but only 11 are usable).

Question

Hybrid Connectivity: DX vs. VPN Failover Strategy

Answer

To balance cost and reliability, architects often use a combination of AWS Direct Connect (DX) and VPN.

Connection TypePrimary Use CasePerformanceCost
Direct Connect (DX)Primary link for heavy workloadsHigh/ConsistentHigher
Site-to-Site VPNBackup/Failover via Public InternetVariableLower
Loading Diagram...

[!TIP] This is considered a "two-way door" decision because you can start with VPN and migrate to DX as traffic grows.

Assessing Solutions and Rightsizing for AWS(4 cards shown)

Question

Rightsizing

Answer

The process of matching instance types and sizes to your workload performance and capacity requirements at the lowest possible cost.

[!NOTE] Rightsizing is a continuous process. It should be performed both before migration (using on-premises metrics) and after migration (using cloud-native monitoring).

Key Goals:

  • Maximize performance efficiency
  • Minimize unnecessary expenditure
  • Eliminate idle or under-provisioned resources

Question

Which five AWS resources are currently supported by AWS Compute Optimizer for machine learning-based rightsizing recommendations?

Answer

AWS Compute Optimizer analyzes utilization metrics to provide recommendations for the following:

  1. Amazon EC2 instances
  2. Amazon EC2 Auto Scaling groups
  3. Amazon EBS volumes
  4. AWS Lambda functions
  5. Amazon ECS services on AWS Fargate
Loading Diagram...

Question

Compare the rightsizing approach: Pre-migration vs. Post-migration.

Answer

PhaseData SourcesGoal
Pre-migrationVMware vSphere, Microsoft Hyper-V metricsMap on-premises workloads to the correct AWS instance family (e.g., Compute, Memory Optimized).
Post-migrationCloudWatch, Trusted Advisor, Cost Explorer, Compute OptimizerContinuous optimization based on real-time AWS usage patterns and ML recommendations.

[!TIP] Use Compute Optimizer to automate the analysis of over-provisioned and under-provisioned resources once the workload is in the cloud.

Question

Applications with a ___ consumption pattern are ideal for Reserved Instances or Savings Plans, while those with ___ requirements are better suited for Spot Instances.

Answer

steady state ; spiky/variable

Reasoning:

  • Steady State: Committing to 1–3 years provides deep discounts (up to 72%) via RIs or Savings Plans.
  • Spiky/Fault-Tolerant: Spot Instances utilize spare AWS capacity for up to 90% savings but can be reclaimed by AWS with a 2-minute warning.

[!WARNING] Never use Spot Instances for workloads that cannot handle interruptions unless a robust stateless architecture is in place.

Asset Planning and Workload Migration (AWS SAP-C02)(2 cards shown)

Question

The 7Rs of Migration

Answer

The seven common migration strategies for moving workloads to the cloud:

StrategyActionDescription
RetireDecommissionTurn off applications no longer needed.
RetainKeepLeave apps on-premises (compliance/latency).
RehostLift and ShiftMove to cloud without changes (EC2).
RelocateTransferMove VMware/containers without new hardware.
RepurchaseDrop and ShopSwitch to a SaaS version (e.g., Salesforce).
ReplatformLift and ReshapeMinor optimization (e.g., move to Amazon RDS).
Re-architectRefactorFull redesign to be cloud-native (Lambda/S3).

[!NOTE] Re-architecting offers the highest ROI but involves the most complexity and cost.

Question

What is the primary function of AWS Migration Hub in the context of portfolio assessment and asset planning?

Answer

AWS Migration Hub provides a centralized location to discover, plan, and track migrations across multiple AWS and partner tools.

Key Capabilities:

  • Discovery: Collects inventory and utilization data from on-premises servers via Discovery Agents or Connectors.
  • Planning: Organizes servers into applications and helps determine the best migration strategy.
  • Tracking: Monitors the status of migrations regardless of which tool (e.g., Application Migration Service, Database Migration Service) is being used.

[!TIP] It is the "Single Pane of Glass" for migration visibility.

Showing 30 of 824 flashcards. Study all flashcards →

Ready to ace AWS Certified Solutions Architect - Professional (SAP-C02)?

Access all 1,035 practice questions, 12 timed mock exams, study notes, and flashcards — no sign-up required.

Start Studying — Free