Study Guide920 words

Study Guide: Determining High-Performing Database Solutions

Determine high-performing database solutions

High-Performing Database Solutions

This guide explores how to select, design, and optimize database architectures on AWS to meet high-performance requirements, focusing on the AWS Certified Solutions Architect - Associate (SAA-C03) objectives.

Learning Objectives

By the end of this module, you will be able to:

  • Distinguish between relational and non-relational database use cases for high performance.
  • Implement scaling strategies including Read Replicas and database caching.
  • Select appropriate database engines (Aurora vs. RDS vs. DynamoDB) based on latency and throughput needs.
  • Understand the impact of database connection pooling and proxies on application performance.
  • Evaluate capacity planning metrics like Provisioned IOPS and Capacity Units.

Key Terms & Glossary

  • IOPS (Input/Output Operations Per Second): A measure of storage performance. High IOPS are critical for database workloads with frequent small reads/writes.
  • Read Replica: A read-only copy of a database instance used to offload read traffic from the primary instance.
  • ACID vs. BASE:
    • ACID (Atomicity, Consistency, Isolation, Durability): Standard for relational databases (RDS).
    • BASE (Basically Available, Soft state, Eventual consistency): Standard for many NoSQL databases (DynamoDB).
  • Vertical Scaling: Increasing the CPU/RAM of a single instance (Scale-up).
  • Horizontal Scaling: Adding more instances or sharding data (Scale-out).
  • DAX (DynamoDB Accelerator): An in-memory cache for DynamoDB that reduces response times from milliseconds to microseconds.

The "Big Idea"

Performance in the cloud is not just about choosing a "faster" server; it is about choosing the purpose-built engine for the specific data access pattern. High-performing architectures move away from monolithic databases toward decoupled, specialized data stores where the storage volume (IOPS), the engine (Aurora), and the caching layer (ElastiCache) are all optimized to eliminate bottlenecks.

Formula / Concept Box

FeatureAmazon RDSAmazon AuroraAmazon DynamoDB
TypeRelational (SQL)Cloud-Native RelationalNoSQL (Key-Value/Document)
ScalingVertical + Read ReplicasAuto-scaling storage + ReplicasSeamless Horizontal Scaling
LatencyMillisecondsMillisecondsSingle-digit Milliseconds
High Perf ToolProvisioned IOPS (PIOPS)Parallel Query / Global DBDAX / Provisioned Capacity

[!NOTE] CAP Theorem: In a distributed data store, you can only provide two of the following three guarantees: Consistency, Availability, and Partition Tolerance. For example, RDS often prioritizes Consistency, while DynamoDB favors Availability and Partition Tolerance.

Hierarchical Outline

  1. Relational Database Performance (RDS)
    • Engine Selection: Choosing between MySQL, PostgreSQL, SQL Server, and Oracle.
    • Storage Types: General Purpose SSD vs. Provisioned IOPS SSD (io1/io2) for high-throughput workloads.
    • Read Replicas: Offloading read-heavy workloads (up to 5 replicas for RDS).
  2. The Aurora Advantage
    • Architecture: Distributed, fault-tolerant storage system that auto-scales.
    • Scaling: Up to 15 Read Replicas with sub-10ms replica lag.
    • Global Databases: For low-latency reads across different AWS Regions.
  3. NoSQL & High Performance (DynamoDB)
    • Performance at Scale: Consistent performance regardless of data size.
    • On-Demand vs. Provisioned: Choosing between predictable throughput or auto-scaling for spikes.
    • In-Memory Acceleration: Using DAX for extreme read performance.
  4. Database Caching & Proxies
    • ElastiCache: Redis (complex data types) vs. Memcached (simple caching).
    • RDS Proxy: Managing large numbers of concurrent connections to improve efficiency.

Visual Anchors

Database Selection Flow

Loading Diagram...

Read Replica Architecture

\begin{tikzpicture}[node distance=2cm] \node (App) [draw, rectangle, fill=blue!10] {Application}; \node (Primary) [draw, cylinder, shape border rotate=90, minimum height=1.5cm, fill=green!10, below of=App, xshift=-2cm] {Primary DB}; \node (RR1) [draw, cylinder, shape border rotate=90, minimum height=1.5cm, fill=orange!10, right of=Primary, xshift=1cm] {Replica 1}; \node (RR2) [draw, cylinder, shape border rotate=90, minimum height=1.5cm, fill=orange!10, right of=RR1] {Replica 2};

code
\draw[->, thick] (App) -- node[left] {Writes} (Primary); \draw[->, thick] (App) -- node[right] {Reads} (RR1); \draw[->, thick] (App) -- node[right] {Reads} (RR2); \draw[->, dashed] (Primary) -- node[above] {Async Sync} (RR1); \draw[->, dashed] (RR1) -- (RR2);

\end{tikzpicture}

Definition-Example Pairs

  • Database Proxy: A managed database proxy that pools and shares established database connections.
    • Example: A serverless Lambda function triggers frequently. Instead of opening a new RDS connection every time (exhausting memory), it connects to RDS Proxy, which reuses existing connections.
  • Global Tables: Multi-region, multi-active DynamoDB replication.
    • Example: A global gaming app stores player scores in DynamoDB Global Tables so a user in Tokyo and a user in London both get sub-10ms access to the same data.
  • In-Memory Database: Data stored entirely in RAM for fastest possible access.
    • Example: A leaderboard for a live auction uses Amazon ElastiCache for Redis to update and sort prices in real-time.

Worked Examples

Scenario 1: Scaling a Bottlenecked RDS Instance

Problem: A social media application uses an RDS MySQL instance. During peak hours, CPU utilization hits 90% and query latency increases. The dashboard shows that 80% of the traffic is read-intensive.

Solution:

  1. Identify the Bottleneck: The primary instance is struggling with read requests.
  2. Deploy Read Replicas: Create 3 Read Replicas across different Availability Zones.
  3. Update Application: Point the application's "read" connection string to the Read Replica endpoint instead of the Primary endpoint.
  4. Result: CPU on the Primary drops to 20%, and query latency is reduced for all users.

Scenario 2: Reducing Latency for Key-Value Lookups

Problem: A mobile app uses DynamoDB to store user profiles. Some users are experiencing latency spikes during high-traffic events.

Solution:

  1. Enable DAX: Provision an Amazon DynamoDB Accelerator (DAX) cluster.
  2. Modify Code: Use the DAX SDK to point the app to the DAX endpoint.
  3. Benefit: Frequent read requests (e.g., getting the same popular user profile) are served from the cache, reducing latency from ~10ms to ~100μs (microseconds).

Checkpoint Questions

  1. Which storage type should you choose for an RDS instance that requires 20,000 IOPS?
  2. What is the maximum number of Read Replicas allowed for Amazon Aurora?
  3. True or False: Multi-AZ deployments in RDS are primarily used for scaling read performance.
  4. Which service would you use to manage a surge of database connections from AWS Lambda?
  5. When would you choose ElastiCache for Redis over Memcached?
Click to view answers
  1. Provisioned IOPS SSD (io1/io2).
  2. False (Multi-AZ is for high availability/disaster recovery; Read Replicas are for scaling).
  3. RDS Proxy.
  4. When you need complex data structures (sets, sorted sets, lists) or data persistence/replication.

Ready to study AWS Certified Solutions Architect - Associate (SAA-C03)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free