AWS CloudFront: Caching Content Based on Request Headers
Cache content based on request headers
AWS CloudFront: Caching Content Based on Request Headers
Learning Objectives
After studying this guide, you should be able to:
- Distinguish between a Cache Policy and an Origin Request Policy.
- Explain how including request headers in a cache key affects the Cache Hit Ratio.
- Configure CloudFront to forward only necessary headers to the origin to optimize performance.
- Identify the components that make up a CloudFront Cache Key.
- Predict whether a request will result in a cache hit or miss based on header configurations.
Key Terms & Glossary
- Cache Key: The unique identifier CloudFront uses to look up an object in its cache. It typically consists of the object URL but can be expanded to include specific headers, cookies, or query strings.
- Cache Hit: Occurs when CloudFront finds the requested object in its edge location cache and serves it directly to the user without contacting the origin.
- Cache Miss: Occurs when the requested object is not in the cache, requiring CloudFront to fetch it from the origin.
- Origin Request Policy: A set of rules that determines which headers, cookies, and query strings are sent to the origin, regardless of whether they are part of the cache key.
- Cache Policy: A configuration that defines how long items stay in the cache (TTL) and which request components are included in the cache key.
- TTL (Time to Live): The duration for which an object remains in the cache before it is considered expired.
The "Big Idea"
The core challenge of content delivery is balancing personalization with performance. If you cache content based on every possible header (like User-Agent), you provide highly specific content but suffer from a low cache hit ratio because every unique browser version creates a new cache entry. Conversely, caching based on no headers provides the best performance (highest hit ratio) but loses the ability to vary content for different users. Mastery of this topic involves using Cache Policies to selectively include only the most critical headers in the cache key.
Formula / Concept Box
| Concept | Impact on Performance | Rule of Thumb |
|---|---|---|
| Fewer Headers in Cache Key | ⬆️ Higher Cache Hit Ratio | Include only headers that change the content returned. |
| More Headers in Cache Key | ⬇️ Lower Cache Hit Ratio | Use for highly dynamic or localized content only. |
| Forwarding All Headers | ❌ Worst Performance | Never forward all headers unless absolutely required by the origin. |
| TTL Settings | ⏱️ Higher TTL = More Hits | Balance TTL with how frequently your source content changes. |
Hierarchical Outline
- I. Understanding the Cache Key
- The URL is the primary identifier.
- Optional Components: Query strings, Cookies, and HTTP Headers.
- Logic: If Request Key == Cached Key, then Hit; else Miss.
- II. Policy Management
- Cache Policy: Controls the "Key" and "Time" (TTL).
- Origin Request Policy: Controls "Communication" (What the origin sees).
- III. Header Forwarding Best Practices
- Whitelist Approach: Forward only necessary headers (e.g.,
Accept-Languagefor translation). - Avoid
User-Agent: Including this in the cache key creates thousands of variations for the same file.
- Whitelist Approach: Forward only necessary headers (e.g.,
- IV. Performance Optimization
- Cache Hit Ratio: The percentage of requests served from the edge.
- Goal: Minimize variations in the cache key to maximize hits.
Visual Anchors
Request Flow Logic
The Composition of a Cache Key
\begin{tikzpicture}[node distance=0.5cm] \node (url) [draw, fill=blue!10, minimum width=4cm, minimum height=1cm] {Object URL (e.g., /image.png)}; \node (plus) [below=of url] {+}; \node (headers) [draw, fill=green!10, below=of plus, minimum width=4cm, minimum height=1cm] {Selected Headers (Whitelist)}; \node (plus2) [below=of headers] {+}; \node (cookies) [draw, fill=orange!10, below=of plus2, minimum width=4cm, minimum height=1cm] {Selected Cookies}; \draw [dashed, ->] (url.south) -- (plus.north); \draw [dashed, ->] (plus.south) -- (headers.north); \draw [dashed, ->] (headers.south) -- (plus2.north); \draw [dashed, ->] (plus2.south) -- (cookies.north); \node (final) [draw, thick, red, fit=(url) (cookies), inner sep=0.3cm, label=right:{Complete Cache Key}] {}; \end{tikzpicture}
Definition-Example Pairs
- Forwarding Headers: Sending specific HTTP headers from the viewer's request to the origin server.
- Example: Forwarding the
CloudFront-Viewer-Countryheader so your server can return a country-specific homepage.
- Example: Forwarding the
- Whitelist: A list of specific items (headers, cookies) allowed to be part of the cache key or forwarded.
- Example: Whitelisting the
Accept-Encodingheader so the origin can provide Gzip or Brotli versions of a file based on browser support.
- Example: Whitelisting the
- Path Pattern: A URL rule used to apply different cache behaviors to different parts of a site.
- Example: Setting a path pattern of
/api/*to have 0 TTL (no caching) while/images/*has a 1-year TTL.
- Example: Setting a path pattern of
Worked Examples
Example 1: The Impact of the User-Agent Header
Scenario: You have a website with a styles.css file. You configure CloudFront to include the User-Agent header in the cache key because you want to track browser statistics at the origin.
- Result: Your Cache Hit Ratio drops to near 0%.
- Why?: Because
User-Agentstrings are extremely diverse (e.g.,Mozilla/5.0 (Windows NT 10.0; Win64; x64)...). CloudFront treats a request from Chrome on Windows as a different key than Chrome on Mac, even though thestyles.cssfile is identical for both. - Fix: Use an Origin Request Policy to send the
User-Agentto the origin for logging, but do not include it in the Cache Policy key.
Example 2: Multilingual Support
Scenario: Your origin serves different versions of index.html based on the user's preferred language.
- Step 1: Create a Cache Policy.
- Step 2: Add
Accept-Languageto the header whitelist in the Cache Policy. - Outcome: CloudFront will now store separate versions of
index.htmlforen-US,fr-FR, etc. Users requesting the same language will get a cache hit, while users with new languages will trigger a miss and a fetch from the origin.
Checkpoint Questions
- What is the primary difference between a Cache Policy and an Origin Request Policy?
- Why does including "All Headers" in a cache key lead to poor performance?
- If a request matches a path pattern but the resource is not in the cache, what is the sequence of events?
- True or False: Using the URL alone as a cache key provides the highest possible cache hit ratio.
- How can you ensure the origin receives a header without reducing the cache hit ratio for that resource?
▶Click to see answers
- Cache Policy defines what is used to create the Cache Key and how long it stays in cache; Origin Request Policy defines what the origin server receives regardless of the cache key.
- It creates too many unique versions of the same object, making it unlikely that two users will share the same cache key, thus increasing cache misses.
- CloudFront performs a Cache Miss; it forwards the request to the origin (applying Origin Request Policy), receives the object, stores it based on the Cache Policy, and serves it to the user.
- True. It minimizes the variations, meaning all users requesting that URL will share the same cached object.
- Add the header to the Origin Request Policy but keep it out of the Cache Policy key whitelist.