AWS Logging, Monitoring, and Auditing for Data Engineers
Deploy logging and monitoring solutions to facilitate auditing and traceability
AWS Logging, Monitoring, and Auditing for Data Engineers
This guide covers the deployment of logging and monitoring solutions to facilitate auditing and traceability within AWS data pipelines, focusing on key services like CloudWatch, CloudTrail, and AWS Config.
Learning Objectives
After studying this material, you should be able to:
- Identify the primary AWS services used for logging (CloudWatch), auditing (CloudTrail), and configuration tracking (AWS Config).
- Differentiate between management events and data events in AWS CloudTrail.
- Implement logging within serverless components like AWS Lambda.
- Analyze log data using serverless query tools like Amazon Athena and CloudWatch Logs Insights.
- Design monitoring architectures that support compliance requirements (e.g., GDPR, HIPAA).
Key Terms & Glossary
- CloudWatch Logs: A centralized service for storing and monitoring application and system logs.
- CloudTrail: A service that records API calls made within an AWS account for auditing and security.
- Audit Trail: A chronological record of security-relevant chronological records that provide documentary evidence of the sequence of activities.
- Traceability: The ability to verify the history, location, or application of an item by means of documented recorded identification.
- Model Drift: The phenomenon where a machine learning model's performance degrades over time due to changes in real-world data patterns.
The "Big Idea"
Think of a data pipeline like an aircraft. Logging and Monitoring are the cockpit instruments (altimeter, fuel gauge) that tell the pilot how the system is performing right now. Auditing is the "Black Box" (flight data recorder) that provides an immutable record of every action taken by the crew and the system. Together, they ensure the flight is safe, performant, and compliant with aviation (regulatory) standards.
Formula / Concept Box
| Feature | Core Purpose | Data Type |
|---|---|---|
| CloudWatch | Performance & Health | Metrics, Application Logs, Alarms |
| CloudTrail | Governance & Compliance | API Call History (Who/What/When) |
| AWS Config | Configuration Integrity | Resource State, History, Relationships |
| Athena | Log Analytics | SQL-based analysis of logs in S3 |
Hierarchical Outline
- Extraction of Logs for Audits
- AWS CloudTrail: Captures API calls (Glue, EMR, Step Functions).
- CloudWatch Logs: Centralized application logs (Lambda, MWAA).
- Application Logs: Custom logs from 3rd party or internal tools.
- Deployment & Implementation
- Infrastructure as Code (IaC): Using AWS SAM or CloudFormation for repeatable monitoring setups.
- Lambda Logging: Integrating the
logginglibrary in Python to capture event context.
- Log Analysis & Insights
- Amazon Athena: Querying logs stored in S3 using standard SQL.
- Amazon OpenSearch: Advanced log analytics and visual dashboards (Kibana).
- CloudTrail Lake: Centralized, immutable query store for audit logs.
- Operational Maintenance
- Alarms & Notifications: Using SNS to alert on pipeline failures.
- Security Monitoring: Using Amazon Macie to detect sensitive data in logs.
Visual Anchors
The Data Logging Flow
Monitoring Trinity
\begin{tikzpicture}[node distance=2cm, every node/.style={draw, rectangle, rounded corners, fill=blue!10, align=center}] \node (Trail) {\textbf{CloudTrail} \ API Calls (Who?)}; \node (Watch) [right=of Trail] {\textbf{CloudWatch} \ Performance (How?)}; \node (Config) [below=of Trail, xshift=2cm] {\textbf{AWS Config} \ Configuration (What changed?)}; \draw[<->, thick] (Trail) -- (Watch); \draw[<->, thick] (Watch) -- (Config); \draw[<->, thick] (Config) -- (Trail); \end{tikzpicture}
Definition-Example Pairs
- Management Events: Operations performed on resources in your AWS account (Control Plane).
- Example: A user creating an Amazon S3 bucket or updating a Lambda function's configuration.
- Data Events: Resource operations performed on or within the resource itself (Data Plane).
- Example: A user uploading a file to an S3 bucket (PutObject) or invoking a Lambda function.
- CloudWatch Alarms: A mechanism to watch a single metric and perform actions based on the value of the metric relative to a threshold.
- Example: Sending an SNS notification to the engineering team if an EMR cluster's CPU exceeds 85% for 5 minutes.
Worked Examples
Example 1: Lambda Logging in Python
To ensure traceability in a serverless pipeline, you must configure the logger to capture the incoming event data.
import logging
import boto3
# Configure logger
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def lambda_handler(event, context):
# Log the incoming event for auditability
logger.info(f"Received event: {event}")
# Business logic
try:
processed_data = "Result data"
logger.info(f"Success: {processed_data}")
except Exception as e:
logger.error(f"Error processing: {str(e)}")
raise eExample 2: Analyzing Logs with Athena
If CloudTrail logs are saved to S3, you can use Athena to find who deleted a Glue Table.
SELECT useridentity.arn, eventtime, eventsource, eventname
FROM cloudtrail_logs
WHERE eventname = 'DeleteTable'
AND eventsource = 'glue.amazonaws.com';Checkpoint Questions
- Which service would you use to see a history of how a specific S3 bucket's policy has changed over the last 6 months?
- What is the main difference between CloudWatch Logs and CloudTrail Lake?
- True or False: CloudTrail tracks both API and non-API actions (like console logins).
- Which analysis tool is best suited for real-time visualization of logs with a dashboard?
[!TIP] Answers: 1. AWS Config; 2. CloudWatch is for app/performance logs, CloudTrail Lake is for managed audit/API query storage; 3. True; 4. Amazon OpenSearch (formerly Elasticsearch).
Comparison Tables
| Feature | CloudWatch Logs | CloudTrail | AWS Config |
|---|---|---|---|
| Primary Focus | Application Performance | Operational Auditing | Compliance & Config History |
| Storage | Log Groups | S3 / CloudTrail Lake | S3 Bucket (History Files) |
| Retention | Configurable (1 day to Never) | 90 Days (Free) / Indefinite (S3) | Indefinite (S3) |
| Standard Alert | Alarms on Metrics | CloudWatch Events on API Calls | Config Rules (Compliance) |
Muddy Points & Cross-Refs
- CloudWatch vs. CloudTrail: Many students get these confused. Remember: CloudWatch is for watching your application's health. CloudTrail is for following the trail of people (API calls).
- Management vs. Data Events: By default, CloudTrail only logs Management Events. Data Events (like S3 object-level actions) are high-volume and incur extra costs; they must be enabled explicitly.
- Cost Management: Logging every successful "200 OK" response can lead to massive storage bills. AWS recommends logging errors (400/500 levels) in production while keeping verbose logging (Info/Debug) for development.
- Further Study: See Unit 4 for how this integrates with Data Governance and PII identification (Amazon Macie).