Study Guide: Deleting Data to Meet Business and Legal Requirements
Delete data to meet business and legal requirements
Deleting Data to Meet Business and Legal Requirements
Effective Data Lifecycle Management (DLM) ensures that data is available when needed, stored cost-effectively, and securely destroyed when it no longer serves a business or legal purpose. This guide covers the tools and strategies used within the AWS ecosystem to automate these processes.
Learning Objectives
- Define the stages of the Data Lifecycle (DLM).
- Identify regulatory requirements (GDPR, HIPAA) influencing data retention.
- Implement automated deletion using Amazon S3 Lifecycle policies and DynamoDB TTL.
- Configure WORM (Write Once, Read Many) compliance using S3 Object Lock.
- Utilize AWS Macie and AWS Config to enforce deletion and classification policies.
Key Terms & Glossary
- DLM (Data Lifecycle Management): A policy-based approach to managing the flow of an information system's data throughout its life cycle: from creation and storage to the time that it becomes obsolete and is deleted.
- S3 Lifecycle Policy: A set of rules that define how Amazon S3 manages objects during their lifetime (e.g., transitioning to cheaper storage or expiring/deleting objects).
- TTL (Time to Live): A mechanism in DynamoDB that allows you to define a specific timestamp to delete items from your tables automatically.
- WORM (Write Once, Read Many): A data storage technology that allows information to be written to a storage medium once and prevents the drive from erasing or modifying the data.
- S3 Object Lock: A feature that allows you to store objects using a WORM model to help meet regulatory requirements that require objects to be non-erasable and non-modifiable.
The "Big Idea"
Data is a liability as much as it is an asset. While data drives insights, keeping "dark data" (unused or obsolete data) increases storage costs and expands the attack surface for potential security breaches. Data Deletion is not just about clearing space; it is a critical compliance function that ensures an organization adheres to legal frameworks like GDPR (Right to Erasure) and HIPAA while optimizing the performance of active datasets.
Formula / Concept Box
| Feature | Primary Use Case | Key Action |
|---|---|---|
| S3 Expiration | Unstructured Data (Files) | Automatically deletes objects after X days. |
| DynamoDB TTL | NoSQL Records | Deletes rows based on a designated timestamp attribute. |
| S3 Object Lock | Regulatory Compliance | Prevents deletion until a retention period expires. |
| EFS Lifecycle | File System Data | Moves data to EFS Infrequent Access or deletes it. |
Hierarchical Outline
- I. Drivers for Data Deletion
- Legal & Compliance: Meeting GDPR (Right to be Forgotten) and HIPAA (Healthcare) standards.
- Cost Optimization: Deleting duplicates or old logs to reduce monthly S3/EBS bills.
- Performance: Removing "cold" data from high-performance tiers to speed up queries.
- II. Amazon S3 Management
- Lifecycle Rules: Transition (move) vs. Expiration (delete).
- Versioning: Managing multiple versions; deleting "non-current" versions to save space.
- Object Lock: Legal Holds and Retention periods (Governance vs. Compliance mode).
- III. Database & File Storage Deletion
- DynamoDB: Using TTL to purge session data or expired logs without consuming WCU (Write Capacity Units).
- Amazon EFS: Policy-based management for file-based workloads.
- IV. Governance Tools
- AWS Macie: Discovering PII (Personally Identifiable Information) to target for deletion.
- AWS Config: Auditing resources to ensure lifecycle policies are active.
Visual Anchors
S3 Lifecycle Flow
Data Compliance Timeline
Definition-Example Pairs
- Term: S3 Versioning Expiration
- Definition: A rule specifically targeting older versions of an object once a newer version is uploaded.
- Example: A company keeps the last 3 versions of a document; the lifecycle policy automatically deletes the 4th oldest version to prevent infinite storage growth.
- Term: DynamoDB TTL
- Definition: An attribute-based deletion mechanism where the database background process deletes items whose timestamp has passed.
- Example: A web app stores user login sessions. A TTL attribute is set to
CurrentTime + 24 hours. DynamoDB deletes the session automatically after one day.
- Term: AWS Macie
- Definition: A machine learning service that discovers and protects sensitive data.
- Example: Macie scans an S3 bucket, finds unencrypted credit card numbers, and triggers a Lambda function to either encrypt or delete the offending files.
Worked Examples
Example 1: Creating an S3 Lifecycle Rule
Scenario: A healthcare provider must keep patient records in S3 Standard for 1 year, move them to Glacier for 6 years, and then delete them.
- Transition Rule: Set to move objects to
GLACIERafter 365 days. - Expiration Rule: Set to
Expire(delete) objects after 2,555 days (7 years total). - Result: The provider saves ~80% in storage costs during years 2-7 and ensures legal deletion exactly at the 7-year mark.
Example 2: DynamoDB TTL Logic
Scenario: Deleting logs older than 30 days.
- Step 1: In the application code, add a field
exp_dateto every log entry. - Step 2: The value must be in Epoch Time format (e.g.,
1734566400). - Step 3: Enable TTL in the DynamoDB console and point it to the
exp_dateattribute. - Note: Deletion occurs typically within 48 hours of the timestamp passing.
Checkpoint Questions
- What is the difference between S3 Transition and S3 Expiration?
- How does DynamoDB TTL impact your Write Capacity Units (WCU)?
- Which AWS service would you use to find PII that needs to be deleted for GDPR compliance?
- What is the main benefit of using S3 Object Lock in "Compliance Mode"?
Comparison Tables
| Strategy | Best For | Complexity | Cost Impact |
|---|---|---|---|
| Manual Deletion | One-time cleanup | High (Human error) | Immediate reduction |
| S3 Lifecycle | Large-scale file storage | Low (Set and forget) | Progressive reduction |
| DynamoDB TTL | High-velocity records | Medium (Code needed) | No extra cost for deletes |
| AWS Backup | Centralized management | Medium | Predictable |
Muddy Points & Cross-Refs
[!WARNING] Versioning vs. Lifecycle: Users often forget that if Versioning is enabled, a simple "Delete" action creates a Delete Marker but doesn't actually remove the data. You must configure "Expiration of Non-current Versions" in your lifecycle policy to truly delete the data and stop being charged.
[!IMPORTANT] Object Lock Modes:
- Governance Mode: Users with special permissions can still delete objects.
- Compliance Mode: Nobody (including the Root user) can delete the data until the retention period ends. Use this for strict legal requirements.