Data Governance and Permissions: Amazon Redshift Data Sharing
Grant permissions for data sharing (for example, data sharing for Amazon Redshift)
Data Governance and Permissions: Amazon Redshift Data Sharing
This guide covers the mechanisms for sharing data securely across AWS environments, focusing on Amazon Redshift's live data sharing capabilities, AWS Lake Formation integration, and the transition from traditional ETL to a modern data mesh architecture.
Learning Objectives
After studying this guide, you should be able to:
- Configure Amazon Redshift Data Sharing between clusters, accounts, and regions.
- Distinguish between Producer and Consumer cluster responsibilities.
- Apply Lake Formation permissions (TBAC) to manage centralized data access.
- Implement Fine-Grained Access Control using Row-Level Security (RLS) and Dynamic Data Masking (DDM).
- Utilize AWS RAM and S3 Access Points for broader resource sharing.
Key Terms & Glossary
- Producer Cluster: The Redshift cluster that owns the data and creates a datashare.
- Consumer Cluster: The cluster that receives a datashare to query data in a read-only format.
- Datashare: A Redshift object that defines the schema, tables, or views to be shared.
- LF-Tags: Metadata tags in Lake Formation used for Tag-Based Access Control (TBAC).
- Resource Link: A local pointer in a consumer's Data Catalog that maps to a shared resource in a producer's account.
- Outbound/Inbound Share: The status of a datashare from the perspective of the provider (outbound) or recipient (inbound).
The "Big Idea"
Traditionally, sharing data meant building complex ETL pipelines to copy data from one silo to another, leading to data staleness and high storage costs. The modern "Big Idea" is Live Data Sharing. By allowing consumers to query the producer's data in-place without moving it, organizations achieve a "Single Source of Truth" and workload isolation—where analytical queries in one department don't impact the performance of production workloads in another.
Formula / Concept Box
| Command / Concept | Syntax / Rule | Purpose |
|---|---|---|
| Create Datashare | CREATE DATASHARE salesshare; | Initializes a share object. |
| Add Objects | ALTER DATASHARE salesshare ADD SCHEMA public; | Grants access to specific metadata levels. |
| Grant Access | GRANT USAGE ON DATASHARE salesshare TO NAMESPACE '...' | Authorizes a specific consumer cluster. |
| Least Privilege | Identity + Resource + Condition | The security baseline for all IAM/Lake Formation policies. |
Hierarchical Outline
- Amazon Redshift Data Sharing
- Architecture: Producer (Outbound) Consumer (Inbound).
- Scope: Database, Schema, Table, View, and SQL UDFs.
- Cross-Account: Requires authorization of the consumer AWS Account ID and Namespace.
- Authorization Mechanisms
- RBAC (Role-Based): Assigning permissions to roles rather than individuals.
- TBAC (Tag-Based): Using Lake Formation tags (e.g.,
Security=Confidential) to control access. - Fine-Grained Control: Row-Level Security (RLS) and Dynamic Data Masking (DDM).
- Governance Patterns
- Hub-and-Spoke: Decentralized; producers manage their own consumers.
- Centralized (Lake Formation): Single point of governance for all shares.
- Data Mesh: Domain-driven ownership with standardized sharing protocols.
Visual Anchors
Redshift Data Sharing Workflow
Cross-Account Governance Structure
\begin{tikzpicture}[node distance=2cm, every node/.style={draw, rectangle, rounded corners, fill=blue!10, align=center}] \node (LF) [fill=green!20] {AWS Lake Formation$Central Governance)}; \node (Prod) [left of=LF, xshift=-2cm] {Producer Account$S3 + Redshift)}; \node (Cons) [right of=LF, xshift=2cm] {Consumer Account$Athena + Redshift)};
\draw [->, thick] (Prod) -- node[above] {\small Share Metadata} (LF);
\draw [->, thick] (LF) -- node[above] {\small Grant Permissions} (Cons);
\draw [<->, dashed] (Prod) -- node[below] {\small Direct Data Access} (Cons);\end{tikzpicture}
Definition-Example Pairs
- Dynamic Data Masking (DDM): Obfuscating sensitive data at query time based on user role.
- Example: A customer support rep sees
XXXX-XXXX-1234for a credit card number, while a billing manager sees the full number.
- Example: A customer support rep sees
- Row-Level Security (RLS): Restricting which rows a user can see based on a policy.
- Example: A Regional Sales Manager in the "North" region can only see rows in the
salestable whereregion_id = 'North'.
- Example: A Regional Sales Manager in the "North" region can only see rows in the
- AWS RAM (Resource Access Manager): A service that allows sharing specific AWS resources across accounts.
- Example: Sharing a VPC Subnet from a Central IT account to a Dev Team account so they can launch resources in a governed network.
Worked Examples
Scenario: Sharing Sales Data Cross-Account
Step 1: Producer Setup (Account 111122223333) Execute the following in the Redshift query editor:
-- Create the share
CREATE DATASHARE sales_share;
-- Add the schema and a specific table
ALTER DATASHARE sales_share ADD SCHEMA public;
ALTER DATASHARE sales_share ADD TABLE public.daily_revenue;
-- Grant access to the consumer account namespace
GRANT USAGE ON DATASHARE sales_share TO NAMESPACE 'abc123-def456-ghi789';Step 2: Consumer Setup (Account 444455556666)
-- Create a local database reference to the share
CREATE DATABASE sales_db FROM DATASHARE sales_share OF NAMESPACE '111122223333';
-- Query the data (Read-only)
SELECT * FROM sales_db.public.daily_revenue;Checkpoint Questions
- True or False: A consumer cluster can perform an
UPDATEstatement on a table shared via Redshift Data Sharing. - Which service should you use to share a Glue Data Catalog table across accounts using tags like
Industry=Healthcare? - What is the benefit of using a Resource Link in Lake Formation?
- How does Redshift Data Sharing prevent "Data Staleness" compared to traditional ETL?
[!IMPORTANT] Answers: 1. False (Sharing is read-only). 2. AWS Lake Formation (using TBAC). 3. It allows users to query shared data as if it were a local table in their own catalog. 4. It provides access to live, committed data in the producer cluster without copying it.
Comparison Tables
| Feature | RBAC (Role-Based) | TBAC (Tag-Based) |
|---|---|---|
| Mechanism | Permissions attached to IAM/DB Roles. | Permissions attached to Metadata Tags. |
| Scalability | Harder; roles must be updated per user. | High; new resources inherit tags and permissions. |
| Complexity | Low for small teams. | Better for large-scale Data Lakes. |
| Primary Tool | IAM, Redshift SQL. | AWS Lake Formation. |
Muddy Points & Cross-Refs
- Cross-Region vs. Cross-Account: Redshift supports both, but Cross-Region sharing incurs data transfer costs, whereas Cross-Account in the same region does not.
- Lake Formation vs. Redshift Native: You can manage Redshift shares through Lake Formation for centralized governance, or natively via SQL for simpler hub-and-spoke setups.
- Superuser Powers: Remember that Redshift superusers or object owners have implicit
GRANTandREVOKErights that cannot be removed by others.