Study Guide945 words

Data Governance and Permissions: Amazon Redshift Data Sharing

Grant permissions for data sharing (for example, data sharing for Amazon Redshift)

Data Governance and Permissions: Amazon Redshift Data Sharing

This guide covers the mechanisms for sharing data securely across AWS environments, focusing on Amazon Redshift's live data sharing capabilities, AWS Lake Formation integration, and the transition from traditional ETL to a modern data mesh architecture.

Learning Objectives

After studying this guide, you should be able to:

  • Configure Amazon Redshift Data Sharing between clusters, accounts, and regions.
  • Distinguish between Producer and Consumer cluster responsibilities.
  • Apply Lake Formation permissions (TBAC) to manage centralized data access.
  • Implement Fine-Grained Access Control using Row-Level Security (RLS) and Dynamic Data Masking (DDM).
  • Utilize AWS RAM and S3 Access Points for broader resource sharing.

Key Terms & Glossary

  • Producer Cluster: The Redshift cluster that owns the data and creates a datashare.
  • Consumer Cluster: The cluster that receives a datashare to query data in a read-only format.
  • Datashare: A Redshift object that defines the schema, tables, or views to be shared.
  • LF-Tags: Metadata tags in Lake Formation used for Tag-Based Access Control (TBAC).
  • Resource Link: A local pointer in a consumer's Data Catalog that maps to a shared resource in a producer's account.
  • Outbound/Inbound Share: The status of a datashare from the perspective of the provider (outbound) or recipient (inbound).

The "Big Idea"

Traditionally, sharing data meant building complex ETL pipelines to copy data from one silo to another, leading to data staleness and high storage costs. The modern "Big Idea" is Live Data Sharing. By allowing consumers to query the producer's data in-place without moving it, organizations achieve a "Single Source of Truth" and workload isolation—where analytical queries in one department don't impact the performance of production workloads in another.

Formula / Concept Box

Command / ConceptSyntax / RulePurpose
Create DatashareCREATE DATASHARE salesshare;Initializes a share object.
Add ObjectsALTER DATASHARE salesshare ADD SCHEMA public;Grants access to specific metadata levels.
Grant AccessGRANT USAGE ON DATASHARE salesshare TO NAMESPACE '...'Authorizes a specific consumer cluster.
Least PrivilegeIdentity + Resource + ConditionThe security baseline for all IAM/Lake Formation policies.

Hierarchical Outline

  1. Amazon Redshift Data Sharing
    • Architecture: Producer (Outbound) \rightarrow Consumer (Inbound).
    • Scope: Database, Schema, Table, View, and SQL UDFs.
    • Cross-Account: Requires authorization of the consumer AWS Account ID and Namespace.
  2. Authorization Mechanisms
    • RBAC (Role-Based): Assigning permissions to roles rather than individuals.
    • TBAC (Tag-Based): Using Lake Formation tags (e.g., Security=Confidential) to control access.
    • Fine-Grained Control: Row-Level Security (RLS) and Dynamic Data Masking (DDM).
  3. Governance Patterns
    • Hub-and-Spoke: Decentralized; producers manage their own consumers.
    • Centralized (Lake Formation): Single point of governance for all shares.
    • Data Mesh: Domain-driven ownership with standardized sharing protocols.

Visual Anchors

Redshift Data Sharing Workflow

Loading Diagram...

Cross-Account Governance Structure

\begin{tikzpicture}[node distance=2cm, every node/.style={draw, rectangle, rounded corners, fill=blue!10, align=center}] \node (LF) [fill=green!20] {AWS Lake Formation$Central Governance)}; \node (Prod) [left of=LF, xshift=-2cm] {Producer Account$S3 + Redshift)}; \node (Cons) [right of=LF, xshift=2cm] {Consumer Account$Athena + Redshift)};

code
\draw [->, thick] (Prod) -- node[above] {\small Share Metadata} (LF); \draw [->, thick] (LF) -- node[above] {\small Grant Permissions} (Cons); \draw [<->, dashed] (Prod) -- node[below] {\small Direct Data Access} (Cons);

\end{tikzpicture}

Definition-Example Pairs

  • Dynamic Data Masking (DDM): Obfuscating sensitive data at query time based on user role.
    • Example: A customer support rep sees XXXX-XXXX-1234 for a credit card number, while a billing manager sees the full number.
  • Row-Level Security (RLS): Restricting which rows a user can see based on a policy.
    • Example: A Regional Sales Manager in the "North" region can only see rows in the sales table where region_id = 'North'.
  • AWS RAM (Resource Access Manager): A service that allows sharing specific AWS resources across accounts.
    • Example: Sharing a VPC Subnet from a Central IT account to a Dev Team account so they can launch resources in a governed network.

Worked Examples

Scenario: Sharing Sales Data Cross-Account

Step 1: Producer Setup (Account 111122223333) Execute the following in the Redshift query editor:

sql
-- Create the share CREATE DATASHARE sales_share; -- Add the schema and a specific table ALTER DATASHARE sales_share ADD SCHEMA public; ALTER DATASHARE sales_share ADD TABLE public.daily_revenue; -- Grant access to the consumer account namespace GRANT USAGE ON DATASHARE sales_share TO NAMESPACE 'abc123-def456-ghi789';

Step 2: Consumer Setup (Account 444455556666)

sql
-- Create a local database reference to the share CREATE DATABASE sales_db FROM DATASHARE sales_share OF NAMESPACE '111122223333'; -- Query the data (Read-only) SELECT * FROM sales_db.public.daily_revenue;

Checkpoint Questions

  1. True or False: A consumer cluster can perform an UPDATE statement on a table shared via Redshift Data Sharing.
  2. Which service should you use to share a Glue Data Catalog table across accounts using tags like Industry=Healthcare?
  3. What is the benefit of using a Resource Link in Lake Formation?
  4. How does Redshift Data Sharing prevent "Data Staleness" compared to traditional ETL?

[!IMPORTANT] Answers: 1. False (Sharing is read-only). 2. AWS Lake Formation (using TBAC). 3. It allows users to query shared data as if it were a local table in their own catalog. 4. It provides access to live, committed data in the producer cluster without copying it.

Comparison Tables

FeatureRBAC (Role-Based)TBAC (Tag-Based)
MechanismPermissions attached to IAM/DB Roles.Permissions attached to Metadata Tags.
ScalabilityHarder; roles must be updated per user.High; new resources inherit tags and permissions.
ComplexityLow for small teams.Better for large-scale Data Lakes.
Primary ToolIAM, Redshift SQL.AWS Lake Formation.

Muddy Points & Cross-Refs

  • Cross-Region vs. Cross-Account: Redshift supports both, but Cross-Region sharing incurs data transfer costs, whereas Cross-Account in the same region does not.
  • Lake Formation vs. Redshift Native: You can manage Redshift shares through Lake Formation for centralized governance, or natively via SQL for simpler hub-and-spoke setups.
  • Superuser Powers: Remember that Redshift superusers or object owners have implicit GRANT and REVOKE rights that cannot be removed by others.

Ready to study AWS Certified Data Engineer - Associate (DEA-C01)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free