Study Guide925 words

Mastering SageMaker Unified Studio: Domains, Domain Units, and Projects

Use domain, domain units, and projects for SageMaker Unified Studio

Mastering SageMaker Unified Studio: Domains, Domain Units, and Projects

This guide covers the structural and governance foundations of Amazon SageMaker Unified Studio, focusing on how to organize machine learning environments using domains, domain units, and projects to ensure secure, scalable, and collaborative workflows.

Learning Objectives

By the end of this guide, you will be able to:

  • Define the role of a SageMaker Domain as the primary administrative unit.
  • Configure Domain Units to partition resources and users for organizational multi-tenancy.
  • Execute project-based collaboration within SageMaker Unified Studio.
  • Apply the principle of least privilege using SageMaker Catalog and project permissions.
  • Distinguish between SageMaker Studio Classic and the modern Unified Studio architecture.

Key Terms & Glossary

  • SageMaker Domain: A top-level entity that manages user profiles, storage (Amazon EFS), and network configurations for a group of users.
  • Domain Unit: A logical subdivision within a domain used to group users and resources for governance (e.g., by department or team).
  • Project: A collaborative workspace that bundles code, data references, and ML resources (models, pipelines) for a specific task.
  • SageMaker Catalog: A business-focused data catalog used to discover and manage data access within Unified Studio.
  • User Profile: An identity within a domain representing an individual user with specific execution roles.

The "Big Idea"

In a large organization, data science shouldn't be a "Wild West." SageMaker Unified Studio provides a hierarchical governance framework. Think of a Domain as the building, Domain Units as the floors (departments), and Projects as the specific rooms where teams work together. This structure allows administrators to enforce security at the building level while giving teams the flexibility to build within their rooms.

Formula / Concept Box

LevelComponentPrimary PurposeKey Resource Created
RootSageMaker DomainIdentity & Storage ManagementAmazon EFS Volume
LogicalDomain UnitGovernance & Multi-tenancyResource Groups / IAM Tags
ExecutionProjectCollaboration & LifecycleSageMaker Project Metadata
DataSageMaker CatalogDiscovery & AuthorizationData Asset Links

Hierarchical Outline

  1. SageMaker Domain (The Foundation)
    • Multi-user Support: Connects to AWS IAM or IAM Identity Center.
    • Shared Storage: Automatically provisions an Amazon EFS volume for home directories.
    • Networking: Configurable in VPC-only mode for high-security environments.
  2. Domain Units & Organization
    • Partitioning: Isolating compute and storage costs between business units.
    • Policy Inheritance: Applying security guardrails at the unit level.
  3. Unified Studio Projects
    • Collaboration: Shared spaces for notebooks, code, and experiments.
    • Integration: Seamless connection to SageMaker Catalog for governed data access.
    • Automation: Templates for MLOps (CI/CD) pipelines.

Visual Anchors

Governance Hierarchy

Loading Diagram...

Domain Storage Architecture

Compiling TikZ diagram…
Running TeX engine…
This may take a few seconds

Definition-Example Pairs

  • Term: Principle of Least Privilege (in SageMaker)

  • Definition: Ensuring a user or project has only the minimum permissions necessary to perform their task.

  • Example: Using SageMaker Catalog to grant a "Data Scientist" role access only to the customer_churn table in S3, rather than the entire S3 bucket.

  • Term: Managed vs. Unmanaged Services

  • Definition: Managed services handle the underlying infrastructure automatically; unmanaged require manual setup.

  • Example: SageMaker Notebook Instances are managed (AWS handles patching), whereas running a custom Jupyter server on a raw EC2 instance is unmanaged.

Worked Examples

Scenario 1: Initial Domain Setup

Task: Set up a domain for a single developer using the "Quick Setup."

  1. Navigate: Go to the SageMaker Console -> Domains.
  2. Action: Click "Create Domain" and select "Quick Setup."
  3. Input: Enter a name (e.g., dev-environment) and select an execution role.
  4. Result: AWS automatically creates an execution role, sets up an EFS volume, and configures the networking. Access is granted via IAM.

Scenario 2: Restricting Access via Projects

Task: Create a project that restricts data access to a specific team.

  1. Project Creation: Within Unified Studio, create a new "Project."
  2. Catalog Link: Use the SageMaker Catalog to "Share" a specific dataset with that Project ID.
  3. Verification: A user in a different project attempts to query the dataset; they receive an "Access Denied" because the governance framework limits data sharing to the Project scope.

Checkpoint Questions

  1. What AWS storage service is automatically provisioned when a SageMaker Domain is created?
  2. How do Domain Units differ from Projects in SageMaker Unified Studio?
  3. True or False: A user profile can belong to multiple SageMaker Domains simultaneously within the same region.
  4. Which tool is used within Unified Studio to manage business data catalogs and governance patterns?
Click to see answers
  1. Amazon EFS (Elastic File System).
  2. Domain Units are administrative/logical groupings for governance; Projects are functional workspaces for collaborative development.
  3. False; User profiles are scoped to a specific Domain.
  4. SageMaker Catalog.

Comparison Tables

Studio Classic vs. Unified Studio

FeatureSageMaker Studio ClassicSageMaker Unified Studio
Core FocusModel Development (IDE)Full Data & ML Lifecycle
GovernanceIAM-based / Role-basedDomain Unit & Project-based
Data DiscoveryManual / Glue CatalogIntegrated SageMaker Catalog
User InterfaceJupyterLab centricUnified Dashboard with Apps

Muddy Points & Cross-Refs

  • EFS Costs: Many learners forget that SageMaker Domains charge for the Amazon EFS storage used for user home directories. Even if no compute is running, EFS costs persist until the domain is deleted.
  • SageMaker Catalog vs. AWS Glue: While they overlap, think of AWS Glue as the technical metadata layer (the "how") and SageMaker Catalog as the business governance layer (the "who" and "why").
  • Deep Dive: For more on fine-grained access, see AWS Lake Formation integration with SageMaker for column-level security.

Ready to study AWS Certified Data Engineer - Associate (DEA-C01)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free