Mastering SageMaker Unified Studio: Domains, Domain Units, and Projects
Use domain, domain units, and projects for SageMaker Unified Studio
Mastering SageMaker Unified Studio: Domains, Domain Units, and Projects
This guide covers the structural and governance foundations of Amazon SageMaker Unified Studio, focusing on how to organize machine learning environments using domains, domain units, and projects to ensure secure, scalable, and collaborative workflows.
Learning Objectives
By the end of this guide, you will be able to:
- Define the role of a SageMaker Domain as the primary administrative unit.
- Configure Domain Units to partition resources and users for organizational multi-tenancy.
- Execute project-based collaboration within SageMaker Unified Studio.
- Apply the principle of least privilege using SageMaker Catalog and project permissions.
- Distinguish between SageMaker Studio Classic and the modern Unified Studio architecture.
Key Terms & Glossary
- SageMaker Domain: A top-level entity that manages user profiles, storage (Amazon EFS), and network configurations for a group of users.
- Domain Unit: A logical subdivision within a domain used to group users and resources for governance (e.g., by department or team).
- Project: A collaborative workspace that bundles code, data references, and ML resources (models, pipelines) for a specific task.
- SageMaker Catalog: A business-focused data catalog used to discover and manage data access within Unified Studio.
- User Profile: An identity within a domain representing an individual user with specific execution roles.
The "Big Idea"
In a large organization, data science shouldn't be a "Wild West." SageMaker Unified Studio provides a hierarchical governance framework. Think of a Domain as the building, Domain Units as the floors (departments), and Projects as the specific rooms where teams work together. This structure allows administrators to enforce security at the building level while giving teams the flexibility to build within their rooms.
Formula / Concept Box
| Level | Component | Primary Purpose | Key Resource Created |
|---|---|---|---|
| Root | SageMaker Domain | Identity & Storage Management | Amazon EFS Volume |
| Logical | Domain Unit | Governance & Multi-tenancy | Resource Groups / IAM Tags |
| Execution | Project | Collaboration & Lifecycle | SageMaker Project Metadata |
| Data | SageMaker Catalog | Discovery & Authorization | Data Asset Links |
Hierarchical Outline
- SageMaker Domain (The Foundation)
- Multi-user Support: Connects to AWS IAM or IAM Identity Center.
- Shared Storage: Automatically provisions an Amazon EFS volume for home directories.
- Networking: Configurable in VPC-only mode for high-security environments.
- Domain Units & Organization
- Partitioning: Isolating compute and storage costs between business units.
- Policy Inheritance: Applying security guardrails at the unit level.
- Unified Studio Projects
- Collaboration: Shared spaces for notebooks, code, and experiments.
- Integration: Seamless connection to SageMaker Catalog for governed data access.
- Automation: Templates for MLOps (CI/CD) pipelines.
Visual Anchors
Governance Hierarchy
Domain Storage Architecture
Definition-Example Pairs
-
Term: Principle of Least Privilege (in SageMaker)
-
Definition: Ensuring a user or project has only the minimum permissions necessary to perform their task.
-
Example: Using SageMaker Catalog to grant a "Data Scientist" role access only to the
customer_churntable in S3, rather than the entire S3 bucket. -
Term: Managed vs. Unmanaged Services
-
Definition: Managed services handle the underlying infrastructure automatically; unmanaged require manual setup.
-
Example: SageMaker Notebook Instances are managed (AWS handles patching), whereas running a custom Jupyter server on a raw EC2 instance is unmanaged.
Worked Examples
Scenario 1: Initial Domain Setup
Task: Set up a domain for a single developer using the "Quick Setup."
- Navigate: Go to the SageMaker Console -> Domains.
- Action: Click "Create Domain" and select "Quick Setup."
- Input: Enter a name (e.g.,
dev-environment) and select an execution role. - Result: AWS automatically creates an execution role, sets up an EFS volume, and configures the networking. Access is granted via IAM.
Scenario 2: Restricting Access via Projects
Task: Create a project that restricts data access to a specific team.
- Project Creation: Within Unified Studio, create a new "Project."
- Catalog Link: Use the SageMaker Catalog to "Share" a specific dataset with that Project ID.
- Verification: A user in a different project attempts to query the dataset; they receive an "Access Denied" because the governance framework limits data sharing to the Project scope.
Checkpoint Questions
- What AWS storage service is automatically provisioned when a SageMaker Domain is created?
- How do Domain Units differ from Projects in SageMaker Unified Studio?
- True or False: A user profile can belong to multiple SageMaker Domains simultaneously within the same region.
- Which tool is used within Unified Studio to manage business data catalogs and governance patterns?
▶Click to see answers
- Amazon EFS (Elastic File System).
- Domain Units are administrative/logical groupings for governance; Projects are functional workspaces for collaborative development.
- False; User profiles are scoped to a specific Domain.
- SageMaker Catalog.
Comparison Tables
Studio Classic vs. Unified Studio
| Feature | SageMaker Studio Classic | SageMaker Unified Studio |
|---|---|---|
| Core Focus | Model Development (IDE) | Full Data & ML Lifecycle |
| Governance | IAM-based / Role-based | Domain Unit & Project-based |
| Data Discovery | Manual / Glue Catalog | Integrated SageMaker Catalog |
| User Interface | JupyterLab centric | Unified Dashboard with Apps |
Muddy Points & Cross-Refs
- EFS Costs: Many learners forget that SageMaker Domains charge for the Amazon EFS storage used for user home directories. Even if no compute is running, EFS costs persist until the domain is deleted.
- SageMaker Catalog vs. AWS Glue: While they overlap, think of AWS Glue as the technical metadata layer (the "how") and SageMaker Catalog as the business governance layer (the "who" and "why").
- Deep Dive: For more on fine-grained access, see AWS Lake Formation integration with SageMaker for column-level security.