Study Guide920 words

AWS SageMaker AI Script Mode: Deep Dive Study Guide

Using SageMaker AI script mode with SageMaker AI supported frameworks to train models (for example, TensorFlow, PyTorch)

AWS SageMaker AI Script Mode: Deep Dive Study Guide

This guide covers the utilization of SageMaker AI Script Mode, a powerful feature that allows Machine Learning Engineers to use custom training scripts with popular frameworks like TensorFlow and PyTorch while leveraging SageMaker's managed infrastructure.

Learning Objectives

By the end of this module, you should be able to:

  • Define the core purpose of Script Mode and when to choose it over built-in algorithms.
  • Configure a SageMaker Estimator using the Python SDK for PyTorch or TensorFlow.
  • Structure a custom Python training script including data loading and model saving logic.
  • Identify the lifecycle of a training job from script submission to S3 artifact generation.
  • Differentiate between managed framework containers and Bring Your Own Container (BYOC).

Key Terms & Glossary

  • Script Mode: A SageMaker feature where you provide a standard Python script and SageMaker executes it inside a pre-configured framework container.
  • Entry Point: The specific Python file (.py) that contains the main execution logic for your training job.
  • Estimator: A high-level object in the SageMaker Python SDK that encapsulates the configuration for a training job (e.g., PyTorch, TensorFlow).
  • Model Artifacts: The output files (usually model.tar.gz) produced by your script and automatically uploaded to Amazon S3 by SageMaker.
  • Managed Container: Docker images maintained by AWS that come pre-installed with frameworks like PyTorch, TensorFlow, and MXNet.

The "Big Idea"

[!IMPORTANT] The "Big Idea": Script Mode is the "Goldilocks" of SageMaker training. While Built-in Algorithms offer maximum ease and BYOC (Bring Your Own Container) offers maximum control, Script Mode provides the flexibility of custom code with the convenience of AWS-managed, framework-optimized environments.

Formula / Concept Box

ComponentImplementation Detail
The ScriptMust handle environment variables like SM_MODEL_DIR and SM_CHANNEL_TRAINING.
The EstimatorRequires entry_point, framework_version, instance_type, and role.
Data InputMapping S3 paths to local container paths (e.g., /opt/ml/input/data/).
Model OutputSaving files to /opt/ml/model/ ensures they are archived and sent to S3.

Hierarchical Outline

  • I. Anatomy of a Script Mode Training Job
    • The Script (train.py): Standard Python code using specific libraries (TF, PyTorch).
    • The Environment: Managed Docker containers provided by AWS.
    • The Launcher: The SageMaker Python SDK fit() method.
  • II. Developing the Training Script
    • Argument Parsing: Using argparse to receive hyperparameters from SageMaker.
    • Environment Variables: Accessing data paths via os.environ (e.g., SM_CHANNELS).
    • Saving the Model: Mandatory step of writing to the correct directory for persistence.
  • III. Configuring the Estimator
    • Selecting Frameworks: Defining versions (e.g., PyTorch 2.0).
    • Compute Resources: Choosing Managed Spot Instances or On-Demand instances.
    • Dependencies: Using a requirements.txt file in the source directory.

Visual Anchors

The Script Mode Workflow

Loading Diagram...

Local vs. Managed Path Mapping

\begin{tikzpicture}[node distance=2cm, every node/.style={draw, rectangle, align=center, fill=blue!10}] \node (s3) [fill=green!10] {Amazon S3 \ s3://bucket/data/}; \node (cont) [right of=s3, xshift=3cm] {SageMaker Container \ /opt/ml/input/data/train/}; \node (script) [below of=cont] {Your Python Script \ (reads from local path)};

code
\draw[->, thick] (s3) -- node[above] {Data Sync} (cont); \draw[->, thick] (cont) -- (script);

\end{tikzpicture}

Definition-Example Pairs

  • Hyperparameter Injection: The process where SageMaker passes parameters to your script as command-line arguments.
    • Example: Passing batch_size=64 in the SDK Estimator results in SageMaker running python train.py --batch_size 64 inside the container.
  • Source Directory: A folder containing your main script and any supporting modules or requirements.
    • Example: A folder src/ containing train.py, utils.py, and requirements.txt is compressed and uploaded to S3 automatically.

Worked Examples

Example: PyTorch Estimator Setup

This example shows how to initialize a training job using the SageMaker Python SDK.

python
from sagemaker.pytorch import PyTorch # Define the Estimator pytorch_estimator = PyTorch( entry_point='train.py', # Your custom script source_dir='src', # Folder with supporting code role=role, # IAM role with S3 permissions instance_count=1, instance_type='ml.p3.2xlarge', # GPU instance framework_version='2.0', py_version='py310', hyperparameters={ 'epochs': 10, 'lr': 0.001 } ) # Start the job pytorch_estimator.fit({'training': 's3://my-bucket/training-data/'})

Step-by-Step Breakdown:

  1. Entry Point: SageMaker will search for train.py inside the src folder.
  2. Infrastructure: AWS provisions an ml.p3.2xlarge instance and pulls the official PyTorch 2.0 image.
  3. Data Mapping: The S3 data is downloaded to /opt/ml/input/data/training/ within the container before your script starts.
  4. Execution: SageMaker runs the script. Any model saved to /opt/ml/model/ will be compressed and uploaded back to S3 upon completion.

Comparison Tables

FeatureBuilt-in AlgorithmsScript ModeBring Your Own Container (BYOC)
Code EffortZero (Config only)Medium (Python script)High (Docker + Code)
FlexibilityLow (Fixed logic)High (Custom code)Maximum (Custom OS/System)
MaintenanceAWS managedAWS managed containerUser managed container
Use CaseCommon tasks (XGBoost)Custom logic in TF/PyTorchProprietary libraries/Non-Python

Checkpoint Questions

  1. Question: Where must your training script save the final model so that SageMaker persists it to S3?
    • Answer: /opt/ml/model/ (referenced by the environment variable SM_MODEL_DIR).
  2. Question: How can you install additional Python libraries that are not in the base SageMaker framework container?
    • Answer: Include a requirements.txt file in your source_dir. SageMaker will run pip install automatically.
  3. Question: True or False: Script Mode requires you to write a Dockerfile.
    • Answer: False. Script Mode uses AWS-provided Docker images.

Muddy Points & Cross-Refs

  • Environment Variables: Many students find the /opt/ml/... path structure confusing. Remember: SageMaker maps S3 to local disk. Always use the SageMaker Python SDK's helper variables to find your data.
  • Local Mode: Before running a 1-hour job on a GPU instance, use instance_type='local' to test your script mode logic on your notebook instance first.
  • Deep Dive: For full control over the runtime environment (e.g., specific Linux C++ libraries), study BYOC (Bring Your Own Container) in the next chapter.

Ready to study AWS Certified Machine Learning Engineer - Associate (MLA-C01)?

Practice tests, flashcards, and all study notes — free, no sign-up needed.

Start Studying — Free