Curriculum Overview: Data and Compute Services in Azure Machine Learning

This curriculum provides a comprehensive guide to understanding how data and compute resources are managed within the Microsoft Azure ecosystem to support data science and machine learning (ML) workloads. It specifically focuses on the Azure Machine Learning (AML) service as the central hub for the ML lifecycle.

Prerequisites

Before engaging with this curriculum, learners should have a foundational understanding of the following:

Cloud Fundamentals: Basic knowledge of cloud computing concepts (SaaS, PaaS, IaaS) and Azure subscription management.
Machine Learning Basics: Familiarity with the differences between regression, classification, and clustering.
Data Concepts: Understanding of basic dataset structures (features and labels) and the purpose of training versus validation sets.
Azure AI Workloads: General awareness of common AI scenarios like Computer Vision and Natural Language Processing (Unit 1 concepts).

Module Breakdown

Module	Topic	Difficulty	Focus Area
1	The Azure ML Workspace	Beginner	Studio Interface, Resources, and Security
2	Data Assets & Storage	Intermediate	Data Stores, Datasets, and Data Exploration
3	Compute Resources	Intermediate	Compute Instances, Clusters, and Inference Nodes
4	No-Code Training Tools	Intermediate	AutoML and Azure Machine Learning Designer
5	Model Management	Advanced	Registration, Versioning, and Deployment

Learning Objectives per Module

Module 1: The Azure ML Workspace

Describe the role of Azure Machine Learning Studio as a unified platform.
Identify how to create and manage a workspace for team collaboration.

Module 2: Data Assets & Storage

Distinguish between Data Stores (connection to storage) and Data Assets (versioned references to data).
Explain how to import and explore data directly within the Studio environment.

Module 3: Compute Resources

Identify the four primary compute types: Compute Instances (workstations), Compute Clusters (scalable training), Inference Clusters (deployment), and Attached Compute.
Determine the appropriate compute resource based on the workload (e.g., development vs. production).

Module 4: No-Code Training Tools

Describe the capabilities of Automated Machine Learning (AutoML) for rapid model selection.
Explain how Azure Machine Learning Designer uses a visual drag-and-drop interface for pipeline creation.

Module 5: Model Management

Understand the process of Registering a model to track versions.
Identify deployment options for real-time or batch inferencing.

Visual Overview of Resources

The ML Lifecycle Flow

Loading Diagram...

Workspace Resource Architecture

Compiling TikZ diagram…

⏳

Running TeX engine…

This may take a few seconds

Success Metrics

To demonstrate mastery of this curriculum, the learner must be able to:

Select Compute: Correctily choose a Compute Cluster for a job requiring multiple GPUs and a Compute Instance for a Jupyter Notebook development session.
Navigate Studio: Successfully locate the "Data" and "Compute" tabs in Azure ML Studio to provision resources.
Explain AutoML: Describe how AutoML automates the selection of algorithms and hyperparameters to save time.
Differentiate Tools: Explain when to use the Designer (visual workflow) versus Notebooks (code-first with PyTorch/TensorFlow).
Identify Deployment: Define the difference between a real-time endpoint for instant predictions and batch inferencing for large datasets.

Real-World Application

[!IMPORTANT] Why this matters: In a professional setting, data scientists spend up to 80% of their time on data preparation and infrastructure management. Mastery of Azure's compute and data services allows teams to:

Scale Efficiently: Use "Zero-node" clusters that only charge you when a job is running, significantly reducing cloud costs.
Maintain Reproducibility: By using versioned Data Assets, teams can ensure that the exact dataset used to train a model in 2023 can be referenced again in 2025 for auditing.
Collaborate Securely: Use a centralized Workspace to share models and data without passing around CSV files or login credentials, ensuring compliance with Responsible AI privacy principles.