Professional Documents
Culture Documents
Mslearn dp100 03
Mslearn dp100 03
Create a
designer
Provision an Azure Machine Learning workspace
pipeline
An Azure Machine Learning workspace provides a central place for managing all resources and assets you need
Add to train and manage your models. You can interact with the Azure Machine Learning workspace through the
transformations Studio, Python SDK, and Azure CLI.
Add model
training Create the workspace
components
To create the Azure Machine Learning workspace and a compute instance, you’ll use the Azure CLI. All necessary
Run the training commands are grouped in a Shell script for you to execute.
pipeline
1. In a browser, open the Azure portal at portal.azure.com, signing in with your Microsoft account.
Create an
inference 2. Select the [>_] (Cloud Shell) button at the top of the page to the right of the search box. This opens a Cloud
pipeline Shell pane at the bottom of the portal.
3. The first time you open the cloud shell, you will be asked to choose the type of shell you want to use (Bash
Deploy the
inference or PowerShell). Select Bash.
pipeline as a 4. If you are asked to create storage for your cloud shell, check that the correct subscription is specified and
web service
select Create storage. Wait for the storage to be created.
6. After the repo has been cloned, enter the following commands to change to the folder for this lab and run
the setup.sh script it contains:
Code Copy
cd mslearn-dp100
./setup.sh
7. Wait for the script to complete - this typically takes around 5-10 minutes.
1. Sign into Azure Machine Learning studio with the Microsoft credentials associated with your Azure
subscription, and select your Azure Machine Learning workspace.
2. In Azure Machine Learning studio, view the Compute page; and on the Compute instances tab, start your
compute instance if it is not already running. You will use this compute instance to test your trained model.
3. While the compute instance is starting, switch to the Compute clusters tab. If you don’t already have an
existing compute cluster, add a new compute cluster with the following settings. You will use this cluster to
run the training pipeline.
1. In Azure Machine Learning studio, view the Data page. Datasets represent specific data files or tables that
you plan to work with in Azure ML.
2. If you have previously created the diabetes dataset dataset, open it. Otherwise, create a new dataset from
web files, using the following settings, and then open it:
◦ Basic Info:
1. In Azure Machine Learning studio, view the Designer page and create a new pipeline.
2. Change the default pipeline name (Pipeline-Created-on-date) to Visual Diabetes Training by clicking the
⚙ icon at the right to open the Settings pane.
3. Note that you need to specify a compute target on which to run the pipeline. In the Settings pane, click
Select compute type and select Compute cluster, click Select Azure ML compute cluster and select your
computer cluster and close Settings.
4. On the left side of the designer, select the Data tab, and drag the diabetes dataset dataset onto the
canvas.
5. Select the diabetes dataset component on the canvas. Then right-click it, and select Preview data.
6. In the DataOutput pane, select the Profile tab.
7. Review the schema of the data, noting that you can see the distributions of the various columns as
histograms. Then close the visualization.
Add transformations
Before you can train a model, you typically need to apply some preprocessing transformations to the data.
1. In the pane on the left, select the Component tab, which contains a wide range of components you can
use to transform data before model training. You can search for components at the top of the pane.
2. Select Designer built-in assets only next to the search field. Then, search for the Normalize Data
component and drag it to the canvas, below the diabetes dataset component. Then connect the output
from the diabetes dataset component to the input of the Normalize Data component.
3. Select the Normalize Data component and view its settings, noting that it requires you to specify the
transformation method and the columns to be transformed. Then, leaving the transformation as ZScore,
edit the columns to includes the following column names:
◦ PlasmaGlucose
◦ DiastolicBloodPressure
◦ TricepsThickness
◦ SerumInsulin
◦ BMI
◦ DiabetesPedigree
Note: We’re normalizing the numeric columns put them on the same scale, and avoid columns with large
values dominating model training. You’d normally apply a whole bunch of pre-processing transformations
like this to prepare your data for training, but we’ll keep things simple in this exercise.
4. Now we’re ready to split the data into separate datasets for training and validation. In the pane on the left,
search for the Split Data component and drag it onto the canvas under the Normalize Data component.
Then connect the Transformed Dataset (left) output of the Normalize Data component to the input of the
Split Data component.
5. Select the Split Data component, and configure its settings as follows:
1. Search for the Train Model component and drag it onto the canvas, under the Split Data component.
Then connect the Result dataset1 (left) output of the Split Data component to the Dataset (right) input of
the Train Model component.
2. The model we’re training will predict the Diabetic value, so select the Train Model component and modify
its settings to set the Label column to Diabetic (matching the case and spelling exactly!)
3. The Diabetic label the model will predict is a binary column (1 for patients who have diabetes, 0 for
patients who don’t), so we need to train the model using a classification algorithm. Search for the Two-
Class Logistic Regression component and drag and drop it onto the canvas, to the left of the Split Data
component and above the Train Model component. Then connect its output to the Untrained model
(left) input of the Train Model component.
4. To test the trained model, we need to use it to score the validation dataset we held back when we split the
original data. Search for the Score Model component and drag and drop it onto the canvas, below the
Train Model component. Then connect the output of the Train Model component to the Trained model