Professional Documents
Culture Documents
Guia Certificacion PDF
Guia Certificacion PDF
Extend on-premises; leverage Azure networking services: implement load balancing using Azure Load Balancer and Azure
Traffic Manager; define DNS, DHCP, and IP addressing configuration; define static IP reservations; apply Network Security
Groups (NSGs) and User Defined Routes (UDRs); deploy Azure Application Gateway
Describe Azure point-to-site (P2S) and site-to-site (S2S) VPN, leverage Azure VPN and ExpressRoute in network architecture
Design VM deployments leveraging availability sets, fault domains, and update domains in Azure; select appropriate VM
SKUs
Author ARM templates; deploy ARM templates via the portal, PowerShell, and CL
Describe the differences between Active Directory on-premises and Azure Active Directory (Azure AD), programmatically
access Azure AD using Graph API, secure access to resources from Azure AD applications using OAuth and OpenID Connect
Use SAML claims to authenticate to on-premises resources, describe AD Connect synchronization, implement federated
identities using Active Directory Federation Services (ADFS)
Provide access to resources using identity providers, such as Microsoft account, Facebook, Google, and Yahoo!; manage
identity and access by using Azure AD B2C; implement Azure AD B2B
Identify security requirements for data in transit and data at rest; identify security requirements using Azure services,
including Azure Storage Encryption, Azure Disk Encryption, and Azure SQL Database TDE
Secure resource scopes, such as the ability to create VMs and Azure Web Apps; implement Azure RBAC standard roles;
design Azure RBAC custom roles
Identify, assess, and mitigate security risks by using Azure Security Center, Operations Management Suite, and other services
Design storage options for data, including Table Storage, SQL Database, DocumentDB, Blob Storage, MongoDB, and MySQL;
design security options for SQL Database or Azure Storage
Select the appropriate storage for performance, identify storage options for cloud services and hybrid scenarios with
compute on-premises and storage on Azure
1|P ag e
70-534 Architecting Microsoft Azure Solutions
Design Azure App Service Web Apps, design custom web API, offload long-running applications using WebJobs, secure Web
API using Azure AD, design Web Apps for scalability and performance, deploy Azure Web Apps to multiple regions for high
availability, deploy Web Apps, create App Service plans, design Web Apps for business continuity, configure data replication
patterns, update Azure Web Apps with minimal downtime, back up and restore data, design for disaster recovery
Design Azure Mobile Services; consume Mobile Apps from cross-platform clients; integrate offline sync capabilities into an
application; extend Mobile Apps using custom code; implement Mobile Apps using Microsoft .NET or Node.js; secure Mobile
Apps using Azure AD; implement push notification services in Mobile Apps; send push notifications to all subscribers, specific
subscribers, or a segment of subscribers
Design high-performance computing (HPC) and other compute-intensive applications using Azure Services
Implement Azure Batch for scalable processing, design stateless components to accommodate scale, use Azure Scheduler
Design Azure architecture using Azure services, such as Azure AD, Azure App Service, API Management, Azure Cache, Azure
Search, Service Bus, Event Hubs, Stream Analytics, and IoT Hub; identify the appropriate use of Azure Machine Learning, big
data, Azure Media Services, and Azure Search services
Use a queue-centric pattern for development; select appropriate technology, such as Azure Storage Queues, Azure Service
Bus queues, topics, subscriptions, and Azure Event Hubs
Implement Azure Batch for compute-intensive tasks, use Azure WebJobs to implement background tasks, use Azure
Functions to implement event-driven actions, leverage Azure Scheduler to run processes at preset/recurring timeslots
Connect to on-premises data from Azure applications using Service Bus Relay, Hybrid Connections, or the Azure Web App
virtual private network (VPN) capability; identify constraints for connectivity with VPN; identify options for joining VMs to
domains or cloud services
Identify the Microsoft products and services for monitoring Azure solutions; leverage the capabilities of Azure Operations
Management Suite and Azure Application Insights for monitoring Azure solutions; leverage built-in Azure capabilities;
identify third-party monitoring tools, including open source; describe Azure architecture constructs, such as availability sets
and update domains, and how they impact a patching strategy; analyze logs by using the Azure Operations Management
Suite
Leverage the architectural capabilities of BC/DR, describe Hyper-V Replica and Azure Site Recovery (ASR), describe use cases
for Hyper-V Replica and ASR
Design and deploy Azure Backup and other Microsoft backup solutions for Azure, leverage use cases when StorSimple and
System Center Data Protection Manager would be appropriate, design and deploy Azure Site recovery
Create a PowerShell script specific to Azure, automate tasks by using the Azure Operations Management Suite
Evaluate when to use Azure Automation, Chef, Puppet, PowerShell, or Desired State Configuration (DSC).
2|P ag e
70-534 Architecting Microsoft Azure Solutions
The Azure Virtual Network service enables you to securely connect Azure resources to each other with virtual
networks (VNets). A VNet is a representation of your own network in the cloud. A VNet is a logical isolation of the
Azure cloud dedicated to your subscription. You can also connect VNets to your on-premises network. The following
picture shows some of the capabilities of the Azure Virtual Network service:1
You can implement multiple VNets within each Azure subscription and Azure region. Each VNet is isolated from other
VNets. For each VNet you can:
Specify a custom private IP address space using public and private (RFC 1918) addresses. Azure assigns
resources connected to the VNet a private IP address from the address space you assign.
Segment the VNet into one or more subnets and allocate a portion of the VNet address space to each subnet.
Use Azure-provided name resolution or specify your own DNS server for use by resources connected to a
VNet. To learn more about name resolution in VNets, read the Name resolution for VMs and Cloud
Services article.
All resources connected to a VNet have outbound connectivity to the Internet by default. The private IP address of the
resource is source network address translated (SNAT) to a public IP address by the Azure infrastructure. To learn more
about outbound Internet connectivity, read the Understanding outbound connections in Azure article. You can change
the default connectivity by implementing custom routing and traffic filtering.
To communicate inbound to Azure resources from the Internet, or to communicate outbound to the Internet without
SNAT, a resource must be assigned a public IP address. To learn more about public IP addresses, read the Public IP
addresses article.
You can connect several Azure resources to a VNet, such as Virtual Machines (VM), Cloud Services, App Service
Environments, and Virtual Machine Scale Sets. VMs connect to a subnet within a VNet through a network interface
(NIC). To learn more about NICs, read the Network interfaces article.
3|P ag e
70-534 Architecting Microsoft Azure Solutions
You can connect VNets to each other, enabling resources connected to either VNet to communicate with each other
across VNets. You can use either or both of the following options to connect VNets to each other:
Peering: Enables resources connected to different Azure VNets within the same Azure location to
communicate with each other. The bandwidth and latency across the VNets is the same as if the resources
were connected to the same VNet. To learn more about peering, read the Virtual network peering article.
VNet-to-VNet connection: Enables resources connected to different Azure VNet within the same, or different
Azure locations. Unlike peering, bandwidth is limited between VNets because traffic must flow through an
Azure VPN Gateway. To learn more about connecting VNets with a VNet-to-VNet connection, read
the Configure a VNet-to-VNet connection article.
You can connect your on-premises network to a VNet using any combination of the following options:
Point-to-site virtual private network (VPN): Established between a single PC connected to your network and
the VNet. This connection type is great if you're just getting started with Azure, or for developers, because it
requires little or no changes to your existing network. The connection uses the SSTP protocol to provide
encrypted communication over the Internet between the PC and the VNet. The latency for a point-to-site VPN
is unpredictable, since the traffic traverses the Internet.
Site-to-site VPN: Established between your VPN device and an Azure VPN Gateway. This connection type
enables any on-premises resource you authorize to access a VNet. The connection is an IPSec/IKE VPN that
provides encrypted communication over the Internet between your on-premises device and the Azure VPN
gateway. The latency for a site-to-site connection is unpredictable, since the traffic traverses the Internet.
Azure ExpressRoute: Established between your network and Azure, through an ExpressRoute partner. This
connection is private. Traffic does not traverse the Internet. The latency for an ExpressRoute connection is
predictable, since traffic doesn't traverse the Internet.
To learn more about all the previous connection options, read the Connection topology diagrams article.
You can filter network traffic between subnets using either or both of the following options:
Network security groups (NSG): Each NSG can contain multiple inbound and outbound security rules that
enable you to filter traffic by source and destination IP address, port, and protocol. You can apply an NSG to
each NIC in a VM. You can also apply an NSG to the subnet a NIC, or other Azure resource, is connected to. To
learn more about NSGs, read the Network security groups article.
Network virtual appliances (NVA): An NVA is a VM running software that performs a network function, such as
a firewall. View a list of available NVAs in the Azure Marketplace. NVAs are also available that provide WAN
optimization and other network traffic functions. NVAs are typically used with user-defined or BGP routes.
You can also use an NVA to filter traffic between VNets.
Azure creates route tables that enable resources connected to any subnet in any VNet to communicate with each
other, by default. You can implement either or both of the following options to override the default routes Azure
creates:
User-defined routes: You can create custom route tables with routes that control where traffic is routed to for
each subnet. To learn more about user-defined routes, read the User-defined routes article.
BGP routes: If you connect your VNet to your on-premises network using an Azure VPN Gateway or
ExpressRoute connection, you can propagate BGP routes to your VNets.
4|P ag e
70-534 Architecting Microsoft Azure Solutions
Pricing
There is no charge for virtual networks, subnets, route tables, or network security groups. Outbound Internet
bandwidth usage, public IP addresses, virtual network peering, VPN Gateways, and ExpressRoute each have their own
pricing structures. View the Virtual network, VPN Gateway, and ExpressRoute pricing pages for more information.
Creating a VNet to experiment with is easy enough, but chances are, you will deploy multiple VNets over time to
support the production needs of your organization. With some planning and design, you will be able to deploy VNets
and connect the resources you need more effectively. If you are not familiar with VNets, it's recommended that
you learn about VNets and how to deploy one before proceeding.
Plan
A thorough understanding of Azure subscriptions, regions, and network resources is critical for success. You can use
the list of considerations below as a starting point. Once you understand those considerations, you can define the
requirements for your network design.
Considerations
Everything you create in Azure is composed of one or more resources. A virtual machine (VM) is a resource,
the network adapter interface (NIC) used by a VM is a resource, the public IP address used by a NIC is a
resource, the VNet the NIC is connected to is a resource.
You create resources within an Azure region and subscription. And resources can only be connected to a VNet
that exists in the same region and subscription they are in.
You can connect VNets to each other by using an Azure VPN Gateway. You can also connect VNets across
regions and subscriptions this way.
You can connect VNets to your on-premises network by using one of the connectivity options available in
Azure.
Different resources can be grouped together in resource groups, making it easier to manage the resource as a
unit. A resource group can contain resources from multiple regions, as long as the resources belong to the
same subscription.
Define requirements
Use the questions below as a starting point for your Azure network design.
3. Do you need to provide communication between your Azure VNet(s) and your on-premises datacenter(s)?
4. How many Infrastructure as a Service (IaaS) VMs, cloud services roles, and web apps do you need for your
solution?
5. Do you need to isolate traffic based on groups of VMs (i.e. front end web servers and back end database
servers)?
5|P ag e
70-534 Architecting Microsoft Azure Solutions
VNet and subnets resources help define a security boundary for workloads running in Azure. A VNet is characterized
by a collection of address spaces, defined as CIDR blocks.
Note
Network administrators are familiar with CIDR notation. If you are not familiar with CIDR, learn more about it.
location Azure location (also Must be one of the valid Azure locations.
referred to as region).
addressSpace Collection of address Must be an array of valid CIDR address blocks, including public
prefixes that make up IP address ranges.
the VNet in CIDR
notation.
A subnet is a child resource of a VNet, and helps define segments of address spaces within a CIDR block, using IP
address prefixes. NICs can be added to subnets, and connected to VMs, providing connectivity for various workloads.
6|P ag e
70-534 Architecting Microsoft Azure Solutions
location Azure location (also referred to Must be one of the valid Azure locations.
as region).
addressPrefix Single address prefix that Must be a single CIDR block that is part of
make up the subnet in CIDR one of the VNet's address spaces.
notation
Name resolution
By default, your VNet uses Azure-provided name resolution to resolve names inside the VNet, and on the public
Internet. However, if you connect your VNets to your on-premises data centers, you need to provide your own DNS
server to resolve names between your networks.
Limits
Review the networking limits in the Azure limits article to ensure that your design doesn't conflict with any of the
limits. Some limits can be increased by opening a support ticket.
You can use Azure RBAC to control the level of access different users may have to different resources in Azure. That
way you can segregate the work done by your team based on their needs.
As far as virtual networks are concerned, users in the Network Contributor role have full control over Azure Resource
Manager virtual network resources. Similarly, users in the Classic Network Contributor role have full control over
classic virtual network resources.
Note
You can also create your own roles to separate your administrative needs.
7|P ag e
70-534 Architecting Microsoft Azure Solutions
Designing VNETs
Once you know the answers to the questions in the Plan section, review the following before defining your VNets.
VMs that need to be placed in different Azure locations. VNets in Azure are regional. They cannot span
locations. Therefore you need at least one VNet for each Azure location you want to host VMs in.
Workloads that need to be completely isolated from one another. You can create separate VNets, that even
use the same IP address spaces, to isolate different workloads from one another.
Keep in mind that the limits you see above are per region, per subscription. That means you can use multiple
subscriptions to increase the limit of resources you can maintain in Azure. You can use a site-to-site VPN, or an
ExpressRoute circuit, to connect VNets in different subscriptions.
The table below shows some common design patterns for using subscriptions and VNets.
8|P ag e
70-534 Architecting Microsoft Azure Solutions
Number of subnets
Not enough private IP addresses for all NICs in a subnet. If your subnet address space does not contain
enough IP addresses for the number of NICs in the subnet, you need to create multiple subnets. Keep in mind
that Azure reserves 5 private IP addresses from each subnet that cannot be used: the first and last addresses
of the address space (for the subnet address, and multicast) and 3 addresses to be used internally (for DHCP
and DNS purposes).
9|P ag e
70-534 Architecting Microsoft Azure Solutions
Security. You can use subnets to separate groups of VMs from one another for workloads that have a multi-
layer structure, and apply different network security groups (NSGs) for those subnets.
Hybrid connectivity. You can use VPN gateways and ExpressRoute circuits to connect your VNets to one
another, and to your on-premises data center(s). VPN gateways and ExpressRoute circuits require a subnet of
their own to be created.
Virtual appliances. You can use a virtual appliance, such as a firewall, WAN accelerator, or VPN gateway in an
Azure VNet. When you do so, you need to route traffic to those appliances and isolate them in their own
subnet.
The table below shows some common design patterns for using subnets.
10 | P a g e
70-534 Architecting Microsoft Azure Solutions
Sample design
To illustrate the application of the information in this article, consider the following scenario.
11 | P a g e
70-534 Architecting Microsoft Azure Solutions
You work for a company that has 2 data centers in North America, and two data centers Europe. You identified 6
different customer facing applications maintained by 2 different business units that you want to migrate to Azure as a
pilot. The basic architecture for the applications are as follows:
App1, App2, App3, and App4 are web applications hosted on Linux servers running Ubuntu. Each application
connects to a separate application server that hosts RESTful services on Linux servers. The RESTful services
connect to a back end MySQL database.
App5 and App6 are web applications hosted on Windows servers running Windows Server 2012 R2. Each
application connects to a back end SQL Server database.
All apps are currently hosted in one of the company's data centers in North America.
You need to design a virtual network solution that meets the following requirements:
Each business unit should not be affected by resource consumption of other business units.
You should minimize the amount of VNets and subnets to make management easier.
Each business unit should have a single test/development VNet used for all applications.
Each application is hosted in 2 different Azure data centers per continent (North America and Europe).
Each application can be accessed by customers over the Internet using HTTP.
Each application can be accessed by users connected to the on-premises data centers by using an encrypted
tunnel.
The company's networking group should have full control over the VNet configuration.
Developers in each business unit should only be able to deploy VMs to existing subnets.
The databases in each location should replicate to other Azure locations once a day.
Each application should use 5 front end web servers, 2 application servers (when necessary), and 2 database
servers.
Plan
You should start your design planning by answering the question in the Define requirements section as shown below.
2 locations in North America, and 2 locations in Europe. You should pick those based on the physical location of your
existing on-premises data centers. That way your connection from your physical locations to Azure will have a better
latency.
3. Do you need to provide communication between your Azure VNet(s) and your on-premises data center(s)?
12 | P a g e
70-534 Architecting Microsoft Azure Solutions
Yes. Since users connected to the on-premises data centers must be able to access the applications through an
encrypted tunnel.
200 IaaS VMs. App1, App2, App3, and App4 require 5 web servers each, 2 applications servers each, and 2 database
servers each. That's a total of 9 IaaS VMs per application, or 36 IaaS VMs. App5 and App6 require 5 web servers and 2
database servers each. That's a total of 7 IaaS VMs per application, or 14 IaaS VMs. Therefore, you need 50 IaaS VMs
for all applications in each Azure region. Since we need to use 4 regions, there will be 200 IaaS VMs.
You will also need to provide DNS servers in each VNet, or in your on-premises data centers to resolve name between
your Azure IaaS VMs and your on-premises network.
5. Do you need to isolate traffic based on groups of VMs (i.e. front end web servers and back end database
servers)?
Yes. Each application should be completely isolated from each other, and each application layer should also be
isolated.
No. Virtual appliances can be used to provide more control over traffic flow, including more detailed data plane
logging.
Yes. The networking team needs full control on the virtual networking settings, while developers should only be able
to deploy their VMs to pre-existing subnets.
Design
You should follow the design specifying subscriptions, VNets, subnets, and NSGs. We will discuss NSGs here, but you
should learn more about NSGs before finishing your design.
Each business unit should not be affected by resource consumption of other business units.
Each business unit should have a single test/development VNet used for all applications.
Each application is hosted in 2 different Azure data centers per continent (North America and Europe).
Based on those requirements, you need a subscription for each business unit. That way, consumption of resources
from a business unit will not count towards limits for other business units. And since you want to minimize the
number of VNets, you should consider using the one subscription per business unit, two VNets per group of
apps pattern as seen below.
13 | P a g e
70-534 Architecting Microsoft Azure Solutions
You also need to specify the address space for each VNet. Since you need connectivity between the on-premises data
centers and the Azure regions, the address space used for Azure VNets cannot clash with the on-premises network,
and the address space used by each VNet should not clash with other existing VNets. You could use the address
spaces in the table below to satisfy these requirements.
14 | P a g e
70-534 Architecting Microsoft Azure Solutions
Each application can be accessed by customers over the Internet using HTTP.
Each application can be accessed by users connected to the on-premises data centers by using an encrypted
tunnel.
The databases in each location should replicate to other Azure locations once a day.
Based on those requirements, you could use one subnet per application layer, and use NSGs to filter traffic per
application. That way, you only have 3 subnets in each VNet (front end, application layer, and data layer) and one NSG
per application per subnet. In this case, you should consider using the one subnet per application layer, NSGs per
app design pattern. The figure below shows the use of the design pattern representing the ProdBU1US1 VNet.
However, you also need to create an extra subnet for the VPN connectivity between the VNets, and your on-premises
data centers. And you need to specify the address space for each subnet. The figure below shows a sample solution
for ProdBU1US1 VNet. You would replicate this scenario for each VNet. Each color represents a different application.
15 | P a g e
70-534 Architecting Microsoft Azure Solutions
Access Control
The company's networking group should have full control over the VNet configuration.
Developers in each business unit should only be able to deploy VMs to existing subnets.
Based on those requirements, you could add users from the networking team to the built-in Network Contributor role
in each subscription; and create a custom role for the application developers in each subscription giving them rights to
add VMs to existing subnets.
VPN Gateway
A VPN gateway is a type of virtual network gateway that sends encrypted traffic across a public connection to an on-
premises location. You can also use VPN gateways to send encrypted traffic between Azure virtual networks over the
Microsoft network. To send encrypted network traffic between your Azure virtual network and your on-premises site,
you must create a VPN gateway for your virtual network.
Each virtual network can have only one VPN gateway, however, you can create multiple connections to the same VPN
gateway. An example of this is a Multi-Site connection configuration. When you create multiple connections to the
same VPN gateway, all VPN tunnels, including Point-to-Site VPNs, share the bandwidth that is available for the
gateway.
16 | P a g e
70-534 Architecting Microsoft Azure Solutions
A virtual network gateway is composed of two or more virtual machines that are deployed to a specific subnet called
the GatewaySubnet. The VMs that are located in the GatewaySubnet are created when you create the virtual network
gateway. Virtual network gateway VMs are configured to contain routing tables and gateway services specific to the
gateway. You can't directly configure the VMs that are part of the virtual network gateway and you should never
deploy additional resources to the GatewaySubnet.
When you create a virtual network gateway using the gateway type 'Vpn', it creates a specific type of virtual network
gateway that encrypts traffic; a VPN gateway. A VPN gateway can take up to 45 minutes to create. This is because the
VMs for the VPN gateway are being deployed to the GatewaySubnet and configured with the settings that you
specified. The Gateway SKU that you select determines how powerful the VMs are.
Gateway SKUs
When you create a virtual network gateway, you need to specify the gateway SKU that you want to use. Select the
SKUs that satisfy your requirements based on the types of workloads, throughputs, features, and SLAs.
Note
The new VPN gateway SKUs (VpnGw1, VpnGw2, and VpnGw3) are supported for the Resource Manager deployment
model only. Classic virtual networks should continue to use the old SKUs. For more information about the old gateway
SKUs, see Working with virtual network gateway SKUs (old).
Throughput is based on measurements of multiple tunnels aggregated through a single gateway. It is not a
guaranteed throughput due to Internet traffic conditions and your application behaviors.
SLA (Service Level Agreement) information can be found on the SLA page.
Due to the differences in SLAs and feature sets, we recommend the following SKUs for production vs. dev-test:
17 | P a g e
70-534 Architecting Microsoft Azure Solutions
Workload SKUs
If you are using the old SKUs, the production SKU recommendations are Standard and HighPerformance SKUs. For
information on the old SKUs, see Gateway SKUs (old).
The new gateway SKUs streamline the feature sets offered on the gateways:
SKU Features
2. When working with the old gateway SKUs, you can resize between Basic, Standard, and HighPerformance
SKUs.
3. You cannot resize from Basic/Standard/HighPerformance SKUs to the new VpnGw1/VpnGw2/VpnGw3 SKUs.
You must, instead, migrate to the new SKUs.
Note
The VPN Gateway Public IP address will change when migrating from an old SKU to a new SKU.
You can't resize your Azure VPN gateways directly between the old SKUs and the new SKU families. If you have VPN
gateways in the Resource Manager deployment model that are using the older version of the SKUs, you can migrate to
the new SKUs. To migrate, you delete the existing VPN gateway for your virtual network, then create a new one.
Migration workflow:
18 | P a g e
70-534 Architecting Microsoft Azure Solutions
4. Update your on-premises VPN devices with the new VPN gateway IP address (for Site-to-Site connections).
5. Update the gateway IP address value for any VNet-to-VNet local network gateways that will connect to this
gateway.
6. Download new client VPN configuration packages for P2S clients connecting to the virtual network through
this VPN gateway.
A VPN gateway connection relies on multiple resources that are configured with specific settings. Most of the
resources can be configured separately, although they must be configured in a certain order in some cases.
Settings
The settings that you chose for each resource are critical to creating a successful connection. For information about
individual resources and settings for VPN Gateway, see About VPN Gateway settings. You'll find information to help
you understand gateway types, VPN types, connection types, gateway subnets, local network gateways, and various
other resource settings that you may want to consider.
Deployment tools
You can start out creating and configuring resources using one configuration tool, such as the Azure portal. You can
then later decide to switch to another tool, such as PowerShell, to configure additional resources, or modify existing
resources when applicable. Currently, you can't configure every resource and resource setting in the Azure portal. The
instructions in the articles for each connection topology specify when a specific configuration tool is needed.
Deployment model
When you configure a VPN gateway, the steps you take depend on the deployment model that you used to create
your virtual network. For example, if you created your VNet using the classic deployment model, you use the
guidelines and instructions for the classic deployment model to create and configure your VPN gateway settings. For
more information about deployment models, see Understanding Resource Manager and classic deployment models.
It's important to know that there are different configurations available for VPN gateway connections. You need to
determine which configuration best fits your needs. In the sections below, you can view information and topology
diagrams about the following VPN gateway connections: The following sections contain tables which list:
Use the diagrams and descriptions to help select the connection topology to match your requirements. The diagrams
show the main baseline topologies, but it's possible to build more complex configurations using the diagrams as a
guideline.
Site-to-Site
19 | P a g e
70-534 Architecting Microsoft Azure Solutions
A Site-to-Site (S2S) VPN gateway connection is a connection over IPsec/IKE (IKEv1 or IKEv2) VPN tunnel. This type of
connection requires a VPN device located on-premises that has a public IP address assigned to it and is not located
behind a NAT. S2S connections can be used for cross-premises and hybrid configurations.
Multi-Site
This type of connection is a variation of the Site-to-Site connection. You create more than one VPN connection from
your virtual network gateway, typically connecting to multiple on-premises sites. When working with multiple
connections, you must use a RouteBased VPN type (known as a dynamic gateway when working with classic VNets).
Because each virtual network can only have one VPN gateway, all connections through the gateway share the
available bandwidth. This is often called a "multi-site" connection.
(*) denotes that the classic portal can only support creating one S2S VPN connection.
(**) denotes that this method contains steps that require PowerShell.
20 | P a g e
70-534 Architecting Microsoft Azure Solutions
A Point-to-Site (P2S) VPN gateway connection allows you to create a secure connection to your virtual network from
an individual client computer. P2S is a VPN connection over SSTP (Secure Socket Tunneling Protocol). Unlike S2S
connections, P2S connections do not require an on-premises public-facing IP address or a VPN device. You establish
the VPN connection by starting it from the client computer. This solution is useful when you want to connect to your
VNet from a remote location, such as from home or a conference, or when you only have a few clients that need to
connect to a VNet. P2S connections can be used with S2S connections through the same VPN gateway, as long as all
the configuration requirements for both connections are compatible.
Connecting a virtual network to another virtual network (VNet-to-VNet) is similar to connecting a VNet to an on-
premises site location. Both connectivity types use a VPN gateway to provide a secure tunnel using IPsec/IKE. You can
even combine VNet-to-VNet communication with multi-site connection configurations. This lets you establish network
topologies that combine cross-premises connectivity with inter-virtual network connectivity.
21 | P a g e
70-534 Architecting Microsoft Azure Solutions
Azure currently has two deployment models: classic and Resource Manager. If you have been using Azure for some
time, you probably have Azure VMs and instance roles running in a classic VNet. Your newer VMs and role instances
may be running in a VNet created in Resource Manager. You can create a connection between the VNets to allow the
resources in one VNet to communicate directly with resources in another.
VNet peering
You may be able to use VNet peering to create your connection, as long as your virtual network meets certain
requirements. VNet peering does not use a virtual network gateway. For more information, see VNet peering.
Azure
Deployment Model/Method Portal Classic Portal PowerShell CLI
(+) denotes this deployment method is available only for VNets in the same subscription.
(*) denotes that this deployment method also requires PowerShell.
Microsoft Azure ExpressRoute lets you extend your on-premises networks into the Microsoft cloud over a dedicated
private connection facilitated by a connectivity provider. With ExpressRoute, you can establish connections to
Microsoft cloud services, such as Microsoft Azure, Office 365, and CRM Online. Connectivity can be from an any-to-
any (IP VPN) network, a point-to-point Ethernet network, or a virtual cross-connection through a connectivity provider
at a co-location facility.
ExpressRoute connections do not go over the public Internet. This allows ExpressRoute connections to offer more
reliability, faster speeds, lower latencies, and higher security than typical connections over the Internet.
An ExpressRoute connection does not use a VPN gateway, although it does use a virtual network gateway as part of its
required configuration. In an ExpressRoute connection, the virtual network gateway is configured with the gateway
type 'ExpressRoute', rather than 'Vpn'. For more information about ExpressRoute, see the ExpressRoute technical
overview.
ExpressRoute is a direct, dedicated connection from your WAN (not over the public Internet) to Microsoft Services,
including Azure. Site-to-Site VPN traffic travels encrypted over the public Internet. Being able to configure Site-to-Site
VPN and ExpressRoute connections for the same virtual network has several advantages.
You can configure a Site-to-Site VPN as a secure failover path for ExpressRoute, or use Site-to-Site VPNs to connect to
sites that are not part of your network, but that are connected through ExpressRoute. Notice that this configuration
requires two virtual network gateways for the same virtual network, one using the gateway type 'Vpn', and the other
using the gateway type 'ExpressRoute'.1
Pricing
You pay for two things: the hourly compute costs for the virtual network gateway, and the egress data transfer from
the virtual network gateway. Pricing information can be found on the Pricing page.
23 | P a g e
70-534 Architecting Microsoft Azure Solutions
If you are sending traffic to your on-premises VPN device, it will be charged with the Internet egress data
transfer rate.
If you are sending traffic between virtual networks in different regions, the pricing is based the region.
If you are sending traffic only between virtual networks that are in the same region, there are no data costs.
Traffic between VNets in the same region is free.
Planning
If you want to connect your on-premises sites securely to a virtual network, you have three different ways to do so:
Site-to-Site, Point-to-Site, and ExpressRoute. Compare the different cross-premises connections that are available. The
option you choose can depend on various considerations, such as:
Do you want to communicate over the public Internet via secure VPN, or over a private connection?
Are you connecting just a few computers, or do you want a persistent connection for your site?
What type of VPN gateway is required for the solution you want to create?
Planning table
The following table can help you decide the best connectivity option for your solution.
Typical Typically < 100 Mbps Typically < 100 Mbps 50 Mbps, 100 Mbps, 200
Bandwidths aggregate aggregate Mbps, 500 Mbps, 1 Gbps, 2
Gbps, 5 Gbps, 10 Gbps
24 | P a g e
70-534 Architecting Microsoft Azure Solutions
Typical use Prototyping, dev / Dev / test / lab scenarios Access to all Azure services
case test / lab scenarios and small scale (validated list), Enterprise-
for cloud services and production workloads for class and mission critical
virtual machines cloud services and virtual workloads, Backup, Big Data,
machines Azure as a DR site
Gateway SKUs
Throughput is based on measurements of multiple tunnels aggregated through a single gateway. It is not a
guaranteed throughput due to Internet traffic conditions and your application behaviors.
SLA (Service Level Agreement) information can be found on the SLA page.
Workflow
The following list outlines the common workflow for cloud connectivity:
1. Design and plan your connectivity topology and list the address spaces for all networks you want to connect.
25 | P a g e
70-534 Architecting Microsoft Azure Solutions
4. Create and configure connections to on-premises networks or other virtual networks (as needed).
5. Create and configure a Point-to-Site connection for your Azure VPN gateway (as needed).
Design
Connection topologies
Start by looking at the diagrams in the About VPN Gateway article. The article contains basic diagrams, the
deployment models for each topology (Resource Manager or classic), and which deployment tools you can use to
deploy your configuration.
Design basics
The following sections discuss the VPN gateway basics. Also, consider networking services limitations.
About subnets
When you are creating connections, you must consider your subnet ranges. You cannot have overlapping subnet
address ranges. An overlapping subnet is when one virtual network or on-premises location contains the same
address space that the other location contains. This means that you need your network engineers for your local on-
premises networks to carve out a range for you to use for your Azure IP addressing space/subnets. You need address
space that is not being used on the local on-premises network.
Avoiding overlapping subnets is also important when you are working with VNet-to-VNet connections. If your subnets
overlap and an IP address exists in both the sending and destination VNets, VNet-to-VNet connections fail. Azure can't
route the data to the other VNet because the destination address is part of the sending VNet.
VPN Gateways require a specific subnet called a gateway subnet. All gateway subnets must be named GatewaySubnet
to work properly. Be sure not to name your gateway subnet a different name, and don't deploy VMs or anything else
to the gateway subnet. See Gateway Subnets.
The local network gateway typically refers to your on-premises location. In the classic deployment model, the local
network gateway is referred to as a Local Network Site. When you configure a local network gateway, you give it a
name, specify the public IP address of the on-premises VPN device, and specify the address prefixes that are in the on-
premises location. Azure looks at the destination address prefixes for network traffic, consults the configuration that
you have specified for the local network gateway, and routes packets accordingly. You can modify these address
prefixes as needed. For more information, see Local network gateways.
Selecting the correct gateway type for your topology is critical. If you select the wrong type, your gateway won't work
properly. The gateway type specifies how the gateway itself connects and is a required configuration setting for the
Resource Manager deployment model.
Vpn
ExpressRoute
Each configuration requires a specific connection type. The connection types are:
IPsec
26 | P a g e
70-534 Architecting Microsoft Azure Solutions
Vnet2Vnet
ExpressRoute
VPNClient
Each configuration requires a specific VPN type. If you are combining two configurations, such as creating a Site-to-
Site connection and a Point-to-Site connection to the same VNet, you must use a VPN type that satisfies both
connection requirements.
PolicyBased: PolicyBased VPNs were previously called static routing gateways in the classic deployment model.
Policy-based VPNs encrypt and direct packets through IPsec tunnels based on the IPsec policies configured
with the combinations of address prefixes between your on-premises network and the Azure VNet. The policy
(or traffic selector) is usually defined as an access list in the VPN device configuration. The value for a
PolicyBased VPN type is PolicyBased. When using a PolicyBased VPN, keep in mind the following limitations:
o PolicyBased VPNs can only be used on the Basic gateway SKU. This VPN type is not compatible with
other gateway SKUs.
o You can only use PolicyBased VPNs for S2S connections, and only for certain configurations. Most VPN
Gateway configurations require a RouteBased VPN.
RouteBased: RouteBased VPNs were previously called dynamic routing gateways in the classic deployment
model. RouteBased VPNs use "routes" in the IP forwarding or routing table to direct packets into their
corresponding tunnel interfaces. The tunnel interfaces then encrypt or decrypt the packets in and out of the
tunnels. The policy (or traffic selector) for RouteBased VPNs are configured as any-to-any (or wild cards). The
value for a RouteBased VPN type is RouteBased.
The following tables show the VPN type as it maps to each connection configuration. Make sure the VPN type for your
gateway matches the configuration that you want to create.
RouteBased PolicyBased
27 | P a g e
70-534 Architecting Microsoft Azure Solutions
RouteBased PolicyBased
Dynamic Static
To configure a Site-to-Site connection, regardless of deployment model, you need the following items:
28 | P a g e
70-534 Architecting Microsoft Azure Solutions
inspection and auditing. This is a critical security requirement for most enterprise IT policies.
Without forced tunneling, Internet-bound traffic from your VMs in Azure will always traverse from Azure network
infrastructure directly out to the Internet, without the option to allow you to inspect or audit the traffic. Unauthorized
Internet access can potentially lead to information disclosure or other types of security breaches.
A forced tunneling connection can be configured in both deployment models and by using different tools. For more
information, see Configure forced tunneling.
Azure Load Balancer delivers high availability and network performance to your applications. It is a Layer 4 (TCP, UDP)
load balancer that distributes incoming traffic among healthy instances of services defined in a load-balanced set.+
Load balance incoming Internet traffic to virtual machines. This configuration is known as Internet-facing load
balancing.
Virtual machines deployed within a cloud service boundary can be grouped to use a load balancer. In this model a
public IP address and a Fully Qualified Domain Name, (FQDN) are assigned to a cloud service. The load balancer does
port translation and load balances the network traffic by using the public IP address for the cloud service.
Load-balanced traffic is defined by endpoints. Port translation endpoints have a one-to-one relationship between the
public-assigned port of the public IP address and the local port assigned to the service on a specific virtual machine.
Load balancing endpoints have a one-to-many relationship between the public IP address and the local ports assigned
to the services on the virtual machines in the cloud service.
The domain label for the public IP address that the load balancer uses for this deployment model is <cloud service
name>.cloudapp.net. The following graphic shows the Azure Load Balancer in this model.
29 | P a g e
70-534 Architecting Microsoft Azure Solutions
In the Resource Manager deployment model there is no need to create a Cloud service. The load balancer is created
to explicitly route traffic among multiple virtual machines.
A public IP address is an individual resource that has a domain label (DNS name). The public IP address is associated
with the load balancer resource. Load balancer rules and inbound NAT rules use the public IP address as the Internet
endpoint for the resources that are receiving load-balanced network traffic.
A private or public IP address is assigned to the network interface resource attached to a virtual machine. Once a
network interface is added to a load balancer's back-end IP address pool, the load balancer is able to send load-
balanced network traffic based on the load-balanced rules that are created.
The following graphic shows the Azure Load Balancer in this model:1
The load balancer can be managed through Resource Manager-based templates, APIs, and tools. To learn more about
Resource Manager, see the Resource Manager overview.
Hash-based distribution
Azure Load Balancer uses a hash-based distribution algorithm. By default, it uses a 5-tuple hash composed of source
IP, source port, destination IP, destination port, and protocol type to map traffic to available servers. It provides
stickiness only within a transport session. Packets in the same TCP or UDP session will be directed to the same
instance behind the load-balanced endpoint. When the client closes and reopens the connection or starts a new
session from the same source IP, the source port changes. This may cause the traffic to go to a different endpoint in a
different datacenter.
For more details, see Load balancer distribution mode. The following graphic shows the hash-based distribution:
30 | P a g e
70-534 Architecting Microsoft Azure Solutions
Port forwarding
Azure Load Balancer gives you control over how inbound communication is managed. This communication includes
traffic initiated from Internet hosts, virtual machines in other cloud services, or virtual networks. This control is
represented by an endpoint (also called an input endpoint).
An input endpoint listens on a public port and forwards traffic to an internal port. You can map the same ports for an
internal or external endpoint or use a different port for them. For example, you can have a web server configured to
listen to port 81 while the public endpoint mapping is port 80. The creation of a public endpoint triggers the creation
of a load balancer instance.
When created using the Azure portal, the portal automatically creates endpoints to the virtual machine for the
Remote Desktop Protocol (RDP) and remote Windows PowerShell session traffic. You can use these endpoints to
remotely administer the virtual machine over the Internet.
Automatic reconfiguration
Azure Load Balancer instantly reconfigures itself when you scale instances up or down. For example, this
reconfiguration happens when you increase the instance count for web/worker roles in a cloud service or when you
add additional virtual machines into the same load-balanced set.
Service monitoring
Azure Load Balancer can probe the health of the various server instances. When a probe fails to respond, the load
balancer stops sending new connections to the unhealthy instances. Existing connections are not impacted.
o Guest agent probe (on Platform as a Service Virtual Machines only): The load balancer utilizes the
guest agent inside the virtual machine. The guest agent listens and responds with an HTTP 200 OK
response only when the instance is in the ready state (i.e. the instance is not in a state like busy,
recycling, or stopping). If the agent fails to respond with an HTTP 200 OK, the load balancer marks the
instance as unresponsive and stops sending traffic to that instance. The load balancer continues to
ping the instance. If the guest agent responds with an HTTP 200, the load balancer will send traffic to
that instance again. When you're using a web role, your website code typically runs in w3wp.exe,
which is not monitored by the Azure fabric or guest agent. This means that failures in w3wp.exe (e.g.
31 | P a g e
70-534 Architecting Microsoft Azure Solutions
HTTP 500 responses) will not be reported to the guest agent, and the load balancer will not know to
take that instance out of rotation.
o HTTP custom probe: This probe overrides the default (guest agent) probe. You can use it to create
your own custom logic to determine the health of the role instance. The load balancer will regularly
probe your endpoint (every 15 seconds, by default). The instance is considered to be in rotation if it
responds with a TCP ACK or HTTP 200 within the timeout period (default of 31 seconds). This is useful
for implementing your own logic to remove instances from the load balancer's rotation. For example,
you can configure the instance to return a non-200 status if the instance is above 90% CPU. For web
roles that use w3wp.exe, you also get automatic monitoring of your website, since failures in your
website code return a non-200 status to the probe.
o TCP custom probe: This probe relies on successful TCP session establishment to a defined probe port.
Source NAT
All outbound traffic to the Internet that originates from your service undergoes source NAT (SNAT) by using the same
VIP address as the incoming traffic. SNAT provides important benefits:
o It enables easy upgrade and disaster recovery of services, since the VIP can be dynamically mapped to
another instance of the service.
o It makes access control list (ACL) management easier. ACLs expressed in terms of VIPs do not change
as services scale up, down, or get redeployed.
The load balancer configuration supports full cone NAT for UDP. Full cone NAT is a type of NAT where the port allows
inbound connections from any external host (in response to an outbound request).
For each new outbound connection that a virtual machine initiates, an outbound port is also allocated by the load
balancer. The external host sees traffic with a virtual IP (VIP)-allocated port. For scenarios that require a large number
of outbound connections, it is recommended to use instance-level public IP addresses so that the VMs have a
dedicated outbound IP address for SNAT. This reduces the risk of port exhaustion.
Please see outbound connections article for more details on this topic.
You can assign more than one load-balanced public IP address to a set of virtual machines. With this ability, you can
host multiple SSL websites and/or multiple SQL Server AlwaysOn Availability Group listeners on the same set of virtual
machines. For more information, see Multiple VIPs per cloud service.
There are different options to distribute network traffic using Microsoft Azure. These options work differently from
each other, having a different feature set and support different scenarios. They can each be used in isolation, or
combining them.
Azure Load Balancer works at the transport layer (Layer 4 in the OSI network reference stack). It provides
network-level distribution of traffic across instances of an application running in the same Azure data center.
Application Gateway works at the application layer (Layer 7 in the OSI network reference stack). It acts as a
reverse-proxy service, terminating the client connection and forwarding requests to back-end endpoints.
Traffic Manager works at the DNS level. It uses DNS responses to direct end-user traffic to globally distributed
endpoints. Clients then connect to those endpoints directly.
32 | P a g e
70-534 Architecting Microsoft Azure Solutions
Endpoints Azure VMs and Cloud Any Azure internal IP Azure VMs, Cloud
Services role instances address, public internet IP Services, Azure Web
address, Azure VM, or Azure Apps, and external
Cloud Service endpoints
Vnet support Can be used for both Can be used for both Only supports Internet-
Internet facing and Internet facing and internal facing applications
internal (Vnet) (Vnet) applications
applications
Azure Load Balancer and Application Gateway route network traffic to endpoints but they have different usage
scenarios to which traffic to handle. The following table helps understanding the difference between the two load
balancers:
33 | P a g e
70-534 Architecting Microsoft Azure Solutions
Health probes Default: probe interval - 15 secs. Taken Idle probe interval 30 secs. Taken out
out of rotation: 2 Continuous failures. after 5 consecutive live traffic failures
Supports user-defined probes or a single probe failure in idle mode.
Supports user-defined probes
If you need to add, change, or remove IP addresses for a NIC, read the Add, change, or remove IP addresses article. If
you need to add NICs to, or remove NICs from VMs, read the Add or remove NICs article.
Complete the following tasks before completing any steps in any section of this article:
Review the Azure limits article to learn about limits for NICs.
Log in to the Azure portal, Azure command-line interface (CLI), or Azure PowerShell with an Azure account. If
you don't already have an Azure account, sign up for a free trial account.
If using PowerShell commands to complete tasks in this article, install and configure Azure PowerShell by
completing the steps in the How to install and configure Azure PowerShell article. Ensure you have the most
recent version of the Azure PowerShell commandlets installed. To get help for PowerShell commands, with
examples, type get-help <command> -full.
If using Azure Command-line interface (CLI) commands to complete tasks in this article, install and configure
the Azure CLI by completing the steps in the How to install and configure the Azure CLI article. Ensure you
have the most recent version of the Azure CLI installed. To get help for CLI commands, type az <command> --
help.
Create a NIC
When creating a VM using the Azure portal, the portal creates a NIC with default settings for you. If you'd rather
specify all your NIC settings, you can create a NIC with custom settings and attach the NIC to a VM when creating a
VM. You can also create a NIC and add it to an existing VM. To learn how to create a VM with an existing NIC or to add
to, or remove NICs from existing VMs, read the Add or remove NICs article. Before creating a NIC, you must have an
34 | P a g e
70-534 Architecting Microsoft Azure Solutions
existing virtual network (VNet) in the same location and subscription you create a NIC in. To learn how to create a
VNet, read the Create a VNet article.
1. Log in to the Azure portal with an account that is assigned (at a minimum) permissions for the Network
Contributor role for your subscription. Read the Built-in roles for Azure role-based access control article to
learn more about assigning roles and permissions to accounts.
2. In the box that contains the text Search resources at the top of the Azure portal, type network interfaces.
When network interfaces appears in the search results, click it.
4. In the Create network interface blade that appears, enter, or select values for the following settings, then
click Create:
Name Yes The name must be unique within the resource group you select. Over
time, you'll likely have several NICs in your Azure subscription. Read
the Naming conventions article for suggestions when creating a naming
convention to make managing several NICs easier. The name cannot be
changed after the NIC is created.
Virtual Yes Select a VNet to connect the NIC to. You can only connect a NIC to a
network VNet that exists in the same subscription and location as the NIC. Once a
NIC is created, you cannot change the VNet it is connected to. The VM
you add the NIC to must also exist in the same location and subscription
as the NIC.
Subnet Yes Select a subnet within the VNet you selected. You can change the
subnet the NIC is connected to after it's created.
Private IP Yes Choose from the following assignment methods: Dynamic: When
address selecting this option, Azure automatically assigns an available address
assignment from the address space of the subnet you selected. Azure may assign a
different address to a NIC when the VM it's in is started after having
been in the stopped (deallocated) state. The address remains the same
if the VM is restarted without having been in the stopped (deallocated)
state. Static: When selecting this option, you must manually assign an
available IP address from within the address space of the subnet you
selected. Static addresses do not change until you change them or the
NIC is deleted. You can change the assignment method after the NIC is
created. The Azure DHCP server assigns this address to the NIC within
the operating system of the VM.
Network No Leave set to None, select an existing network security group (NSG), or
security group create an NSG. NSGs enable you to filter network traffic in and out of a
NIC. To learn more about NSGs, read the Network security
groups article. To create an NSG, read the Create an NSG article. You
35 | P a g e
70-534 Architecting Microsoft Azure Solutions
can apply zero or one NSG to a NIC. Zero or one NSG can also be applied
to the subnet the NIC is connected to. When an NSG is applied to a NIC
and the subnet the NIC is connected to, sometimes unexpected results
occur. To troubleshoot NSGs applied to NICs and subnets, read
the Troubleshoot NSGs article.
Subscription Yes Select one of your Azure subscriptions. The VM you attach a NIC to and
the VNet you connect it to must exist in the same subscription.
Resource Yes Select an existing resource group or create one. A NIC can exist in the
group same, or different resource group, than the VM you attach it to, or the
VNet you connect it to.
Location Yes The VM you attach a NIC to and the VNet you connect it to must exist in
the same location, also referred to as a region.
The portal doesn't provide the option to assign a public IP address to the NIC when you create it, though it does assign
a public IP address to a NIC when you create a VM using the portal. To learn how to add a public IP address to the NIC
after creating it, read the Add, change, or remove IP addresses article. If you want to create a NIC with a public IP
address, you must use the CLI or PowerShell to create the NIC.
Note
Azure assigns a MAC address to the NIC only after the NIC is attached to a VM and the VM is started the first time. You
cannot specify the MAC address that Azure assigns to the NIC. The MAC address remains assigned to the NIC until the
NIC is deleted or the private IP address assigned to the primary IP configuration of the primary NIC is changed. To
learn more about IP addresses and IP configurations, read the Add, change, or remove IP addresses article.
Commands
Tool Command
PowerShell New-AzureRmNetworkInterface
1. Log in to the Azure portal with an account that is assigned (at a minimum) permissions for the Network
Contributor role for your subscription. Read the Built-in roles for Azure role-based access control article to
learn more about assigning roles and permissions to accounts.
36 | P a g e
70-534 Architecting Microsoft Azure Solutions
2. In the box that contains the text Search resources at the top of the Azure portal, type network interfaces.
When network interfaces appears in the search results, click it.
3. In the Network interfaces blade that appears, click the NIC you want to view or change settings for.
4. The following settings are listed in the blade that appears for the NIC you selected:
Overview: Provides information about the NIC, such as the IP addresses assigned to it, the
VNet/subnet the NIC is connected to, and the VM the NIC is attached to (if it's attached to one). The
following picture shows the overview settings for a NIC
named mywebserver256:
You can move a NIC to a different resource group or subscription by clicking (change) next to
the Resource group or Subscription name. If you move the NIC, you must move all resources related
to the NIC with it. If the NIC is attached to a VM, for example, you must also move the VM, and other
VM-related resources. To move a NIC, read the Move resource to a new resource group or
subscription article. The article lists prerequisites, and how to move resources using the Azure portal,
PowerShell, and the Azure CLI.
IP configurations: Public and private IP addresses are assigned to one or more IP configurations for a
NIC. To learn more about the maximum number of IP configurations supported for a NIC, read
the Azure limits article. Each IP configuration has one assigned private IP address, and may have one
public IP address associated to it. To add, change, or delete IP configurations from the NIC, complete
the steps in the Add a secondary IP configuration to a NIC, Change an IP configuration, or Delete an IP
configurationsections of the Add, change, or remove IP addresses article. IP forwarding and subnet
assignment are also configured in this section. To learn more about these settings, read the Enable-
disable IP forwarding and Change subnet assignment sections of this article.
DNS servers: You can specify which DNS server a NIC is assigned by the Azure DHCP servers. The NIC
can inherit the setting from the VNet the NIC is connected to, or have a custom setting that overrides
the setting for the VNet it's connected to. To modify what's displayed, complete the steps in
the Change DNS servers section of this article.
Network security group (NSG): Displays which NSG is associated to the NIC (if any). An NSG contains
inbound and outbound rules to filter network traffic for the NIC. If an NSG is associated to the NIC, the
name of the associated NSG is displayed. To modify what's displayed, complete the steps in
the Associate an NSG to or disassociate an NSG from a network interface section of this article.
Properties: Displays key settings about the NIC, including its MAC address (blank if the NIC isn't
attached to a VM), and the subscription it exists in.
Effective security rules: Security rules are listed if the NIC is attached to a running VM, and an NSG is
associated to the NIC, the subnet it's connected to, or both. To learn more about what's displayed,
37 | P a g e
70-534 Architecting Microsoft Azure Solutions
read the Troubleshoot network security groups article. To learn more about NSGs, read the Network
security groups article.
Effective routes: Routes are listed if the NIC is attached to a running VM. The routes are a
combination of the Azure default routes, any user-defined routes (UDR), and any BGP routes that may
exist for the subnet the NIC is connected to. To learn more about what's displayed, read
the Troubleshoot routes article. To learn more about Azure default and UDRs, read the User-defined
routesarticle.
Common Azure Resource Manager settings: To learn more about common Azure Resource Manager
settings, read the Activity log, Access control (IAM), Tags, Locks, and Automation script articles.
Commands
Tool Command
CLI az network nic list to view NICs in the subscription; az network nic show to view settings for
a NIC
PowerShell Get-AzureRmNetworkInterface to view NICs in the subscription or view settings for a NIC
The DNS server is assigned by the Azure DHCP server to the NIC within the VM operating system. The DNS server
assigned is whatever the DNS server setting is for a NIC. To learn more about name resolution settings for a NIC, read
the Name resolution for VMs article. The NIC can inherit the settings from the VNet, or use its own unique settings
that override the setting for the VNet.
1. Log in to the Azure portal with an account that is assigned (at a minimum) permissions for the Network
Contributor role for your subscription. Read the Built-in roles for Azure role-based access control article to
learn more about assigning roles and permissions to accounts.
2. In the box that contains the text Search resources at the top of the Azure portal, type network interfaces.
When network interfaces appears in the search results, click it.
3. In the Network interfaces blade that appears, click the NIC you want to view or change settings for.
4. In the blade for the NIC you selected, click DNS servers under SETTINGS.
5. Click either:
Inherit from virtual network (default): Choose this option to inherit the DNS server setting defined for
the virtual network the NIC is connected to. At the VNet level, either a custom DNS server or the
Azure-provided DNS server is defined. The Azure-provided DNS server can resolve hostnames for
resources connected to the same VNet. FQDN must be used to resolve for resources connected to
different VNets.
Custom: You can configure your own DNS server to resolve names across multiple VNets. Enter the IP
address of the server you want to use as a DNS server. The DNS server address you specify is assigned
only to this NIC and overrides any DNS setting for the VNet the NIC is connected to.
6. Click Save.
Commands
38 | P a g e
70-534 Architecting Microsoft Azure Solutions
Tool Command
PowerShell Set-AzureRmNetworkInterface
Enable-disable IP forwarding
Receive network traffic not destined for one of the IP addresses assigned to any of the IP configurations
assigned to the NIC.
Send network traffic with a different source IP address than the one assigned to one of a NIC's IP
configurations.
The setting must be enabled for every NIC attached to the VM that receives traffic that the VM needs to forward. A
VM can forward traffic whether it has multiple NICs or a single NIC attached to it. While IP forwarding is an Azure
setting, the VM must also run an application able to forward the traffic, such as firewall, WAN optimization, and load
balancing applications. When a VM is running network applications, the VM is often referred to as a network virtual
appliance (NVA). You can view a list of ready to deploy NVAs in the Azure Marketplace. IP forwarding is typically used
with user-defined routes. To learn more about user-defined routes, read the User-defined routes article.
1. Log in to the Azure portal with an account that is assigned (at a minimum) permissions for the Network
Contributor role for your subscription. Read the Built-in roles for Azure role-based access control article to
learn more about assigning roles and permissions to accounts.
2. In the box that contains the text Search resources at the top of the Azure portal, type network interfaces.
When network interfaces appears in the search results, click it.
3. In the Network interfaces blade that appears, click the NIC you want to enable or disable IP forwarding for.
4. In the blade for the NIC you selected, click IP configurations in the SETTINGS section.
6. Click Save.
Commands
Tool Command
PowerShell Set-AzureRmNetworkInterface
You can change the subnet, but not the VNet, that a NIC is connected to.
39 | P a g e
70-534 Architecting Microsoft Azure Solutions
1. Log in to the Azure portal with an account that is assigned (at a minimum) permissions for the Network
Contributor role for your subscription. Read the Built-in roles for Azure role-based access control article to
learn more about assigning roles and permissions to accounts.
2. In the box that contains the text Search resources at the top of the Azure portal, type network interfaces.
When network interfaces appears in the search results, click it.
3. In the Network interfaces blade that appears, click the NIC you want to view or change settings for.
4. Click IP configurations under SETTINGS in the blade for the NIC you selected. If any private IP addresses for
any IP configurations listed have (Static) next to them, you must change the IP address assignment method to
dynamic by completing the steps that follow. All private IP addresses must be assigned with the dynamic
assignment method to change the subnet assignment for the NIC. If the addresses are assigned with the
dynamic method, continue to step five. If any addresses are assigned with the static assignment method,
complete the following steps to change the assignment method to dynamic:
Click the IP configuration you want to change the IP address assignment method for from the list of IP
configurations.
In the blade that appears for the IP configuration, click Dynamic for the Assignment method.
Click Save.
5. Select the subnet you want to connect the NIC to from the Subnet drop-down list.
6. Click Save. New dynamic addresses are assigned from the subnet address range for the new subnet. After
assigning the NIC to a new subnet, you can assign a static IP address from the new subnet address range if you
choose. To learn more about adding, changing, and removing IP addresses for a NIC, read the Add, change, or
remove IP addresses article.
Commands
Tool Command
PowerShell Set-AzureRmNetworkInterfaceIpConfig
Delete a NIC
You can delete a NIC as long as it's not attached to a VM. If it is attached to a VM, you must first place the VM in the
stopped (deallocated) state, then detach the NIC from the VM, before you can delete the NIC. To detach a NIC from a
VM, complete the steps in the Detach a NIC from a virtual machine section of the Add or remove network
interfaces article. Deleting a VM detaches all NICs attached to it, but does not delete the NICs.
1. Log in to the Azure portal with an account that is assigned (at a minimum) permissions for the Network
Contributor role for your subscription. Read the Built-in roles for Azure role-based access control article to
learn more about assigning roles and permissions to accounts.
2. In the box that contains the text Search resources at the top of the Azure portal, type network interfaces.
When network interfaces appears in the search results, click it.
40 | P a g e
70-534 Architecting Microsoft Azure Solutions
When you delete a NIC, any MAC or IP addresses assigned to it are released.
Commands
Tool Command
PowerShell Remove-AzureRmNetworkInterface
Note
Scenario
You may need a static IP address for web servers that require
SSL connections in which the SSL certificate is linked to an IP
address.
1. From a browser, navigate to the Azure portal and, if necessary, sign in with
your Azure account.
4. In the Basics blade, enter the VM information as shown below, and then
click OK.
6. In the Settings blade, click Public IP address, then in the Create public IP
address blade, under Assignment, click Static as shown below. And then
click OK.
8. Review the Summary blade, as shown below, and then click OK.
9. Once the VM is created, the Settings blade will be displayed as shown below
42 | P a g e
70-534 Architecting Microsoft Azure Solutions
Scenario
To better illustrate how to create NSGs, this document will use the scenario below.
43 | P a g e
70-534 Architecting Microsoft Azure Solutions
In this scenario you will create an NSG for each subnet in the TestVNet virtual network, as described below:
NSG-FrontEnd. The front end NSG will be applied to the FrontEnd subnet, and contain two rules:
o rdp-rule. This rule will allow RDP traffic to the FrontEnd subnet.
o web-rule. This rule will allow HTTP traffic to the FrontEnd subnet.
NSG-BackEnd. The back end NSG will be applied to the BackEnd subnet, and contain two rules:
o sql-rule. This rule allows SQL traffic only from the FrontEnd subnet.
o web-rule. This rule denies all internet bound traffic from the BackEnd subnet.
The combination of these rules create a DMZ-like scenario, where the back end subnet can only receive incoming
traffic for SQL from the front end subnet, and has no access to the Internet, while the front end subnet can
communicate with the Internet, and receive incoming HTTP requests only.
The sample PowerShell commands below expect a simple environment already created based on the scenario above.
If you want to run the commands as they are displayed in this document, first build the test environment by
deploying this template, click Deploy to Azure, replace the default parameter values if necessary, and follow the
instructions in the portal. The steps below use RG-NSG as the name of the resource group the template was deployed
to.
To create the NSG-FrontEnd NSG as shown in the scenario above, follow the steps below.
1. From a browser, navigate to http://portal.azure.com and, if necessary, sign in with your Azure account.
44 | P a g e
70-534 Architecting Microsoft Azure Solutions
4. In the Create network security group blade, create an NSG named NSG-FrontEnd in the RG-NSG resource
group, and then click Create.
45 | P a g e
70-534 Architecting Microsoft Azure Solutions
To create rules in an existing NSG from the Azure portal, follow the steps below.
46 | P a g e
70-534 Architecting Microsoft Azure Solutions
4. In the Add inbound security rule blade, create a rule named web-rule with priority of 200 allowing access
via TCP to port 80 to any VM from any source, and then click OK. Notice that most of these settings are
default values already.
5. After a few seconds you will see the new rule in the NSG.
47 | P a g e
70-534 Architecting Microsoft Azure Solutions
6. Repeat steps to 6 to create an inbound rule named rdp-rule with a priority of 250 allowing access via TCP to
port 3389 to any VM from any source.
3. In the Settings blade, click Subnets > FrontEnd > Network security group > NSG-FrontEnd.
48 | P a g e
70-534 Architecting Microsoft Azure Solutions
To create the NSG-BackEnd NSG and associate it to the BackEnd subnet, follow the steps below.
1. Repeat the steps in Create the NSG-FrontEnd NSG to create an NSG named NSG-BackEnd
2. Repeat the steps in Create rules in an existing NSG to create the inbound rules in the table below.
49 | P a g e
70-534 Architecting Microsoft Azure Solutions
3. Repeat the steps in Associate the NSG to the FrontEnd subnet to associate the NSG-Backend NSG to
the BackEnd subnet.
For more information about UDR and IP forwarding, visit User Defined Routes and IP Forwarding.
Scenario
To better illustrate how to create UDRs, this document will use the scenario below.
50 | P a g e
70-534 Architecting Microsoft Azure Solutions
In this scenario you will create one UDR for the Front end subnet and another UDR for the Back end subnet , as
described below:
UDR-FrontEnd. The front end UDR will be applied to the FrontEnd subnet, and contain one route:
o RouteToBackend. This route will send all traffic to the back end subnet to the FW1 virtual machine.
UDR-BackEnd. The back end UDR will be applied to the BackEnd subnet, and contain one route:
o RouteToFrontend. This route will send all traffic to the front end subnet to the FW1 virtual machine.
The combination of these routes will ensure that all traffic destined from one subnet to another will be routed to
the FW1 virtual machine, which is being used as a virtual appliance. You also need to turn on IP forwarding for that
VM, to ensure it can receive traffic destined to other VMs.
The sample PowerShell commands below expect a simple environment already created based on the scenario above.
If you want to run the commands as they are displayed in this document, first build the test environment by
deploying this template, click Deploy to Azure, replace the default parameter values if necessary, and follow the
instructions in the portal.
To perform the steps in this article, you'll need to to install and configure Azure PowerShell and follow the instructions
all the way to the end to sign into Azure and select your subscription.
51 | P a g e
70-534 Architecting Microsoft Azure Solutions
Note
If you don't have an Azure account, you'll need one. Go sign up for a free trial here.
To create the route table and route needed for the front-end subnet based on the scenario above, complete the
following steps:
1. Create a route used to send all traffic destined to the back-end subnet (192.168.2.0/24) to be routed to
the FW1 virtual appliance (192.168.0.4).
$route = New-AzureRmRouteConfig -Name RouteToBackEnd `
-NextHopIpAddress 192.168.0.4
2. Create a route table named UDR-FrontEnd in the westus region that contains the route.
$routeTable = New-AzureRmRouteTable -ResourceGroupName TestRG -Location westus `
3. Create a variable that contains the VNet where the subnet is. In our scenario, the VNet is named TestVNet.
$vnet = Get-AzureRmVirtualNetwork -ResourceGroupName TestRG -Name TestVNet
Warning
The output for the command above shows the content for the virtual network configuration object, which only exists
on the computer where you are running PowerShell. You need to run the Set-AzureVirtualNetwork cmdlet to save
these settings to Azure.
Expected output:
Name : TestVNet
ResourceGroupName : TestRG
Location : westus
Id : /subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/virtualNetworks/TestVNet
Etag : W/"[Id]"
ProvisioningState : Succeeded
Tags :
Name Value
=========== =====
displayName VNet
AddressSpace : {
"AddressPrefixes": [
"192.168.0.0/16"
]
}
DhcpOptions : {
"DnsServers": null
}
NetworkInterfaces : null
52 | P a g e
70-534 Architecting Microsoft Azure Solutions
Subnets : [
...,
{
"Name": "FrontEnd",
"Etag": "W/\"[Id]\"",
"Id":
"/subscriptions/[Id]/resourceGroups/TestRG/providers/Microsoft.Network/virtualNetworks/TestVNet/subnets/F
rontEnd",
"AddressPrefix": "192.168.1.0/24",
"IpConfigurations": [
{
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/networkInterfaces/NICWEB2/ipConfigurations/ipconfig
1"
},
{
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/networkInterfaces/NICWEB1/ipConfigurations/ipconfig
1"
}
],
"NetworkSecurityGroup": {
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/networkSecurityGroups/NSG-FrontEnd"
},
"RouteTable": {
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/routeTables/UDR-FrontEnd"
},
"ProvisioningState": "Succeeded"
},
...
]
Create the UDR for the back-end subnet
To create the route table and route needed for the back-end subnet based on the scenario above, follow the steps
below.
1. Create a route used to send all traffic destined to the front-end subnet (192.168.1.0/24) to be routed to
the FW1 virtual appliance (192.168.0.4).
$route = New-AzureRmRouteConfig -Name RouteToFrontEnd `
-AddressPrefix 192.168.1.0/24 -NextHopType VirtualAppliance `
-NextHopIpAddress 192.168.0.4
2. Create a route table named UDR-BackEnd in the uswest region that contains the route created above.
$routeTable = New-AzureRmRouteTable -ResourceGroupName TestRG -Location westus `
-Name UDR-BackEnd -Route $route
Expected output:
Name : TestVNet
ResourceGroupName : TestRG
Location : westus
Id : /subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/virtualNetworks/TestVNet
53 | P a g e
70-534 Architecting Microsoft Azure Solutions
Etag : W/"[Id]"
ProvisioningState : Succeeded
Tags :
Name Value
=========== =====
displayName VNet
AddressSpace : {
"AddressPrefixes": [
"192.168.0.0/16"
]
}
DhcpOptions : {
"DnsServers": null
}
NetworkInterfaces : null
Subnets : [
...,
{
"Name": "BackEnd",
"Etag": "W/\"[Id]\"",
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/virtualNetworks/TestVNet/subnets/BackEnd",
"AddressPrefix": "192.168.2.0/24",
"IpConfigurations": [
{
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/networkInterfaces/NICSQL2/ipConfigurations/ipconfig
1"
},
{
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/networkInterfaces/NICSQL1/ipConfigurations/ipconfig
1"
}
],
"NetworkSecurityGroup": {
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/networkSecurityGroups/NSG-BacEnd"
},
"RouteTable": {
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/routeTables/UDR-BackEnd"
},
"ProvisioningState": "Succeeded"
}
]
Enable IP forwarding on FW1
To enable IP forwarding in the NIC used by FW1, follow the steps below.
1. Create a variable that contains the settings for the NIC used by FW1. In our scenario, the NIC is
named NICFW1.
$nicfw1 = Get-AzureRmNetworkInterface -ResourceGroupName TestRG -Name NICFW1
Expected output:
Name : NICFW1
ResourceGroupName : TestRG
54 | P a g e
70-534 Architecting Microsoft Azure Solutions
Location : westus
Id : /subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/networkInterfaces/NICFW1
Etag : W/"[Id]"
ProvisioningState : Succeeded
Tags :
Name Value
=========== =======================
displayName NetworkInterfaces - DMZ
VirtualMachine : {
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Compute/virtualMachines/FW1"
}
IpConfigurations : [
{
"Name": "ipconfig1",
"Etag": "W/\"[Id]\"",
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/networkInterfaces/NICFW1/ipConfigurations/ipconfig1
",
"PrivateIpAddress": "192.168.0.4",
"PrivateIpAllocationMethod": "Static",
"Subnet": {
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/virtualNetworks/TestVNet/subnets/DMZ"
},
"PublicIpAddress": {
"Id": "/subscriptions/[Subscription
Id]/resourceGroups/TestRG/providers/Microsoft.Network/publicIPAddresses/PIPFW1"
},
"LoadBalancerBackendAddressPools": [],
"LoadBalancerInboundNatRules": [],
"ProvisioningState": "Succeeded"
}
]
DnsSettings : {
"DnsServers": [],
"AppliedDnsServers": [],
"InternalDnsNameLabel": null,
"InternalFqdn": null
}
EnableIPForwarding : True
NetworkSecurityGroup : null
Primary : True
55 | P a g e
70-534 Architecting Microsoft Azure Solutions
Azure allows you to create resources, such as VMs, in defined geographic regions like 'West US', 'North Europe', or
'Southeast Asia'. There are currently 30 Azure regions around the world. You can review the list of regions and their
locations. Within each region, multiple datacenters exist to provide for redundancy and availability. This approach
gives you flexibility when building your applications to create VMs closest to your users and to meet any legal,
compliance, or tax purposes.
There are some special Azure regions for compliance or legal purposes that you may wish to use when building out
your applications. These special regions include:
o A physical and logical network-isolated instance of Azure for US government agencies and partners,
operated by screened US persons. Includes additional compliance certifications such
as FedRAMP and DISA. Read more about Azure Government.
o These regions are available through a unique partnership between Microsoft and 21Vianet, whereby
Microsoft does not directly maintain the datacenters. See more about Microsoft Azure in China.
o These regions are currently available via a data trustee model whereby customer data remains in
Germany under control of T-Systems, a Deutsche Telekom company, acting as the German data
trustee.
Region pairs
Each Azure region is paired with another region within the same geography (such as US, Europe, or Asia). This
approach allows for the replication of resources, such as VM storage, across a geography that should reduce the
likelihood of natural disasters, civil unrest, power outages, or physical network outages affecting both regions at once.
Additional advantages of region pairs include:
In the event of a wider Azure outage, one region is prioritized out of every pair to help reduce the time to
restore for applications.
Planned Azure updates are rolled out to paired regions one at a time to minimize downtime and risk of
application outage.
Data continues to reside within the same geography as its pair (except for Brazil South) for tax and law
enforcement jurisdiction purposes.
56 | P a g e
70-534 Architecting Microsoft Azure Solutions
Primary Secondary
West US East US
Feature availability
Some services or VM features are only available in certain regions, such as specific VM sizes or storage types. There
are also some global Azure services that do not require you to select a particular region, such as Azure Active
Directory, Traffic Manager, or Azure DNS. To assist you in designing your application environment, you can check
the availability of Azure services across each region.
Storage availability
Understanding Azure regions and geographies becomes important when you consider the available storage replication
options. Depending on the storage type, you have different replication options.
o Replicates your data three times within the region in which you created your storage account.
o Replicates your data three times within the region in which you created your storage account.
o Replicates your data three times across two to three facilities, either within a single region or across
two regions.
o Replicates your data to a secondary region that is hundreds of miles away from the primary region.
o Replicates your data to a secondary region, as with GRS, but also then provides read-only access to
the data in the secondary location.
The following table provides a quick overview of the differences between the storage replication types:
57 | P a g e
70-534 Architecting Microsoft Azure Solutions
RA-
Replication strategy LRS ZRS GRS GRS
Data can be read from the secondary location and from the primary No No No Yes
location.
You can read more about Azure Storage replication options here. For more information about managed disks,
see Azure Managed Disks overview.
Storage costs
Prices vary depending on the storage type and availability that you select.
Premium Managed Disks are backed by Solid State Drives (SSDs) and Standard Managed Disks are backed by
regular spinning disks. Both Premium and Standard Managed Disks are charged based on the provisioned
capacity for the disk.
Unmanaged disks
Premium storage is backed by Solid State Drives (SSDs) and is charged based on the capacity of the disk.
Standard storage is backed by regular spinning disks and is charged based on the in-use capacity and desired
storage availability.
o For RA-GRS, there is an additional Geo-Replication Data Transfer charge for the bandwidth of
replicating that data to another Azure region.
Azure images
In Azure, VMs are created from an image. Typically, images are from the Azure Marketplace where partners can
provide pre-configured complete OS or application images.
When you create a VM from an image in the Azure Marketplace, you are actually working with templates. Azure
Resource Manager templates are declarative JavaScript Object Notation (JSON) files that can be used to create
complex application environments comprising VMs, storage, virtual networking, etc. You can read more about
using Azure Resource Manager templates, including how to build your own templates.
You can also create your own custom images and upload them using Azure CLI or Azure PowerShell to quickly create
custom VMs to your specific build requirements.
Availability sets
An availability set is a logical grouping of VMs that allows Azure to understand how your application is built to provide
for redundancy and availability. It is recommended that two or more VMs are created within an availability set to
provide for a highly available application and to meet the 99.95% Azure SLA. When a single VM is using Azure
Premium Storage, the Azure SLA applies for unplanned maintenance events. An availability set is compromised of two
additional groupings that protect against hardware failures and allow updates to safely be applied - fault domains
(FDs) and update domains (UDs).
58 | P a g e
70-534 Architecting Microsoft Azure Solutions
You can read more about how to manage the availability of Linux VMs or Windows VMs.
Fault domains
A fault domain is a logical group of underlying hardware that share a common power source and network switch,
similar to a rack within an on-premises datacenter. As you create VMs within an availability set, the Azure platform
automatically distributes your VMs across these fault domains. This approach limits the impact of potential physical
hardware failures, network outages, or power interruptions.
For VMs using Azure Managed Disks, VMs are aligned with managed disk fault domains when using a managed
availability set. This alignment ensures that all the managed disks attached to a VM are within the same managed disk
fault domain. Only VMs with managed disks can be created in a managed availability set. The number of managed disk
fault domains varies by region - either two or three managed disk fault domains per region.
Important
The number of fault domains for managed availability sets varies by region - either two or three per region
Update domains
An update domain is a logical group of underlying hardware that can undergo maintenance or be rebooted at the
same time. As you create VMs within an availability set, the Azure platform automatically distributes your VMs across
59 | P a g e
70-534 Architecting Microsoft Azure Solutions
these update domains. This approach ensures that at least one instance of your application always remains running as
the Azure platform undergoes periodic maintenance. The order of update domains being rebooted may not proceed
sequentially during planned maintenance, but only one update domain is rebooted at a time.
There are two types of Microsoft Azure platform events that can affect the availability of your virtual machines:
planned maintenance and unplanned maintenance.
Planned maintenance events are periodic updates made by Microsoft to the underlying Azure platform to
improve overall reliability, performance, and security of the platform infrastructure that your virtual machines
run on. Most of these updates are performed without any impact upon your virtual machines or cloud
services. However, there are instances where these updates require a reboot of your virtual machine to apply
the required updates to the platform infrastructure.
Unplanned maintenance events occur when the hardware or physical infrastructure underlying your virtual
machine has faulted in some way. This may include local network failures, local disk failures, or other rack
level failures. When such a failure is detected, the Azure platform automatically migrates your virtual machine
from the unhealthy physical machine hosting your virtual machine to a healthy physical machine. Such events
are rare, but may also cause your virtual machine to reboot.
To reduce the impact of downtime due to one or more of these events, we recommend the following high availability
best practices for your virtual machines:
To provide redundancy to your application, we recommend that you group two or more virtual machines in an
availability set. This configuration ensures that during either a planned or unplanned maintenance event, at least one
virtual machine is available and meets the 99.95% Azure SLA. For more information, see the SLA for Virtual Machines.
Important
Avoid leaving a single instance virtual machine in an availability set by itself. VMs in this configuration do not qualify
for a SLA guarantee and face downtime during Azure planned maintenance events, except when a single VM is
using Azure Premium Storage. For single VMs using premium storage, the Azure SLA applies.
Each virtual machine in your availability set is assigned an update domain and a fault domain by the underlying Azure
platform. For a given availability set, five non-user-configurable update domains are assigned by default (Resource
Manager deployments can then be increased to provide up to 20 update domains) to indicate groups of virtual
machines and underlying physical hardware that can be rebooted at the same time. When more than five virtual
machines are configured within a single availability set, the sixth virtual machine is placed into the same update
domain as the first virtual machine, the seventh in the same update domain as the second virtual machine, and so on.
The order of update domains being rebooted may not proceed sequentially during planned maintenance, but only
one update domain is rebooted at a time.
Fault domains define the group of virtual machines that share a common power source and network switch. By
default, the virtual machines configured within your availability set are separated across up to three fault domains for
Resource Manager deployments (two fault domains for Classic). While placing your virtual machines into an
availability set does not protect your application from operating system or application-specific failures, it does limit
the impact of potential physical hardware failures, network outages, or power interruptions.
If you are currently using VMs with unmanaged disks, we highly recommend you convert VMs in Availability Set to use
Managed Disks.
Managed disks provides better reliability for Availability Sets by ensuring that the disks of VMs in an Availability Set are
sufficiently isolated from each other to avoid single points of failure. It does this by automatically placing the disks in
60 | P a g e
70-534 Architecting Microsoft Azure Solutions
different storage clusters. If a storage cluster fails due to hardware or software failure, only the VM instances with
disks on those stamps fail.
If you plan to use VMs with unmanaged disks, follow below best practices for Storage accounts where virtual hard
disks (VHDs) of VMs are stored as page blobs.
1. Keep all disks (OS and data) associated with a VM in the same storage account
2. Review the limits on the number of unmanaged disks in a Storage account before adding more VHDs to a
storage account
3. Use separate storage account for each VM in an Availability Set. Do not share Storage accounts with multiple
VMs in the same Availability Set. It is acceptable for VMs across different Availability Sets to share storage
accounts if above best practices are followed
If your virtual machines are all nearly identical and serve the
same purpose for your application, we recommend that you
configure an availability set for each tier of your application. If
you place two different tiers in the same availability set, all
virtual machines in the same application tier can be rebooted
at once. By configuring at least two virtual machines in an
availability set for each tier, you guarantee that at least one
virtual machine in each tier is available.
For example, you could put all the virtual machines in the
front-end of your application running IIS, Apache, Nginx in a
single availability set. Make sure that only front-end virtual
machines are placed in the same availability set. Similarly,
make sure that only data-tier virtual machines are placed in
their own availability set, like your replicated SQL Server virtual machines or your MySQL virtual machines.
Combine the Azure Load Balancer with an availability set to get the most application resiliency. The Azure Load
Balancer distributes traffic between multiple virtual machines. For our Standard tier virtual machines, the Azure Load
Balancer is included. Not all virtual machine tiers include the Azure Load Balancer. For more information about load
balancing your virtual machines, see Load Balancing virtual machines.
If the load balancer is not configured to balance traffic across multiple virtual machines, then any planned
maintenance event affects the only traffic-serving virtual machine, causing an outage to your application tier. Placing
multiple virtual machines of the same tier under the same load balancer and availability set enables traffic to be
continuously served by at least one instance.
This article describes the available sizes and options for the Azure virtual machines you can use to run your Windows
apps and workloads. It also provides deployment considerations to be aware of when you're planning to use these
resources. This article is also available for Linux virtual machines.
61 | P a g e
70-534 Architecting Microsoft Azure Solutions
General purpose DSv2, Dv2, DS, Balanced CPU-to-memory ratio. Ideal for testing and
D, Av2, A0-7 development, small to medium databases, and low to medium
traffic web servers.
Compute Fs, F High CPU-to-memory ratio. Good for medium traffic web
optimized servers, network appliances, batch processes, and application
servers.
Storage optimized Ls High disk throughput and IO. Ideal for Big Data, SQL, and NoSQL
databases.
High performance H, A8-11 Our fastest and most powerful CPU virtual machines with
compute optional high-throughput network interfaces (RDMA).
Template format
62 | P a g e
70-534 Architecting Microsoft Azure Solutions
$schema Yes Location of the JSON schema file that describes the version of the
template language. Use the URL shown in the preceding example.
contentVersion Yes Version of the template (such as 1.0.0.0). You can provide any value for
this element. When deploying resources using the template, this value
can be used to make sure that the right template is being used.
variables No Values that are used as JSON fragments in the template to simplify
template language expressions.
resources Yes Resource types that are deployed or updated in a resource group.
Each element contains properties you can set. The following example contains the full syntax for a template:
{
"$schema": "http://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "",
"parameters": {
"<parameter-name>" : {
"type" : "<type-of-parameter-value>",
"defaultValue": "<default-value-of-parameter>",
"allowedValues": [ "<array-of-allowed-values>" ],
"minValue": <minimum-value-for-int>,
"maxValue": <maximum-value-for-int>,
"minLength": <minimum-length-for-string-or-array>,
"maxLength": <maximum-length-for-string-or-array-parameters>,
"metadata": {
"description": "<description-of-the parameter>"
}
}
},
"variables": {
"<variable-name>": "<variable-value>",
"<variable-name>": {
<variable-complex-type-value>
}
},
"resources": [
{
"condition": "<boolean-value-whether-to-deploy>",
"apiVersion": "<api-version-of-resource>",
"type": "<resource-provider-namespace/resource-type-name>",
"name": "<name-of-the-resource>",
"location": "<location-of-resource>",
"tags": {
"<tag-name1>": "<tag-value1>",
"<tag-name2>": "<tag-value2>"
63 | P a g e
70-534 Architecting Microsoft Azure Solutions
},
"comments": "<your-reference-notes>",
"copy": {
"name": "<name-of-copy-loop>",
"count": "<number-of-iterations>",
"mode": "<serial-or-parallel>",
"batchSize": "<number-to-deploy-serially>"
},
"dependsOn": [
"<array-of-related-resource-names>"
],
"properties": {
"<settings-for-the-resource>",
"copy": [
{
"name": ,
"count": ,
"input": {}
}
]
},
"resources": [
"<array-of-child-resources>"
]
}
],
"outputs": {
"<outputName>" : {
"type" : "<type-of-output-value>",
"value": "<output-value-expression>"
}
}
}
We examine the sections of the template in greater detail later in this topic.
The basic syntax of the template is JSON. However, expressions and functions extend the JSON values available within
the template. Expressions are written within JSON string literals whose first and last characters are the
brackets: [ and ], respectively. The value of the expression is evaluated when the template is deployed. While written
as a string literal, the result of evaluating the expression can be of a different JSON type, such as an array or integer,
depending on the actual expression. To have a literal string start with a bracket [, but not have it interpreted as an
expression, add an extra bracket to start the string with [[.
Typically, you use expressions with functions to perform operations for configuring the deployment. Just like in
JavaScript, function calls are formatted as functionName(arg1,arg2,arg3). You reference properties by using the dot
and [index] operators.
The following example shows how to use several functions when constructing values:
"variables": {
"location": "[resourceGroup().location]",
"usernameAndPassword": "[concat(parameters('username'), ':', parameters('password'))]",
"authorizationHeader": "[concat('Basic ', base64(variables('usernameAndPassword')))]"
}
For the full list of template functions, see Azure Resource Manager template functions.
Parameters
In the parameters section of the template, you specify which values you can input when deploying the resources.
These parameter values enable you to customize the deployment by providing values that are tailored for a particular
environment (such as dev, test, and production). You do not have to provide parameters in your template, but
without parameters your template would always deploy the same resources with the same names, locations, and
properties.
64 | P a g e
70-534 Architecting Microsoft Azure Solutions
type Yes Type of the parameter value. See the list of allowed types after this
table.
defaultValue No Default value for the parameter, if no value is provided for the
parameter.
allowedValues No Array of allowed values for the parameter to make sure that the right
value is provided.
minValue No The minimum value for int type parameters, this value is inclusive.
maxValue No The maximum value for int type parameters, this value is inclusive.
minLength No The minimum length for string, secureString, and array type
parameters, this value is inclusive.
maxLength No The maximum length for string, secureString, and array type
parameters, this value is inclusive.
string
secureString
int
65 | P a g e
70-534 Architecting Microsoft Azure Solutions
bool
object
secureObject
array
To specify a parameter as optional, provide a defaultValue (can be an empty string).
If you specify a parameter name in your template that matches a parameter in the command to deploy the template,
there is potential ambiguity about the values you provide. Resource Manager resolves this confusion by adding the
postfix FromTemplate to the template parameter. For example, if you include a parameter
named ResourceGroupName in your template, it conflicts with the ResourceGroupName parameter in the New-
AzureRmResourceGroupDeployment cmdlet. During deployment, you are prompted to provide a value
for ResourceGroupNameFromTemplate. In general, you should avoid this confusion by not naming parameters with
the same name as parameters used for deployment operations.
Note
All passwords, keys, and other secrets should use the secureString type. If you pass sensitive data in a JSON object, use
the secureObject type. Template parameters with secureString or secureObject types cannot be read after resource
deployment.
For example, the following entry in the deployment history shows the value for a string and object but not for
secureString and secureObject.
66 | P a g e
70-534 Architecting Microsoft Azure Solutions
"P4"
]
},
"skuCapacity": {
"type": "int",
"defaultValue": 1,
"minValue": 1
}
}
For how to input the parameter values during deployment, see Deploy an application with Azure Resource Manager
template.
Variables
In the variables section, you construct values that can be used throughout your template. You do not need to define
variables, but they often simplify your template by reducing complex expressions.
Resources
In the resources section, you define the resources that are deployed or updated. This section can get complicated
because you must understand the types you are deploying to provide the right values. For the resource-specific values
(apiVersion, type, and properties) that you need to set, see Define resources in Azure Resource Manager templates.
"resources": [
{
"condition": "<boolean-value-whether-to-deploy>",
"apiVersion": "<api-version-of-resource>",
"type": "<resource-provider-namespace/resource-type-name>",
"name": "<name-of-the-resource>",
"location": "<location-of-resource>",
"tags": {
"<tag-name1>": "<tag-value1>",
"<tag-name2>": "<tag-value2>"
},
"comments": "<your-reference-notes>",
"copy": {
"name": "<name-of-copy-loop>",
"count": "<number-of-iterations>",
"mode": "<serial-or-parallel>",
"batchSize": "<number-to-deploy-serially>"
},
"dependsOn": [
"<array-of-related-resource-names>"
],
"properties": {
"<settings-for-the-resource>",
"copy": [
{
"name": ,
"count": ,
"input": {}
}
]
},
"resources": [
"<array-of-child-resources>"
]
}
]
Element
name Required Description
apiVersion Yes Version of the REST API to use for creating the resource.
type Yes Type of the resource. This value is a combination of the namespace of the
resource provider and the resource type (such
as Microsoft.Storage/storageAccounts).
name Yes Name of the resource. The name must follow URI component restrictions
defined in RFC3986. In addition, Azure services that expose the resource
name to outside parties validate the name to make sure it is not an attempt
to spoof another identity.
location Varies Supported geo-locations of the provided resource. You can select any of the
available locations, but typically it makes sense to pick one that is close to
your users. Usually, it also makes sense to place resources that interact with
68 | P a g e
70-534 Architecting Microsoft Azure Solutions
Element
name Required Description
each other in the same region. Most resource types require a location, but
some types (such as a role assignment) do not require a location. See Set
resource location in Azure Resource Manager templates.
tags No Tags that are associated with the resource. See Tag resources in Azure
Resource Manager templates.
copy No If more than one instance is needed, the number of resources to create. The
default mode is parallel. Specify serial mode when you do not want all or the
resources to deploy at the same time. For more information, see Create
multiple instances of resources in Azure Resource Manager.
dependsOn No Resources that must be deployed before this resource is deployed. Resource
Manager evaluates the dependencies between resources and deploys them
in the correct order. When resources are not dependent on each other, they
are deployed in parallel. The value can be a comma-separated list of a
resource names or resource unique identifiers. Only list resources that are
deployed in this template. Resources that are not defined in this template
must already exist. Avoid adding unnecessary dependencies as they can slow
your deployment and create circular dependencies. For guidance on setting
dependencies, see Defining dependencies in Azure Resource Manager
templates.
properties No Resource-specific configuration settings. The values for the properties are
the same as the values you provide in the request body for the REST API
operation (PUT method) to create the resource. You can also specify a copy
array to create multiple instances of a property. For more information,
see Create multiple instances of resources in Azure Resource Manager.
resources No Child resources that depend on the resource being defined. Only provide
resource types that are permitted by the schema of the parent resource.
The fully qualified type of the child resource includes the parent resource
type, such as Microsoft.Web/sites/extensions. Dependency on the parent
resource is not implied. You must explicitly define that dependency.
The resources section contains an array of the resources to deploy. Within each resource, you can also define an array
of child resources. Therefore, your resources section could have a structure like:
"resources": [
{
"name": "resourceA",
69 | P a g e
70-534 Architecting Microsoft Azure Solutions
},
{
"name": "resourceB",
"resources": [
{
"name": "firstChildResourceB",
},
{
"name": "secondChildResourceB",
}
]
},
{
"name": "resourceC",
}
]
For more information about defining child resources, see Set name and type for child resource in Resource Manager
template.
The condition element specifies whether the resource is deployed. The value for this element resolves to true or false.
For example, to specify whether a new storage account is deployed, use:
{
"condition": "[equals(parameters('newOrExisting'),'new')]",
"type": "Microsoft.Storage/storageAccounts",
"name": "[variables('storageAccountName')]",
"apiVersion": "2017-06-01",
"location": "[resourceGroup().location]",
"sku": {
"name": "[variables('storageAccountType')]"
},
"kind": "Storage",
"properties": {}
}
For an example of using a new or existing resource, see New or existing condition template.
To specify whether a virtual machine is deployed with a password or SSH key, define two versions of the virtual
machine in your template and use condition to differentiate usage. Pass a parameter that specifies which scenario to
deploy.
{
"condition": "[equals(parameters('passwordOrSshKey'),'password')]",
"apiVersion": "2016-03-30",
"type": "Microsoft.Compute/virtualMachines",
"name": "[concat(variables('vmName'),'password')]",
"properties": {
"osProfile": {
"computerName": "[variables('vmName')]",
"adminUsername": "[parameters('adminUsername')]",
"adminPassword": "[parameters('adminPassword')]"
},
...
},
...
},
{
"condition": "[equals(parameters('passwordOrSshKey'),'sshKey')]",
"apiVersion": "2016-03-30",
"type": "Microsoft.Compute/virtualMachines",
"name": "[concat(variables('vmName'),'ssh')]",
"properties": {
"osProfile": {
"linuxConfiguration": {
"disablePasswordAuthentication": "true",
"ssh": {
"publicKeys": [
{
70 | P a g e
70-534 Architecting Microsoft Azure Solutions
"path": "[variables('sshKeyPath')]",
"keyData": "[parameters('adminSshKey')]"
}
]
}
}
},
...
},
...
}
For an example of using a password or SSH key to deploy virtual machine, see Username or SSH condition template.
Outputs
In the Outputs section, you specify values that are returned from deployment. For example, you could return the URI
to access a deployed resource.
Element
name Required Description
outputName Yes Name of the output value. Must be a valid JavaScript identifier.
type Yes Type of the output value. Output values support the same types as
template input parameters.
value Yes Template language expression that is evaluated and returned as output
value.
The following example shows a value that is returned in the Outputs section.
"outputs": {
"siteUri" : {
"type" : "string",
"value": "[concat('http://',reference(resourceId('Microsoft.Web/sites',
parameters('siteName'))).hostNames[0])]"
}
}
For more information about working with output, see Sharing state in Azure Resource Manager templates.
Template limits
Limit the size of your template to 1 MB, and each parameter file to 64 KB. The 1-MB limit applies to the final state of
the template after it has been expanded with iterative resource definitions, and values for variables and parameters.
256 parameters
256 variables
71 | P a g e
70-534 Architecting Microsoft Azure Solutions
64 output values
You can exceed some template limits by using a nested template. For more information, see Using linked templates
when deploying Azure resources. To reduce the number of parameters, variables, or outputs, you can combine
several values into an object. For more information, see Objects as parameters.
The Resource Manager template you deploy can either be a local file on your machine, or an external file that is
located in a repository like GitHub. The template you deploy in this article is available in the Sample template section,
or as storage account template in GitHub.
If needed, install the Azure PowerShell module using the instructions found in the Azure PowerShell guide, and then
run Login-AzureRmAccount to create a connection with Azure. Also, you need to have an SSH public key
named id_rsa.pub in the .ssh directory of your user profile.
The following example creates a resource group, and deploys a template from your local machine:
Login-AzureRmAccount
Instead of storing Resource Manager templates on your local machine, you may prefer to store them in an external
location. You can store templates in a source control repository (such as GitHub). Or, you can store them in an Azure
storage account for shared access in your organization.
To deploy an external template, use the TemplateUri parameter. Use the URI in the example to deploy the sample
template from GitHub.
New-AzureRmResourceGroupDeployment -Name ExampleDeployment -ResourceGroupName ExampleResourceGroup `
-TemplateUri https://raw.githubusercontent.com/Azure/azure-quickstart-templates/master/101-storage-
account-create/azuredeploy.json `
-storageAccountType Standard_GRS
72 | P a g e
70-534 Architecting Microsoft Azure Solutions
The preceding example requires a publicly accessible URI for the template, which works for most scenarios because
your template should not include sensitive data. If you need to specify sensitive data (like an admin password), pass
that value as a secure parameter. However, if you do not want your template to be publicly accessible, you can
protect it by storing it in a private storage container. For information about deploying a template that requires a
shared access signature (SAS) token, see Deploy private template with SAS token.
Parameter files
Rather than passing parameters as inline values in your script, you may find it easier to use a JSON file that contains
the parameter values. The parameter file must be in the following format:
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"storageAccountType": {
"value": "Standard_GRS"
}
}
}
Notice that the parameters section includes a parameter name that matches the parameter defined in your template
(storageAccountType). The parameter file contains a value for the parameter. This value is automatically passed to the
template during deployment. You can create multiple parameter files for different deployment scenarios, and then
pass in the appropriate parameter file.
You can use inline parameters and a local parameter file in the same deployment operation. For example, you can
specify some values in the local parameter file and add other values inline during deployment. If you provide values
for a parameter in both the local parameter file and inline, the inline value takes precedence.
However, when you use an external parameter file, you cannot pass other values either inline or from a local file.
When you specify a parameter file in the TemplateParameterUri parameter, all inline parameters are ignored. Provide
all parameter values in the external file. If your template includes a sensitive value that you cannot include in the
parameter file, either add that value to a key vault, or dynamically provide all parameter values inline.1
If your template includes a parameter with the same name as one of the parameters in the PowerShell command,
PowerShell presents the parameter from your template with the postfix FromTemplate. For example, a parameter
named ResourceGroupName in your template conflicts with the ResourceGroupName parameter in the New-
AzureRmResourceGroupDeployment cmdlet. You are prompted to provide a value
for ResourceGroupNameFromTemplate. In general, you should avoid this confusion by not naming parameters with
the same name as parameters used for deployment operations.
73 | P a g e
70-534 Architecting Microsoft Azure Solutions
To test your template and parameter values without actually deploying any resources, use Test-AzureRmResource
GroupDeployment.
Test-AzureRmResourceGroupDeployment -Name ExampleDeployment -ResourceGroupName ExampleResourceGroup `
-TemplateFile c:\MyTemplates\storage.json -storageAccountType Standard_GRS
If no errors are detected, the command finishes without a response. If an error is detected, the command returns an
error message. For example, attempting to pass an incorrect value for the storage account SKU, returns the following
error:
Test-AzureRmResourceGroupDeployment -ResourceGroupName testgroup `
-TemplateFile c:\MyTemplates\storage.json -storageAccountType badSku
Code : InvalidTemplate
Message : Deployment template validation failed: 'The provided value 'badSku' for the template parameter
'storageAccountType'
at line '15' and column '24' is not valid. The parameter value is not part of the allowed
value(s):
'Standard_LRS,Standard_ZRS,Standard_GRS,Standard_RAGRS,Premium_LRS'.'.
Details :
If your template has a syntax error, the command returns an error indicating it could not parse the template. The
message indicates the line number and position of the parsing error.
Test-AzureRmResourceGroupDeployment : After parsing a value an unexpected character was encountered:
". Path 'variables', line 31, position 3.
When deploying your resources, you specify that the deployment is either an incremental update or a complete
update. The primary difference between these two modes is how Resource Manager handles existing resources in the
resource group that are not in the template:
In complete mode, Resource Manager deletes resources that exist in the resource group but are not specified
in the template.
In incremental mode, Resource Manager leaves unchanged resources that exist in the resource group but are
not specified in the template.
For both modes, Resource Manager attempts to provision all resources specified in the template. If the resource
already exists in the resource group and its settings are unchanged, the operation results in no change. If you change
the settings for a resource, the resource is provisioned with those new settings. If you attempt to update the location
or type of an existing resource, the deployment fails with an error. Instead, deploy a new resource with the location or
type that you need.
To illustrate the difference between incremental and complete modes, consider the following scenario.
74 | P a g e
70-534 Architecting Microsoft Azure Solutions
Resource B
Resource C
Resource D
When deployed in complete mode, Resource C is deleted. The resource group contains:
Resource A
Resource B
Resource D
To use complete mode, use the Mode parameter:
New-AzureRmResourceGroupDeployment -Mode Complete -Name ExampleDeployment `
-ResourceGroupName ExampleResourceGroup -TemplateFile c:\MyTemplates\storage.json
Sample template
The following template is used for the examples in this topic. Copy and save it as a file named storage.json. To
understand how to create this template, see Create your first Azure Resource Manager template.
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"storageAccountType": {
"type": "string",
"defaultValue": "Standard_LRS",
"allowedValues": [
"Standard_LRS",
"Standard_GRS",
"Standard_ZRS",
"Premium_LRS"
],
"metadata": {
"description": "Storage Account type"
}
}
},
"variables": {
"storageAccountName": "[concat(uniquestring(resourceGroup().id), 'standardsa')]"
},
"resources": [
{
"type": "Microsoft.Storage/storageAccounts",
"name": "[variables('storageAccountName')]",
"apiVersion": "2016-01-01",
"location": "[resourceGroup().location]",
"sku": {
"name": "[parameters('storageAccountType')]"
},
"kind": "Storage",
"properties": {
}
}
],
"outputs": {
"storageAccountName": {
"type": "string",
"value": "[variables('storageAccountName')]"
}
}
}
75 | P a g e
70-534 Architecting Microsoft Azure Solutions
Monitor application usage and protect your business from advanced threats with security reporting and
monitoring.
Secure mobile (remote) access to on-premises applications.
How does Azure AD compare to on-premises Active Directory Domain Services (AD DS)?
Both Azure Active Directory (Azure AD) and on-premises Active Directory (Active Directory Domain Services or AD DS)
are systems that store directory data and manage communication between users and resources, including user logon
processes, authentication, and directory searches.
AD DS is a server role on Windows Server, which means that it can be deployed on physical or virtual machines. It has
a hierarchical structure based on X.500. It uses DNS for locating objects, can be interacted with using LDAP, and it
primarily uses Kerberos for authentication. Active Directory enables organizational units (OUs) and Group Policy
Objects (GPOs) in addition to joining machines to the domain, and trusts are created between domains.
Azure AD is a multi-customer public directory service, which means that within Azure AD you can create a tenant for
your cloud servers and applications such as Office 365. Users and groups are created in a flat structure without OUs or
GPOs. Authentication is performed through protocols such as SAML, WS-Federation, and OAuth. It's possible to query
Azure AD, but instead of using LDAP you must use a REST API called AD Graph API. These all work over HTTP and
HTTPS.
Get started with Azure Active Directory Identity Protection & Graph API
Microsoft Graph is Microsofts unified API endpoint and the home of Azure Active Directory Identity Protections APIs.
Our first API, identityRiskEvents, allows you to query Microsoft Graph for a list of risk events and associated
information. This article gets you started querying this API. For an in depth introduction, full documentation, and
access to the Graph Explorer, see the Microsoft Graph site.
There are three steps to accessing Identity Protection data through Microsoft Graph:
1. Add an application with a client secret.
2. Use this secret and a few other pieces of information to authenticate to Microsoft Graph, where you receive
an authentication token.
3. Use this token to make requests to the API endpoint and get Identity Protection data back.
Before you get started, youll need:
Administrator privileges to create the application in Azure AD
The name of your tenant's domain (for example, contoso.onmicrosoft.com)
Add an application with a client secret
1. Sign in to your Azure classic portal as an administrator.
2. On on the left navigation pane, click Active Directory.
3. From the Directory list, select the directory for which you want to enable directory integration.
4. In the menu on the top, click Applications.
6. On the What do you want to do dialog, click Add an application my organization is developing.
77 | P a g e
70-534 Architecting Microsoft Azure Solutions
7. On the Tell us about your application dialog, perform the following steps:
a. In the Name textbox, type a name for your application (e.g.: AADIP Risk Event API Application).
b. As Type, select Web Application And / Or Web API.
c. Click Next.
8. On the App properties dialog, perform the following steps:
78 | P a g e
70-534 Architecting Microsoft Azure Solutions
3. in the keys section, copy the value of your newly created key, and then paste it into a safe location.
79 | P a g e
70-534 Architecting Microsoft Azure Solutions
Note
If you lose this key, you will have to return to this section and create a new key. Keep this key a secret: anyone who
has it can access your data.
4. In the properties section, copy the Client ID, and then paste it into a safe location.
Authenticate to Microsoft Graph and query the Identity Risk Events API
At this point, you should have:
The client ID you copied above
The key you copied above
The name of your tenant's domain
To authenticate, send a post request to https://login.microsoft.com with the following parameters in the body:
grant_type: client_credentials
resource: https://graph.microsoft.com
client_id:
client_secret:
Note
You need to provide values for the client_id and the client_secret parameter.
If successful, this returns an authentication token.
To call the API, create a header with the following parameter:
Copy
`Authorization`=<token_type> <access_token>"
When authenticating, you can find the token type and access token in the returned token.
Send this header as a request to the following API URL: https://graph.microsoft.com/beta/identityRiskEvents
The response, if successful, is a collection of identity risk events and associated data in the OData JSON format, which
can be parsed and handled as see fit.
Heres sample code for authenticating and calling the API using Powershell.
Just add your client ID, key, and tenant domain.
$ClientID = "<your client ID here>" # Should be a ~36 hex character string; insert your info
here
$ClientSecret = "<your client secret here>" # Should be a ~44 character string; insert your info
here
$tenantdomain = "<your tenant domain here>" # For example, contoso.onmicrosoft.com
$loginURL = "https://login.microsoft.com"
$resource = "https://graph.microsoft.com"
$body =
@{grant_type="client_credentials";resource=$resource;client_id=$ClientID;client_secret=$ClientSecret}
$oauth = Invoke-RestMethod -Method Post -Uri $loginURL/$tenantdomain/oauth2/token?api-version=1.0 -
Body $body
Write-Output $oauth
$url = "https://graph.microsoft.com/beta/identityRiskEvents"
Write-Output $url
} else {
Write-Host "ERROR: No Access Token"
}
80 | P a g e
70-534 Architecting Microsoft Azure Solutions
81 | P a g e
70-534 Architecting Microsoft Azure Solutions
Value Description
The metadata is a simple JavaScript Object Notation (JSON) document. See the following snippet for an example. The
snippet's contents are fully described in the OpenID Connect specification.
{
"authorization_endpoint": "https:\/\/login.microsoftonline.com\/common\/oauth2\/v2.0\/authorize",
"token_endpoint": "https:\/\/login.microsoftonline.com\/common\/oauth2\/v2.0\/token",
"token_endpoint_auth_methods_supported": [
"client_secret_post",
"private_key_jwt"
],
"jwks_uri": "https:\/\/login.microsoftonline.com\/common\/discovery\/v2.0\/keys",
...
}
Typically, you would use this metadata document to configure an OpenID Connect library or SDK; the library would
use the metadata to do its work. However, if you're not using a pre-build OpenID Connect library, you can follow the
steps in the remainder of this article to perform sign-in in a web app by using the v2.0 endpoint.
Send the sign-in request
When your web app needs to authenticate the user, it can direct the user to the /authorize endpoint. This request is
similar to the first leg of the OAuth 2.0 authorization code flow, with these important distinctions:
The request must include the openid scope in the scope parameter.
The response_type parameter must include id_token.
The request must include the nonce parameter.
For example:
// Line breaks are for legibility only.
GET https://login.microsoftonline.com/{tenant}/oauth2/v2.0/authorize?
client_id=6731de76-14a6-49ae-97bc-6eba6914391e
&response_type=id_token
&redirect_uri=http%3A%2F%2Flocalhost%2Fmyapp%2F
&response_mode=form_post
&scope=openid
&state=12345
&nonce=678910
Tip: Click the following link to execute this request. After you sign in, your browser will be redirected
to https://localhost/myapp/, with an ID token in the address bar. Note that this request
uses response_mode=query (for demonstration purposes only). We recommend that you
use response_mode=form_post.https://login.microsoftonline.com/common/oauth2/v2.0/authorize...1
82 | P a g e
70-534 Architecting Microsoft Azure Solutions
tenant Required You can use the {tenant} value in the path of the request to
control who can sign in to the application. The allowed values
are common, organizations, consumers, and tenant identifiers.
For more information, see protocol basics.
response_type Required Must include id_token for OpenID Connect sign-in. It might also
include other response_types values, such as code.
redirect_uri Recommended The redirect URI of your app, where authentication responses can
be sent and received by your app. It must exactly match one of
the redirect URIs you registered in the portal, except that it must
be URL encoded.
nonce Required A value included in the request, generated by the app, that will be
included in the resulting id_token value as a claim. The app can
verify this value to mitigate token replay attacks. The value
typically is a randomized, unique string that can be used to
identify the origin of the request.
response_mode Recommended Specifies the method that should be used to send the resulting
authorization code back to your app. Can be one
of query, form_post, or fragment. For web applications, we
recommend using response_mode=form_post, to ensure the
most secure transfer of tokens to your application.
state Recommended A value included in the request that also will be returned in the
token response. It can be a string of any content you want. A
randomly generated unique value typically is used to prevent
cross-site request forgery attacks. The state also is used to encode
information about the user's state in the app before the
authentication request occurred, such as the page or view the
user was on.
prompt Optional Indicates the type of user interaction that is required. The only
valid values at this time are login, none, and consent.
The prompt=login claim forces the user to enter their credentials
on that request, which negates single sign-on.
The prompt=none claim is the opposite. This claim ensures that
the user is not presented with any interactive prompt whatsoever.
83 | P a g e
70-534 Architecting Microsoft Azure Solutions
login_hint Optional You can use this parameter to pre-fill the username and email
address field of the sign-in page for the user, if you know the
username ahead of time. Often, apps use this parameter during
re-authentication, after already extracting the username from an
earlier sign-in by using the preferred_username claim.
At this point, the user is prompted to enter their credentials and complete the authentication. The v2.0 endpoint
verifies that the user has consented to the permissions indicated in the scope query parameter. If the user has not
consented to any of those permissions, the v2.0 endpoint prompts the user to consent to the required permissions.
You can read more about permissions, consent, and multitenant apps.
After the user authenticates and grants consent, the v2.0 endpoint returns a response to your app at the indicated
redirect URI by using the method specified in the response_mode parameter.
Successful response
A successful response when you use response_mode=form_post looks like this:
POST /myapp/ HTTP/1.1
Host: localhost
Content-Type: application/x-www-form-urlencoded
id_token=eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6Ik1uQ19WWmNB...&state=12345
Parameter Description
id_token The ID token that the app requested. You can use the id_token parameter to verify the
user's identity and begin a session with the user. For more details about ID tokens and their
contents, see the v2.0 endpoint tokens reference.
state If a state parameter is included in the request, the same value should appear in the
response. The app should verify that the state values in the request and response are
identical.
Error response: Error responses might also be sent to the redirect URI so that the app can handle them. An error
response looks like this:
POST /myapp/ HTTP/1.1
Host: localhost
Content-Type: application/x-www-form-urlencoded
error=access_denied&error_description=the+user+canceled+the+authentication
84 | P a g e
70-534 Architecting Microsoft Azure Solutions
Parameter Description
error An error code string that you can use to classify types of errors that occur, and to
react to errors.
error_description A specific error message that can help you identify the root cause of an
authentication error.
invalid_request Protocol error, such as a Fix and resubmit the request. This is a
missing, required development error that typically is
parameter. caught during initial testing.
unauthorized_client The client application This usually occurs when the client
cannot request an application is not registered in Azure AD
authorization code. or is not added to the user's Azure AD
tenant. The application can prompt the
user with instructions to install the
application and add it to Azure AD.
access_denied The resource owner The client application can notify the user
denied consent. that it cannot proceed unless the user
consents.
unsupported_response_type The authorization server Fix and resubmit the request. This is a
does not support the development error that typically is
response type in the caught during initial testing.
request.
server_error The server encountered an Retry the request. These errors can
unexpected error. result from temporary conditions. The
client application might explain to the
user that its response is delayed due to a
temporary error.
temporarily_unavailable The server is temporarily Retry the request. The client application
too busy to handle the might explain to the user that its
request. response is delayed due to a temporary
condition.
85 | P a g e
70-534 Architecting Microsoft Azure Solutions
post_logout_redirect_uri Recommended The URL that the user is redirected to after successfully
signing out. If the parameter is not included, the user is
shown a generic message that's generated by the v2.0
endpoint. This URL must match one of the redirect URIs
registered for your application in the app registration
portal.
Single sign-out
When you redirect the user to the end_session_endpoint, the v2.0 endpoint clears the user's session from the
browser. However, the user may still be signed in to other applications that use Microsoft accounts for authentication.
To enable those applications to sign the user out simultaneously, the v2.0 endpoint sends an HTTP GET request to the
registered LogoutUrl of all the applications that the user is currently signed in to. Applications must respond to this
request by clearing any session that identifies the user and returning a 200 response. If you wish to support single sign
out in your application, you must implement such a LogoutUrl in your application's code. You can set
the LogoutUrl from the app registration portal.
Protocol diagram: Token acquisition
Many web apps need to not only sign the user in, but also to access a web service on behalf of the user by using
OAuth. This scenario combines OpenID Connect for user authentication while simultaneously getting an authorization
code that you can use to get access tokens if you are using the OAuth authorization code flow.
The full OpenID Connect sign-in and token acquisition flow looks similar to the next diagram. We describe each step in
detail in the next sections of the article.
86 | P a g e
70-534 Architecting Microsoft Azure Solutions
GET https://login.microsoftonline.com/{tenant}/oauth2/v2.0/authorize?
client_id=6731de76-14a6-49ae-97bc-6eba6914391e // Your registered Application ID
&response_type=id_token%20code
&redirect_uri=http%3A%2F%2Flocalhost%2Fmyapp%2F // Your registered redirect URI, URL encoded
&response_mode=form_post // 'query', 'form_post', or 'fragment'
&scope=openid%20 // Include both 'openid' and scopes that your app
needs
offline_access%20
https%3A%2F%2Fgraph.microsoft.com%2Fmail.read
&state=12345 // Any value, provided by your app
&nonce=678910 // Any value, provided by your app
Tip: Click the following link to execute this request. After you sign in, your browser is redirected
to https://localhost/myapp/, with an ID token and a code in the address bar. Note that this request
uses response_mode=query (for demonstration purposes only). We recommend that you
use response_mode=form_post.https://login.microsoftonline.com/common/oauth2/v2.0/authorize...
By including permission scopes in the request and by using response_type=id_token code, the v2.0 endpoint ensures
that the user has consented to the permissions indicated in the scope query parameter. It returns an authorization
code to your app to exchange for an access token.
Successful response: A successful response from using response_mode=form_post looks like this:
POST /myapp/ HTTP/1.1
Host: localhost
Content-Type: application/x-www-form-urlencoded
id_token=eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiIsIng1dCI6Ik1uQ19WWmNB...&code=AwABAAAAvPM1KaPlrEqdFSBzjqfTGBC
mLdgfSTLEMPGYuNHSUYBrq...&state=12345
87 | P a g e
70-534 Architecting Microsoft Azure Solutions
Parameter Description
id_token The ID token that the app requested. You can use the ID token to verify the user's identity
and begin a session with the user. You'll find more details about ID tokens and their contents
in the v2.0 endpoint tokens reference.
code The authorization code that the app requested. The app can use the authorization code to
request an access token for the target resource. An authorization code is very short-lived.
Typically, an authorization code expires in about 10 minutes.
state If a state parameter is included in the request, the same value should appear in the
response. The app should verify that the state values in the request and response are
identical.
Error response: Error responses might also be sent to the redirect URI so that the app can handle them appropriately.
An error response looks like this:
POST /myapp/ HTTP/1.1
Host: localhost
Content-Type: application/x-www-form-urlencoded
error=access_denied&error_description=the+user+canceled+the+authentication
Parameter Description
error An error code string that you can use to classify types of errors that occur, and to
react to errors.
error_description A specific error message that can help you identify the root cause of an
authentication error.
For a description of possible error codes and recommended client responses, see Error codes for authorization
endpoint errors.
When you have an authorization code and an ID token, you can sign the user in and get access tokens on their behalf.
To sign the user in, you must validate the ID token exactly as described. To get access tokens, follow the steps
described in our OAuth protocol documentation.
SAML Protocol requires the identity provider (Azure AD) and the service provider (the application) to exchange
information about themselves.
When an application is registered with Azure AD, the app developer registers federation-related information with
Azure AD. This includes the Redirect URI and Metadata URI of the application.
Azure AD uses the Metadata URI of the cloud service to retrieve the signing key and the logout URI of the cloud
service. If the application does not support a metadata URI, the developer must contact Microsoft support to provide
the logout URI and signing key.
88 | P a g e
70-534 Architecting Microsoft Azure Solutions
Azure Active Directory exposes tenant-specific and common (tenant-independent) single sign-on and single sign-out
endpoints. These URLs represent addressable locations -- they are not just an identifiers -- so you can go to the
endpoint to read the metadata.
Design principles
89 | P a g e
70-534 Architecting Microsoft Azure Solutions
The diagram above shows the recommended basic topology to start deploying your AD FS infrastructure in Azure. The
principles behind the various components of the topology are listed below:
DC / ADFS Servers: If you have fewer than 1,000 users you can simply install AD FS role on your domain
controllers. If you do not want any performance impact on the domain controllers or if you have more than
1,000 users, then deploy AD FS on separate servers.
WAP Server it is necessary to deploy Web Application Proxy servers, so that users can reach the AD FS when
they are not on the company network also.
DMZ: The Web Application Proxy servers will be placed in the DMZ and ONLY TCP/443 access is allowed
between the DMZ and the internal subnet.
Load Balancers: To ensure high availability of AD FS and Web Application Proxy servers, we recommend using
an internal load balancer for AD FS servers and Azure Load Balancer for Web Application Proxy servers.
Availability Sets: To provide redundancy to your AD FS deployment, it is recommended that you group two or
more virtual machines in an Availability Set for similar workloads. This configuration ensures that during either
a planned or unplanned maintenance event, at least one virtual machine will be available
Storage Accounts: It is recommended to have two storage accounts. Having a single storage account can lead
to creation of a single point of failure and can cause the deployment to become unavailable in an unlikely
scenario where the storage account goes down. Two storage accounts will help associate one storage account
for each fault line.
Network segregation: Web Application Proxy servers should be deployed in a separate DMZ network. You can
divide one virtual network into two subnets and then deploy the Web Application Proxy server(s) in an
isolated subnet. You can simply configure the network security group settings for each subnet and allow only
required communication between the two subnets. More details are given per deployment scenario below
Steps to deploy AD FS in Azure
The steps mentioned in this section outline the guide to deploy the below depicted AD FS infrastructure in Azure.
1. Deploying the network
As outlined above, you can either create two subnets in a single virtual network or else create two completely
different virtual networks (VNet). This article will focus on deploying a single virtual network and divide it into two
subnets. This is currently an easier approach as two separate VNets would require a VNet to VNet gateway for
communications.
1.1 Create virtual network
90 | P a g e
70-534 Architecting Microsoft Azure Solutions
91 | P a g e
70-534 Architecting Microsoft Azure Solutions
After the NSG is created, there will be 0 inbound and 0 outbound rules. Once
the roles on the respective servers are installed and functional, then the
inbound and outbound rules can be made according to the desired level of
security.
After the NSGs are created, associate NSG_INT with subnet INT and NSG_DMZ
with subnet DMZ. An example screenshot is given below:
92 | P a g e
70-534 Architecting Microsoft Azure Solutions
93 | P a g e
70-534 Architecting Microsoft Azure Solutions
94 | P a g e
70-534 Architecting Microsoft Azure Solutions
contosodcset DC/ADFS 3 5
contosowapset WAP 3 5
As you might have noticed, no NSG has been specified. This is because azure lets you use NSG at the subnet level.
Then, you can control machine network traffic by using the individual NSG associated with either the subnet or else
the NIC object. Read more on What is a Network Security Group (NSG). Static IP address is recommended if you are
95 | P a g e
70-534 Architecting Microsoft Azure Solutions
managing the DNS. You can use Azure DNS and instead in the DNS records for your domain, refer to the new machines
by their Azure FQDNs. Your virtual machine pane should look like below after the deployment is completed:
96 | P a g e
70-534 Architecting Microsoft Azure Solutions
After you click create and the ILB is deployed, you should see it in
the list of load balancers:
Next step is to configure the backend pool and the backend probe.
6.2. Configure ILB backend pool
Select the newly created ILB in the Load Balancers panel. It will open the settings panel.
1. Select backend pools from the settings panel
97 | P a g e
70-534 Architecting Microsoft Azure Solutions
98 | P a g e
70-534 Architecting Microsoft Azure Solutions
99 | P a g e
70-534 Architecting Microsoft Azure Solutions
After deployment, the load balancer will appear in the Load balancers list.
100 | P a g e
70-534 Architecting Microsoft Azure Solutions
8.3. Configure backend pool for Internet Facing (Public) Load Balancer
Follow the same steps as in creating the internal load balancer, to configure the backend pool for Internet Facing
(Public) Load Balancer as the availability set for the WAP servers. For example, contosowapset.
101 | P a g e
70-534 Architecting Microsoft Azure Solutions
102 | P a g e
70-534 Architecting Microsoft Azure Solutions
103 | P a g e
70-534 Architecting Microsoft Azure Solutions
Note
If client user certificate authentication (clientTLS authentication using X509 user certificates) is required, then AD FS
requires TCP port 49443 be enabled for inbound access.
10. Test the AD FS sign-in
The easiest way is to test AD FS is by using the IdpInitiatedSignon.aspx page. In order to be able to do that, it is
required to enable the IdpInitiatedSignOn on the AD FS properties. Follow the steps below to verify your AD FS setup
1. Run the below cmdlet on the AD FS server, using PowerShell, to set it to enabled. Set-AdfsProperties -
EnableIdPInitiatedSignonPage $true
2. From any external machine access https://adfs.thecloudadvocate.com/adfs/ls/IdpInitiatedSignon.aspx
3. You should see the AD FS page like below:
On successful sign-in, it will provide you with a success message as shown below:
104 | P a g e
70-534 Architecting Microsoft Azure Solutions
Azure AD B2C
Azure AD B2C is a cloud identity management solution for your web and mobile applications. It is a highly available
global service that scales to hundreds of millions of identities. Built on an enterprise-grade secure platform, Azure AD
B2C keeps your applications, your business, and your customers protected.
With minimal configuration, Azure AD B2C enables your application to authenticate:
Social Accounts (such as Facebook, Google, LinkedIn, and more)
Enterprise Accounts (using open standard protocols, OpenID Connect or SAML)
Local Accounts (email address and password, or username and password)
Azure Active Directory B2C: Provide sign-up and sign-in to consumers with Microsoft accounts
Create a Microsoft account application
To use Microsoft account as an identity provider in Azure Active Directory (Azure AD) B2C, you need to create a
Microsoft account application and supply it with the right parameters. You need a Microsoft account to do this. If you
dont have one, you can get it at https://www.live.com/.
1. Go to the Microsoft Application Registration Portal and sign in with your Microsoft account credentials.
2. Click Add an app.
105 | P a g e
70-534 Architecting Microsoft Azure Solutions
4. Copy the value of Application Id. You will need it to configure Microsoft account as an identity provider in your
tenant.
106 | P a g e
70-534 Architecting Microsoft Azure Solutions
7. Click on Generate New Password under the Application Secrets section. Copy the new password displayed on
screen. You will need it to configure Microsoft account as an identity provider in your tenant. This password is
an important security credential.
107 | P a g e
70-534 Architecting Microsoft Azure Solutions
8. Check the box that says Live SDK support under the Advanced Options section. Click Save.
108 | P a g e
70-534 Architecting Microsoft Azure Solutions
You don't require Multi-Factor Authentication to access an application in general, but you do require it to
access the sensitive portions within it. For example, the consumer can sign in to a banking application with a
social or local account and check account balance, but must verify the phone number before attempting a
wire transfer.
Modify your sign-up policy to enable Multi-Factor Authentication
1. Follow these steps to navigate to the B2C features blade on the Azure portal.
2. Click Sign-up policies.
3. Click your sign-up policy (for example, "B2C_1_SiUp") to open it.
4. Click Multi-factor authentication and turn the State to ON. Click OK.
5. Click Save at the top of the blade.
You can use the "Run now" feature on the policy to verify the consumer experience. Confirm the following:
A consumer account gets created in your directory before the Multi-Factor Authentication step occurs. During the
step, the consumer is asked to provide his or her phone number and verify it. If verification is successful, the phone
number is attached to the consumer account for later use. Even if the consumer cancels or drops out, he or she can
be asked to verify a phone number again during the next sign-in (with Multi-Factor Authentication enabled).
Modify your sign-in policy to enable Multi-Factor Authentication
1. Follow these steps to navigate to the B2C features blade on the Azure portal.
2. Click Sign-in policies.
3. Click your sign-in policy (for example, "B2C_1_SiIn") to open it. Click Edit at the top of the blade.
4. Click Multi-factor authentication and turn the State to ON. Click OK.
5. Click Save at the top of the blade.
You can use the "Run now" feature on the policy to verify the consumer experience. Confirm the following:
When the consumer signs in (using a social or local account), if a verified phone number is attached to the consumer
account, he or she is asked to verify it. If no phone number is attached, the consumer is asked to provide one and
verify it. On successful verification, the phone number is attached to the consumer account for later use.
Multi-Factor Authentication on other policies
As described for sign-up & sign-in policies above, it is also possible to enable multi-factor authentication on sign-up or
sign-in policies and password reset policies. It will be available soon on profile editing policies.
109 | P a g e
70-534 Architecting Microsoft Azure Solutions
AAD B2B doesn't remove the concept of federation, but it takes the work away from you. If you use AAD, and an
organization you work with also has AAD you both have "stuff" in Azure so wouldn't it make sense that Microsoft
handles this for you? Yes, it would, and that is what AAD B2B will leverage.
If you have resources in your AAD tenant you can invite users from other AAD tenants to be linked to your resources.
They login with their existing credentials, but gains access to your data through a federation not visible to you.
It's not an AAD-only thing either, you will be able to invite users not in an AAD tenant as well, with the option for them
to step up to getting one should they like. The only restriction for now is that the external users cannot have a
consumer email provider like Gmail/Outlook.com.
Let's go through the necessary steps for setting this up between two organizations. Both of these organizations has an
Office 365 subscription, and an associated Azure AD tenant.
Go to the Active Directory section in the legacy Azure portal https://manage.windowsazure.com, navigate to the Users
tab, and click "Add User".
You should select "Users in partner organizations" as the user type. As you can see this currently requires uploading a
csv-file to progress.
The csv-file has a format like this:
Email,DisplayName,InviteAppID,InviteReplyUrl,InviteAppResources,InviteGroupResources,InviteContactUsUrl
andreas@contoso.com,Andreas,cd3ed3de-93ee-400b-8b19-b61ef44a0f29,,,,http://contoso.com
The invite is sent via mail to the external users. You can verify the status of the invite in the management portal. (If
your csv-file is incorrect you will also be notified.)
110 | P a g e
70-534 Architecting Microsoft Azure Solutions
The same link can be reused later, but it's probably easier to save a favorite for the following url:
https://myapps.microsoft.com/
111 | P a g e
70-534 Architecting Microsoft Azure Solutions
In addition to the apps of your own organization, you should have an extra icon for apps belonging to other
organizations:
At the moment you can't tell the difference between the apps based on your organizational belonging.
If you browse to the Users tab of the AAD section in the portal you will see be able to see the external user as an AAD
account (Other directory).
As you can see this is a fairly friction-free process with a minimum of configuration needed. Even without deep
knowledge about Identity Management this is achievable.
112 | P a g e
70-534 Architecting Microsoft Azure Solutions
The management plane consists of the resources used to manage your storage account. In this section, we'll talk
about the Azure Resource Manager deployment model and how to use Role-Based Access Control (RBAC) to control
access to your storage accounts. We will also talk about managing your storage account keys and how to regenerate
them.
Data Plane Security Securing Access to Your Data
In this section, we'll look at allowing access to the actual data objects in your Storage account, such as blobs, files,
queues, and tables, using Shared Access Signatures and Stored Access Policies. We will cover both service-level SAS
and account-level SAS. We'll also see how to limit access to a specific IP address (or range of IP addresses), how to
limit the protocol used to HTTPS, and how to revoke a Shared Access Signature without waiting for it to expire.
Encryption in Transit
This section discusses how to secure data when you transfer it into or out of Azure Storage. We'll talk about the
recommended use of HTTPS and the encryption used by SMB 3.0 for Azure File Shares. We will also take a look at
Client-side Encryption, which enables you to encrypt the data before it is transferred into Storage in a client
application, and to decrypt the data after it is transferred out of Storage.
Encryption at Rest
We will talk about Storage Service Encryption (SSE), and how you can enable it for a storage account, resulting in your
block blobs, page blobs, and append blobs being automatically encrypted when written to Azure Storage. We will also
look at how you can use Azure Disk Encryption and explore the basic differences and cases of Disk Encryption versus
SSE versus Client-Side Encryption. We will briefly look at FIPS compliance for U.S. Government computers.
Using Storage Analytics to audit access of Azure Storage
This section discusses how to find information in the storage analytics logs for a request. We'll take a look at real
storage analytics log data and see how to discern whether a request is made with the Storage account key, with a
Shared Access signature, or anonymously, and whether it succeeded or failed.
Enabling Browser-Based Clients using CORS
This section talks about how to allow cross-origin resource sharing (CORS). We'll talk about cross-domain access, and
how to handle it with the CORS capabilities built into Azure Storage.
Management Plane Security
The management plane consists of operations that affect the storage account itself. For example, you can create or
delete a storage account, get a list of storage accounts in a subscription, retrieve the storage account keys, or
regenerate the storage account keys.
When you create a new storage account, you select a deployment model of Classic or Resource Manager. The Classic
model of creating resources in Azure only allows all-or-nothing access to the subscription, and in turn, the storage
account.
This guide focuses on the Resource Manager model which is the recommended means for creating storage accounts.
With the Resource Manager storage accounts, rather than giving access to the entire subscription, you can control
access on a more finite level to the management plane using Role-Based Access Control (RBAC).
How to secure your storage account with Role-Based Access Control (RBAC)
Let's talk about what RBAC is, and how you can use it. Each Azure subscription has an Azure Active Directory. Users,
groups, and applications from that directory can be granted access to manage resources in the Azure subscription that
use the Resource Manager deployment model. This is referred to as Role-Based Access Control (RBAC). To manage
this access, you can use the Azure portal, the Azure CLI tools, PowerShell, or the Azure Storage Resource Provider
REST APIs.
With the Resource Manager model, you put the storage account in a resource group and control access to the
management plane of that specific storage account using Azure Active Directory. For example, you can give specific
users the ability to access the storage account keys, while other users can view information about the storage
account, but cannot access the storage account keys.
Granting Access
Access is granted by assigning the appropriate RBAC role to users, groups, and applications, at the right scope. To
grant access to the entire subscription, you assign a role at the subscription level. You can grant access to all of the
resources in a resource group by granting permissions to the resource group itself. You can also assign specific roles to
specific resources, such as storage accounts.
Here are the main points that you need to know about using RBAC to access the management operations of an Azure
Storage account:
113 | P a g e
70-534 Architecting Microsoft Azure Solutions
When you assign access, you basically assign a role to the account that you want to have access. You can
control access to the operations used to manage that storage account, but not to the data objects in the
account. For example, you can grant permission to retrieve the properties of the storage account (such as
redundancy), but not to a container or data within a container inside Blob Storage.
For someone to have permission to access the data objects in the storage account, you can give them
permission to read the storage account keys, and that user can then use those keys to access the blobs,
queues, tables, and files.
Roles can be assigned to a specific user account, a group of users, or to a specific application.
Each role has a list of Actions and Not Actions. For example, the Virtual Machine Contributor role has an
Action of "listKeys" that allows the storage account keys to be read. The Contributor has "Not Actions" such as
updating the access for users in the Active Directory.
Roles for storage include (but are not limited to) the following:
o Owner They can manage everything, including access.
o Contributor They can do anything the owner can do except assign access. Someone with this role
can view and regenerate the storage account keys. With the storage account keys, they can access
the data objects.
o Reader They can view information about the storage account, except secrets. For example, if you
assign a role with reader permissions on the storage account to someone, they can view the
properties of the storage account, but they can't make any changes to the properties or view the
storage account keys.
o Storage Account Contributor They can manage the storage account they can read the
subscription's resource groups and resources, and create and manage subscription resource group
deployments. They can also access the storage account keys, which in turn means they can access the
data plane.
o User Access Administrator They can manage user access to the storage account. For example, they
can grant Reader access to a specific user.
o Virtual Machine Contributor They can manage virtual machines but not the storage account to
which they are connected. This role can list the storage account keys, which means that the user to
whom you assign this role can update the data plane.
In order for a user to create a virtual machine, they have to be able to create the corresponding VHD file in a storage
account. To do that, they need to be able to retrieve the storage account key and pass it to the API creating the VM.
Therefore, they must have this permission so they can list the storage account keys.
The ability to define custom roles is a feature that allows you to compose a set of actions from a list of
available actions that can be performed on Azure resources.
The user has to be set up in your Azure Active Directory before you can assign a role to them.
You can create a report of who granted/revoked what kind of access to/from whom and on what scope using
PowerShell or the Azure CLI.
Resources
Azure Active Directory Role-based Access Control
This article explains the Azure Active Directory Role-based Access Control and how it works.
RBAC: Built in Roles
This article details all of the built-in roles available in RBAC.
Understanding Resource Manager deployment and classic deployment
This article explains the Resource Manager deployment and classic deployment models, and explains the benefits of
using the Resource Manager and resource groups. It explains how the Azure Compute, Network, and Storage
Providers work under the Resource Manager model.
Managing Role-Based Access Control with the REST API
This article shows how to use the REST API to manage RBAC.
Azure Storage Resource Provider REST API Reference
This is the reference for the APIs you can use to manage your storage account programmatically.
Developer's guide to auth with Azure Resource Manager API
This article shows how to authenticate using the Resource Manager APIs.
Role-Based Access Control for Microsoft Azure from Ignite
114 | P a g e
70-534 Architecting Microsoft Azure Solutions
This is a link to a video on Channel 9 from the 2015 MS Ignite conference. In this session, they talk about access
management and reporting capabilities in Azure, and explore best practices around securing access to Azure
subscriptions using Azure Active Directory.
Managing Your Storage Account Keys
Storage account keys are 512-bit strings created by Azure that, along with the storage account name, can be used to
access the data objects stored in the storage account, e.g. blobs, entities within a table, queue messages, and files on
an Azure Files share. Controlling access to the storage account keys controls access to the data plane for that storage
account.
Each storage account has two keys referred to as "Key 1" and "Key 2" in the Azure portal and in the PowerShell
cmdlets. These can be regenerated manually using one of several methods, including, but not limited to using
the Azure portal, PowerShell, the Azure CLI, or programmatically using the .NET Storage Client Library or the Azure
Storage Services REST API.
There are any number of reasons to regenerate your storage account keys.
You might regenerate them on a regular basis for security reasons.
You would regenerate your storage account keys if someone managed to hack into an application and retrieve
the key that was hardcoded or saved in a configuration file, giving them full access to your storage account.
Another case for key regeneration is if your team is using a Storage Explorer application that retains the
storage account key, and one of the team members leaves. The application would continue to work, giving
them access to your storage account after they're gone. This is actually the primary reason they created
account-level Shared Access Signatures you can use an account-level SAS instead of storing the access keys
in a configuration file.
Key regeneration plan
You don't want to just regenerate the key you are using without some planning. If you do that, you could cut off all
access to that storage account, which can cause major disruption. This is why there are two keys. You should
regenerate one key at a time.
Before you regenerate your keys, be sure you have a list of all of your applications that are dependent on the storage
account, as well as any other services you are using in Azure. For example, if you are using Azure Media Services that
are dependent on your storage account, you must re-sync the access keys with your media service after you
regenerate the key. If you are using any applications such as a storage explorer, you will need to provide the new keys
to those applications as well. Note that if you have VMs whose VHD files are stored in the storage account, they will
not be affected by regenerating the storage account keys.
You can regenerate your keys in the Azure portal. Once keys are regenerated they can take up to 10 minutes to be
synchronized across Storage Services.
When you're ready, here's the general process detailing how you should change your key. In this case, the assumption
is that you are currently using Key 1 and you are going to change everything to use Key 2 instead.
1. Regenerate Key 2 to ensure that it is secure. You can do this in the Azure portal.
2. In all of the applications where the storage key is stored, change the storage key to use Key 2's new value.
Test and publish the application.
3. After all of the applications and services are up and running successfully, regenerate Key 1. This ensures that
anybody to whom you have not expressly given the new key will no longer have access to the storage
account.
If you are currently using Key 2, you can use the same process, but reverse the key names.
You can migrate over a couple of days, changing each application to use the new key and publishing it. After all of
them are done, you should then go back and regenerate the old key so it no longer works.
Another option is to put the storage account key in an Azure Key Vault as a secret and have your applications retrieve
the key from there. Then when you regenerate the key and update the Azure Key Vault, the applications will not need
to be redeployed because they will pick up the new key from the Azure Key Vault automatically. Note that you can
have the application read the key each time you need it, or you can cache it in memory and if it fails when using it,
retrieve the key again from the Azure Key Vault.
Using Azure Key Vault also adds another level of security for your storage keys. If you use this method, you will never
have the storage key hardcoded in a configuration file, which removes that avenue of somebody getting access to the
keys without specific permission.
115 | P a g e
70-534 Architecting Microsoft Azure Solutions
Another advantage of using Azure Key Vault is you can also control access to your keys using Azure Active Directory.
This means you can grant access to the handful of applications that need to retrieve the keys from Azure Key Vault,
and know that other applications will not be able to access the keys without granting them permission specifically.
Note: it is recommended to use only one of the keys in all of your applications at the same time. If you use Key 1 in
some places and Key 2 in others, you will not be able to rotate your keys without some application losing access.
Resources
About Azure Storage Accounts
This article gives an overview of storage accounts and discusses viewing, copying, and regenerating storage access
keys.
Azure Storage Resource Provider REST API Reference
This article contains links to specific articles about retrieving the storage account keys and regenerating the storage
account keys for an Azure Account using the REST API. Note: This is for Resource Manager storage accounts.
Operations on storage accounts
This article in the Storage Service Manager REST API Reference contains links to specific articles on retrieving and
regenerating the storage account keys using the REST API. Note: This is for the Classic storage accounts.
Say goodbye to key management manage access to Azure Storage data using Azure AD
This article shows how to use Active Directory to control access to your Azure Storage keys in Azure Key Vault. It also
shows how to use an Azure Automation job to regenerate the keys on an hourly basis.
Data Plane Security
Data Plane Security refers to the methods used to secure the data objects stored in Azure Storage the blobs,
queues, tables, and files. We've seen methods to encrypt the data and security during transit of the data, but how do
you go about allowing access to the objects?
There are basically two methods for controlling access to the data objects themselves. The first is by controlling access
to the storage account keys, and the second is using Shared Access Signatures to grant access to specific data objects
for a specific amount of time.
One exception to note is that you can allow public access to your blobs by setting the access level for the container
that holds the blobs accordingly. If you set access for a container to Blob or Container, it will allow public read access
for the blobs in that container. This means anyone with a URL pointing to a blob in that container can open it in a
browser without using a Shared Access Signature or having the storage account keys.
Storage Account Keys
Storage account keys are 512-bit strings created by Azure that, along with the storage account name, can be used to
access the data objects stored in the storage account.
For example, you can read blobs, write to queues, create tables, and modify files. Many of these actions can be
performed through the Azure portal, or using one of many Storage Explorer applications. You can also write code to
use the REST API or one of the Storage Client Libraries to perform these operations.
As discussed in the section on the Management Plane Security, access to the storage keys for a Classic storage
account can be granted by giving full access to the Azure subscription. Access to the storage keys for a storage
account using the Azure Resource Manager model can be controlled through Role-Based Access Control (RBAC).
How to delegate access to objects in your account using Shared Access Signatures and Stored Access Policies
A Shared Access Signature is a string containing a security token that can be attached to a URI that allows you to
delegate access to storage objects and specify constraints such as the permissions and the date/time range of access.
You can grant access to blobs, containers, queue messages, files, and tables. With tables, you can actually grant
permission to access a range of entities in the table by specifying the partition and row key ranges to which you want
the user to have access. For example, if you have data stored with a partition key of geographical state, you could give
someone access to just the data for California.
In another example, you might give a web application a SAS token that enables it to write entries to a queue, and give
a worker role application a SAS token to get messages from the queue and process them. Or you could give one
customer a SAS token they can use to upload pictures to a container in Blob Storage, and give a web application
permission to read those pictures. In both cases, there is a separation of concerns each application can be given just
the access that they require in order to perform their task. This is possible through the use of Shared Access
Signatures.
Why you want to use Shared Access Signatures
Why would you want to use an SAS instead of just giving out your storage account key, which is so much easier?
Giving out your storage account key is like sharing the keys of your storage kingdom. It grants complete access.
116 | P a g e
70-534 Architecting Microsoft Azure Solutions
Someone could use your keys and upload their entire music library to your storage account. They could also replace
your files with virus-infected versions, or steal your data. Giving away unlimited access to your storage account is
something that should not be taken lightly.
With Shared Access Signatures, you can give a client just the permissions required for a limited amount of time. For
example, if someone is uploading a blob to your account, you can grant them write access for just enough time to
upload the blob (depending on the size of the blob, of course). And if you change your mind, you can revoke that
access.
Additionally, you can specify that requests made using a SAS are restricted to a certain IP address or IP address range
external to Azure. You can also require that requests are made using a specific protocol (HTTPS or HTTP/HTTPS). This
means if you only want to allow HTTPS traffic, you can set the required protocol to HTTPS only, and HTTP traffic will be
blocked.
Definition of a Shared Access Signature
A Shared Access Signature is a set of query parameters appended to the URL pointing at the resource
that provides information about the access allowed and the length of time for which the access is permitted. Here is
an example; this URI provides read access to a blob for five minutes. Note that SAS query parameters must be URL
Encoded, such as %3A for colon (:) or %20 for a space.
http://mystorage.blob.core.windows.net/mycontainer/myblob.txt (URL to the blob)
?sv=2015-04-05 (storage service version)
&st=2015-12-10T22%3A18%3A26Z (start time, in UTC time and URL encoded)
&se=2015-12-10T22%3A23%3A26Z (end time, in UTC time and URL encoded)
&sr=b (resource is a blob)
&sp=r (read access)
&sip=168.1.5.60-168.1.5.70 (requests can only come from this range of IP addresses)
&spr=https (only allow HTTPS requests)
&sig=Z%2FRHIX5Xcg0Mq2rqI3OlWTjEg2tYkboXr1P9ZUXDtkk%3D (signature used for the authentication of the SAS)
How the Shared Access Signature is authenticated by the Azure Storage Service
When the storage service receives the request, it takes the input query parameters and creates a signature using the
same method as the calling program. It then compares the two signatures. If they agree, then the storage service can
check the storage service version to make sure it's valid, verify that the current date and time are within the specified
window, make sure the access requested corresponds to the request made, etc.
For example, with our URL above, if the URL was pointing to a file instead of a blob, this request would fail because it
specifies that the Shared Access Signature is for a blob. If the REST command being called was to update a blob, it
would fail because the Shared Access Signature specifies that only read access is permitted.
Types of Shared Access Signatures
A service-level SAS can be used to access specific resources in a storage account. Some examples of this are
retrieving a list of blobs in a container, downloading a blob, updating an entity in a table, adding messages to
a queue or uploading a file to a file share.
An account-level SAS can be used to access anything that a service-level SAS can be used for. Additionally, it
can give options to resources that are not permitted with a service-level SAS, such as the ability to create
containers, tables, queues, and file shares. You can also specify access to multiple services at once. For
example, you might give someone access to both blobs and files in your storage account.
Creating an SAS URI
1. You can create an ad hoc URI on demand, defining all of the query parameters each time.
This is really flexible, but if you have a logical set of parameters that are similar each time, using a Stored Access Policy
is a better idea.
2. You can create a Stored Access Policy for an entire container, file share, table, or queue. Then you can use this
as the basis for the SAS URIs you create. Permissions based on Stored Access Policies can be easily revoked.
You can have up to 5 policies defined on each container, queue, table, or file share.
For example, if you were going to have many people read the blobs in a specific container, you could create a Stored
Access Policy that says "give read access" and any other settings that will be the same each time. Then you can create
an SAS URI using the settings of the Stored Access Policy and specifying the expiration date/time. The advantage of
this is that you don't have to specify all of the query parameters every time.
Revocation
117 | P a g e
70-534 Architecting Microsoft Azure Solutions
Suppose your SAS has been compromised, or you want to change it because of corporate security or regulatory
compliance requirements. How do you revoke access to a resource using that SAS? It depends on how you created the
SAS URI.
If you are using ad hoc URI's, you have three options. You can issue SAS tokens with short expiration policies and
simply wait for the SAS to expire. You can rename or delete the resource (assuming the token was scoped to a single
object). You can change the storage account keys. This last option can have a big impact, depending on how many
services are using that storage account, and probably isn't something you want to do without some planning.
If you are using a SAS derived from a Stored Access Policy, you can remove access by revoking the Stored Access Policy
you can just change it so it has already expired, or you can remove it altogether. This takes effect immediately, and
invalidates every SAS created using that Stored Access Policy. Updating or removing the Stored Access Policy may
impact people accessing that specific container, file share, table, or queue via SAS, but if the clients are written so they
request a new SAS when the old one becomes invalid, this will work fine.
Because using a SAS derived from a Stored Access Policy gives you the ability to revoke that SAS immediately, it is the
recommended best practice to always use Stored Access Policies when possible.
Resources
For more detailed information on using Shared Access Signatures and Stored Access Policies, complete with examples,
please refer to the following articles:
These are the reference articles.
o Service SAS
This article provides examples of using a service-level SAS with blobs, queue messages, table ranges, and files.
o Constructing a service SAS
o Constructing an account SAS
These are tutorials for using the .NET client library to create Shared Access Signatures and Stored Access
Policies.
o Using Shared Access Signatures (SAS)
o Shared Access Signatures, Part 2: Create and Use a SAS with the Blob Service
This article includes an explanation of the SAS model, examples of Shared Access Signatures, and recommendations
for the best practice use of SAS. Also discussed is the revocation of the permission granted.
Limiting access by IP Address (IP ACLs)
o What is an endpoint Access Control List (ACLs)?
o Constructing a Service SAS
This is the reference article for service-level SAS; it includes an example of IP ACLing.
o Constructing an Account SAS
This is the reference article for account-level SAS; it includes an example of IP ACLing.
Authentication
o Authentication for the Azure Storage Services
Shared Access Signatures Getting Started Tutorial
o SAS Getting Started Tutorial
Encryption in Transit
Transport-Level Encryption Using HTTPS
Another step you should take to ensure the security of your Azure Storage data is to encrypt the data between the
client and Azure Storage. The first recommendation is to always use the HTTPS protocol, which ensures secure
communication over the public Internet.
To have a secure communication channel, you should always use HTTPS when calling the REST APIs or accessing
objects in storage. Also, Shared Access Signatures, which can be used to delegate access to Azure Storage objects,
include an option to specify that only the HTTPS protocol can be used when using Shared Access Signatures, ensuring
that anybody sending out links with SAS tokens will use the proper protocol.
You can enforce the use of HTTPS when calling the REST APIs to access objects in storage accounts by enabling Secure
transfer required for the storage account. Connections using HTTP will be refused once this is enabled.
Using encryption during transit with Azure File Shares
Azure File Storage supports HTTPS when using the REST API, but is more commonly used as an SMB file share attached
to a VM. SMB 2.1 does not support encryption, so connections are only allowed within the same region in Azure.
However, SMB 3.0 supports encryption, and it's available in Windows Server 2012 R2, Windows 8, Windows 8.1, and
Windows 10, allowing cross-region access and even access on the desktop.
118 | P a g e
70-534 Architecting Microsoft Azure Solutions
Note that while Azure File Shares can be used with Unix, the Linux SMB client does not yet support encryption, so
access is only allowed within an Azure region. Encryption support for Linux is on the roadmap of Linux developers
responsible for SMB functionality. When they add encryption, you will have the same ability for accessing an Azure
File Share on Linux as you do for Windows.
You can enforce the use of encryption with the Azure Files service by enabling Secure transfer required for the storage
account. If using the REST APIs, HTTPs is required. For SMB, only SMB connections that support encryption will
connect successfully.
Resources
How to use Azure File Storage with Linux
This article shows how to mount an Azure File Share on a Linux system and upload/download files.
Get started with Azure File storage on Windows
This article gives an overview of Azure File shares and how to mount and use them using PowerShell and .NET.
Inside Azure File Storage
This article announces the general availability of Azure File Storage and provides technical details about the SMB 3.0
encryption.
Using Client-side encryption to secure data that you send to storage
Another option that helps you ensure that your data is secure while being transferred between a client application
and Storage is Client-side Encryption. The data is encrypted before being transferred into Azure Storage. When
retrieving the data from Azure Storage, the data is decrypted after it is received on the client side. Even though the
data is encrypted going across the wire, we recommend that you also use HTTPS, as it has data integrity checks built in
which help mitigate network errors affecting the integrity of the data.
Client-side encryption is also a method for encrypting your data at rest, as the data is stored in its encrypted form.
We'll talk about this in more detail in the section on Encryption at Rest.
Encryption at Rest
There are three Azure features that provide encryption at rest. Azure Disk Encryption is used to encrypt the OS and
data disks in IaaS Virtual Machines. The other two Client-side Encryption and SSE are both used to encrypt data in
Azure Storage. Let's look at each of these, and then do a comparison and see when each one can be used.
While you can use Client-side Encryption to encrypt the data in transit (which is also stored in its encrypted form in
Storage), you may prefer to simply use HTTPS during the transfer, and have some way for the data to be automatically
encrypted when it is stored. There are two ways to do this -- Azure Disk Encryption and SSE. One is used to directly
encrypt the data on OS and data disks used by VMs, and the other is used to encrypt data written to Azure Blob
Storage.
Storage Service Encryption (SSE)
SSE allows you to request that the storage service automatically encrypt the data when writing it to Azure Storage.
When you read the data from Azure Storage, it will be decrypted by the storage service before being returned. This
enables you to secure your data without having to modify code or add code to any applications.
This is a setting that applies to the whole storage account. You can enable and disable this feature by changing the
value of the setting. To do this, you can use the Azure portal, PowerShell, the Azure CLI, the Storage Resource Provider
REST API, or the .NET Storage Client Library. By default, SSE is turned off.
At this time, the keys used for the encryption are managed by Microsoft. We generate the keys originally, and manage
the secure storage of the keys as well as the regular rotation as defined by internal Microsoft policy. In the future, you
will get the ability to manage your own encryption keys, and provide a migration path from Microsoft-managed keys
to customer-managed keys.
This feature is available for Standard and Premium Storage accounts created using the Resource Manager deployment
model. SSE applies only to block blobs, page blobs, and append blobs. The other types of data, including tables,
queues, and files, will not be encrypted.
Data is only encrypted when SSE is enabled and the data is written to Blob Storage. Enabling or disabling SSE does not
impact existing data. In other words, when you enable this encryption, it will not go back and encrypt data that
already exists; nor will it decrypt the data that already exists when you disable SSE.
If you want to use this feature with a Classic storage account, you can create a new Resource Manager storage
account and use AzCopy to copy the data to the new account.
119 | P a g e
70-534 Architecting Microsoft Azure Solutions
Client-side Encryption
We mentioned client-side encryption when discussing the encryption of the data in transit. This feature allows you to
programmatically encrypt your data in a client application before sending it across the wire to be written to Azure
Storage, and to programmatically decrypt your data after retrieving it from Azure Storage.
This does provide encryption in transit, but it also provides the feature of Encryption at Rest. Note that although the
data is encrypted in transit, we still recommend using HTTPS to take advantage of the built-in data integrity checks
which help mitigate network errors affecting the integrity of the data.
An example of where you might use this is if you have a web application that stores blobs and retrieves blobs, and you
want the application and data to be as secure as possible. In that case, you would use client-side encryption. The
traffic between the client and the Azure Blob Service contains the encrypted resource, and nobody can interpret the
data in transit and reconstitute it into your private blobs.
Client-side encryption is built into the Java and the .NET storage client libraries, which in turn use the Azure Key Vault
APIs, making it pretty easy for you to implement. The process of encrypting and decrypting the data uses the envelope
technique, and stores metadata used by the encryption in each storage object. For example, for blobs, it stores it in
the blob metadata, while for queues, it adds it to each queue message.
For the encryption itself, you can generate and manage your own encryption keys. You can also use keys generated by
the Azure Storage Client Library, or you can have the Azure Key Vault generate the keys. You can store your
encryption keys in your on-premises key storage, or you can store them in an Azure Key Vault. Azure Key Vault allows
you to grant access to the secrets in Azure Key Vault to specific users using Azure Active Directory. This means that
not just anybody can read the Azure Key Vault and retrieve the keys you're using for client-side encryption.
Resources
Encrypt and decrypt blobs in Microsoft Azure Storage using Azure Key Vault
This article shows how to use client-side encryption with Azure Key Vault, including how to create the KEK and store it
in the vault using PowerShell.
Client-Side Encryption and Azure Key Vault for Microsoft Azure Storage
This article gives an explanation of client-side encryption, and provides examples of using the storage client library to
encrypt and decrypt resources from the four storage services. It also talks about Azure Key Vault.
Using Azure Disk Encryption to encrypt disks used by your virtual machines
Azure Disk Encryption is a new feature. This feature allows you to encrypt the OS disks and Data disks used by an IaaS
Virtual Machine. For Windows, the drives are encrypted using industry-standard BitLocker encryption technology. For
Linux, the disks are encrypted using the DM-Crypt technology. This is integrated with Azure Key Vault to allow you to
control and manage the disk encryption keys.
The solution supports the following scenarios for IaaS VMs when they are enabled in Microsoft Azure:
Integration with Azure Key Vault
Standard tier VMs: A, D, DS, G, GS, and so forth series IaaS VMs
Enabling encryption on Windows and Linux IaaS VMs
Disabling encryption on OS and data drives for Windows IaaS VMs
Disabling encryption on data drives for Linux IaaS VMs
Enabling encryption on IaaS VMs that are running Windows client OS
Enabling encryption on volumes with mount paths
Enabling encryption on Linux VMs that are configured with disk striping (RAID) by using mdadm
Enabling encryption on Linux VMs by using LVM for data disks
Enabling encryption on Windows VMs that are configured by using storage spaces
All Azure public regions are supported
The solution does not support the following scenarios, features, and technology in the release:
Basic tier IaaS VMs
Disabling encryption on an OS drive for Linux IaaS VMs
IaaS VMs that are created by using the classic VM creation method
Integration with your on-premises Key Management Service
Azure Files (shared file system), Network File System (NFS), dynamic volumes, and Windows VMs that are
configured with software-based RAID systems
Note
Linux OS disk encryption is currently supported on the following Linux distributions: RHEL 7.2, CentOS 7.2n, and
Ubuntu 16.04.
120 | P a g e
70-534 Architecting Microsoft Azure Solutions
This feature ensures that all data on your virtual machine disks is encrypted at rest in Azure Storage.
Resources
Azure Disk Encryption for Windows and Linux IaaS VMs
Comparison of Azure Disk Encryption, SSE, and Client-Side Encryption
IaaS VMs and their VHD files
For disks used by IaaS VMs, we recommend using Azure Disk Encryption. You can turn on SSE to encrypt the VHD files
that are used to back those disks in Azure Storage, but it only encrypts newly written data. This means if you create a
VM and then enable SSE on the storage account that holds the VHD file, only the changes will be encrypted, not the
original VHD file.
If you create a VM using an image from the Azure Marketplace, Azure performs a shallow copy of the image to your
storage account in Azure Storage, and it is not encrypted even if you have SSE enabled. After it creates the VM and
starts updating the image, SSE will start encrypting the data. For this reason, it's best to use Azure Disk Encryption on
VMs created from images in the Azure Marketplace if you want them fully encrypted.
If you bring a pre-encrypted VM into Azure from on-premises, you will be able to upload the encryption keys to Azure
Key Vault, and continue using the encryption for that VM that you were using on-premises. Azure Disk Encryption is
enabled to handle this scenario.
If you have non-encrypted VHD from on-premises, you can upload it into the gallery as a custom image and provision
a VM from it. If you do this using the Resource Manager templates, you can ask it to turn on Azure Disk Encryption
when it boots up the VM.
When you add a data disk and mount it on the VM, you can turn on Azure Disk Encryption on that data disk. It will
encrypt that data disk locally first, and then the service management layer will do a lazy write against storage so the
storage content is encrypted.
Client-side encryption
Client-side encryption is the most secure method of encrypting your data, because it encrypts it before transit, and
encrypts the data at rest. However, it does require that you add code to your applications using storage, which you
may not want to do. In those cases, you can use HTTPs for your data in transit, and SSE to encrypt the data at rest.
With client-side encryption, you can encrypt table entities, queue messages, and blobs. With SSE, you can only
encrypt blobs. If you need table and queue data to be encrypted, you should use client-side encryption.
Client-side encryption is managed entirely by the application. This is the most secure approach, but does require you
to make programmatic changes to your application and put key management processes in place. You would use this
when you want the extra security during transit, and you want your stored data to be encrypted.
Client-side encryption is more load on the client, and you have to account for this in your scalability plans, especially if
you are encrypting and transferring a lot of data.
Storage Service Encryption (SSE)
SSE is managed by Azure Storage. Using SSE does not provide for the security of the data in transit, but it does encrypt
the data as it is written to Azure Storage. There is no impact on the performance when using this feature.
You can only encrypt block blobs, append blobs, and page blobs using SSE. If you need to encrypt table data or queue
data, you should consider using client-side encryption.
If you have an archive or library of VHD files that you use as a basis for creating new virtual machines, you can create a
new storage account, enable SSE, and then upload the VHD files to that account. Those VHD files will be encrypted by
Azure Storage.
If you have Azure Disk Encryption enabled for the disks in a VM and SSE enabled on the storage account holding the
VHD files, it will work fine; it will result in any newly-written data being encrypted twice.
Storage Analytics
Using Storage Analytics to monitor authorization type
For each storage account, you can enable Azure Storage Analytics to perform logging and store metrics data. This is a
great tool to use when you want to check the performance metrics of a storage account, or need to troubleshoot a
storage account because you are having performance problems.
Another piece of data you can see in the storage analytics logs is the authentication method used by someone when
they access storage. For example, with Blob Storage, you can see if they used a Shared Access Signature or the storage
account keys, or if the blob accessed was public.
This can be really helpful if you are tightly guarding access to storage. For example, in Blob Storage you can set all of
the containers to private and implement the use of an SAS service throughout your applications. Then you can check
121 | P a g e
70-534 Architecting Microsoft Azure Solutions
the logs regularly to see if your blobs are accessed using the storage account keys, which may indicate a breach of
security, or if the blobs are public but they shouldn't be.
What do the logs look like?
After you enable the storage account metrics and logging through the Azure portal, analytics data will start to
accumulate quickly. The logging and metrics for each service is separate; the logging is only written when there is
activity in that storage account, while the metrics will be logged every minute, every hour, or every day, depending on
how you configure it.
The logs are stored in block blobs in a container named $logs in the storage account. This container is automatically
created when Storage Analytics is enabled. Once this container is created, you can't delete it, although you can delete
its contents.
Under the $logs container, there is a folder for each service, and then there are subfolders for the
year/month/day/hour. Under hour, the logs are simply numbered. This is what the directory structure will look like:
Every request to Azure Storage is logged. Here's a snapshot of a log file, showing the first few fields.
You can see that you can use the logs to track any kind of calls to a storage account.
What are all of those fields for?
There is an article listed in the resources below that provides the list of the many fields in the logs and what they are
used for. Here is the list of fields in order:
We're interested in the entries for GetBlob, and how they are authenticated, so we need to look for entries with
operation-type "Get-Blob", and check the request-status (4th column) and the authorization-type (8th column).
For example, in the first few rows in the listing above, the request-status is "Success" and the authorization-type is
"authenticated". This means the request was validated using the storage account key.
122 | P a g e
70-534 Architecting Microsoft Azure Solutions
One thing to note is that CORS allows access, but it does not provide authentication, which is required for all non-
public access of storage resources. This means you can only access blobs if they are public or you include a Shared
Access Signature giving you the appropriate permission. Tables, queues, and files have no public access, and require a
SAS.
By default, CORS is disabled on all services. You can enable CORS by using the REST API or the storage client library to
call one of the methods to set the service policies. When you do that, you include a CORS rule, which is in XML. Here's
an example of a CORS rule that has been set using the Set Service Properties operation for the Blob Service for a
storage account. You can perform that operation using the storage client library or the REST APIs for Azure Storage.
<Cors>
<CorsRule>
<AllowedOrigins>http://www.contoso.com, http://www.fabrikam.com</AllowedOrigins>
<AllowedMethods>PUT,GET</AllowedMethods>
<AllowedHeaders>x-ms-meta-data*,x-ms-meta-target*,x-ms-meta-abc</AllowedHeaders>
<ExposedHeaders>x-ms-meta-*</ExposedHeaders>
<MaxAgeInSeconds>200</MaxAgeInSeconds>
</CorsRule>
<Cors>
Here's what each row means:
AllowedOrigins This tells which non-matching domains can request and receive data from the storage service.
This says that both contoso.com and fabrikam.com can request data from Blob Storage for a specific storage
account. You can also set this to a wildcard (*) to allow all domains to access requests.
AllowedMethods This is the list of methods (HTTP request verbs) that can be used when making the request.
In this example, only PUT and GET are allowed. You can set this to a wildcard (*) to allow all methods to be
used.
AllowedHeaders This is the request headers that the origin domain can specify when making the request. In
this example, all metadata headers starting with x-ms-meta-data, x-ms-meta-target, and x-ms-meta-abc are
permitted. The wildcard character (*) indicates that any header beginning with the specified prefix is allowed.
ExposedHeaders This tells which response headers should be exposed by the browser to the request issuer. In
this example, any header starting with "x-ms-meta-" will be exposed.
MaxAgeInSeconds This is the maximum amount of time that a browser will cache the preflight OPTIONS
request. (For more information about the preflight request, check the first article below.)
Resources
For more information about CORS and how to enable it, please check out these resources.
Cross-Origin Resource Sharing (CORS) Support for the Azure Storage Services on Azure.com
This article provides an overview of CORS and how to set the rules for the different storage services.
Cross-Origin Resource Sharing (CORS) Support for the Azure Storage Services on MSDN
This is the reference documentation for CORS support for the Azure Storage Services. This has links to articles applying
to each storage service, and shows an example and explains each element in the CORS file.
Microsoft Azure Storage: Introducing CORS
This is a link to the initial blog article announcing CORS and showing how to use it.
Frequently asked questions about Azure Storage security
1. How can I verify the integrity of the blobs I'm transferring into or out of Azure Storage if I can't use the HTTPS
protocol?
If for any reason you need to use HTTP instead of HTTPS and you are working with block blobs, you can use MD5
checking to help verify the integrity of the blobs being transferred. This will help with protection from
network/transport layer errors, but not necessarily with intermediary attacks.
If you can use HTTPS, which provides transport level security, then using MD5 checking is redundant and unnecessary.
For more information, please check out the Azure Blob MD5 Overview.
2. What about FIPS-Compliance for the U.S. Government?
The United States Federal Information Processing Standard (FIPS) defines cryptographic algorithms approved for use
by U.S. Federal government computer systems for the protection of sensitive data. Enabling FIPS mode on a Windows
server or desktop tells the OS that only FIPS-validated cryptographic algorithms should be used. If an application uses
non-compliant algorithms, the applications will break. With.NET Framework versions 4.5.2 or higher, the application
automatically switches the cryptography algorithms to use FIPS-compliant algorithms when the computer is in FIPS
mode.
124 | P a g e
70-534 Architecting Microsoft Azure Solutions
Microsoft leaves it up to each customer to decide whether to enable FIPS mode. We believe there is no compelling
reason for customers who are not subject to government regulations to enable FIPS mode by default.
Resources
Why We're Not Recommending "FIPS Mode" Anymore
This blog article gives an overview of FIPS and explains why they don't enable FIPS mode by default.
FIPS 140 Validation
This article provides information on how Microsoft products and cryptographic modules comply with the FIPS
standard for the U.S. Federal government.
"System cryptography: Use FIPS compliant algorithms for encryption, hashing, and signing" security settings
effects in Windows XP and in later versions of Windows
This article talks about the use of FIPS mode in older Windows computers.
125 | P a g e
70-534 Architecting Microsoft Azure Solutions
126 | P a g e
70-534 Architecting Microsoft Azure Solutions
128 | P a g e
70-534 Architecting Microsoft Azure Solutions
129 | P a g e
70-534 Architecting Microsoft Azure Solutions
Backup and restore of encrypted VMs is supported only for VMs that are encrypted with the KEK configuration. It is
not supported on VMs that are encrypted without KEK. KEK is an optional parameter that enables VM encryption. This
support is coming soon. Update encryption settings of an existing encrypted premium storage VM are not supported.
This support is coming soon.
Encryption features
When you enable and deploy Azure Disk Encryption for Azure IaaS VMs, the following capabilities are enabled,
depending on the configuration provided:
Encryption of the OS volume to protect the boot volume at rest in your storage
Encryption of data volumes to protect the data volumes at rest in your storage
Disabling encryption on the OS and data drives for Windows IaaS VMs
Disabling encryption on the data drives for Linux IaaS VMs
Safeguarding the encryption keys and secrets in your key vault subscription
Reporting the encryption status of the encrypted IaaS VM
Removal of disk-encryption configuration settings from the IaaS virtual machine
Backup and restore of encrypted VMs by using the Azure Backup service
Note
Backup and restore of encrypted VMs is supported only for VMs that are encrypted with the KEK configuration. It is
not supported on VMs that are encrypted without KEK. KEK is an optional parameter that enables VM encryption.
Azure Disk Encryption for IaaS VMS for Windows and Linux solution includes:
The disk-encryption extension for Windows.
The disk-encryption extension for Linux.
The disk-encryption PowerShell cmdlets.
The disk-encryption Azure command-line interface (CLI) cmdlets.
The disk-encryption Azure Resource Manager templates.
The Azure Disk Encryption solution is supported on IaaS VMs that are running Windows or Linux OS. For more
information about the supported operating systems, see the "Prerequisites" section.
Note
There is no additional charge for encrypting VM disks with Azure Disk Encryption.
Value proposition
When you apply the Azure Disk Encryption-management solution, you can satisfy the following business needs:
IaaS VMs are secured at rest, because you can use industry-standard encryption technology to address
organizational security and compliance requirements.
IaaS VMs boot under customer-controlled keys and policies, and you can audit their usage in your key vault.
Encryption workflow
To enable disk encryption for Windows and Linux VMs, do the following:
1. Choose an encryption scenario from among the preceding encryption scenarios.
2. Opt in to enabling disk encryption via the Azure Disk Encryption Resource Manager template, PowerShell
cmdlets, or CLI command, and specify the encryption configuration.
For the customer-encrypted VHD scenario, upload the encrypted VHD to your storage account and
the encryption key material to your key vault. Then, provide the encryption configuration to enable
encryption on a new IaaS VM.
For new VMs that are created from the Marketplace and existing VMs that are already running in
Azure, provide the encryption configuration to enable encryption on the IaaS VM.
3. Grant access to the Azure platform to read the encryption-key material (BitLocker encryption keys for
Windows systems and Passphrase for Linux) from your key vault to enable encryption on the IaaS VM.
4. Provide the Azure Active Directory (Azure AD) application identity to write the encryption key material to your
key vault. Doing so enables encryption on the IaaS VM for the scenarios mentioned in step 2.
5. Azure updates the VM service model with encryption and the key vault configuration, and sets up your
encrypted VM.
130 | P a g e
70-534 Architecting Microsoft Azure Solutions
Decryption workflow
To disable disk encryption for IaaS VMs, complete the following high-level steps:
1. Choose to disable encryption (decryption) on a running IaaS VM in Azure via the Azure Disk Encryption
Resource Manager template or PowerShell cmdlets, and specify the decryption configuration.
This step disables encryption of the OS or the data volume or both on the running Windows IaaS VM. However, as
mentioned in the previous section, disabling OS disk encryption for Linux is not supported. The decryption step is
allowed only for data drives on Linux VMs.
2. Azure updates the VM service model, and the IaaS VM is marked decrypted. The contents of the VM are no
longer encrypted at rest.
Note
The disable-encryption operation does not delete your key vault and the encryption key material (BitLocker
encryption keys for Windows systems or Passphrase for Linux). Disabling OS disk encryption for Linux is not supported.
The decryption step is allowed only for data drives on Linux VMs.
Azure SQL Database transparent data encryption helps protect against the threat of malicious activity by performing
real-time encryption and decryption of the database, associated backups, and transaction log files at rest without
requiring changes to the application.
TDE encrypts the storage of an entire database by using a symmetric key called the database encryption key. In SQL
Database the database encryption key is protected by a built-in server certificate. The built-in server certificate is
unique for each SQL Database server. If a database is in a GeoDR relationship, it is protected by a different key on each
server. If 2 databases are connected to the same server, they share the same built-in certificate. Microsoft
automatically rotates these certificates at least every 90 days. For a general description of TDE, see Transparent Data
Encryption (TDE).
Azure SQL Database does not support Azure Key Vault integration with TDE. SQL Server running on an Azure virtual
machine can use an asymmetric key from the Key Vault. For more information, see Extensible Key Management Using
Azure Key Vault (SQL Server).
Permissions
To configure TDE through the Azure portal, by using the REST API, or by using PowerShell, you must be connected as
the Azure Owner, Contributor, or SQL Security Manager.
To configure TDE by using Transact-SQL requires the following:
To execute the ALTER DATABASE statement with the SET option requires membership in the dbmanager role.
Enable TDE on a Database Using the Portal
1. Visit the Azure Portal at https://portal.azure.com and sign-in with your Azure Administrator or Contributor
account.
2. On the left banner, click to BROWSE, and then click SQL databases.
131 | P a g e
70-534 Architecting Microsoft Azure Solutions
3. With SQL databases selected in the left pane, click your user database.
4. In the database blade, click All settings.
5. In the Settings blade, click Transparent data encryption part to open the Transparent data encryption blade.
6. In the Data encryption blade, move the Data encryption button to On, and then click Save (at the top of the
page) to apply the setting. The Encryption status will approximate the progress of the transparent data
encryption.
You can also monitor the progress of encryption by connecting to SQL Database using a query tool such as SQL Server
Management Studio as a database user with the VIEW DATABASE STATE permission. Query
the encryption_state column of the sys.dm_database_encryption_keys view.
Enabling TDE on SQL Database by Using Transact-SQL
The following steps enable TDE.
1. Connect to the database using a login that is an administrator or a member of the dbmanager role in the
master database.
2. Execute the following statements to encrypt the database.
Copy
-- Enable encryption
ALTER DATABASE [AdventureWorks] SET ENCRYPTION ON;
GO
3. To monitor the progress of encryption on SQL Database, database users with the VIEW DATABASE
STATE permission can query the encryption_state column of the sys.dm_database_encryption_keys view.
Enabling and Disabling TDE on SQL Database by Using PowerShell
Using the Azure PowerShell you can run the following command to turn TDE on/off. You must connect your account
to the PS window before running the command. Customize the example to use your values for
the ServerName, ResourceGroupName, and DatabaseName parameters. For additional information about PowerShell,
see How to install and configure Azure PowerShell.
Note
To continue, you should install and configure version 1.0 of Azure PowerShell. Version 0.9.8 can be used but it is
deprecated and it requires switching to the AzureResourceManager cmdlets by using the PS C:\> Switch-AzureMode -
Name AzureResourceManager command.
1. To enable TDE, return the TDE status, and view the encryption activity:
PS C:\> Set-AzureRMSqlDatabaseTransparentDataEncryption -ServerName "myserver" -ResourceGroupName
"Default-SQL-WestUS" -DatabaseName "database1" -State "Enabled"
132 | P a g e
70-534 Architecting Microsoft Azure Solutions
133 | P a g e
70-534 Architecting Microsoft Azure Solutions
RBAC in the Azure portal. If you want more details about how RBAC helps you manage access, see What is Role-Based
Access Control.
Within each subscription, you can grant up to 2000 role assignments.2
View access
You can see who has access to a resource, resource group, or subscription from its main blade in the Azure portal. For
example, we want to see who has access to one of our resource groups:
1. Select Resource groups in the navigation bar on the left.
2. Select the name of the resource group from the Resource groups blade.
3. Select Access control (IAM) from the left menu.
4. The Access control blade lists all users, groups, and applications that have been granted access to the
resource group.
Notice that some users were Assigned access while others Inherited it. Access is either assigned specifically to the
resource group or inherited from an assignment to the parent subscription.
Note
Classic subscription admins and co-admins are considered owners of the subscription in the new RBAC model.
Add Access
You grant access from within the resource, resource group, or subscription that is the scope of the role assignment.
1. Select Add on the Access control blade.
2. Select the role that you wish to assign from the Select a
role blade.
3. Select the user, group, or application in your directory that
you wish to grant access to. You can search the directory with
display names, email addresses, and object identifiers.
134 | P a g e
70-534 Architecting Microsoft Azure Solutions
to Inherited there is a link that takes you to the resources where this role was assigned. Go to the resource listed there
to remove the role assignment.
API Management Service Can manage API Management service and the APIs
Contributor
API Management Service Can manage API Management service, but not the APIs themselves
Operator Role
API Management Service Read-only access to API Management service and APIs
Reader Role
Backup Operator Can manage backup except removing backup, in Recovery Services
vault
135 | P a g e
70-534 Architecting Microsoft Azure Solutions
Data Factory Contributor Can create and manage data factories, and child resources within
them.
DevTest Labs User Can view everything and connect, start, restart, and shutdown virtual
machines
Logic App Contributor Can manage all aspects of a Logic App, but not create a new one.
Logic App Operator Can start and stop workflows defined within a Logic App.
Monitoring Contributor Can read monitoring data and edit monitoring settings
New Relic APM Account Can manage New Relic Application Performance Management
Contributor accounts and applications
136 | P a g e
70-534 Architecting Microsoft Azure Solutions
Security Manager Can manage security components, security policies, and virtual
machines
SQL DB Contributor Can manage SQL databases, but not their security-related policies
SQL Security Manager Can manage the security-related policies of SQL servers and databases
SQL Server Contributor Can manage SQL servers and databases, but not their security-related
policies
Classic Virtual Machine Can manage classic virtual machines, but not the virtual network or
Contributor storage account to which they are connected
Virtual Machine Contributor Can manage virtual machines, but not the virtual network or storage
account to which they are connected
Classic Network Contributor Can manage classic virtual networks and reserved IPs
Website Contributor Can manage websites, but not the web plans to which they are
connected
The following is an example of a custom role for monitoring and restarting virtual machines:
{
"Name": "Virtual Machine Operator",
"Id": "cadb4a5a-4e7a-47be-84db-05cad13b6769",
"IsCustom": true,
"Description": "Can monitor and restart virtual machines.",
"Actions": [
"Microsoft.Storage/*/read",
"Microsoft.Network/*/read",
"Microsoft.Compute/*/read",
"Microsoft.Compute/virtualMachines/start/action",
"Microsoft.Compute/virtualMachines/restart/action",
"Microsoft.Authorization/*/read",
"Microsoft.Resources/subscriptions/resourceGroups/read",
"Microsoft.Insights/alertRules/*",
"Microsoft.Insights/diagnosticSettings/*",
"Microsoft.Support/*"
],
"NotActions": [
],
"AssignableScopes": [
"/subscriptions/c276fc76-9cd4-44c9-99a7-4fd71546436e",
"/subscriptions/e91d47c4-76f3-4271-a796-21b4ecfe3624",
"/subscriptions/34370e90-ac4a-4bf9-821f-85eeedeae1a2"
]
}
Actions
The Actions property of a custom role specifies the Azure operations to which the role grants access. It is a collection
of operation strings that identify securable operations of Azure resource providers. Operation strings follow the
format of Microsoft.<ProviderName>/<ChildResourceType>/<action>. Operation strings that contain wildcards (*)
grant access to all operations that match the operation string. For instance:
*/read grants access to read operations for all resource types of all Azure resource providers.
Microsoft.Compute/* grants access to all operations for all resource types in the Microsoft.Compute resource
provider.
Microsoft.Network/*/read grants access to read operations for all resource types in the Microsoft.Network
resource provider of Azure.
Microsoft.Compute/virtualMachines/* grants access to all operations of virtual machines and its child
resource types.
Microsoft.Web/sites/restart/Action grants access to restart websites.
Use Get-AzureRmProviderOperation (in PowerShell) or azure provider operations show (in Azure CLI) to list operations
of Azure resource providers. You may also use these commands to verify that an operation string is valid, and to
expand wildcard operation strings.
Get-AzureRMProviderOperation Microsoft.Compute/virtualMachines/*/action | FT Operation, OperationName
Get-AzureRMProviderOperation Microsoft.Network/*
138 | P a g e
70-534 Architecting Microsoft Azure Solutions
NotActions
Use the NotActions property if the set of operations that you wish to allow is more easily defined by excluding
restricted operations. The access granted by a custom role is computed by subtracting the NotActions operations from
the Actions operations.
Note
If a user is assigned a role that excludes an operation in NotActions, and is assigned a second role that grants access to
the same operation, the user will be allowed to perform that operation. NotActions is not a deny rule it is simply a
convenient way to create a set of allowed operations when specific operations need to be excluded.3
AssignableScopes
139 | P a g e
70-534 Architecting Microsoft Azure Solutions
The AssignableScopes property of the custom role specifies the scopes (subscriptions, resource groups, or resources)
within which the custom role is available for assignment. You can make the custom role available for assignment in
only the subscriptions or resource groups that require it, and not clutter user experience for the rest of the
subscriptions or resource groups.
Examples of valid assignable scopes include:
/subscriptions/c276fc76-9cd4-44c9-99a7-4fd71546436e, /subscriptions/e91d47c4-76f3-4271-a796-
21b4ecfe3624 - makes the role available for assignment in two subscriptions.
/subscriptions/c276fc76-9cd4-44c9-99a7-4fd71546436e - makes the role available for assignment in a
single subscription.
/subscriptions/c276fc76-9cd4-44c9-99a7-4fd71546436e/resourceGroups/Network - makes the role
available for assignment only in the Network resource group.
1
Note
You must use at least one subscription, resource group, or resource ID.
Custom roles access control
The AssignableScopes property of the custom role also controls who can view, modify, and delete the role.
Who can create a custom role? Owners (and User Access Administrators) of subscriptions, resource groups,
and resources can create custom roles for use in those scopes. The user creating the role needs to be able to
perform Microsoft.Authorization/roleDefinition/write operation on all the AssignableScopes of the role.
Who can modify a custom role? Owners (and User Access Administrators) of subscriptions, resource groups,
and resources can modify custom roles in those scopes. Users need to be able to perform
the Microsoft.Authorization/roleDefinition/write operation on all the AssignableScopes of a custom role.
Who can view custom roles? All built-in roles in Azure RBAC allow viewing of roles that are available for
assignment. Users who can perform the Microsoft.Authorization/roleDefinition/read operation at a scope can
view the RBAC roles that are available for assignment at that scope.
Security Center enables these individuals to meet these various responsibilities. For example:
Jeff (Cloud Workload Owner)
Manage a cloud workload and its related resources
Responsible for implementing and maintaining protections in accordance with company security policy
Ellen (CISO/CIO)
Responsible for all aspects of security for the company
Wants to understand the company's security posture across cloud workloads
Needs to be informed of major attacks and risks
David (IT Security)
Sets company security policies to ensure the appropriate protections are in place
Monitors compliance with policies
Generates reports for leadership or auditors
Judy (Security Operations)
Monitors and responds to security alerts 24/7
Escalates to Cloud Workload Owner or IT Security Analyst
Sam (Security Analyst)
Investigate attacks
Work with Cloud Workload Owner to apply remediation
Security Center uses Role-Based Access Control (RBAC), which provides built-in roles that can be assigned to users,
groups, and services in Azure. When a user opens Security Center, they only see information related to resources they
have access to. Which means the user is assigned the role of Owner, Contributor, or Reader to the subscription or
resource group that a resource belongs to. In addition to these roles, there are two specific Security Center roles:
Security reader: user that belongs to this role is be able to view rights to Security Center, which includes
recommendations, alerts, policy, and health, but it won't be able to make changes.
Security admin: same as security reader but it can also update the security policy, dismiss recommendations
and alerts.
The Security Center roles described above do not have access to other service areas of Azure such as Storage, Web &
Mobile, or Internet of Things.
Note
A user needs to be at least a subscription, resource group owner, or contributor to be able to see Security Center in
Azure.
Using the personas explained in the previous diagram, the following RBAC would be needed:
Jeff (Cloud Workload Owner)
Resource Group Owner/Collaborator
141 | P a g e
70-534 Architecting Microsoft Azure Solutions
Note
If you need to review which policies were changed, you can use Azure Audit Logs. Policy changes are always logged in
Azure Audit Logs.
Security recommendations
142 | P a g e
70-534 Architecting Microsoft Azure Solutions
Before configuring security policies, review each of the security recommendations, and determine whether these
policies are appropriate for your various subscriptions and resource groups. It is also important to understand what
action should be taken to address Security Recommendationsand who in your organization will be responsible for
monitoring for new recommendations and taking the needed steps.
Security Center will recommend that you provide security contact details for your Azure subscription. This information
will be used by Microsoft to contact you if the Microsoft Security Response Center (MSRC) discovers that your
customer data has been accessed by an unlawful or unauthorized party. Read Provide security contact details in Azure
Security Center for more information on how to enable this recommendation.
Data collection and storage
Azure Security Center uses the Microsoft Monitoring Agent this is the same agent used by the Operations
Management Suite and Log Analytics service to collect security data from your virtual machines. Data collected from
this agent will be stored in your Log Analytics workspace(s).
Agent
After data collection is enabled in the security policy, the Microsoft Monitoring Agent (for Windows or Linux) is
installed on all supported Azure VMs and any new ones that are created. If the VM already has the Microsoft
Monitoring Agent installed, Azure Security Center will leverage the current installed agent. The agents process is
designed to be non-invasive and have very minimal impact on VM performance.
The Microsoft Monitoring Agent for Windows requires use TCP port 443. See the Troubleshooting article for
additional details.
If at some point you want to disable Data Collection, you can turn it off in the security policy. However, because the
Microsoft Monitoring Agent may be used by other Azure management and monitoring services, the agent will not be
uninstalled automatically when you turn off data collection in Security Center. You can manually uninstall the agent if
needed.
Note
To find a list of supported VMs, read the Azure Security Center frequently asked questions (FAQ).
Workspace
Data collected from the Microsoft Monitoring Agent (on behalf of Azure Security Center) will be stored in either an
existing Log Analytics workspace(s) associated with your Azure subscription or a new workspace(s), taking into
account the Geo of the VM.
In the Azure portal, you can browse to see a list of your Log Analytics workspaces, including any created by Azure
Security Center. A related resource group will be created for new workspaces. Both will follow this naming
convention:
Workspace: DefaultWorkspace-[subscription-ID]-[geo]
Resource Group: DefaultResouceGroup-[geo]
For workspaces created by Azure Security Center, data is retained for 30 days. For exiting workspaces, retention is
based on the workspace pricing tier.
Note
Microsoft make strong commitments to protect the privacy and security of this data. Microsoft adheres to strict
compliance and security guidelinesfrom coding to operating a service. For more information about data handling
and privacy, read Azure Security Center Data Security.
Ongoing security monitoring
After initial configuration and application of Security Center recommendations, the next step is considering Security
Center operational processes.
To access Security Center from the Azure portal you can click Browse and type Security Center in the Filter field. The
views that the user gets are according to these applied filters, the example below shows an environment with many
issues to be addressed:
143 | P a g e
70-534 Architecting Microsoft Azure Solutions
Note
Security Center will not interfere with your normal operational procedures, it will passively monitor your deployments
and provide recommendations based on the security policies you enabled.
When you first opt in to use Security Center for your current Azure environment, make sure that you review all
recommendations, which can be done in the Recommendations tile or per resource (Compute, Networking, Storage &
data, Application).
Once you address all recommendations, the Prevention section should be green for all resources that were addressed.
Ongoing monitoring at this point becomes easier since you will only take actions based on changes in the resource
security health and recommendations tiles.
The Detection section is more reactive, these are alerts regarding issues that are either taking place now, or occurred
in the past and were detected by Security Center controls and 3rd party systems. The Security Alerts tile will show bar
graphs that represent the number of threat detection alerts that were found in each day, and their distribution among
the different severity categories (low, medium, high). For more information about Security Alerts, read Managing and
responding to security alerts in Azure Security Center.
Note
You can also leverage Microsoft Power BI to visualize your Security Center data. Read Get insights from Azure Security
Center data with Power BI.
Monitoring for new or changed resources
Most Azure environments are dynamic, with new resources being spun up and down on a regular basis, configurations
or changes, etc. Security Center helps ensure that you have visibility into the security state of these new resources.
When you add new resources (VMs, SQL DBs) to your Azure Environment, Security Center will automatically discover
these resources and begin to monitor their security. This also includes PaaS web roles and worker roles. If Data
Collection is enabled in the Security Policy, additional monitoring capabilities will be enabled automatically for your
virtual machines.
144 | P a g e
70-534 Architecting Microsoft Azure Solutions
1. For virtual machines, click Compute, under Prevention section. Any issues with enabling data or related
recommendations will be surfaced in the Overview tab, and Monitoring Recommendations section.
2. View the Recommendations to see what, if any, security risks were identified for the new resource.
3. It is very common that when new VMs are added to your environment, only the operating system is initially
installed. The resource owner might need some time to deploy other apps that will be used by these VMs.
Ideally, you should know the final intent of this workload. Is it going to be an Application Server? Based on
what this new workload is going to be, you can enable the appropriate Security Policy, which is the third step
in this workflow.
4. As new resources are added to your Azure environment, it is possible that new alerts appear in the Security
Alerts tile. Always verify if there are new alerts in this tile and take actions according to Security Center
recommendations.
You will also want to regularly monitor the state of existing resources to identify configuration changes that have
created security risks, drift from recommended baselines, and security alerts. Start at the Security Center dashboard.
From there you have three major areas to review on a consistent basis.
145 | P a g e
70-534 Architecting Microsoft Azure Solutions
1. The Prevention section panel provides you quick access to your key resources. Use this option to monitor
Compute, Networking, Storage & data and Applications.
2. The Recommendations panel enables you to review Security Center recommendations. During your ongoing
monitoring you may find that you dont have recommendations on a daily basis, which is normal since you
addressed all recommendations on the initial Security Center setup. For this reason, you may not have new
information in this section every day and will just need to access it as needed.
3. The Detection section might change on either a very frequent or very infrequent basis. Always review your
security alerts and take actions based on Security Center recommendations.
Incident response
Security Center detects and alerts you to threats as they occur. Organizations should monitor for new security alerts
and take action as needed to investigate further or remediate the attack. For more information on how Security
Center threat detection works, read Azure Security Center detection capabilities.
While this article doesnt have the intent to assist you creating your own Incident Response plan, we are going to use
Microsoft Azure Security Response in the Cloud lifecycle as the foundation for incident response stages. The stages
are shown in the following diagram:
Note
146 | P a g e
70-534 Architecting Microsoft Azure Solutions
You can use the National Institute of Standards and Technology (NIST) Computer Security Incident Handling Guide as a
reference to assist you building your own.
You can use Security Center Alerts during the following stages:
Detect: identify a suspicious activity in one or more resources.
Assess: perform the initial assessment to obtain more information about the suspicious activity.
Diagnose: use the remediation steps to conduct the technical procedure to address the issue.
Each Security Alert provides information that can be used to better understand the nature of the attack and suggest
possible mitigations. Some alerts also provide links to either more information or to other sources of information
within Azure. You can use the information provided for further research and to begin mitigation, and you can also
search security-related data that is stored in your workspace.
The following example shows a suspicious RDP activity taking place:
As you can see, this blade shows details regarding the time that the attack took place, the source hostname, the
target VM and also gives recommendation steps. In some circumstances the source information of the attack may be
empty. Read Missing Source Information in Azure Security Center Alerts for more information about this type of
behavior.
In the How to Leverage the Azure Security Center & Microsoft Operations Management Suite for an Incident
Response video you can see some demonstrations that can help you to understand how Security Center can be used
in each one of those stages.
147 | P a g e
70-534 Architecting Microsoft Azure Solutions
148 | P a g e
70-534 Architecting Microsoft Azure Solutions
File Storage offers shared storage for legacy applications using the standard SMB protocol. Azure virtual
machines and cloud services can share file data across application components via mounted shares, and on-
premises applications can access file data in a share via the File service REST API.
An Azure storage account is a secure account that gives you access to services in Azure Storage. Your storage account
provides the unique namespace for your storage resources. The image below shows the relationships between the
Azure storage resources in a storage account:
1
There are two types of storage accounts:
General-purpose Storage Accounts
A general-purpose storage account gives you access to Azure Storage services such as Tables, Queues, Files, Blobs and
Azure virtual machine disks under a single account. This type of storage account has two performance tiers:
A standard storage performance tier which allows you to store Tables, Queues, Files, Blobs and Azure virtual
machine disks.
A premium storage performance tier which currently only supports Azure virtual machine disks. See Premium
Storage: High-Performance Storage for Azure Virtual Machine Workloads for an in-depth overview of
Premium storage.
149 | P a g e
70-534 Architecting Microsoft Azure Solutions
To learn how to create a storage account, see Create a storage account for more details. You can create up to 200
uniquely named storage accounts with a single subscription. See Azure Storage Scalability and Performance
Targets for details about storage account limits.
Storage Service Versions
The Azure Storage services are regularly updated with support for new features. The Azure Storage services REST API
reference describes each supported version and its features. We recommend that you use the latest version
whenever possible. For information on the latest version of the Azure Storage services, as well as information on
previous versions, see Versioning for the Azure Storage Services.
Blob storage
For users with large amounts of unstructured object data to store in the cloud, Blob storage offers a cost-effective and
scalable solution. You can use Blob storage to store content such as:
Documents
Social data such as photos, videos, music, and blogs
Backups of files, computers, databases, and devices
Images and text for web applications
Configuration data for cloud applications
Big data, such as logs and other large datasets
Every blob is organized into a container. Containers also provide a useful way to assign security policies to groups of
objects. A storage account can contain any number of containers, and a container can contain any number of blobs,
up to the 500 TB capacity limit of the storage account.1
Blob storage offers three types of blobs, block blobs, append blobs, and page blobs (disks).1
Block blobs are optimized for streaming and storing cloud objects, and are a good choice for storing
documents, media files, backups etc.
Append blobs are similar to block blobs, but are optimized for append operations. An append blob can be
updated only by adding a new block to the end. Append blobs are a good choice for scenarios such as logging,
where new data needs to be written only to the end of the blob.
Page blobs are optimized for representing IaaS disks and supporting random writes, and may be up to 1 TB in
size. An Azure virtual machine network attached IaaS disk is a VHD stored as a page blob.
For very large datasets where network constraints make uploading or downloading data to Blob storage over the wire
unrealistic, you can ship a hard drive to Microsoft to import or export data directly from the data center. See Use the
Microsoft Azure Import/Export Service to Transfer Data to Blob Storage.
Table storage
Tip: The content in this article applies to the original basic Azure Table storage. However, there is now a premium
offering for Azure Table storage in public preview that offers throughput-optimized tables, global distribution, and
automatic secondary indexes. To learn more and try out the new premium experience, please check out Azure
Cosmos DB: Table API.
Modern applications often demand data stores with greater scalability and flexibility than previous generations of
software required. Table storage offers highly available, massively scalable storage, so that your application can
automatically scale to meet user demand. Table storage is Microsoft's NoSQL key/attribute store it has a schemaless
design, making it different from traditional relational databases. With a schemaless data store, it's easy to adapt your
data as the needs of your application evolve. Table storage is easy to use, so developers can create applications
quickly. Access to data is fast and cost-effective for all kinds of applications. Table storage is typically significantly
lower in cost than traditional SQL for similar volumes of data.1
Table storage is a key-attribute store, meaning that every value in a table is stored with a typed property name. The
property name can be used for filtering and specifying selection criteria. A collection of properties and their values
comprise an entity. Since Table storage is schemaless, two entities in the same table can contain different collections
of properties, and those properties can be of different types.
You can use Table storage to store flexible datasets, such as user data for web applications, address books, device
information, and any other type of metadata that your service requires. You can store any number of entities in a
table, and a storage account may contain any number of tables, up to the capacity limit of the storage account.
Like Blobs and Queues, developers can manage and access Table storage using standard REST protocols, however
Table storage also supports a subset of the OData protocol, simplifying advanced querying capabilities and enabling
both JSON and AtomPub (XML based) formats.
150 | P a g e
70-534 Architecting Microsoft Azure Solutions
For today's Internet-based applications, NoSQL databases like Table storage offer a popular alternative to traditional
relational databases.
Queue storage
In designing applications for scale, application components are often decoupled, so that they can scale independently.
Queue storage provides a reliable messaging solution for asynchronous communication between application
components, whether they are running in the cloud, on the desktop, on an on-premises server, or on a mobile device.
Queue storage also supports managing asynchronous tasks and building process workflows.1
A storage account can contain any number of queues. A queue can contain any number of messages, up to the
capacity limit of the storage account. Individual messages may be up to 64 KB in size.
File storage
Azure File storage offers cloud-based SMB file shares, so that you can migrate legacy applications that rely on file
shares to Azure quickly and without costly rewrites. With Azure File storage, applications running in Azure virtual
machines or cloud services can mount a file share in the cloud, just as a desktop application mounts a typical SMB
share. Any number of application components can then mount and access the File storage share simultaneously.
Since a File storage share is a standard SMB file share, applications running in Azure can access data in the share via
file system I/O APIs. Developers can therefore leverage their existing code and skills to migrate existing applications. IT
Pros can use PowerShell cmdlets to create, mount, and manage File storage shares as part of the administration of
Azure applications.
Like the other Azure storage services, File storage exposes a REST API for accessing data in a share. On-premises
applications can call the File storage REST API to access data in a file share. This way, an enterprise can choose to
migrate some legacy applications to Azure and continue running others from within their own organization. Note that
mounting a file share is only possible for applications running in Azure; an on-premises application may only access
the file share via the REST API.
Distributed applications can also use File storage to store and share useful application data and development and
testing tools. For example, an application may store configuration files and diagnostic data such as logs, metrics, and
crash dumps in a File storage share so that they are available to multiple virtual machines or roles. Developers and
administrators can store utilities that they need to build or manage an application in a File storage share that is
available to all components, rather than installing them on every virtual machine or role instance.
Access to Blob, Table, Queue, and File resources
By default, only the storage account owner can access resources in the storage account. For the security of your data,
every request made against resources in your account must be authenticated. Authentication relies on a Shared Key
model. Blobs can also be configured to support anonymous authentication.
Your storage account is assigned two private access keys on creation that are used for authentication. Having two keys
ensures that your application remains available when you regularly regenerate the keys as a common security key
management practice.
If you do need to allow users controlled access to your storage resources, then you can create a shared access
signature. A shared access signature (SAS) is a token that can be appended to a URL that enables delegated access to a
storage resource. Anyone who possesses the token can access the resource it points to with the permissions it
specifies, for the period of time that it is valid. Beginning with version 2015-04-05, Azure Storage supports two kinds
of shared access signatures: service SAS and account SAS.
The service SAS delegates access to a resource in just one of the storage services: the Blob, Queue, Table, or File
service.
An account SAS delegates access to resources in one or more of the storage services. You can delegate access to
service-level operations that are not available with a service SAS. You can also delegate access to read, write, and
delete operations on blob containers, tables, queues, and file shares that are not permitted with a service SAS.
Finally, you can specify that a container and its blobs, or a specific blob, are available for public access. When you
indicate that a container or blob is public, anyone can read it anonymously; no authentication is required. Public
containers and blobs are useful for exposing resources such as media and documents that are hosted on websites. To
decrease network latency for a global audience, you can cache blob data used by websites with the Azure CDN.
151 | P a g e
70-534 Architecting Microsoft Azure Solutions
See Using Shared Access Signatures (SAS) for more information on shared access signatures. See Manage anonymous
read access to containers and blobs and Authentication for the Azure Storage Services for more information on secure
access to your storage account.
Replication for durability and high availability
The data in your Microsoft Azure storage account is always replicated to ensure durability and high availability.
Replication copies your data, either within the same data center, or to a second data center, depending on which
replication option you choose. Replication protects your data and preserves your application up-time in the event of
transient hardware failures. If your data is replicated to a second data center, that also protects your data against a
catastrophic failure in the primary location.
Replication ensures that your storage account meets the Service-Level Agreement (SLA) for Storage even in the face of
failures. See the SLA for information about Azure Storage guarantees for durability and availability.
When you create a storage account, you can select one of the following replication options:
Locally redundant storage (LRS). Locally redundant storage maintains three copies of your data. LRS is
replicated three times within a single data center in a single region. LRS protects your data from normal
hardware failures, but not from the failure of a single data center.
LRS is offered at a discount. For maximum durability, we recommend that you use geo-redundant storage, described
below.
Zone-redundant storage (ZRS). Zone-redundant storage maintains three copies of your data. ZRS is replicated
three times across two to three facilities, either within a single region or across two regions, providing higher
durability than LRS. ZRS ensures that your data is durable within a single region.
ZRS provides a higher level of durability than LRS; however, for maximum durability, we recommend that you use geo-
redundant storage, described below.
Note
ZRS is currently available only for block blobs, and is only supported for versions 2014-02-14 and later.
Once you have created your storage account and selected ZRS, you cannot convert it to use to any other type of
replication, or vice versa.
Geo-redundant storage (GRS). GRS maintains six copies of your data. With GRS, your data is replicated three
times within the primary region, and is also replicated three times in a secondary region hundreds of miles
away from the primary region, providing the highest level of durability. In the event of a failure at the primary
region, Azure Storage will failover to the secondary region. GRS ensures that your data is durable in two
separate regions.
For information about primary and secondary pairings by region, see Azure Regions.
Read-access geo-redundant storage (RA-GRS). Read-access geo-redundant storage replicates your data to a
secondary geographic location, and also provides read access to your data in the secondary location. Read-
access geo-redundant storage allows you to access your data from either the primary or the secondary
location, in the event that one location becomes unavailable. Read-access geo-redundant storage is the
default option for your storage account by default when you create it.
Important: You can change how your data is replicated after your storage account has been created, unless you
specified ZRS when you created the account. However, note that you may incur an additional one-time data transfer
cost if you switch from LRS to GRS or RA-GRS.
See Azure Storage replication for additional details about storage replication options.
For pricing information for storage account replication, see Azure Storage Pricing. See Azure Regions for more
information about what services are available in each region.
For architectural details about durability with Azure Storage, see SOSP Paper - Azure Storage: A Highly Available Cloud
Storage Service with Strong Consistency.
Transferring data to and from Azure Storage
You can use the AzCopy command-line utility to copy blob, file, and table data within your storage account or across
storage accounts. See Transfer data with the AzCopy Command-Line Utility for more information.
AzCopy is built on top of the Azure Data Movement Library, which is currently available in preview.
The Azure Import/Export service provides a way to import blob data into or export blob data from your storage
account via a hard drive disk mailed to the Azure data center. For more information about the Import/Export service,
see Use the Microsoft Azure Import/Export Service to Transfer Data to Blob Storage.
152 | P a g e
70-534 Architecting Microsoft Azure Solutions
Azure Cosmos DB
Azure Cosmos DB is Microsoft's globally distributed, multi-model database. With the click of a button, Azure Cosmos
DB enables you to elastically and independently scale throughput and storage across any number of Azure's
geographic regions. It offers throughput, latency, availability, and consistency guarantees with comprehensive service
level agreements (SLAs), something no other database service can offer.
Azure Cosmos DB contains a write optimized, resource governed, schema-agnostic database engine that natively
supports multiple data models: key-value, documents, graphs, and columnar. It also supports many APIs for accessing
data including MongoDB, DocumentDB SQL, Gremlin(preview), and Azure Tables (preview), in an extensible manner.
Azure Cosmos DB started in late 2010 to address developer pain-points that are faced by large scale applications
inside Microsoft. Since building globally distributed applications is not a problem unique to just to Microsoft, we made
the service available externally to all Azure Developers in the form of Azure DocumentDB. Azure Cosmos DB is the
next big leap in the evolution of DocumentDB and we are now making it available for you to use. As a part of this
release of Azure Cosmos DB, DocumentDB customers (with their data) are automatically Azure Cosmos DB customers.
The transition is seamless and they now have access to a broader range of new capabilities offered by Azure Cosmos
DB.
Capability comparison
Azure Cosmos DB provides the best capabilities of relational and non-relational databases.
Relational Non-relational
Capabilities DBs (NoSQL) DBs Azure Cosmos DB
153 | P a g e
70-534 Architecting Microsoft Azure Solutions
Relational Non-relational
Capabilities DBs (NoSQL) DBs Azure Cosmos DB
Data model + Relational + Multi-model + OSS Multi-model + SQL + OSS API (more coming
API SQL API soon)
Key capabilities
As a globally distributed database service, Azure Cosmos DB provides the following capabilities to help you build
scalable, globally distributed, highly responsive applications:
Turnkey global distribution
o Your application is instantly available to your users, everywhere. Now your data can be too.
o Don't worry about hardware, adding nodes, VMs or cores. Just point and click, and your data is there.
Multiple data models and popular APIs for accessing and querying data
o Support for multiple data models including key-value, document, graph, and columnar.
o Extensible APIs for Node.js, Java, .NET, .NET Core, Python, and MongoDB.
o SQL and Gremlin for queries.
Elastically scale throughput and storage on demand, worldwide
o Easily scale throughput at second and minute granularities, and change it anytime you want.
o Scale storage transparently and automatically to cover your size requirements now and forever.
Build highly responsive and mission-critical applications
o Get access to your data with single digit millisecond latencies at the 99th percentile, anywhere in the
world.
Ensure "always on" availability
o 99.99% availability within a single region.
o Deploy to any number of Azure regions for higher availability.
o Simulate a failure of one or more regions with zero-data loss guarantees.
Write globally distributed applications, the right way
o Five consistency models models offer strong SQL-like consistency to NoSQL-like eventual consistency,
and every thing in between.
Money back guarantees
o Your data gets there fast, or your money back.
o Service level agreements for availability, latency, throughput, and consistency.
No database schema/index management
o Stop worrying about keeping your database schema and indexes in-sync with your applications
schema. We're schema-free.
Low cost of ownership
o Five to ten times more cost effective than a non-managed solution.
o Three times cheaper than DynamoDB.
Global distribution
Azure Cosmos DB containers are distributed along two dimensions:
1. Within a given region, all resources are horizontally partitioned using resource partitions (local distribution).
2. Each resource partition is also replicated across geographical regions (global distribution).
154 | P a g e
70-534 Architecting Microsoft Azure Solutions
When your storage and throughput needs to be scaled, Cosmos DB transparently performs partition management
operations across all the regions. Independent of the scale, distribution, or failures, Cosmos DB continues to provide a
single system image of the globally distributed resources.
Global distribution of resources in Cosmos DB is turn-key. At any time with a few button clicks (or programmatically
with a single API call), you can associate any number of geographical regions with your database account.
Regardless of the amount of data or the number of regions, Cosmos DB guarantees each newly associated region to
start processing client requests under an hour at the 99th percentile. This is done by parallelizing the seeding and
copying data from all the source resource partitions to the newly associated region. Customers can also remove an
existing region or take a region that was previously associated with their database account offline.
Multi-model, multi-API support
Azure Cosmos DB natively supports multiple data models including documents, key-value, graph, and column-family.
The core content-model of Cosmos DBs database engine is based on atom-record-sequence (ARS). Atoms consist of a
small set of primitive types like string, bool, and number. Records are structs composed of these types. Sequences are
arrays consisting of atoms, records, or sequences.
155 | P a g e
70-534 Architecting Microsoft Azure Solutions
The database engine can efficiently translate and project different data models onto the ARS-based data model. The
core data model of Cosmos DB is natively accessible from dynamically typed programming languages and can be
exposed as-is as JSON.
The service also supports popular database APIs for data access and querying. Cosmos DBs database engine currently
supports DocumentDB SQL, MongoDB, Azure Tables (preview), and Gremlin (preview). You can continue to build
applications using popular OSS APIs and get all the benefits of a battle-tested and fully managed, globally distributed
database service.
Horizontal scaling of storage and throughput
All the data within a Cosmos DB container (for example, a document collection, table, or graph) is horizontally
partitioned and transparently managed by resource partitions. A resource partition is a consistent and highly available
container of data partitioned by a customer specified partition-key. It provides a single system image for a set of
resources it manages and is a fundamental unit of scalability and distribution. Cosmos DB is designed to let you
elastically scale throughput based on the application traffic patterns across different geographical regions to support
fluctuating workloads varying both by geography and time. The service manages the partitions transparently without
compromising the availability, consistency, latency, or throughput of a Cosmos DB container.
You can elastically scale throughput of an Azure Cosmos DB container by programmatically provisioning throughput
using request units per second (RU/s). Internally, the service transparently manages resource partitions to deliver the
throughput on a given container. Cosmos DB ensures that the throughput is available for use across all the regions
associated with the container. The new throughput is effective within five seconds of the change in the configured
throughput value.
You can provision throughput on a Cosmos DB container at both, per-second and at per-minute (RU/m) granularities.
The provisioned throughput at per-minute granularity is used to manage unexpected spikes in the workload occurring
at a per-second granularity.
Low latency guarantees at the 99th percentile
As part of its SLAs, Cosmos DB guarantees end-to-end low latency at the 99th percentile to its customers. For a typical
1-KB item, Cosmos DB guarantees end-to-end latency of reads under 10 ms and indexed writes under 15 ms at the
99th percentile, within the same Azure region. The median latencies are significantly lower (under 5 ms). With an
upper bound of request processing on every database transaction, Cosmos DB allows clients to clearly distinguish
between transactions with high latency vs. a database being unavailable.
Transparent multi-homing and 99.99% high availability
156 | P a g e
70-534 Architecting Microsoft Azure Solutions
You can dynamically associate "priorities" to the regions associated with your Azure Cosmos DB database account.
Priorities are used to direct the requests to specific regions in the event of regional failures. In an unlikely event of a
regional disaster, Cosmos DB automatically failovers in the order of priority.
To test the end-to-end availability of the application, you can manually trigger failover (rate limited to two operations
within an hour). Cosmos DB guarantees zero data loss during manual regional failovers. In case a regional disaster
occurs, Cosmos DB guarantees an upper-bound on data loss during the system-initiated automatic failover. You do not
have to redeploy your application after a regional failover, and availability SLAs are maintained by Azure Cosmos DB.
For this scenario, Cosmos DB allows you to interact with resources using either logical (region-agnostic) or physical
(region-specific) endpoints. The former ensures that the application can transparently be multi-homed in case of
failover. The latter provides fine-grained control to the application to redirect reads and writes to specific regions.
Cosmos DB guarantees 99.99% availability SLA for every database account. The availability guarantees are agnostic of
the scale (provisioned throughput and storage), number of regions, or geographical distance between regions
associated with a given database.
Multiple, well-defined consistency models
Commercial distributed databases fall into two categories: databases that do not offer well-defined, provable
consistency choices at all, and databases which offer two extreme programmability choices (strong vs. eventual
consistency). The former burdens application developers with minutia of their replication protocols and expects them
to make difficult tradeoffs between consistency, availability, latency, and throughput. The latter puts a pressure to
choose one of the two extremes. Despite the abundance of research and proposals for more than 50 consistency
models, the distributed database community has not been able to commercialize consistency levels beyond strong
and eventual consistency.
Cosmos DB allows you to choose between five well-defined consistency models along the consistency spectrum
strong, bounded staleness, session, consistent prefix, and eventual.
The following table illustrates the specific guarantees each consistency level provides.
Consistency Levels and guarantees
Strong Linearizability
Consistent Prefix Updates returned are some prefix of all the updates, with no gaps
157 | P a g e
70-534 Architecting Microsoft Azure Solutions
You can configure the default consistency level on your Cosmos DB account (and later override the consistency on a
specific read request). Internally, the default consistency level applies to data within the partition sets which may be
span regions.
Guaranteed service level agreements
Cosmos DB is the first managed database service to offer 99.99% SLA guarantees for availability, throughput, low
latency, and consistency.
Availability: 99.99% uptime availability SLA for each of the data and control plane operations.
Throughput: 99.99% of requests complete successfully
Latency: 99.99% of <10 ms latencies at the 99th percentile
Consistency: 100% of read requests will meet the consistency guarantee for the consistency level requested
by you.
Schema-free
Both relational and NoSQL databases force you to deal with schema & index management, versioning and migration
all of this is extremely challenging in a globally distributed setup. But dont worry -- Cosmos DB makes this problem go
away! With Cosmos DB, you do not have to manage schemas and indexes, deal with schema versioning or worry
about application downtime while migrating schemas. Cosmos DBs database engine is fully schema-agnostic it
automatically indexes all the data it ingests without requiring any schema or indexes and serves blazing fast queries.
Low cost of ownership
When all total cost of ownership (TCO) considerations taken into account, managed cloud services like Azure Cosmos
DB can be five to ten times more cost effective than their OSS counter-parts running on-premises or virtual machines.
And Azure Cosmos DB is up to two to three times cheaper than DynamoDB for high volume workloads. Learn more in
the TCO whitepaper.
158 | P a g e
70-534 Architecting Microsoft Azure Solutions
Max number of blob containers, blobs, file Only limit is the 500 TB storage account capacity
shares, tables, queues, entities, or messages
per storage account
Max number of files in a file share Only limit is the 5 TB total capacity of the file share
Max number of files in a file share Only limit is the 5 TB total capacity of the file share
Max number of blob containers, blobs, file Only limit is the 500 TB storage account capacity
shares, tables, queues, entities, or messages
per storage account
159 | P a g e
70-534 Architecting Microsoft Azure Solutions
Maximum Request Rate per storage account Blobs: 20,000 requests per second for blobs of any valid
size (capped only by the account's ingress/egress
limits)
Files: 1000 IOPS (8 KB in size) per file share
Queues: 20,000 messages per second (assuming 1 KB
message size)
Tables: 20,000 transactions per second (assuming 1 KB
entity size)
Target throughput for single blob Up to 60 MB per second, or up to 500 requests per
second
Target throughput for single table partition (1 Up to 2000 entities per second
KB entities)
Max ingress2 per storage account (US Regions) 10 Gbps if GRS/ZRS3 enabled, 20 Gbps for LRS
Max egress2 per storage account (US Regions) 20 Gbps if RA-GRS/GRS/ZRS3 enabled, 30 Gbps for LRS
Max ingress2 per storage account (Non-US 5 Gbps if GRS/ZRS3 enabled, 10 Gbps for LRS
regions)
Max egress2 per storage account (Non-US 10 Gbps if RA-GRS/GRS/ZRS3 enabled, 15 Gbps for LRS
regions)
1
This includes both Standard and Premium storage accounts. If you require more than 200 storage accounts, make a
request through Azure Support. The Azure Storage team will review your business case and may approve up to 250
storage accounts.
2
Ingress refers to all data (requests) being sent to a storage account. Egress refers to all data (responses) being
received from a storage account.
3
Azure Storage replication options include:
RA-GRS: Read-access geo-redundant storage. If RA-GRS is enabled, egress targets for the secondary location
are identical to those for the primary location.
GRS: Geo-redundant storage.
ZRS: Zone-redundant storage. Available only for block blobs.
LRS: Locally redundant storage.
Scalability targets for virtual machine disks
An Azure virtual machine supports attaching a number of data disks. For optimal performance, you will want to limit
the number of highly utilized disks attached to the virtual machine to avoid possible throttling. If all disks are not being
highly utilized at the same time, the storage account can support a larger number disks.
For Azure Managed Disks: Managed Disks count limit is regional and also depends on the storage type. The
default and also the maximum limit is 10,000 per subscription, per region and per storage type. For example,
160 | P a g e
70-534 Architecting Microsoft Azure Solutions
you can create up to 10,000 standard managed disks and also 10,000 premium managed disks in a
subscription and in a region.
Managed Snapshots and Images are counted against the Managed Disks limit.
For standard storage accounts: A standard storage account has a maximum total request rate of 20,000 IOPS.
The total IOPS across all of your virtual machine disks in a standard storage account should not exceed this
limit.
You can roughly calculate the number of highly utilized disks supported by a single standard storage account based on
the request rate limit. For example, for a Basic Tier VM, the maximum number of highly utilized disks is about 66
(20,000/300 IOPS per disk), and for a Standard Tier VM, it is about 40 (20,000/500 IOPS per disk), as shown in the
table below.
For premium storage accounts: A premium storage account has a maximum total throughput rate of 50 Gbps.
The total throughput across all of your VM disks should not exceed this limit.
See Windows VM sizes or Linux VM sizes for additional details.
Managed virtual machine disks
Standard managed virtual machine disks
Standard
Disk Type S4 S6 S10 S20 S30 S40 S50
Throughput 60 60 60 60 60 60 60
per disk MB/sec MB/sec MB/sec MB/sec MB/sec MB/sec MB/sec
Premium
Disks Type P4 P6 P10 P20 P30 P40 P50
161 | P a g e
70-534 Architecting Microsoft Azure Solutions
Disk size 128 GiB 512 GiB 1024 GiB (1 2048 GiB (2 4095 GiB (4
TB) TB) TB)
Max throughput per disk 100 150 200 MB/s 250 MB/s 250 MB/s
MB/s MB/s
162 | P a g e
70-534 Architecting Microsoft Azure Solutions
163 | P a g e
70-534 Architecting Microsoft Azure Solutions
164 | P a g e
70-534 Architecting Microsoft Azure Solutions
In the New Windows Azure Cloud Service dialog, double-click Worker Role. Leave the default name ("WorkerRole1").
This step adds a worker role to the solution. Click OK.
165 | P a g e
70-534 Architecting Microsoft Azure Solutions
In general, an Azure application can contain multiple roles, although this tutorial uses a single role.
166 | P a g e
70-534 Architecting Microsoft Azure Solutions
Replace all of the boilerplate code in this file with the following:
using Owin;
using System.Web.Http;
namespace WorkerRole1
{
class Startup
{
public void Configuration(IAppBuilder app)
{
HttpConfiguration config = new HttpConfiguration();
config.Routes.MapHttpRoute(
"Default",
"{controller}/{id}",
new { id = RouteParameter.Optional });
app.UseWebApi(config);
}
}
}
using System;
using System.Net.Http;
using System.Web.Http;
167 | P a g e
70-534 Architecting Microsoft Azure Solutions
namespace WorkerRole1
{
public class TestController : ApiController
{
public HttpResponseMessage Get()
{
return new HttpResponseMessage()
{
Content = new StringContent("Hello from OWIN!")
};
}
For simplicity, this controller just defines two GET methods that return plain text.
Start the OWIN Host
Open the WorkerRole.cs file. This class defines the code that runs when the worker role is started and stopped.
Add the following using statement:
using Microsoft.Owin.Hosting;
// ....
}
In the OnStart method, add the following code to start the host:
public override bool OnStart()
{
ServicePointManager.DefaultConnectionLimit = 12;
// New code:
var endpoint = RoleEnvironment.CurrentRoleInstance.InstanceEndpoints["Endpoint1"];
string baseUri = String.Format("{0}://{1}",
endpoint.Protocol, endpoint.IPEndpoint);
The WebApp.Start method starts the OWIN host. The name of the Startup class is a type parameter to the method. By
convention, the host will call the Configure method of this class.
Override the OnStop to dispose of the _app instance:
public override void OnStop()
{
if (_app != null)
{
_app.Dispose();
}
base.OnStop();
}
168 | P a g e
70-534 Architecting Microsoft Azure Solutions
namespace WorkerRole1
{
public class WorkerRole : RoleEntryPoint
{
private IDisposable _app = null;
while (true)
{
Thread.Sleep(10000);
Trace.TraceInformation("Working", "Information");
}
}
169 | P a g e
70-534 Architecting Microsoft Azure Solutions
Find the IP address under Service Deployments, deployment [id], Service Details. Open a web browser and navigate to
http://address/test/1, where address is the IP address assigned by the compute emulator; for
example, http://127.0.0.1:80/test/1. You should see the response from the Web API controller:
Deploy to Azure
For this step, you must have an Azure account. If you don't already have one, you can create a free trial account in just
a couple of minutes. For details, see Microsoft Azure Free Trial.
In Solution Explorer, right-click the AzureApp project. Select Publish.
170 | P a g e
70-534 Architecting Microsoft Azure Solutions
If you are not signed in to your Azure account, click Sign In.
After you are signed in, choose a subscription and click Next.
171 | P a g e
70-534 Architecting Microsoft Azure Solutions
Enter a name for the cloud service and choose a region. Click Create.
Click Publish.
172 | P a g e
70-534 Architecting Microsoft Azure Solutions
The Azure Activity Log window shows the progress of the deployment. When the app is deployed, browse
to http://appname.cloudapp.net/test/1.
173 | P a g e
70-534 Architecting Microsoft Azure Solutions
3. Under Name, provide a name for the WebJob. The name must start with a letter or a number and cannot
contain any special characters other than "-" and "_".
4. In the How to Run box, choose Run on Demand.
174 | P a g e
70-534 Architecting Microsoft Azure Solutions
5. In the File Upload box, click the folder icon and browse to the zip file that contains your script. The zip file
should contain your executable (.exe .cmd .bat .sh .php .py .js) as well as any supporting files needed to run
the program or script.
6. Check Create to upload the script to your web app.
The name you specified for the WebJob appears in the list on the WebJobs blade.
7. To run the WebJob, right-click its name in the list and click Run.
175 | P a g e
70-534 Architecting Microsoft Azure Solutions
Note: when deploying a WebJob from Visual Studio, make sure to mark your settings.job file properties as 'Copy if
newer'.
Create a scheduled WebJob using the Azure Scheduler
The following alternate technique makes use of the Azure Scheduler. In this case, your WebJob does not have any
direct knowledge of the schedule. Instead, the Azure Scheduler gets configured to trigger your WebJob on a schedule.
The Azure Portal doesn't yet have the ability to create a scheduled WebJob, but until that feature is added you can do
it by using the classic portal.
1. In the classic portal go to the WebJob page and click Add.
2. In the How to Run box, choose Run on a schedule.
3. Choose the Scheduler Region for your job, and then click the arrow on the bottom right of the dialog to
proceed to the next screen.
4. In the Create Job dialog, choose the type of Recurrence you want: One-time job or Recurring job.
176 | P a g e
70-534 Architecting Microsoft Azure Solutions
6. If you want to start at a specific time, choose your starting time values under Starting On.
177 | P a g e
70-534 Architecting Microsoft Azure Solutions
178 | P a g e
70-534 Architecting Microsoft Azure Solutions
2. In the Summary part of the Resource group blade, click a resource that you want to scale. The following
screenshot shows a SQL Database resource and an Azure Storage resource.
3. For a SQL Database resource, click Settings > Pricing tier to scale the pricing tier.
You can also turn on geo-replication for your SQL Database instance.
For an Azure Storage resource, click Settings > Configuration to scale up your storage options.
179 | P a g e
70-534 Architecting Microsoft Azure Solutions
180 | P a g e
70-534 Architecting Microsoft Azure Solutions
181 | P a g e
70-534 Architecting Microsoft Azure Solutions
by dropdown:
If you previously had autoscale, on you'll see a view of the exact rules that you had.
To scale based on another metric click the Add Rule row. You can also click one of the existing rows to change
from the metric you previously had to the metric you want to scale
by.
Now you need to select which metric you want to scale by. When choosing a metric there are a couple things to
consider:
The resource the metric comes from. Typically, this will be the same as the resource you are scaling. However, if
you want to scale by the depth of a Storage queue, the resource is the queue that you want to scale by.
182 | P a g e
70-534 Architecting Microsoft Azure Solutions
1
With this additional rule, if your load exceeds 85% before a scale action, you will get two additional instances
instead of one.
Scale based on a schedule
By default, when you create a scale rule it will always apply. You can see that when you click on the profile header:
183 | P a g e
70-534 Architecting Microsoft Azure Solutions
However, you may want to have more agressive scaling during the day, or the week, than on the weekend. You
could even shut down your service entirely off working hours.
To do this, on the profile you have, select recurrence instead of always, and choose the times that you want the
profile to apply.
For example, to have a profile that applies during the week, in the Days dropdown uncheck Saturday and Sunday.
To have a profile that applies during the daytime, set the Start time to the time of day that you want to start at.
184 | P a g e
70-534 Architecting Microsoft Azure Solutions
Click OK.
Next, you will need to add the profile that you want to apply at other
times. Click the Add Profile row.
Name your new, second, profile, for example you could call it Off work.
185 | P a g e
70-534 Architecting Microsoft Azure Solutions
Then select recurrence again, and choose the instance count range you want during this time.
As with the Default profile, choose the Days you want this profile to apply to, and the Start time during the day.
Note
Autoscale will use the Daylight savings rules for whichever Time zone you select. However, during Daylight savings
time the UTC offset will show the base Time zone offset, not the Daylight savings UTC offset.
Click OK.
Now, you will need to add whatever rules you want to apply during your second profile. Click Add Rule, and then
you could construct the same rule you have during the Default profile.
Be sure to create both a rule for scale out and scale in, or else during the profile the instance count will only grow
(or decrease).
Finally, click Save.
186 | P a g e
70-534 Architecting Microsoft Azure Solutions
You can use Azure Traffic Manager to control how requests from web clients are distributed to web apps in Azure
App Service. When web app endpoints are added to a Azure Traffic Manager profile, Azure Traffic Manager keeps
track of the status of your web apps (running, stopped or deleted) so that it can decide which of those endpoints
should receive traffic.
Load Balancing Methods
Azure Traffic Manager uses three different load balancing methods. These are described in the following list as
they pertain to Azure web apps.
Failover: If you have web app clones in different regions, you can use this method to configure one web app to
service all web client traffic, and configure another web app in a different region to service that traffic in case the
first web app becomes unavailable.
Round Robin: If you have web app clones in different regions, you can use this method to distribute traffic equally
across the web apps in different regions.
Performance: The Performance method distributes traffic based on the shortest round trip time to clients. The
Performance method can be used for web apps within the same region or in different regions.
Web Apps and Traffic Manager Profiles
To configure the control of web app traffic, you create a profile in Azure Traffic Manager that uses one of the
three load balancing methods described previously, and then add the endpoints (in this case, web apps) for which
you want to control traffic to the profile. Your web app status (running, stopped or deleted) is regularly
communicated to the profile so that Azure Traffic Manager can direct traffic accordingly.
When using Azure Traffic Manager with Azure, keep in mind the following points:
For web app only deployments within the same region, Web Apps already provides failover and round-robin
functionality without regard to web app mode.
For deployments in the same region that use Web Apps in conjunction with another Azure cloud service, you can
combine both types of endpoints to enable hybrid scenarios.
You can only specify one web app endpoint per region in a profile. When you select a web app as an endpoint for
one region, the remaining web apps in that region become unavailable for selection for that profile.
The web app endpoints that you specify in a Azure Traffic Manager profile will appear under the Domain
Names section on the Configure page for the web app in the profile, but will not be configurable there.
After you add a web app to a profile, the Site URL on the Dashboard of the web app's portal page will display the
custom domain URL of the web app if you have set one up. Otherwise, it will display the Traffic Manager profile
URL (for example, contoso.trafficmgr.com). Both the direct domain name of the web app and the Traffic Manager
URL will be visible on the web app's Configure page under the Domain Names section.
Your custom domain names will work as expected, but in addition to adding them to your web apps, you must
also configure your DNS map to point to the Traffic Manager URL. For information on how to set up a custom
domain for a Azure web app, see Configuring a custom domain name for an Azure web site.
You can only add web apps that are in standard mode to a Azure Traffic Manager profile.
187 | P a g e
70-534 Architecting Microsoft Azure Solutions
For example, if your plan is configured to use two "small" instances in the standard service tier, all apps that are
associated with that plan run on both instances. Apps also have access to the standard service tier features. Plan
instances on which apps are running are fully managed and highly available.
Important
The SKU and Scale of the App Service plan determines the cost and not the number of apps hosted in it.
This article explores the key characteristics, such as tier and scale, of an App Service plan and how they come into play
while managing your apps.
Apps and App Service plans
An app in App Service can be associated with only one App Service plan at any given time.
Both apps and plans are contained in a resource group. A resource group serves as the lifecycle boundary for every
resource that's within it. You can use resource groups to manage all the pieces of an application together.
Because a single resource group can have multiple App Service plans, you can allocate different apps to different
physical resources.
For example, you can separate resources among dev, test, and production environments. Having separate
environments for production and dev/test lets you isolate resources. In this way, load testing against a new version of
your apps does not compete for the same resources as your production apps, which are serving real customers.
When you have multiple plans in a single resource group, you can also define an application that spans geographical
regions.
For example, a highly available app running in two regions includes at least two plans, one for each region, and one
app associated with each plan. In such a situation, all the copies of the app are then contained in a single resource
group. Having a resource group with multiple plans and multiple apps makes it easy to manage, control, and view the
health of the application.
Create an App Service plan or use existing one
When you create an app, you should consider creating a resource group. On the other hand, if this app is a
component for a larger application, create it within the resource group that's allocated for that larger application.
Whether the app is an altogether new application or part of a larger one, you can choose to use an existing plan to
host it or create a new one. This decision is more a question of capacity and expected load.
We recommend isolating your app into a new App Service plan when:1
App is resource-intensive.
App has different scaling factors from the other apps hosted in an existing plan.
App needs resource in a different geographical region.
This way you can allocate a new set of resources for your app and gain greater control of your apps.
Create an App Service plan
Tip
If you have an App Service Environment, you can review the documentation specific to App Service Environments
here: Create an App Service plan in an App Service Environment
You can create an empty App Service plan from the App Service plan browse experience or as part of app creation.
188 | P a g e
70-534 Architecting Microsoft Azure Solutions
In the Azure portal, click New > Web + mobile, and then select Web App or other App Service app
kind.
You can then select or create the App Service plan for the new app.
To create an App Service plan, click [+] Create New, type the App Service plan name, and then select an
appropriate Location. Click Pricing tier, and then select an appropriate pricing tier for the service. Select View all to
view more pricing options, such as Free and Shared. After you have selected the pricing tier, click the Select button.
Move an app to a different App Service plan
You can move an app to a different App Service plan in the Azure portal. You can move apps between plans as long as
the plans are in the same resource group and geographical region.
To move an app to another plan:
Navigate to the app that you want to move.
In the Menu, look for the App Service Plan section.
189 | P a g e
70-534 Architecting Microsoft Azure Solutions
Each plan has its own pricing tier. For example, moving a site from a Free tier to a Standard tier, enables all apps
assigned to it to use the features and resources of the Standard tier.
Clone an app to a different App Service plan
If you want to move the app to a different region, one alternative is app cloning. Cloning makes a copy of your app in a
new or existing App Service plan in any region.
You can find Clone App in the Development Tools section of the menu.
Important
Cloning has some limitations that you can read about at Azure App Service App cloning using Azure portal.
Scale an App Service plan
There are three ways to scale a plan:
Change the plans pricing tier. A plan in the Basic tier can be converted to Standard, and all apps assigned to it
to use the features of the Standard tier.
Change the plans instance size. As an example, a plan in the Basic tier that uses small instances can be
changed to use large instances. All apps that are associated with that plan now can use the additional memory
and CPU resources that the larger instance size offers.
Change the plans instance count. For example, a Standard plan that's scaled out to three instances can be
scaled to 10 instances. A Premium plan can be scaled out to 20 instances (subject to availability). All apps that
are associated with that plan now can use the additional memory and CPU resources that the larger instance
count offers.
You can change the pricing tier and instance size by clicking the Scale Up item under settings for either the app or the
App Service plan. Changes apply to the App Service plan and affect all apps that it hosts.
190 | P a g e
70-534 Architecting Microsoft Azure Solutions
191 | P a g e
70-534 Architecting Microsoft Azure Solutions
Linked resource management is not supported for non-production slots. In the Azure Portal only, you can
avoid this potential impact on a production slot by temporarily moving the non-production slot to a different
App Service plan mode. Note that the non-production slot must once again share the same mode with the
production slot before you can swap the two slots.
Add a deployment slot
The app must be running in the Standard or Premium mode in order for you to enable multiple deployment slots.
1. In the Azure Portal, open your app's resource blade.
2. Choose the Deployment slots option, then click Add Slot.
Note:If the app is not already in the Standard or Premium mode, you will receive a message indicating the supported
modes for enabling staged publishing. At this point, you have the option to select Upgrade and navigate to
the Scale tab of your app before continuing.
3. In the Add a slot blade, give the slot a name, and select whether to clone app configuration from another
existing deployment slot. Click the check mark to continue.
192 | P a g e
70-534 Architecting Microsoft Azure Solutions
The first time you add a slot, you will only have two choices: clone configuration from the default slot in production or
not at all. After you have created several slots, you will be able to clone configuration from a slot other than the one in
production:
193 | P a g e
70-534 Architecting Microsoft Azure Solutions
4. In your app's resource blade, click Deployment slots, then click a deployment slot to open that slot's resource
blade, with a set of metrics and configuration just like any other app. The name of the slot is shown at the top
of the blade to remind you that you are viewing the deployment slot.
5. Click the app URL in the slot's blade. Notice the deployment slot has its own hostname and is also a live app.
To limit public access to the deployment slot, see App Service Web App block web access to non-production
deployment slots.
There is no content after deployment slot creation. You can deploy to the slot from a different repository branch, or
an altogether different repository. You can also change the slot's configuration. Use the publish profile or deployment
credentials associated with the deployment slot for content updates. For example, you can publish to this slot with git.
Configuration for deployment slots
When you clone configuration from another deployment slot, the cloned configuration is editable. Furthermore, some
configuration elements will follow the content across a swap (not slot specific) while other configuration elements will
stay in the same slot after a swap (slot specific). The following lists show the configuration that will change when you
swap slots.
Settings that are swapped:
General settings - such as framework version, 32/64-bit, Web sockets
App settings (can be configured to stick to a slot)
Connection strings (can be configured to stick to a slot)
Handler mappings
Monitoring and diagnostic settings
WebJobs content
Settings that are not swapped:
Publishing endpoints
Custom Domain Names
SSL certificates and bindings
Scale settings
WebJobs schedulers
To configure an app setting or connection string to stick to a slot (not swapped), access the Application Settings blade
for a specific slot, then select the Slot Setting box for the configuration elements that should stick the slot. Note that
marking a configuration element as slot specific has the effect of establishing that element as not swappable across all
the deployment slots associated with the app.
194 | P a g e
70-534 Architecting Microsoft Azure Solutions
2. Make sure that the swap source and swap target are set properly. Usually, the swap target is the production
slot. Click OK to complete the operation. When the operation finishes, the deployment slots have been
swapped.
195 | P a g e
70-534 Architecting Microsoft Azure Solutions
For the Swap with preview swap type, see Swap with preview (multi-phase swap).
Swap with preview (multi-phase swap)
Swap with preview, or multi-phase swap, simplify validation of slot-specific configuration elements, such as
connection strings. For mission-critical workloads, you want to validate that the app behaves as expected when the
production slot's configuration is applied, and you must perform such validation before the app is swapped into
production. Swap with preview is what you need.
Note
Swap with preview is not supported in web apps on Linux.
When you use the Swap with preview option (see Swap deployment slots), App Service does the following:
Keeps the destination slot unchanged so existing workload on that slot (e.g. production) is not impacted.
Applies the configuration elements of the destination slot to the source slot, including the slot-specific
connection strings and app settings.
Restarts the worker processes on the source slot using these aforementioned configuration elements.
When you complete the swap: Moves the pre-warmed-up source slot into the destination slot. The
destination slot is moved into the source slot as in a manual swap.
When you cancel the swap: Reapplies the configuration elements of the source slot to the source slot.
You can preview exactly how the app will behave with the destination slot's configuration. Once you completes
validation, you complete the swap in a separate step. This step has the added advantage that the source slot is already
warmed up with the desired configuration, and clients will not experience any downtime.
Samples for the Azure PowerShell cmdlets available for multi-phase swap are included in the Azure PowerShell
cmdlets for deployment slots section.
Configure Auto Swap
Auto Swap streamlines DevOps scenarios where you want to continuously deploy your app with zero cold start and
zero downtime for end customers of the app. When a deployment slot is configured for Auto Swap into production,
every time you push your code update to that slot, App Service will automatically swap the app into production after it
has already warmed up in the slot.
Important
196 | P a g e
70-534 Architecting Microsoft Azure Solutions
When you enable Auto Swap for a slot, make sure the slot configuration is exactly the configuration intended for the
target slot (usually the production slot).
Note:Auto swap is not supported in web apps on Linux.
Configuring Auto Swap for a slot is easy. Follow the steps below:
1. In Deployment Slots, select a non-production slot, and choose Application Settings in that slot's resource
blade.
2. Select On for Auto Swap, select the desired target slot in Auto Swap Slot, and click Save in the command bar.
Make sure configuration for the slot is exactly the configuration intended for the target slot.
The Notifications tab will flash a green SUCCESS once the operation is complete.
Note: To test Auto Swap for your app, you can first select a non-production target slot in Auto Swap Slot to become
familiar with the feature.
3. Execute a code push to that deployment slot. Auto Swap will happen after a short time and the update will be
reflected at your target slot's URL.
To rollback a production app after swap
If any errors are identified in production after a slot swap, roll the slots back to their pre-swap states by swapping the
same two slots immediately.
Custom warm-up before swap
197 | P a g e
70-534 Architecting Microsoft Azure Solutions
Some apps may require custom warm-up actions. The applicationInitialization configuration element in web.config
allows you to specify custom initialization actions to be performed before a request is received. The swap operation
will wait for this custom warm-up to complete. Here is a sample web.config fragment.
<applicationInitialization>
<add initializationPage="/" hostName="[app hostname]" />
<add initializationPage="/Home/About" hostname="[app hostname]" />
</applicationInitialization>
Initiate a swap with review (multi-phase swap) and apply destination slot configuration to source slot
$ParametersObject = @{targetSlot = "[slot name e.g. production]"}
Invoke-AzureRmResourceAction -ResourceGroupName [resource group name] -ResourceType
Microsoft.Web/sites/slots -ResourceName [app name]/[slot name] -Action applySlotConfig -Parameters
$ParametersObject -ApiVersion 2015-07-01
Cancel a pending swap (swap with review) and restore source slot configuration
Invoke-AzureRmResourceAction -ResourceGroupName [resource group name] -ResourceType
Microsoft.Web/sites/slots -ResourceName [app name]/[slot name] -Action resetSlotConfig -ApiVersion 2015-
07-01
198 | P a g e
70-534 Architecting Microsoft Azure Solutions
199 | P a g e
70-534 Architecting Microsoft Azure Solutions
1. On the Settings blade of your app in the Azure Portal, click Backups to display the Backups blade. Then
click Restore.
200 | P a g e
70-534 Architecting Microsoft Azure Solutions
The App backup option shows you all the existing backups of the current app, and you can easily select one.
The Storage option lets you select any backup ZIP file from any existing Azure Storage account and container in your
subscription. If you're trying to restore a backup of another app, use the Storage option.
3. Then, specify the destination for the app restore in Restore destination.
Warning: If you choose Overwrite, all existing data in your current app is erased and overwritten. Before you click OK,
make sure that it is exactly what you want to do. You can select Existing App to restore the app backup to another app
in the same resoure group. Before you use this option, you should have already created another app in your resource
group with mirroring database configuration to the one defined in the app backup. You can also Create a New app to
restore your content to.
4. Click OK.
Download or delete a backup from a storage account
1. From the main Browse blade of the Azure portal, select Storage accounts. A list of your existing storage
accounts is displayed.
2. Select the storage account that contains the backup that you want to download or delete.The blade for the
storage account is displayed.
3. In the storage account blade, select the container you want
201 | P a g e
70-534 Architecting Microsoft Azure Solutions
202 | P a g e
70-534 Architecting Microsoft Azure Solutions
203 | P a g e
70-534 Architecting Microsoft Azure Solutions
Isolated / Dedicated Environments - App Service can be run in a fully isolated and dedicated enviroment for
securely running Azure App Service apps at high scale. This is ideal for application workloads requiring very
high scale, isolation or secure network access.
Discover more about App Service Environments.
Getting Started
To get started with Mobile Apps, follow the Get Started tutorial. This will cover the basics of producing a mobile
backend and client of your choice, then integrating authentication, offline sync and push notifications. You can follow
the Get Started tutorial several times - once for each client application.
204 | P a g e
70-534 Architecting Microsoft Azure Solutions
A local store is associated with the sync context using an initialize method such
as IMobileServicesSyncContext.InitializeAsync(localstore) in the .NET client SDK.
How offline synchronization works
When using sync tables, your client code controls when local changes are synchronized with an Azure Mobile App
backend. Nothing is sent to the backend until there is a call to push local changes. Similarly, the local store is
populated with new data only when there is a call to pull data.
Push: Push is an operation on the sync context and sends all CUD changes since the last push. Note that it is
not possible to send only an individual table's changes, because otherwise operations could be sent out of
order. Push executes a series of REST calls to your Azure Mobile App backend, which in turn modifies your
server database.
Pull: Pull is performed on a per-table basis and can be customized with a query to retrieve only a subset of the
server data. The Azure Mobile client SDKs then insert the resulting data into the local store.
Implicit Pushes: If a pull is executed against a table that has pending local updates, the pull first executes
a push() on the sync context. This push helps minimize conflicts between changes that are already queued
and new data from the server.
Incremental Sync: the first parameter to the pull operation is a query name that is used only on the client. If
you use a non-null query name, the Azure Mobile SDK performs an incremental sync. Each time a pull
operation returns a set of results, the latest updatedAttimestamp from that result set is stored in the SDK
local system tables. Subsequent pull operations retrieve only records after that timestamp.
To use incremental sync, your server must return meaningful updatedAt values and must also support sorting by this
field. However, since the SDK adds its own sort on the updatedAt field, you cannot use a pull query that has its
own orderBy clause.
The query name can be any string you choose, but it must be unique for each logical query in your app. Otherwise,
different pull operations could overwrite the same incremental sync timestamp and your queries can return incorrect
results.
If the query has a parameter, one way to create a unique query name is to incorporate the parameter value. For
instance, if you are filtering on userid, your query name could be as follows (in C#):
Copy
await todoTable.PullAsync("todoItems" + userid,
syncTable.Where(u => u.UserId == userid));
If you want to opt out of incremental sync, pass null as the query ID. In this case, all records are retrieved on every call
to PullAsync, which is potentially inefficient.
Purging: You can clear the contents of the local store using IMobileServiceSyncTable.PurgeAsync. Purging may
be necessary if you have stale data in the client database, or if you wish to discard all pending changes.
A purge clears a table from the local store. If there are operations awaiting synchronization with the server database,
the purge throws an exception unless the force purge parameter is set.
As an example of stale data on the client, suppose in the "todo list" example, Device1 only pulls items that are not
completed. A todoitem "Buy milk" is marked completed on the server by another device. However, Device1 still has
the "Buy milk" todoitem in local store because it is only pulling items that are not marked complete. A purge clears
this stale item.
Now you have connected a notification hub to your Mobile Apps back-end project. Later you will configure this
notification hub to connect to a platform notification system (PNS) to push to devices.
Register your app for push notifications
You need to submit your app to the Windows Store, then configure your server project to integrate with Windows
Notification Services (WNS) to send push.
1. In Visual Studio Solution Explorer, right-click the UWP app project, click Store > Associate App with the Store....
2. In the wizard, click Next, sign in with your Microsoft account, type a name for your app in Reserve a new app
name, then click Reserve.
3. After the app registration is successfully created, select the new app name, click Next, and then
click Associate. This adds the required Windows Store registration information to the application manifest.
206 | P a g e
70-534 Architecting Microsoft Azure Solutions
4. Navigate to the Windows Dev Center, sign-in with your Microsoft account, click the new app registration
in My apps, then expand Services > Push notifications.
5. In the Push notifications page, click Live Services site under Microsoft Azure Mobile Services.
6. In the registration page, make a note of the value under Application secrets and the Package SID, which you
will next use to configure your mobile app backend.
Important
The client secret and package SID are important security credentials. Do not share these values with anyone or
distribute them with your app. The Application Id is used with the secret to configure Microsoft Account
authentication.
Configure the backend to send push notifications
1. In the Azure portal, click Browse All > App Services, and click your Mobile Apps back end. Under Settings,
click App Service Push, and then click your notification hub name.
2. Go to Windows (WNS), enter the Security key (client secret) and Package SID that you obtained from the Live
Services site, and then click Save.
207 | P a g e
70-534 Architecting Microsoft Azure Solutions
Your back end is now configured to use WNS to send push notifications.
Update the server to send push notifications
Use the procedure below that matches your backend project typeeither .NET backend or Node.js backend.
.NET backend project
1. In Visual Studio, right-click the server project and click Manage NuGet Packages, search for
Microsoft.Azure.NotificationHubs, then click Install. This installs the Notification Hubs client library.
2. Expand Controllers, open TodoItemController.cs, and add the following using statements:
using System.Collections.Generic;
using Microsoft.Azure.NotificationHubs;
using Microsoft.Azure.Mobile.Server.Config;
3. In the PostTodoItem method, add the following code after the call to InsertAsync:
// Get the settings for the server project.
HttpConfiguration config = this.Configuration;
MobileAppSettingsDictionary settings =
this.Configuration.GetMobileAppSettingsProvider().GetMobileAppSettings();
208 | P a g e
70-534 Architecting Microsoft Azure Solutions
This code tells the notification hub to send a push notification after a new item is insertion.
4. Republish the server project.
Node.js backend project
1. If you haven't already done so, download the quickstart project or else use the online editor in the Azure
portal.
2. Replace the existing code in the todoitem.js file with the following:
var azureMobileApps = require('azure-mobile-apps'),
promises = require('azure-mobile-apps/src/utilities/promises'),
logger = require('azure-mobile-apps/src/logger');
table.insert(function (context) {
// For more information about the Notification Hubs JavaScript SDK,
// see http://aka.ms/nodejshubs
logger.info('Running TodoItem.insert');
// Define the WNS payload that contains the new item Text.
var payload = "<toast><visual><binding template=\ToastText01\><text id=\"1\">"
+ context.item.text + "</text></binding></visual></toast>";
module.exports = table;
This sends a WNS toast notification that contains the item.text when a new todo item is inserted.
3. When editing the file on your local computer, republish the server project.
Add push notifications to your app
Next, your app must register for push notifications on start-up. When you have already enabled authentication, make
sure that the user signs-in before trying to register for push notifications.
1. Open the App.xaml.cs project file and add the following using statements:
using System.Threading.Tasks;
using Windows.Networking.PushNotifications;
2. In the same file, add the following InitNotificationsAsync method definition to the App class:
private async Task InitNotificationsAsync()
{
// Get a channel URI from WNS.
var channel = await PushNotificationChannelManager
.CreatePushNotificationChannelForApplicationAsync();
3. At the top of the OnLaunched event handler in App.xaml.cs, add the async modifier to the method definition
and add the following call to the new InitNotificationsAsync method, as in the following example:
protected async override void OnLaunched(LaunchActivatedEventArgs e)
{
await InitNotificationsAsync();
// ...
}
This guarantees that the short-lived ChannelURI is registered each time the application is launched.
4. Rebuild your UWP app project. Your app is now ready to receive toast notifications.
Test push notifications in your app
1. Right-click the Windows Store project, click Set as StartUp Project, then press the F5 key to run the Windows
Store app.
After the app starts, the device is registered for push notifications.
2. Stop the Windows Store app and repeat the previous step for the Windows Phone Store app.
At this point, both devices are registered to receive push notifications.
3. Run the Windows Store app again, and type text in Insert a TodoItem, and then click Save.
Note that after the insert completes, both the Windows Store and the Windows Phone apps receive a push
notification from WNS. The notification is displayed on Windows Phone even when the app isn't running.
Configure your App Service application to use Azure Active Directory login
This topic shows you how to configure Azure App Services to use Azure Active Directory as an authentication provider.
Configure Azure Active Directory using express settings
1. In the Azure portal, navigate to your application. Click Settings, and then Authentication/Authorization.
2. If the Authentication / Authorization feature is not enabled, turn the switch to On.
3. Click Azure Active Directory, and then click Express under Management Mode.
4. Click OK to register the application in Azure Active Directory. This will create a new registration. If you want to
choose an existing registration instead, click Select an existing app and then search for the name of a
previously created registration within your tenant. Click the registration to select it and click OK. Then
click OK on the Azure Active Directory settings blade.
By default, App Service provides authentication but does not restrict authorized access to your site content and APIs.
You must authorize users in your app code.
210 | P a g e
70-534 Architecting Microsoft Azure Solutions
5. (Optional) To restrict access to your site to only users authenticated by Azure Active Directory, set Action to
take when request is not authenticated to Log in with Azure Active Directory. This requires that all requests be
authenticated, and all unauthenticated requests are redirected to Azure Active Directory for authentication.
6. Click Save.
You are now ready to use Azure Active Directory for authentication in your app.
(Alternative method) Manually configure Azure Active Directory with advanced settings
You can also choose to provide configuration settings manually. This is the preferred solution if the AAD tenant you
wish to use is different from the tenant with which you sign into Azure. To complete the configuration, you must first
create a registration in Azure Active Directory, and then you must provide some of the registration details to App
Service.
Register your application with Azure Active Directory
1. Log on to the Azure portal, and navigate to your application. Copy your URL. You will use this to configure your
Azure Active Directory app.
2. Sign in to the Azure classic portal and navigate to Active Directory.
3. Select your directory, and then select the Applications tab at the top. Click ADD at the bottom to create a new
app registration.
4. Click Add an application my organization is developing.
5. In the Add Application Wizard, enter a Name for your application and click the Web Application And/Or Web
API type. Then click to continue.
6. In the SIGN-ON URL box, paste the application URL you copied earlier. Enter that same URL in the App ID
URI box. Then click to continue.
7. Once the application has been added, click the Configure tab. Edit the Reply URL under Single Sign-on to be
the URL of your application appended with the path, /.auth/login/aad/callback. For
211 | P a g e
70-534 Architecting Microsoft Azure Solutions
8. Click Save. Then copy the Client ID for the app. You will configure your application to use this later.
9. In the bottom command bar, click View Endpoints, and then copy the Federation Metadata Document URL and
download that document or navigate to it in a browser.
10. Within the root EntityDescriptor element, there should be an entityID attribute of the
form https://sts.windows.net/ followed by a GUID specific to your tenant (called a "tenant ID"). Copy this
value - it will serve as your Issuer URL. You will configure your application to use this later.
Add Azure Active Directory information to your application
1. Back in the Azure portal, navigate to your application. Click Settings, and then Authentication/Authorization.
2. If the Authentication/Authorization feature is not enabled, turn the switch to On.
3. Click Azure Active Directory, and then click Advanced under Management Mode. Paste in the Client ID and
Issuer URL value which you obtained previously. Then click OK.
212 | P a g e
70-534 Architecting Microsoft Azure Solutions
By default, App Service provides authentication but does not restrict authorized access to your site content and APIs.
You must authorize users in your app code.
4. (Optional) To restrict access to your site to only users authenticated by Azure Active Directory, set Action to
take when request is not authenticated to Log in with Azure Active Directory. This requires that all requests be
authenticated, and all unauthenticated requests are redirected to Azure Active Directory for authentication.
5. Click Save.
You are now ready to use Azure Active Directory for authentication in your app.
(Optional) Configure a native client application
Azure Active Directory also allows you to register native clients, which provides greater control over permissions
mapping. You need this if you wish to perform logins using a library such as the Active Directory Authentication Library.
1. Navigate to Active Directory in the Azure classic portal.
2. Select your directory, and then select the Applications tab at the top. Click ADD at the bottom to create a new
app registration.
3. Click Add an application my organization is developing.
4. In the Add Application Wizard, enter a Name for your application and click the Native Client Application type.
Then click to continue.
5. In the Redirect URI box, enter your site's /.auth/login/done endpoint, using the HTTPS scheme. This value
should be similar to https://contoso.azurewebsites.net/.auth/login/done. If creating a Windows application,
instead use the package SID as the URI.
6. Once the native application has been added, click the Configure tab. Find the Client ID and make a note of this
value.
7. Scroll the page down to the Permissions to other applications section and click Add application.
8. Search for the web application that you registered earlier and click the plus icon. Then click the check to close
the dialog. If the web application cannot be found, navigate to its registration and add a new reply URL (e.g.,
the HTTP version of your current URL), click save, and then repeat these steps - the application should show
up in the list.
9. On the new entry you just added, open the Delegated Permissions dropdown and select Access (appName).
Then click Save.
You have now configured a native client application which can access your App Service application.
213 | P a g e
70-534 Architecting Microsoft Azure Solutions
While PNSes are powerful, they leave much work to the app developer in order to implement even common push
notification scenarios, such as broadcasting or sending push notifications to segmented users.
Push is one of the most requested features in mobile cloud services, because its working requires complex
infrastructures that are unrelated to the app's main business logic. Some of the infrastructural challenges are:
Platform dependency:
o The backend needs to have complex and hard-to-maintain platform-dependent logic to send
notifications to devices on various platforms as PNSes are not unified.
Scale:
o Per PNS guidelines, device tokens must be refreshed upon every app launch. This means the backend
is dealing with a large amount of traffic and database access just to keep the tokens up-to-date. When
the number of devices grows to hundreds and thousands of millions, the cost of creating and
maintaining this infrastructure is massive.
o Most PNSes do not support broadcast to multiple devices. This means a simple broadcast to a million
devices results in a million calls to the PNSes. Scaling this amount of traffic with minimal latency is
nontrivial.
Routing:
o Though PNSes provide a way to send messages to devices, most apps notifications are targeted at
users or interest groups. This means the backend must maintain a registry to associate devices with
interest groups, users, properties, etc. This overhead adds to the time to market and maintenance
costs of an app.
Why Use Notification Hubs?
Notification Hubs eliminates all complexities associated with enabling push on your own. Its multi-platform, scaled-out
push notification infrastructure reduces push-related codes and simplifies your backend. With Notification Hubs,
devices are merely responsible for registering their PNS handles with a hub, while the backend sends messages to
users or interest groups, as shown in the following figure:1
Notification hubs is your ready-to-use push engine with the following advantages:
Cross platforms
o Support for all major push platforms including iOS, Android, Windows, and Kindle and Baidu.
o A common interface to push to all platforms in platform-specific or platform-independent formats
with no platform-specific work.
o Device handle management in one place.
Cross backends
o Cloud or on-premises
o .NET, Node.js, Java, etc.
Rich set of delivery patterns:
o Broadcast to one or multiple platforms: You can instantly broadcast to millions of devices across
platforms with a single API call.
o Push to device: You can target notifications to individual devices.
o Push to user: Tags and templates features help you reach all cross-platform devices of a user.
215 | P a g e
70-534 Architecting Microsoft Azure Solutions
o Push to segment with dynamic tags: Tags feature helps you segment devices and push to them
according to your needs, whether you are sending to one segment or an expression of segments (e.g.
active AND lives in Seattle NOT new user). Instead of being restricted to pub-sub, you can update
device tags anywhere and anytime.
o Localized push: Templates feature helps achieve localization without affecting backend code.
o Silent push: You can enables the push-to-pull pattern by sending silent notifications to devices and
triggering them to complete certain pulls or actions.
o Scheduled push: You can schedule to send out notifications anytime.
o Direct push: You can skip registering devices with our service and directly batch push to a list of device
handles.
o Personalized push: Device push variables helps you send device-specific personalized push
notifications with customized key-value pairs.
Rich telemetry
o General push, device, error, and operation telemetry is available in the Azure portal and
programmatically.
o Per Message Telemetry tracks each push from your initial request call to our service successfully
batching the pushes out.
o Platform Notification System Feedback communicates all feedback from Platfom Notification Systems
to assist in debugging.
Scalability
o Send fast messages to millions of devices without re-architecting or device sharding.
Security
o Shared Access Secret (SAS) or federated authentication.
Integration with App Service Mobile Apps
To facilitate a seamless and unifying experience across Azure services, App Service Mobile Apps has built-in support
for push notifications using Notification Hubs. App Service Mobile Apps offers a highly scalable, globally available
mobile application development platform for Enterprise Developers and System Integrators that brings a rich set of
capabilities to mobile developers.
Mobile Apps developers can utilize Notification Hubs with the following workflow:
1. Retrieve device PNS handle
2. Register device with Notification Hubs through convenient Mobile Apps Client SDK register API
Note that Mobile Apps strips away all tags on registrations for security purposes. Work with
Notification Hubs from your backend directly to associate tags with devices.
3. Send notifications from your app backend with Notification Hubs
Here are some conveniences brought to developers with this integration:
Mobile Apps Client SDKs: These multi-platform SDKs provide simple APIs for registration and talk to the
notification hub linked up with the mobile app automatically. Developers do not need to dig through
Notification Hubs credentials and work with an additional service.
o Push to user: The SDKs automatically tag the given device with Mobile Apps authenticated User ID to
enable push to user scenario.
o Push to device: The SDKs automatically use the Mobile Apps Installation ID as GUID to register with
Notification Hubs, saving developers the trouble of maintaining multiple service GUIDs.
Installation model: Mobile Apps works with Notification Hubs' latest push model to represent all push
properties associated with a device in a JSON Installation that aligns with Push Notification Services and is
easy to use.
Flexibility: Developers can always choose to work with Notification Hubs directly even with the integration in
place.
Integrated experience in Azure portal: Push as a capability is represented visually in Mobile Apps and
developers can easily work with the associated notification hub through Mobile Apps.
216 | P a g e
70-534 Architecting Microsoft Azure Solutions
217 | P a g e
70-534 Architecting Microsoft Azure Solutions
Take advantage of a wide range of Linux and Windows applications, libraries, and tools from independent software
vendors with solutions across industries such as financial services, engineering, oil and gas, life sciences, and digital
content creation. Your existing cluster manager and job scheduler can work with Azure Virtual Machines. Microsoft
partners including Excelian, Cycle Computing, Techila, Rescale, Fixstars, and Nimbo can help make the cloud work for
you.
Some examples of workloads that are commonly processed using this technique are:
Financial risk modeling
Climate and hydrology data analysis
Image rendering, analysis, and processing
Media encoding and transcoding
Genetic sequence analysis
Engineering stress analysis
Software testing
Batch can also perform parallel calculations with a reduce step at the end, and execute more complex HPC workloads
such as Message Passing Interface (MPI) applications.
For a comparison between Batch and other HPC solution options in Azure, see Batch and HPC solutions.
Pricing
Azure Batch is a free service; you aren't charged for the Batch account itself. You are charged for the underlying Azure
compute resources that your Batch solutions consume, and for the resources consumed by other services when your
workloads run. For example, you are charged for the compute nodes (VMs) in your pools and for the data you store in
Azure Storage as input or output for your tasks. Similarly, if you use the application packages feature of Batch, you are
218 | P a g e
70-534 Architecting Microsoft Azure Solutions
charged for the Azure Storage resources used for storing your application packages. See Batch pricing for more
information.
Low-priority VMs can significantly reduce the cost of Batch workloads. For information about pricing for low-priority
VMs, see Batch Pricing.
Scenario: Scale out a parallel workload
A common solution that uses the Batch APIs to interact with the Batch service involves scaling out intrinsically parallel
work--such as the rendering of images for 3D scenes--on a pool of compute nodes. This pool of compute nodes can be
your "render farm" that provides tens, hundreds, or even thousands of cores to your rendering job, for example.
The following diagram shows a common Batch workflow, with a client application or hosted service using Batch to run
a parallel workload.
In this common scenario, your application or service processes a computational workload in Azure Batch by
performing the following steps:
1. Upload the input files and the application that will process those files to your Azure Storage account. The input
files can be any data that your application will process, such as financial modeling data, or video files to be
transcoded. The application files can be any application that is used for processing the data, such as a 3D
rendering application or media transcoder.
2. Create a Batch pool of compute nodes in your Batch account--these nodes are the virtual machines that will
execute your tasks. You specify properties such as the node size, their operating system, and the location in
Azure Storage of the application to install when the nodes join the pool (the application that you uploaded in
step #1). You can also configure the pool to automatically scale in response to the workload that your tasks
generate. Auto-scaling dynamically adjusts the number of compute nodes in the pool.
3. Create a Batch job to run the workload on the pool of compute nodes. When you create a job, you associate it
with a Batch pool.
4. Add tasks to the job. When you add tasks to a job, the Batch service automatically schedules the tasks for
execution on the compute nodes in the pool. Each task uses the application that you uploaded to process the
input files.
4a. Before a task executes, it can download the data (the input files) that it is to process to the
compute node it is assigned to. If the application has not already been installed on the node (see step
219 | P a g e
70-534 Architecting Microsoft Azure Solutions
#2), it can be downloaded here instead. When the downloads are complete, the tasks execute on
their assigned nodes.
5. As the tasks run, you can query Batch to monitor the progress of the job and its tasks. Your client application
or service communicates with the Batch service over HTTPS. Because you may be monitoring thousands of
tasks running on thousands of compute nodes, be sure to query the Batch service efficiently.
6. As the tasks complete, they can upload their result data to Azure Storage. You can also retrieve files directly
from the file system on a compute node.
7. When your monitoring detects that the tasks in your job have completed, your client application or service
can download the output data for further processing or evaluation.
Keep in mind this is just one way to use Batch, and this scenario describes only a few of its available features. For
example, you can execute multiple tasks in parallel on each compute node, and you can use job preparation and
completion tasks to prepare the nodes for your jobs, then clean up afterward.
220 | P a g e
70-534 Architecting Microsoft Azure Solutions
Figure 1: Service Bus provides a multi-tenant service for connecting applications through the cloud.
Within a namespace, you can use one or more instances of three different communication mechanisms, each of
which connects applications in a different way. The choices are:
Queues, which allow one-directional communication. Each queue acts as an intermediary (sometimes called
a broker) that stores sent messages until they are received. Each message is received by a single recipient.
Topics, which provide one-directional communication using subscriptions-a single topic can have multiple
subscriptions. Like a queue, a topic acts as a broker, but each subscription can optionally use a filter to receive
only messages that match specific criteria.
Relays, which provide bi-directional communication. Unlike queues and topics, a relay doesn't store in-flight
messages; it's not a broker. Instead, it just passes them on to the destination application.
When you create a queue, topic, or relay, you give it a name. Combined with whatever you called your namespace,
this name creates a unique identifier for the object. Applications can provide this name to Service Bus, then use that
queue, topic, or relay to communicate with one another.
To use any of these objects in the relay scenario, Windows applications can use Windows Communication Foundation
(WCF). This service is known as WCF Relay. For queues and topics, Windows applications can use Service Bus-defined
messaging APIs. To make these objects easier to use from non-Windows applications, Microsoft provides SDKs for
Java, Node.js, and other languages. You can also access queues and topics using REST APIsover HTTP(s).
It's important to understand that even though Service Bus itself runs in the cloud (that is, in Microsoft's Azure
datacenters), applications that use it can run anywhere. You can use Service Bus to connect applications running on
Azure, for example, or applications running inside your own datacenter. You can also use it to connect an application
running on Azure or another cloud platform with an on-premises application or with tablets and phones. It's even
possible to connect household appliances, sensors, and other devices to a central application or to one other. Service
Bus is a communication mechanism in the cloud that's accessible from pretty much anywhere. How you use it
depends on what your applications need to do.
Queues
Suppose you decide to connect two applications using a Service Bus queue. Figure 2 illustrates this situation.
221 | P a g e
70-534 Architecting Microsoft Azure Solutions
222 | P a g e
70-534 Architecting Microsoft Azure Solutions
Useful as they are, queues aren't always the right solution. Sometimes, Service Bus topics are better. Figure 3
illustrates this idea.
Figure 3: Based on the filter a subscribing application specifies, it can receive some or all the messages sent to a Service
Bus topic.
A topic is similar in many ways to a queue. Senders submit messages to a topic in the same way that they submit
messages to a queue, and those messages look the same as with queues. The difference is that topics enable each
receiving application to create its own subscription by defining a filter. A subscriber then sees only the messages that
match that filter. For example, Figure 3 shows a sender and a topic with three subscribers, each with its own filter:
Subscriber 1 receives only messages that contain the property Seller="Ava".
Subscriber 2 receives messages that contain the property Seller="Ruby" and/or contain an Amount property
whose value is greater than 100,000. Perhaps Ruby is the sales manager, so she wants to see both her own
sales and all large sales regardless of who makes them.
Subscriber 3 has set its filter to True, which means that it receives all messages. For example, this application
might be responsible for maintaining an audit trail and therefore it needs to see all the messages.
As with queues, subscribers to a topic can read messages using either ReceiveAndDelete or PeekLock. Unlike queues,
however, a single message sent to a topic can be received by multiple subscriptions. This approach, commonly
called publish and subscribe (or pub/sub), is useful whenever multiple applications are interested in the same
messages. By defining the right filter, each subscriber can tap into just the part of the message stream that it needs to
see.
Relays
Both queues and topics provide one-way asynchronous communication through a broker. Traffic flows in just one
direction, and there's no direct connection between senders and receivers. But what if you don't want this
connection? Suppose your applications need to both send and receive messages, or perhaps you want a direct link
between them and you don't need a broker to store messages. To address scenarios such as this, Service Bus
provides relays, as Figure 4 shows.
223 | P a g e
70-534 Architecting Microsoft Azure Solutions
Figure 4: Service Bus relay provides synchronous, two-way communication between applications.
The obvious question to ask about relays is this: why would I use one? Even if I don't need queues, why make
applications communicate via a cloud service rather than just interact directly? The answer is that talking directly can
be harder than you might think.
Suppose you want to connect two on-premises applications, both running inside corporate datacenters. Each of these
applications sits behind a firewall, and each datacenter probably uses network address translation (NAT). The firewall
blocks incoming data on all but a few ports, and NAT implies that the machine each application is running on doesn't
have a fixed IP address that you can reach directly from outside the datacenter. Without some extra help, connecting
these applications over the public internet is problematic.
A Service Bus relay can help. To communicate bi-directionally through a relay, each application establishes an
outbound TCP connection with Service Bus, then keeps it open. All communication between the two applications
travels over these connections. Because each connection was established from inside the datacenter, the firewall
allows incoming traffic to each application without opening new ports. This approach also gets around the NAT
problem, because each application has a consistent endpoint in the cloud throughout the communication. By
exchanging data through the relay, the applications can avoid the problems that would otherwise make
communication difficult.
To use Service Bus relays, applications rely on the Windows Communication Foundation (WCF). Service Bus provides
WCF bindings that make it straightforward for Windows applications to interact via relays. Applications that already
use WCF can typically specify one of these bindings, then talk to each other through a relay. Unlike queues and topics,
however, using relays from non-Windows applications, while possible, requires some programming effort; no
standard libraries are provided.
Unlike queues and topics, applications don't explicitly create relays. Instead, when an application that wishes to
receive messages establishes a TCP connection with Service Bus, a relay is created automatically. When the
connection is dropped, the relay is deleted. To enable an application to find the relay created by a specific listener,
Service Bus provides a registry that enables applications to locate a specific relay by name.
Relays are the right solution when you need direct communication between applications. For example, consider an
airline reservation system running in an on-premises datacenter that must be accessed from check-in kiosks, mobile
devices, and other computers. Applications running on all these systems could rely on Service Bus relays in the cloud
to communicate, wherever they might be running.
Summary
Connecting applications has always been part of building complete solutions, and the range of scenarios that require
applications and services to communicate with each other is set to increase as more applications and devices are
connected to the internet. By providing cloud-based technologies for achieving communication through queues,
topics, and relays, Service Bus aims to make this essential function easier to implement and more broadly available.
224 | P a g e
70-534 Architecting Microsoft Azure Solutions
Both Storage queues and Service Bus queues are implementations of the message queuing service currently offered
on Microsoft Azure. Each has a slightly different feature set, which means you can choose one or the other, or use
both, depending on the needs of your particular solution or business/technical problem you are solving.
When determining which queuing technology fits the purpose for a given solution, solution architects and developers
should consider the recommendations below. For more details, see the next section.
As a solution architect/developer, you should consider using Storage queues when:
Your application must store over 80 GB of messages in a queue, where the messages have a lifetime shorter
than 7 days.
Your application wants to track progress for processing a message inside of the queue. This is useful if the
worker processing a message crashes. A subsequent worker can then use that information to continue from
where the prior worker left off.
You require server side logs of all of the transactions executed against your queues.
As a solution architect/developer, you should consider using Service Bus queues when:
Your solution must be able to receive messages without having to poll the queue. With Service Bus, this can
be achieved through the use of the long-polling receive operation using the TCP-based protocols that Service
Bus supports.
Your solution requires the queue to provide a guaranteed first-in-first-out (FIFO) ordered delivery.
You want a symmetric experience in Azure and on Windows Server (private cloud). For more information,
see Service Bus for Windows Server.
Your solution must be able to support automatic duplicate detection.
You want your application to process messages as parallel long-running streams (messages are associated
with a stream using the SessionIdproperty on the message). In this model, each node in the consuming
application competes for streams, as opposed to messages. When a stream is given to a consuming node, the
node can examine the state of the application stream state using transactions.
Your solution requires transactional behavior and atomicity when sending or receiving multiple messages
from a queue.
The time-to-live (TTL) characteristic of the application-specific workload can exceed the 7-day period.
Your application handles messages that can exceed 64 KB but will not likely approach the 256 KB limit.
You deal with a requirement to provide a role-based access model to the queues, and different
rights/permissions for senders and receivers.
Your queue size will not grow larger than 80 GB.
You want to use the AMQP 1.0 standards-based messaging protocol. For more information about AMQP,
see Service Bus AMQP Overview.
You can envision an eventual migration from queue-based point-to-point communication to a message
exchange pattern that enables seamless integration of additional receivers (subscribers), each of which
receives independent copies of either some or all messages sent to the queue. The latter refers to the
publish/subscribe capability natively provided by Service Bus.
Your messaging solution must be able to support the "At-Most-Once" delivery guarantee without the need for
you to build the additional infrastructure components.
You would like to be able to publish and consume batches of messages.
Comparing Storage queues and Service Bus queues
The tables in the following sections provide a logical grouping of queue features and let you compare, at a glance, the
capabilities available in both Storage queues and Service Bus queues.
Foundational capabilities
This section compares some of the fundamental queuing capabilities provided by Storage queues and Service Bus
queues.
Comparison
Criteria Storage queues Service Bus Queues
225 | P a g e
70-534 Architecting Microsoft Azure Solutions
Comparison
Criteria Storage queues Service Bus Queues
Atomic No Yes
operation
support
Non-blocking
Push-style No Yes
API
OnMessage and OnMessage sessions .NET
API.
226 | P a g e
70-534 Architecting Microsoft Azure Solutions
Comparison
Criteria Storage queues Service Bus Queues
Batched No Yes
send
(through the use of transactions or client-
side batching)
Additional information
Messages in Storage queues are typically first-in-first-out, but sometimes they can be out of order; for
example, when a message's visibility timeout duration expires (for example, as a result of a client application
crashing during processing). When the visibility timeout expires, the message becomes visible again on the
queue for another worker to dequeue it. At that point, the newly visible message might be placed in the
queue (to be dequeued again) after a message that was originally enqueued after it.
The guaranteed FIFO pattern in Service Bus queues requires the use of messaging sessions. In the event that
the application crashes while processing a message received in the Peek & Lock mode, the next time a queue
receiver accepts a messaging session, it will start with the failed message after its time-to-live (TTL) period
expires.
Storage queues are designed to support standard queuing scenarios, such as decoupling application
components to increase scalability and tolerance for failures, load leveling, and building process workflows.
Service Bus queues support the At-Least-Once delivery guarantee. In addition, the At-Most-Once semantic can
be supported by using session state to store the application state and by using transactions to atomically
receive messages and update the session state.
Storage queues provide a uniform and consistent programming model across queues, tables, and BLOBs
both for developers and for operations teams.
Service Bus queues provide support for local transactions in the context of a single queue.
The Receive and Delete mode supported by Service Bus provides the ability to reduce the messaging
operation count (and associated cost) in exchange for lowered delivery assurance.
Storage queues provide leases with the ability to extend the leases for messages. This allows the workers to
maintain short leases on messages. Thus, if a worker crashes, the message can be quickly processed again by
another worker. In addition, a worker can extend the lease on a message if it needs to process it longer than
the current lease time.
Storage queues offer a visibility timeout that you can set upon the enqueueing or dequeuing of a message. In
addition, you can update a message with different lease values at run-time, and update different values across
messages in the same queue. Service Bus lock timeouts are defined in the queue metadata; however, you can
renew the lock by calling the RenewLock method.
The maximum timeout for a blocking receive operation in Service Bus queues is 24 days. However, REST-
based timeouts have a maximum value of 55 seconds.
Client-side batching provided by Service Bus enables a queue client to batch multiple messages into a single
send operation. Batching is only available for asynchronous send operations.
Features such as the 200 TB ceiling of Storage queues (more when you virtualize accounts) and unlimited
queues make it an ideal platform for SaaS providers.
Storage queues provide a flexible and performant delegated access control mechanism.
Advanced capabilities
This section compares advanced capabilities provided by Storage queues and Service Bus queues.
227 | P a g e
70-534 Architecting Microsoft Azure Solutions
Comparison
Criteria Storage queues Service Bus Queues
Automatic No Yes
dead
lettering
Server-side Yes No
transaction
log
State No Yes
management
Microsoft.ServiceBus.Messaging.EntityStatus.Active,
Microsoft.ServiceBus.Messaging.EntityStatus.Disabled,
Microsoft.ServiceBus.Messaging.EntityStatus.SendDisabled,
Microsoft.ServiceBus.Messaging.EntityStatus.ReceiveDisabled
Message No Yes
auto-
forwarding
228 | P a g e
70-534 Architecting Microsoft Azure Solutions
Comparison
Criteria Storage queues Service Bus Queues
Message No Yes
groups
(through the use of messaging sessions)
Application No Yes
state per
message
group
Duplicate No Yes
detection
(configurable on the sender side)
Browsing No Yes
message
groups
Fetching No Yes
message
sessions by
ID
Additional information
Both queuing technologies enable a message to be scheduled for delivery at a later time.
Queue auto-forwarding enables thousands of queues to auto-forward their messages to a single queue, from
which the receiving application consumes the message. You can use this mechanism to achieve security,
control flow, and isolate storage between each message publisher.
Storage queues provide support for updating message content. You can use this functionality for persisting
state information and incremental progress updates into the message so that it can be processed from the
last known checkpoint, instead of starting from scratch. With Service Bus queues, you can enable the same
scenario through the use of message sessions. Sessions enable you to save and retrieve the application
processing state (by using SetState and GetState).
Dead lettering, which is only supported by Service Bus queues, can be useful for isolating messages that
cannot be processed successfully by the receiving application or when messages cannot reach their
destination due to an expired time-to-live (TTL) property. The TTL value specifies how long a message remains
in the queue. With Service Bus, the message will be moved to a special queue called $DeadLetterQueue when
the TTL period expires.
To find "poison" messages in Storage queues, when dequeuing a message the application examines
the DequeueCount property of the message. If DequeueCount is greater than a given threshold, the
application moves the message to an application-defined "dead letter" queue.
Storage queues enable you to obtain a detailed log of all of the transactions executed against the queue, as
well as aggregated metrics. Both of these options are useful for debugging and understanding how your
application uses Storage queues. They are also useful for performance-tuning your application and reducing
the costs of using queues.
The concept of "message sessions" supported by Service Bus enables messages that belong to a certain logical
group to be associated with a given receiver, which in turn creates a session-like affinity between messages
and their respective receivers. You can enable this advanced functionality in Service Bus by setting
229 | P a g e
70-534 Architecting Microsoft Azure Solutions
the SessionID property on a message. Receivers can then listen on a specific session ID and receive messages
that share the specified session identifier.
The duplication detection functionality supported by Service Bus queues automatically removes duplicate
messages sent to a queue or topic, based on the value of the MessageId property.
Capacity and quotas
This section compares Storage queues and Service Bus queues from the perspective of capacity and quotas that may
apply.
Comparison
Criteria Storage queues Service Bus Queues
Maximum 64 KB 256 KB or 1 MB
message size
(48 KB when using Base64 encoding) (including both header and body,
maximum header size: 64 KB).
Azure supports large messages by combining
queues and blobs at which point you can Depends on the service tier.
enqueue up to 200GB for a single item.
Additional information
Service Bus enforces queue size limits. The maximum queue size is specified upon creation of the queue and
can have a value between 1 and 80 GB. If the queue size value set on creation of the queue is reached,
additional incoming messages will be rejected and an exception will be received by the calling code. For more
information about quotas in Service Bus, see Service Bus Quotas.
In the Standard tier, you can create Service Bus queues in 1, 2, 3, 4, or 5 GB sizes (the default is 1 GB). In the
Premium tier, you can create queues up to 80 GB in size. In Standard tier, with partitioning enabled (which is
the default), Service Bus creates 16 partitions for each GB you specify. As such, if you create a queue that is 5
GB in size, with 16 partitions the maximum queue size becomes (5 * 16) = 80 GB. You can see the maximum
size of your partitioned queue or topic by looking at its entry on the Azure portal. In the Premium tier, only 2
partitions are created per queue.
230 | P a g e
70-534 Architecting Microsoft Azure Solutions
With Storage queues, if the content of the message is not XML-safe, then it must be Base64 encoded. If
you Base64-encode the message, the user payload can be up to 48 KB, instead of 64 KB.
With Service Bus queues, each message stored in a queue is composed of two parts: a header and a body. The
total size of the message cannot exceed the maximum message size supported by the service tier.
When clients communicate with Service Bus queues over the TCP protocol, the maximum number of
concurrent connections to a single Service Bus queue is limited to 100. This number is shared between
senders and receivers. If this quota is reached, subsequent requests for additional connections will be
rejected and an exception will be received by the calling code. This limit is not imposed on clients connecting
to the queues using REST-based API.
If you require more than 10,000 queues in a single Service Bus namespace, you can contact the Azure support
team and request an increase. To scale beyond 10,000 queues with Service Bus, you can also create additional
namespaces using the Azure portal.
Management and operations
This section compares the management features provided by Storage queues and Service Bus queues.
(Letters in a queue name must be lowercase.) (Queue paths and names are
case-insensitive.)
231 | P a g e
70-534 Architecting Microsoft Azure Solutions
Additional information
Storage queues provide support for arbitrary attributes that can be applied to the queue description, in the
form of name/value pairs.
Both queue technologies offer the ability to peek a message without having to lock it, which can be useful
when implementing a queue explorer/browser tool.
The Service Bus .NET brokered messaging APIs leverage full-duplex TCP connections for improved
performance when compared to REST over HTTP, and they support the AMQP 1.0 standard protocol.
Names of Storage queues can be 3-63 characters long, can contain lowercase letters, numbers, and hyphens.
For more information, see Naming Queues and Metadata.
Service Bus queue names can be up to 260 characters long and have less restrictive naming rules. Service Bus
queue names can contain letters, numbers, periods, hyphens, and underscores.
Additional information
Every request to either of the queuing technologies must be authenticated. Public queues with anonymous
access are not supported. Using SAS, you can address this scenario by publishing a write-only SAS, read-only
SAS, or even a full-access SAS.
The authentication scheme provided by Storage queues involves the use of a symmetric key, which is a hash-
based Message Authentication Code (HMAC), computed with the SHA-256 algorithm and encoded as
a Base64 string. For more information about the respective protocol, see Authentication for the Azure Storage
Services. Service Bus queues support a similar model using symmetric keys. For more information, see Shared
Access Signature Authentication with Service Bus.
Conclusion
By gaining a deeper understanding of the two technologies, you will be able to make a more informed decision on
which queue technology to use, and when. The decision on when to use Storage queues or Service Bus queues clearly
depends on a number of factors. These factors may depend heavily on the individual needs of your application and its
architecture. If your application already uses the core capabilities of Microsoft Azure, you may prefer to choose
Storage queues, especially if you require basic communication and messaging between services or need queues that
can be larger than 80 GB in size.
Because Service Bus queues provide a number of advanced features, such as sessions, transactions, duplicate
detection, automatic dead-lettering, and durable publish/subscribe capabilities, they may be a preferred choice if you
are building a hybrid application or if your application otherwise requires these features.
scenarios using the Service Bus messaging fabric. Decoupled communication has many advantages; for example,
clients and servers can connect as needed and perform their operations in an asynchronous fashion.
The messaging entities that form the core of the messaging capabilities in Service Bus are queues, topics and
subscriptions, and rules/actions.
Queues
Queues offer First In, First Out (FIFO) message delivery to one or more competing consumers. That is, messages are
typically expected to be received and processed by the receivers in the order in which they were added to the queue,
and each message is received and processed by only one message consumer. A key benefit of using queues is to
achieve "temporal decoupling" of application components. In other words, the producers (senders) and consumers
(receivers) do not have to be sending and receiving messages at the same time, because messages are stored durably
in the queue. Furthermore, the producer does not have to wait for a reply from the consumer in order to continue to
process and send messages.
A related benefit is "load leveling," which enables producers and consumers to send and receive messages at different
rates. In many applications, the system load varies over time; however, the processing time required for each unit of
work is typically constant. Intermediating message producers and consumers with a queue means that the consuming
application only has to be provisioned to be able to handle average load instead of peak load. The depth of the queue
grows and contracts as the incoming load varies. This directly saves money with regard to the amount of
infrastructure required to service the application load. As the load increases, more worker processes can be added to
read from the queue. Each message is processed by only one of the worker processes. Furthermore, this pull-based
load balancing allows for optimum use of the worker computers even if the worker computers differ with regard to
processing power, as they will pull messages at their own maximum rate. This pattern is often termed the "competing
consumer" pattern.
Using queues to intermediate between message producers and consumers provides an inherent loose coupling
between the components. Because producers and consumers are not aware of each other, a consumer can be
upgraded without having any effect on the producer.
Creating a queue is a multi-step process. You perform management operations for Service Bus messaging entities
(both queues and topics) via the Microsoft.ServiceBus.NamespaceManager class, which is constructed by supplying
the base address of the Service Bus namespace and the user credentials. NamespaceManager provides methods to
create, enumerate and delete messaging entities. After creating a Microsoft.ServiceBus.TokenProvider object from
the SAS name and key, and a service namespace management object, you can use
the Microsoft.ServiceBus.NamespaceManager.CreateQueue method to create the queue. For example:
// Create management credentials
TokenProvider credentials =
TokenProvider.CreateSharedAccessSignatureTokenProvider(sasKeyName,sasKeyValue);
// Create namespace client
NamespaceManager namespaceClient = new NamespaceManager(ServiceBusEnvironment.CreateServiceUri("sb",
ServiceNamespace, string.Empty), credentials);
You can then create a queue object and a messaging factory with the Service Bus URI as an argument. For example:
QueueDescription myQueue;
myQueue = namespaceClient.CreateQueue("TestQueue");
MessagingFactory factory = MessagingFactory.Create(ServiceBusEnvironment.CreateServiceUri("sb",
ServiceNamespace, string.Empty), credentials);
QueueClient myQueueClient = factory.CreateQueueClient("TestQueue");
You can then send messages to the queue. For example, if you have a list of brokered messages called MessageList,
the code appears similar to the following:
for (int count = 0; count < 6; count++)
{
var issue = MessageList[count];
issue.Label = issue.Properties["IssueTitle"].ToString();
myQueueClient.Send(issue);
}
You then receive messages from the queue as follows:
while ((message = myQueueClient.Receive(new TimeSpan(hours: 0, minutes: 0, seconds: 5))) != null)
{
Console.WriteLine(string.Format("Message received: {0}, {1}, {2}", message.SequenceNumber,
message.Label, message.MessageId));
message.Complete();
233 | P a g e
70-534 Architecting Microsoft Azure Solutions
Thread.Sleep(1000);
}
In the ReceiveAndDelete mode, the receive operation is single-shot; that is, when Service Bus receives the request, it
marks the message as being consumed and returns it to the application. ReceiveAndDelete mode is the simplest
model and works best for scenarios in which the application can tolerate not processing a message in the event of a
failure. To understand this, consider a scenario in which the consumer issues the receive request and then crashes
before processing it. Because Service Bus marks the message as being consumed, when the application restarts and
begins consuming messages again, it will have missed the message that was consumed prior to the crash.
In PeekLock mode, the receive operation becomes two-stage, which makes it possible to support applications that
cannot tolerate missing messages. When Service Bus receives the request, it finds the next message to be consumed,
locks it to prevent other consumers from receiving it, and then returns it to the application. After the application
finishes processing the message (or stores it reliably for future processing), it completes the second stage of the
receive process by calling Complete on the received message. When Service Bus sees the Complete call, it marks the
message as being consumed.
If the application is unable to process the message for some reason, it can call the Abandon method on the received
message (instead of Complete). This enables Service Bus to unlock the message and make it available to be received
again, either by the same consumer or by another competing consumer. Secondly, there is a timeout associated with
the lock and if the application fails to process the message before the lock timeout expires (for example, if the
application crashes), then Service Bus unlocks the message and makes it available to be received again (essentially
performing an Abandon operation by default).
Note that in the event that the application crashes after processing the message, but before the Complete request is
issued, the message is redelivered to the application when it restarts. This is often called At Least Once processing;
that is, each message is processed at least once. However, in certain situations the same message may be redelivered.
If the scenario cannot tolerate duplicate processing, then additional logic is required in the application to detect
duplicates which can be achieved based upon the MessageId property of the message, which remains constant across
delivery attempts. This is known as Exactly Once processing.
Topics and subscriptions
In contrast to queues, in which each message is processed by a single consumer, topics and subscriptions provide a
one-to-many form of communication, in a publish/subscribe pattern. Useful for scaling to very large numbers of
recipients, each published message is made available to each subscription registered with the topic. Messages are
sent to a topic and delivered to one or more associated subscriptions, depending on filter rules that can be set on a
per-subscription basis. The subscriptions can use additional filters to restrict the messages that they want to receive.
Messages are sent to a topic in the same way they are sent to a queue, but messages are not received from the topic
directly. Instead, they are received from subscriptions. A topic subscription resembles a virtual queue that receives
copies of the messages that are sent to the topic. Messages are received from a subscription identically to the way
they are received from a queue.
By way of comparison, the message-sending functionality of a queue maps directly to a topic and its message-
receiving functionality maps to a subscription. Among other things, this means that subscriptions support the same
patterns described earlier in this section with regard to queues: competing consumer, temporal decoupling, load
leveling, and load balancing.
Creating a topic is similar to creating a queue, as shown in the example in the previous section. Create the service URI,
and then use the NamespaceManager class to create the namespace client. You can then create a topic using
the CreateTopic method. For example:
TopicDescription dataCollectionTopic = namespaceClient.CreateTopic("DataCollectionTopic");
Next, add subscriptions as desired:
SubscriptionDescription myAgentSubscription = namespaceClient.CreateSubscription(myTopic.Path,
"Inventory");
SubscriptionDescription myAuditSubscription = namespaceClient.CreateSubscription(myTopic.Path,
"Dashboard");
You can then create a topic client. For example:
MessagingFactory factory = MessagingFactory.Create(serviceUri, tokenProvider);
TopicClient myTopicClient = factory.CreateTopicClient(myTopic.Path)
Using the message sender, you can send and receive messages to and from the topic, as shown in the previous
section. For example:
foreach (BrokeredMessage message in messageList)
{
234 | P a g e
70-534 Architecting Microsoft Azure Solutions
myTopicClient.Send(message);
Console.WriteLine(
string.Format("Message sent: Id = {0}, Body = {1}", message.MessageId, message.GetBody<string>()));
}
Similar to queues, messages are received from a subscription using a SubscriptionClient object instead of
a QueueClient object. Create the subscription client, passing the name of the topic, the name of the subscription, and
(optionally) the receive mode as parameters. For example, with the Inventory subscription:
// Create the subscription client
MessagingFactory factory = MessagingFactory.Create(serviceUri, tokenProvider);
For example, Event Hubs enables behavior tracking in mobile apps, traffic information from web farms, in-game event
capture in console games, or telemetry collected from industrial machines, connected vehicles, or other devices.
Azure Event Hubs overview
The common role that Event Hubs plays in solution architectures is the "front door" for an event pipeline, often called
an event ingestor. An event ingestor is a component or service that sits between event publishers and event
consumers to decouple the production of an event stream from the consumption of those events. The following figure
depicts this architecture:
Event Hubs provides message stream handling capability but has characteristics that are different from traditional
enterprise messaging. Event Hubs capabilities are built around high throughput and event processing scenarios. As
such, Event Hubs is different from Azure Service Busmessaging, and does not implement some of the capabilities that
are available for Service Bus messaging entities, such as topics.1
Event Hubs features
Event Hubs contains the following key elements:
Event producers/publishers: An entity that sends data to an event hub. An event is published via AMQP 1.0 or
HTTPS.
Partitions: Enables each consumer to only read a specific subset, or partition, of the event stream.
SAS tokens: used to identify and authenticate the event publisher.
Event consumers: An entity that reads event data from an event hub. Event consumers connect via AMQP 1.0.
Consumer groups: Provides each multiple consuming application with a separate view of the event stream,
enabling those consumers to act independently.
Throughput units: Pre-purchased units of capacity. A single partition has a maximum scale of one throughput
unit.
236 | P a g e
70-534 Architecting Microsoft Azure Solutions
For a higher-level introduction to the Batch service, see Basics of Azure Batch.
Batch service workflow
The following high-level workflow is typical of nearly all applications and services that use the Batch service for
processing parallel workloads:
1. Upload the data files that you want to process to an Azure Storage account. Batch includes built-in support for
accessing Azure Blob storage, and your tasks can download these files to compute nodes when the tasks are
run.
2. Upload the application files that your tasks will run. These files can be binaries or scripts and their
dependencies, and are executed by the tasks in your jobs. Your tasks can download these files from your
Storage account, or you can use the application packages feature of Batch for application management and
deployment.
3. Create a pool of compute nodes. When you create a pool, you specify the number of compute nodes for the
pool, their size, and the operating system. When each task in your job runs, it's assigned to execute on one of
the nodes in your pool.
4. Create a job. A job manages a collection of tasks. You associate each job to a specific pool where that job's
tasks will run.
5. Add tasks to the job. Each task runs the application or script that you uploaded to process the data files it
downloads from your Storage account. As each task completes, it can upload its output to Azure Storage.
6. Monitor job progress and retrieve the task output from Azure Storage.
The following sections discuss these and the other resources of Batch that enable your distributed computational
scenario.
Note: You need a Batch account to use the Batch service. Also, nearly all solutions use an Azure Storage account for
file storage and retrieval. Batch currently supports only the General purpose storage account type, as described in step
5 of Create a storage account in About Azure storage accounts.
Batch service resources
Some of the following resources--accounts, compute nodes, pools, jobs, and tasks--are required by all solutions that
use the Batch service. Others, like job schedules and application packages, are helpful, but optional, features.
Account
Compute node
Pool
Job
o Job schedules
Task
o Start task
o Job manager task
o Job preparation and release tasks
o Multi-instance task (MPI)
o Task dependencies
Application packages
Account
A Batch account is a uniquely identified entity within the Batch service. All processing is associated with a Batch
account.
You can create an Azure Batch account using the Azure portal or programmatically, such as with the Batch
Management .NET library. When creating the account, you can associate an Azure storage account.
Batch supports two account configurations, and you'll need to select the appropriate configuration when you create
your Batch account. The difference between the two account configurations lies in how Batch pools are allocated for
the account. You can either allocate pools of compute nodes in a subscription managed by Azure Batch, or you can
allocate them in your own subscription. The pool allocation mode property for the account determines which
configuration it uses.
To decide which account configuration to use, consider which best fits your scenario:
Batch Service: Batch Service is the default account configuration. For an account created with this
configuration, Batch pools are allocated behind the scenes in Azure-managed subscriptions. Keep in mind
these key points about the Batch Service account configuration:
o The Batch Service account configuration supports both Cloud Service and Virtual Machine pools.
237 | P a g e
70-534 Architecting Microsoft Azure Solutions
o The Batch Service account configuration supports access to the Batch APIs using either shared key
authentication or Azure Active Directory authentication.
o You can use either dedicated or low-priority compute nodes in pools in the Batch Service account
configuration.
o Do not use the Batch Service account configuration if you plan to create Azure virtual machine pools
from custom VM images, or if you plan to use a virtual network. Create your account with the User
Subscription account configuration instead.
o Virtual Machine pools provisioned in an account with the Batch Service subscription account
configuration must be created from Azure Virtual Machines Marketplace images.
User subscription: With the User Subscription account configuration, Batch pools are allocated in the Azure
subscription where the account is created. Keep in mind these key points about the User Subscription account
configuration:
o The User Subscription account configuration supports only Virtual Machine pools. It does not support
Cloud Services pools.
o To create Virtual Machine pools from custom VM images or to use a virtual network with Virtual
Machine pools, you must use the User Subscription configuration.
o You must authenticate requests to the Batch service using Azure Active Directory authentication.
o The User Subscription account configuration requires you to set up an Azure key vault for your Batch
account.
o You can use only dedicated compute nodes in pools in an account created with the User Subscription
account configuration. Low-priority nodes are not supported.
o Virtual Machine pools provisioned in an account with the User Subscription account configuration can
be created either from Azure Virtual Machines Marketplace images, or from custom images that you
provide.
Compute node
A compute node is an Azure virtual machine (VM) or cloud service VM that is dedicated to processing a portion of
your application's workload. The size of a node determines the number of CPU cores, memory capacity, and local file
system size that is allocated to the node. You can create pools of Windows or Linux nodes by using either Azure Cloud
Services or Virtual Machines Marketplace images. See the following Pool section for more information on these
options.
Nodes can run any executable or script that is supported by the operating system environment of the node. This
includes *.exe, *.cmd, *.bat and PowerShell scripts for Windows--and binaries, shell, and Python scripts for Linux.
All compute nodes in Batch also include:
A standard folder structure and associated environment variables that are available for reference by tasks.
Firewall settings that are configured to control access.
Remote access to both Windows (Remote Desktop Protocol (RDP)) and Linux (Secure Shell (SSH)) nodes.
Pool
A pool is a collection of nodes that your application runs on. The pool can be created manually by you, or
automatically by the Batch service when you specify the work to be done. You can create and manage a pool that
meets the resource requirements of your application. A pool can be used only by the Batch account in which it was
created. A Batch account can have more than one pool.
Azure Batch pools build on top of the core Azure compute platform. They provide large-scale allocation, application
installation, data distribution, health monitoring, and flexible adjustment of the number of compute nodes within a
pool (scaling).
Every node that is added to a pool is assigned a unique name and IP address. When a node is removed from a pool,
any changes that are made to the operating system or files are lost, and its name and IP address are released for
future use. When a node leaves a pool, its lifetime is over.
When you create a pool, you can specify the following attributes. Some settings differ, depending on the pool
allocation mode of the Batch account:
Compute node operating system and version
Compute node type and target number of nodes
Size of the compute nodes
Scaling policy
Task scheduling policy
238 | P a g e
70-534 Architecting Microsoft Azure Solutions
One unique custom image VHD blob can support up to 40 Linux VM instances or 20 Windows VM instances.
You will need to create copies of the VHD blob to create pools with more VMs. For example, a pool with 200
Windows VMs needs 10 unique VHD blobs specified for the osDisk property.
When you create a pool, you need to select the appropriate nodeAgentSkuId, depending on the OS of the base image
of your VHD. You can get a mapping of available node agent SKU ID's to their OS Image references by calling the List
Supported Node Agent SKUs operation.
To create a pool from a custom image using the Azure portal:
1. Navigate to your Batch account in the Azure portal.
2. On the Settings blade, select the Pools menu item.
3. On the Pools blade, select the Add command; the Add pool blade will be displayed.
4. Select Custom Image (Linux/Windows) from the Image Type dropdown. The portal displays the Custom
Image picker. Choose one or more VHDs from the same container and click the Select button. Support for
multiple VHDs from different storage accounts and different containers will be added in the future.
5. Select the correct Publisher/Offer/Sku for your custom VHDs, select the desired Caching mode, then fill in all
the other parameters for the pool.
6. To check if a pool is based on a custom image, see the Operating System property in the resource summary
section of the Pool blade. The value of this property should be Custom VM image.
7. All custom VHDs associated with a pool are displayed on the pool's Properties blade.
Compute node type and target number of nodes
When you create a pool, you can specify which types of compute nodes you want and the target number for each.
The two types of compute nodes are:
Dedicated compute nodes. Dedicated compute nodes are reserved for your workloads. They are more
expensive than low-priority nodes, but they are guaranteed to never be preempted.
Low-priority compute nodes. Low-priority nodes take advantage of surplus capacity in Azure to run your Batch
workloads. Low-priority nodes are less expensive per hour than dedicated nodes, and enable workloads
requiring a lot of compute power. For more information, see Use low-priority VMs with Batch.
Low-priority compute nodes may be preempted when Azure has insufficient surplus capacity. If a node is preempted
while running tasks, the tasks are requeued and run again once a compute node becomes available again. Low-priority
nodes are a good option for workloads where the job completion time is flexible and the work is distributed across
many nodes. Before you decide to use low-priority nodes for your scenario, make sure that any work lost due to
preemption will be minimal and easy to recreate.
Low-priority compute nodes are available only for Batch accounts created with the pool allocation mode set to Batch
Service.
You can have both low-priority and dedicated compute nodes in the same pool. Each type of node low-priority and
dedicated has its own target setting, for which you can specify the desired number of nodes.
The number of compute nodes is referred to as a target because, in some situations, your pool might not reach the
desired number of nodes. For example, a pool might not achieve the target if it reaches the core quota for your Batch
account first. Or, the pool might not achieve the target if you have applied an auto-scaling formula to the pool that
limits the maximum number of nodes.
For pricing information for both low-priority and dedicated compute nodes, see Batch Pricing.
Size of the compute nodes
Cloud Services Configuration compute node sizes are listed in Sizes for Cloud Services. Batch supports all Cloud
Services sizes except ExtraSmall, STANDARD_A1_V2, and STANDARD_A2_V2.
Virtual Machine Configuration compute node sizes are listed in Sizes for virtual machines in Azure (Linux) and Sizes for
virtual machines in Azure(Windows). Batch supports all Azure VM sizes except STANDARD_A0 and those with
premium storage (STANDARD_GS, STANDARD_DS, and STANDARD_DSV2 series).
When selecting a compute node size, consider the characteristics and requirements of the applications you'll run on
the nodes. Aspects like whether the application is multithreaded and how much memory it consumes can help
determine the most suitable and cost-effective node size. It's typical to select a node size assuming one task will run
on a node at a time. However, it is possible to have multiple tasks (and therefore multiple application instances) run in
parallel on compute nodes during job execution. In this case, it is common to choose a larger node size to
accommodate the increased demand of parallel task execution. See Task scheduling policy for more information.
All of the nodes in a pool are the same size. If you intend to run applications with differing system requirements
and/or load levels, we recommend that you use separate pools.
240 | P a g e
70-534 Architecting Microsoft Azure Solutions
Scaling policy
For dynamic workloads, you can write and apply an auto-scaling formula to a pool. The Batch service periodically
evaluates your formula and adjusts the number of nodes within the pool based on various pool, job, and task
parameters that you can specify.
Task scheduling policy
The max tasks per node configuration option determines the maximum number of tasks that can be run in parallel on
each compute node within the pool.
The default configuration specifies that one task at a time runs on a node, but there are scenarios where it is
beneficial to have two or more tasks executed on a node simultaneously. See the example scenario in the concurrent
node tasks article to see how you can benefit from multiple tasks per node.
You can also specify a fill type which determines whether Batch spreads the tasks evenly across all nodes in a pool, or
packs each node with the maximum number of tasks before assigning tasks to another node.
Communication status for compute nodes
In most scenarios, tasks operate independently and do not need to communicate with one another. However, there
are some applications in which tasks must communicate, like MPI scenarios.
You can configure a pool to allow internode communication, so that nodes within a pool can communicate at runtime.
When internode communication is enabled, nodes in Cloud Services Configuration pools can communicate with each
other on ports greater than 1100, and Virtual Machine Configuration pools do not restrict traffic on any port.
Note that enabling internode communication also impacts the placement of the nodes within clusters and might limit
the maximum number of nodes in a pool because of deployment restrictions. If your application does not require
communication between nodes, the Batch service can allocate a potentially large number of nodes to the pool from
many different clusters and datacenters to enable increased parallel processing power.
Start tasks for compute nodes
The optional start task executes on each node as that node joins the pool, and each time a node is restarted or
reimaged. The start task is especially useful for preparing compute nodes for the execution of tasks, like installing the
applications that your tasks run on the compute nodes.
Application packages
You can specify application packages to deploy to the compute nodes in the pool. Application packages provide
simplified deployment and versioning of the applications that your tasks run. Application packages that you specify for
a pool are installed on every node that joins that pool, and every time a node is rebooted or reimaged. Application
packages are currently unsupported on Linux compute nodes.
Network configuration
You can specify the subnet of an Azure virtual network (VNet) in which the pool's compute nodes should be created.
See the Pool network configuration section for more information.
Job
A job is a collection of tasks. It manages how computation is performed by its tasks on the compute nodes in a pool.
The job specifies the pool in which the work is to be run. You can create a new pool for each job, or use one
pool for many jobs. You can create a pool for each job that is associated with a job schedule, or for all jobs
that are associated with a job schedule.
You can specify an optional job priority. When a job is submitted with a higher priority than jobs that are
currently in progress, the tasks for the higher-priority job are inserted into the queue ahead of tasks for the
lower-priority jobs. Tasks in lower-priority jobs that are already running are not preempted.
You can use job constraints to specify certain limits for your jobs:
You can set a maximum wallclock time, so that if a job runs for longer than the maximum wallclock time that is
specified, the job and all of its tasks are terminated.
Batch can detect and then retry failed tasks. You can specify the maximum number of task retries as a constraint,
including whether a task is always or never retried. Retrying a task means that the task is requeued to be run again.
Your client application can add tasks to a job, or you can specify a job manager task. A job manager task
contains the information that is necessary to create the required tasks for a job, with the job manager task
being run on one of the compute nodes in the pool. The job manager task is handled specifically by Batch--it is
queued as soon as the job is created, and is restarted if it fails. A job manager task is required for jobs that are
created by a job schedule because it is the only way to define the tasks before the job is instantiated.
By default, jobs remain in the active state when all tasks within the job are complete. You can change this
behavior so that the job is automatically terminated when all tasks in the job are complete. Set the
241 | P a g e
70-534 Architecting Microsoft Azure Solutions
243 | P a g e
70-534 Architecting Microsoft Azure Solutions
Job release task: When a job has completed, a job release task runs on each node in the pool that executed at
least one task. You can use a job release task to delete data that is copied by the job preparation task, or to
compress and upload diagnostic log data, for example.
Both job preparation and release tasks allow you to specify a command line to run when the task is invoked. They
offer features like file download, elevated execution, custom environment variables, maximum execution duration,
retry count, and file retention time.
For more information on job preparation and release tasks, see Run job preparation and completion tasks on Azure
Batch compute nodes.
Multi-instance task
A multi-instance task is a task that is configured to run on more than one compute node simultaneously. With multi-
instance tasks, you can enable high-performance computing scenarios that require a group of compute nodes that are
allocated together to process a single workload (like Message Passing Interface (MPI)).
For a detailed discussion on running MPI jobs in Batch by using the Batch .NET library, check out Use multi-instance
tasks to run Message Passing Interface (MPI) applications in Azure Batch.
Task dependencies
Task dependencies, as the name implies, allow you to specify that a task depends on the completion of other tasks
before its execution. This feature provides support for situations in which a "downstream" task consumes the output
of an "upstream" task--or when an upstream task performs some initialization that is required by a downstream task.
To use this feature, you must first enable task dependencies on your Batch job. Then, for each task that depends on
another (or many others), you specify the tasks which that task depends on.
With task dependencies, you can configure scenarios like the following:
taskB depends on taskA (taskB will not begin execution until taskA has completed).
taskC depends on both taskA and taskB.
taskD depends on a range of tasks, such as tasks 1 through 10, before it executes.
Check out Task dependencies in Azure Batch and the TaskDependencies code sample in the azure-batch-
samples GitHub repository for more in-depth details on this feature.
Environment settings for tasks
Each task executed by the Batch service has access to environment variables that it sets on compute nodes. This
includes environment variables defined by the Batch service (service-defined) and custom environment variables that
you can define for your tasks. The applications and scripts your tasks execute have access to these environment
variables during execution.
You can set custom environment variables at the task or job level by populating the environment settings property for
these entities. For example, see the Add a task to a job operation (Batch REST API), or
the CloudTask.EnvironmentSettings and CloudJob.CommonEnvironmentSettingsproperties in Batch .NET.
Your client application or service can obtain a task's environment variables, both service-defined and custom, by using
the Get information about a task operation (Batch REST) or by accessing the CloudTask.EnvironmentSettings property
(Batch .NET). Processes executing on a compute node can access these and other environment variables on the node,
for example, by using the familiar %VARIABLE_NAME% (Windows) or $VARIABLE_NAME (Linux) syntax.
You can find a full list of all service-defined environment variables in Compute node environment variables.
Files and directories
Each task has a working directory under which it creates zero or more files and directories. This working directory can
be used for storing the program that is run by the task, the data that it processes, and the output of the processing it
performs. All files and directories of a task are owned by the task user.
The Batch service exposes a portion of the file system on a node as the root directory. Tasks can access the root
directory by referencing the AZ_BATCH_NODE_ROOT_DIR environment variable. For more information about using
environment variables, see Environment settings for tasks.
The root directory contains the following directory structure:
244 | P a g e
70-534 Architecting Microsoft Azure Solutions
shared: This directory provides read/write access to all tasks that run on a node. Any task that runs on the
node can create, read, update, and delete files in this directory. Tasks can access this directory by referencing
the AZ_BATCH_NODE_SHARED_DIR environment variable.
startup: This directory is used by a start task as its working directory. All of the files that are downloaded to
the node by the start task are stored here. The start task can create, read, update, and delete files under this
directory. Tasks can access this directory by referencing the AZ_BATCH_NODE_STARTUP_DIR environment
variable.
Tasks: A directory is created for each task that runs on the node. It is accessed by referencing
the AZ_BATCH_TASK_DIR environment variable.
Within each task directory, the Batch service creates a working directory (wd) whose unique path is specified by
the AZ_BATCH_TASK_WORKING_DIR environment variable. This directory provides read/write access to the task. The
task can create, read, update, and delete files under this directory. This directory is retained based on
the RetentionTime constraint that is specified for the task.
stdout.txt and stderr.txt: These files are written to the task folder during the execution of the task.
Important
When a node is removed from the pool, all of the files that are stored on the node are removed.
Application packages
The application packages feature provides easy management and deployment of applications to the compute nodes in
your pools. You can upload and manage multiple versions of the applications run by your tasks, including their binaries
and support files. Then you can automatically deploy one or more of these applications to the compute nodes in your
pool.
You can specify application packages at the pool and task level. When you specify pool application packages, the
application is deployed to every node in the pool. When you specify task application packages, the application is
deployed only to nodes that are scheduled to run at least one of the job's tasks, just before the task's command line is
run.
Batch handles the details of working with Azure Storage to store your application packages and deploy them to
compute nodes, so both your code and management overhead can be simplified.
To find out more about the application package feature, check out Application deployment with Azure Batch
application packages.
Note
If you add pool application packages to an existing pool, you must reboot its compute nodes for the application
packages to be deployed to the nodes.
Pool and compute node lifetime
When you design your Azure Batch solution, you have to make a design decision about how and when pools are
created, and how long compute nodes within those pools are kept available.
On one end of the spectrum, you can create a pool for each job that you submit, and delete the pool as soon as its
tasks finish execution. This maximizes utilization because the nodes are only allocated when needed, and shut down
as soon as they're idle. While this means that the job must wait for the nodes to be allocated, it's important to note
that tasks are scheduled for execution as soon as nodes are individually available, allocated, and the start task has
completed. Batch does not wait until all nodes within a pool are available before assigning tasks to the nodes. This
ensures maximum utilization of all available nodes.
245 | P a g e
70-534 Architecting Microsoft Azure Solutions
At the other end of the spectrum, if having jobs start immediately is the highest priority, you can create a pool ahead
of time and make its nodes available before jobs are submitted. In this scenario, tasks can start immediately, but
nodes might sit idle while waiting for them to be assigned.
A combined approach is typically used for handling a variable, but ongoing, load. You can have a pool that multiple
jobs are submitted to, but can scale the number of nodes up or down according to the job load (see Scaling compute
resources in the following section). You can do this reactively, based on current load, or proactively, if load can be
predicted.
Pool network configuration
When you create a pool of compute nodes in Azure Batch, you can specify a subnet ID of an Azure virtual network
(VNet) in which the pool's compute nodes should be created.
The VNet must be:
o In the same Azure region as the Azure Batch account.
o In the same subscription as the Azure Batch account.
The type of VNet supported depends on how pools are being allocated for the Batch account:
o If the Batch account was created with its poolAllocationMode property set to 'BatchService', then the
specified VNet must be a classic VNet.
o If the Batch account was created with its poolAllocationMode property set to 'UserSubscription', then
the specified VNet may be a classic VNet or an Azure Resource Manager VNet. Pools must be created
with a virtual machine configuration in order to use a VNet. Pools created with a cloud service
configuration are not supported.
If the Batch account was created with its poolAllocationMode property set to 'BatchService', then you must
provide permissions for the Batch service principal to access the VNet. The Batch service principal, named
'Microsoft Azure Batch' or 'MicrosoftAzureBatch', must have the Classic Virtual Machine Contributor Role-
Based Access Control (RBAC) role for the specified VNet. If the specified RBAC role is not provided, the Batch
service returns 400 (Bad Request).
The specified subnet should have enough free IP addresses to accommodate the total number of target
nodes; that is, the sum of the targetDedicatedNodes and targetLowPriorityNodes properties of the pool. If the
subnet doesn't have enough free IP addresses, the Batch service partially allocates the compute nodes in the
pool and returns a resize error.
The specified subnet must allow communication from the Batch service to be able to schedule tasks on the
compute nodes. If communication to the compute nodes is denied by a Network Security Group
(NSG) associated with the VNet, then the Batch service sets the state of the compute nodes to unusable.
If the specified VNet has any associated Network Security Groups (NSG), then a few reserved system ports
must be enabled for inbound communication. For pools created with a virtual machine configuration, enable
ports 29876 and 29877, as well as port 22 for Linux and port 3389 for Windows. For pools created with a
cloud service configuration, enable ports 10100, 20100, and 30100. Additionally, enable outbound
connections to Azure Storage on port 443.
The following table describes the inbound ports that you need to enable for pools that you created with the virtual
machine configuration:
Required
for VM to
Does Batch add be Action from
Destination Port(s) Source IP address NSGs? usable? user
For pools Only Batch service Yes. Batch adds Yes You do not
created with role IP addresses NSGs at the level need to
the virtual of network specify an
machine interfaces (NIC) NSG, because
configuration: attached to VMs. Batch allows
29876, 29877 These NSGs allow only Batch IP
For pools traffic only from addresses.
created with Batch service role
246 | P a g e
70-534 Architecting Microsoft Azure Solutions
Required
for VM to
Does Batch add be Action from
Destination Port(s) Source IP address NSGs? usable? user
If you specify
* as the
source IP in
your NSG,
Batch still
adds NSGs at
the level of
NIC attached
to VMs.
The following table describes the outbound port that you need to enable to permit access to Azure Storage:
Additional settings for the VNet depend on the pool allocation mode of the Batch account.
VNets for pools provisioned in the Batch service
In Batch service allocation mode, only Cloud Services Configuration pools can be assigned a VNet. Additionally, the
specified VNet must be aclassic VNet. VNets created with the Azure Resource Manager deployment model are not
supported.
The MicrosoftAzureBatch service principal must have the Classic Virtual Machine Contributor Role-Based
Access Control (RBAC) role for the specified VNet. In the Azure portal:
o Select the VNet, then Access control (IAM) > Roles > Classic Virtual Machine Contributor > Add
o Enter "MicrosoftAzureBatch" in the Search box
o Check the MicrosoftAzureBatch check box
o Select the Select button
247 | P a g e
70-534 Architecting Microsoft Azure Solutions
248 | P a g e
70-534 Architecting Microsoft Azure Solutions
File upload errors can occur if the SAS supplied for accessing Azure Storage is invalid or does not provide write
permissions, if the storage account is no longer available, or if another issue was encountered that prevented the
successful copying of files from the node.
Application failures
The process that is specified by the task's command line can also fail. The process is deemed to have failed when a
nonzero exit code is returned by the process that is executed by the task (see Task exit codes in the next section).
For application failures, you can configure Batch to automatically retry the task up to a specified number of times.
Constraint failures
You can set a constraint that specifies the maximum execution duration for a job or task, the maxWallClockTime. This
can be useful for terminating tasks that fail to progress.
When the maximum amount of time has been exceeded, the task is marked as completed, but the exit code is set
to 0xC000013A and the schedulingError field is marked as { category:"ServerError", code="TaskEnded"}.
Debugging application failures
stderr and stdout
During execution, an application might produce diagnostic output that you can use to troubleshoot issues. As
mentioned in the earlier section Files and directories, the Batch service writes standard output and standard error
output to stdout.txt and stderr.txt files in the task directory on the compute node. You can use the Azure portal or one
of the Batch SDKs to download these files. For example, you can retrieve these and other files for troubleshooting
purposes by using ComputeNode.GetNodeFile and CloudTask.GetNodeFile in the Batch .NET library.
Task exit codes
As mentioned earlier, a task is marked as failed by the Batch service if the process that is executed by the task returns
a nonzero exit code. When a task executes a process, Batch populates the task's exit code property with the return
code of the process. It is important to note that a task's exit code is not determined by the Batch service. A task's exit
code is determined by the process itself or the operating system on which the process executed.
Accounting for task failures or interruptions
Tasks might occasionally fail or be interrupted. The task application itself might fail, the node on which the task is
running might be rebooted, or the node might be removed from the pool during a resize operation if the pool's
deallocation policy is set to remove nodes immediately without waiting for tasks to finish. In all cases, the task can be
automatically requeued by Batch for execution on another node.
It is also possible for an intermittent issue to cause a task to hang or take too long to execute. You can set the
maximum execution interval for a task. If the maximum execution interval is exceeded, the Batch service interrupts
the task application.
Connecting to compute nodes
You can perform additional debugging and troubleshooting by signing in to a compute node remotely. You can use the
Azure portal to download a Remote Desktop Protocol (RDP) file for Windows nodes and obtain Secure Shell (SSH)
connection information for Linux nodes. You can also do this by using the Batch APIs--for example, with Batch
.NET or Batch Python.
Important
To connect to a node via RDP or SSH, you must first create a user on the node. To do this, you can use the Azure
portal, add a user account to a node by using the Batch REST API, call
the ComputeNode.CreateComputeNodeUser method in Batch .NET, or call the add_user method in the Batch Python
module.
Troubleshooting problematic compute nodes
In situations where some of your tasks are failing, your Batch client application or service can examine the metadata
of the failed tasks to identify a misbehaving node. Each node in a pool is given a unique ID, and the node on which a
task runs is included in the task metadata. After you've identified a problem node, you can take several actions with it:
Reboot the node (REST | .NET)
Restarting the node can sometimes clear up latent issues like stuck or crashed processes. Note that if your pool uses a
start task or your job uses a job preparation task, they are executed when the node restarts.
Reimage the node (REST | .NET)
This reinstalls the operating system on the node. As with rebooting a node, start tasks and job preparation tasks are
rerun after the node has been reimaged.
Remove the node from the pool (REST | .NET)
Sometimes it is necessary to completely remove the node from the pool.
249 | P a g e
70-534 Architecting Microsoft Azure Solutions
250 | P a g e
70-534 Architecting Microsoft Azure Solutions
3. Under Name, provide a name for the WebJob. The name must start with a letter or a number and cannot
contain any special characters other than "-" and "_".
4. In the How to Run box, choose Run on Demand.
5. In the File Upload box, click the folder icon and browse to the zip file that contains your script. The zip file
should contain your executable (.exe .cmd .bat .sh .php .py .js) as well as any supporting files needed to run
the program or script.
6. Check Create to upload the script to your web app.
The name you specified for the WebJob appears in the list on the WebJobs blade.
7. To run the WebJob, right-click its name in the list and click Run.
251 | P a g e
70-534 Architecting Microsoft Azure Solutions
Note: when deploying a WebJob from Visual Studio, make sure to mark your settings.job file properties as 'Copy if
newer'.
Create a scheduled WebJob using the Azure Scheduler
The following alternate technique makes use of the Azure Scheduler. In this case, your WebJob does not have any
direct knowledge of the schedule. Instead, the Azure Scheduler gets configured to trigger your WebJob on a schedule.
252 | P a g e
70-534 Architecting Microsoft Azure Solutions
The Azure Portal doesn't yet have the ability to create a scheduled WebJob, but until that feature is added you can do
it by using the classic portal.
1. In the classic portal go to the WebJob page and click Add.
2. In the How to Run box, choose Run on a schedule.
3. Choose the Scheduler Region for your job, and then click the arrow on the bottom right of the dialog to
proceed to the next screen.
4. In the Create Job dialog, choose the type of Recurrence you want: One-time job or Recurring job.
253 | P a g e
70-534 Architecting Microsoft Azure Solutions
6. If you want to start at a specific time, choose your starting time values under Starting On.
7. If you chose a recurring job, you have the Recur Every option to specify the frequency of occurrence and
the Ending On option to specify an ending time.
8. If you choose Weeks, you can select the On a Particular Schedule box and specify the days of the week that
you want the job to run.
254 | P a g e
70-534 Architecting Microsoft Azure Solutions
9. If you choose Months and select the On a Particular Schedule box, you can set the job to run on particular
numbered Days in the month.
10. If you choose Week Days, you can select which day or days of the week in the month you want the job to run
on.
255 | P a g e
70-534 Architecting Microsoft Azure Solutions
11. Finally, you can also use the Occurrences option to choose which week in the month (first, second, third etc.)
you want the job to run on the week days you specified.
12. After you have created one or more jobs, their names will appear on the WebJobs tab with their status,
schedule type, and other information. Historical information for the last 30 WebJobs is maintained.
256 | P a g e
70-534 Architecting Microsoft Azure Solutions
3. The Job Action page opens, where you can further configure the job.
257 | P a g e
70-534 Architecting Microsoft Azure Solutions
258 | P a g e
70-534 Architecting Microsoft Azure Solutions
2. Clicking the link opens the details page for the WebJob. This page shows you the name of the command run,
the last times it ran, and its success or failure. Under Recent job runs, click a time to see further details.
3. The WebJob Run Details page appears. Click Toggle Output to see the text of the log contents. The output log
is in text format.
259 | P a g e
70-534 Architecting Microsoft Azure Solutions
4. To see the output text in a separate browser window, click the download link. To download the text itself,
right-click the link and use your browser options to save the file contents.
5. The WebJobs link at the top of the page provides a convenient way to get to a list of WebJobs on the history
dashboard.
260 | P a g e
70-534 Architecting Microsoft Azure Solutions
Clicking one of these links takes you to the WebJob Details page for the job you selected.
Notes
Web apps in Free mode can time out after 20 minutes if there are no requests to the scm (deployment) site
and the web app's portal is not open in Azure. Requests to the actual site will not reset this.
Code for a continuous job needs to be written to run in an endless loop.
Continuous jobs run continuously only when the web app is up.
Basic and Standard modes offer the Always On feature which, when enabled, prevents web apps from
becoming idle.
You can only debug continuously running WebJobs. Debugging scheduled or on-demand WebJobs is not
supported.
Prerequisites
A GitHub account with at least one project.
261 | P a g e
70-534 Architecting Microsoft Azure Solutions
An Azure subscription. If you don't have one, create a free account before you begin.
Add Function Apps to your portal favorites
If you haven't already done so, add Function Apps to your favorites in the Azure portal. This makes it easier to find
your function apps. If you have already done this, skip to the next section.
1. Log in to the Azure portal.
2. Click the arrow at the bottom left to expand all services, type Functions in the Filter field, and then click the
star next to Function Apps.
This adds the Functions icon to the menu on the left of the portal.
3. Close the menu, then scroll down to the bottom to see the Functions icon. Click this icon to see a list of all
your function apps. Click your function app to work with functions in this app.
262 | P a g e
70-534 Architecting Microsoft Azure Solutions
263 | P a g e
70-534 Architecting Microsoft Azure Solutions
App name Globally unique Name that identifies your new function app.
name
Resource myResourceGroup Name for the new resource group in which to create your function
Group app.
Hosting Consumption plan Hosting plan that defines how resources are allocated to your
plan function app. In the default Consumption Plan, resources are added
dynamically as required by your functions. You only pay for the
time your functions run.
Location West Europe Choose a location near you or near other services your functions
will access.
Storage Globally unique Name of the new storage account used by your function app. You
account name can also use an existing account.
264 | P a g e
70-534 Architecting Microsoft Azure Solutions
2. Select the GitHubWebHook template for your desired language. Name your function, then select Create.
265 | P a g e
70-534 Architecting Microsoft Azure Solutions
3. In your new function, click </> Get function URL, then copy and save the values. Do the same thing for </> Get
GitHub secret. You use these values to configure the webhook in GitHub.
1. In GitHub, navigate to a repository that you own. You can also use any repository that you have forked. If you
need to fork a repository, use https://github.com/Azure-Samples/functions-quickstart.
2. Click Settings, then click Webhooks, and Add webhook.
267 | P a g e
70-534 Architecting Microsoft Azure Solutions
Payload URL Copied value Use the value returned by </> Get function URL.
Secret Copied value Use the value returned by </> Get GitHub secret.
Event triggers Let me select individual events We only want to trigger on issue comment events.
Issue comment
Now, the webhook is configured to trigger your function when a new issue comment is added.
Test the function
1. In your GitHub repository, open the Issues tab in a new browser window.
268 | P a g e
70-534 Architecting Microsoft Azure Solutions
2. In the new window, click New Issue, type a title, and then click Submit new issue.
3. In the issue, type a comment and click Comment.
4. Go back to the portal and view the logs. You should see a trace entry with the new comment text.
Clean up resources
Other quick starts in this collection build upon this quick start. If you plan to continue on to work with subsequent
quick starts or with the tutorials, do not clean up the resources created in this quick start.
If you do not plan to continue, click the Resource group for the function app in the portal, and then click Delete.
269 | P a g e
70-534 Architecting Microsoft Azure Solutions
3. Lets create a job that simply hits http://www.microsoft.com/ with a GET request. In the Scheduler Job screen,
enter the following information:
a. Name: getmicrosoft
b. Subscription: Your Azure subscription
c. Job Collection: Select an existing job collection, or click Create New > enter a name.
4. Next, in Action Settings, define the following values:
. Action Type: HTTP
a. Method: GET
b. URL: http://www.microsoft.com
270 | P a g e
70-534 Architecting Microsoft Azure Solutions
5. Finally, let's define a schedule. The job could be defined as a one-time job, but lets pick a recurrence
schedule:
. Recurrence: Recurring
a. Start: Today's date
b. Recur every: 12 Hours
c. End by: Two days from today's date
271 | P a g e
70-534 Architecting Microsoft Azure Solutions
6. Click Create
Manage and monitor jobs
Once a job is created, it appears in the main Azure dashboard. Click the job and a new window opens with the
following tabs:
1. Properties
2. Action Settings
3. Schedule
4. History
5. Users
Properties
These read-only properties describe the management metadata for the Scheduler job.
272 | P a g e
70-534 Architecting Microsoft Azure Solutions
Action settings
Clicking on a job in the Jobs screen allows you to configure that job. This lets you configure advanced settings, if you
didn't configure them in the quick-create wizard.
For all action types, you may change the retry policy and the error action.
For HTTP and HTTPS job action types, you may change the method to any allowed HTTP verb. You may also add,
delete, or change the headers and basic authentication information.
For storage queue action types, you may change the storage account, queue name, SAS token, and body.
For service bus action types, you may change the namespace, topic/queue path, authentication settings, transport
type, message properties, and message body.
273 | P a g e
70-534 Architecting Microsoft Azure Solutions
Schedule
This lets you reconfigure the schedule, if you'd like to change the schedule you created in the quick-create wizard.
This is an opportunity to build complex schedules and advanced recurrence in your job
You may change the start date and time, recurrence schedule, and the end date and time (if the job is recurring.)
274 | P a g e
70-534 Architecting Microsoft Azure Solutions
History
The History tab displays selected metrics for every job execution in the system for the selected job. These metrics
provide real-time values regarding the health of your Scheduler:
1. Status
2. Details
3. Retry attempts
4. Occurrence: 1st, 2nd, 3rd, etc.
5. Start time of execution
6. End time of execution
275 | P a g e
70-534 Architecting Microsoft Azure Solutions
You can click on a run to view its History Details, including the whole response for every execution. This dialog box also
allows you to copy the response to the clipboard.
Users
Azure Role-Based Access Control (RBAC) enables fine-grained access management for Azure Scheduler. To learn how
to use the Users tab, refer to Azure Role-Based Access Control
276 | P a g e
70-534 Architecting Microsoft Azure Solutions
WCF x
.NET Core x
.NET Framework x x
JavaScript/NodeJS x
Hybrid Connections
The Azure Relay Hybrid Connections capability is a secure, open-protocol evolution of the existing Relay features that
can be implemented on any platform and in any language that has a basic WebSocket capability, which explicitly
includes the WebSocket API in common web browsers. Hybrid Connections is based on HTTP and WebSockets.
WCF Relays
The WCF Relay works for the full .NET Framework (NETFX) and for WCF. You initiate the connection between your on-
premises service and the relay service using a suite of WCF "relay" bindings. Behind the scenes, the relay bindings map
to new transport binding elements designed to create WCF channel components that integrate with Service Bus in the
cloud.
Service history
277 | P a g e
70-534 Architecting Microsoft Azure Solutions
Hybrid Connections supplants the former, similarly named "BizTalk Services" feature that was built on the Azure
Service Bus WCF Relay. The new Hybrid Connections capability complements the existing WCF Relay feature and these
two service capabilities exist side-by-side in the Azure Relay service for the foreseeable future. They share a common
gateway, but are otherwise different implementations.
The security token that must be used to register the listener and maintain the control channel may expire while the
listener is active. The token expiry will not affect ongoing connections, but it will cause the control channel to be
dropped by the service at or soon after the instant of expiry. The "renew" operation is a JSON message that the
listener can send to replace the token associated with the control channel, so that the control channel can be
maintained for extended periods.
Ping
If the control channel stays idle for a long time, intermediaries on the way, such as load balancers or NATs may drop
the TCP connection. The "ping" operation avoids that by sending a small amount of data on the channel that reminds
everyone on the network route that the connection is meant to be alive, and it also serves as a "liveness" test for the
listener. If the ping fails, the control channel should be considered unusable and the listener should reconnect.
Sender interaction
The sender only has a single interaction with the service; it connects.
Connect
The "connect" operation opens a web socket on the service, providing the name of the Hybrid Connection and
(optionally, but required by default) a security token conferring "Send" permission in the query string. The service will
then interact with the listener in the way described previously, and have the listener create a rendezvous connection
that will be joined with this web socket. After the web socket has been accepted, all further interactions on the web
socket will therefore be with a connected listener.
Interaction summary
The result of this interaction model is that the sender client comes out of the handshake with a "clean" web socket
which is connected to a listener and that needs no further preambles or preparation. This allows practically any
existing web socket client implementation to readily take advantage of the Hybrid Connections service by simply
supplying a correctly-constructed URL into their web socket client layer.
The rendezvous connection web Socket that the listener obtains through the accept interaction is also clean and can
be handed to any existing web socket server implementation with some minimal extra abstraction that distinguishes
between "accept" operations on their framework's local network listeners and Hybrid Connections remote "accept"
operations.
Protocol reference
This section describes the details of the protocol interactions described above.
All web socket connections are made on port 443 as an upgrade from HTTPS 1.1, which is commonly abstracted by
some web socket framework or API. The description here is kept implementation neutral, without suggesting a
specific framework.
Listener protocol
The listener protocol consists of two connection gestures and three message operations.
Listener control channel connection
The control channel is opened with creating a web socket connection to:
Copy
wss://{namespace-address}/$hc/{path}?sb-hc-action=...[&sb-hc-id=...]&sb-hc-token=...
The namespace-address is the fully qualified domain name of the Azure Relay namespace that hosts the Hybrid
Connection, typically of the form {myname}.servicebus.windows.net.
The query string parameter options are as follows.
sb-hc- Yes For the listener role the parameter must be sb-hc-action=listen
action
{path} Yes The URL-encoded namespace path of the preconfigured Hybrid Connection
to register this listener on. This expression is appended to the fixed $hc/ path
portion.
279 | P a g e
70-534 Architecting Microsoft Azure Solutions
sb-hc- Yes* The listener must provide a valid, URL-encoded Service Bus Shared Access
token Token for the namespace or Hybrid Connection that confers the Listen right.
If the web socket connection fails due to the Hybrid Connection path not being registered, or an invalid or missing
token, or some other error, the error feedback will be provided using the regular HTTP 1.1 status feedback model. The
status description will contain an error tracking-id that can be communicated to Azure support:
404 Not Found The Hybrid Connection path is invalid or the base URL is malformed.
403 Forbidden The security token is not valid for this path for this action.
If the web socket connection is intentionally shut down by the service after it was initially set up, the reason for doing
so will be communicated using an appropriate web socket protocol error code along with a descriptive error message
that will also include a tracking ID. The service will not shut down the control channel without encountering an error
condition. Any clean shutdown is client controlled.
WS Status Description
1008 The security token has expired and the authorization policy is therefore violated.
Accept handshake
The accept notification is sent by the service to the listener over the previously established control channel as a JSON
message in a web socket text frame. There is no reply to this message.
The message contains a JSON object named "accept," which defines the following properties at this time:
address the URL string to be used for establishing the web socket to the service to accept an incoming
connection.
id the unique identifier for this connection. If the ID was supplied by the sender client, it is the sender
supplied value, otherwise it is a system generated value.
connectHeaders all HTTP headers that have been supplied to the Relay endpoint by the sender, which also
includes the Sec-WebSocket-Protocol and the Sec-WebSocket-Extensions headers.
Accept Message
{
"accept" : {
"address" : "wss://168.61.148.205:443/$hc/{path}?..."
"id" : "4cb542c3-047a-4d40-a19f-bdc66441e736",
"connectHeaders" : {
"Host" : "...",
"Sec-WebSocket-Protocol" : "...",
"Sec-WebSocket-Extensions" : "..."
280 | P a g e
70-534 Architecting Microsoft Azure Solutions
}
}
}
The address URL provided in the JSON message is used by the listener to establish the web socket for accepting or
rejecting the sender socket.
Accepting the Socket
To accept, the listener establishes a web socket connection to the provided address.
If the "accept" message carries a "Sec-WebSocket-Protocol" header, it is expected that the listener will only accept the
web socket if it supports that protocol and that it sets the header as the web socket is established.
The same applies to the "Sec-WebSocket-Extensions" header. If the framework supports an extension, it should set
the header to the server side reply of the required "Sec-WebSocket-Extensions" handshake for the extension.
The URL must be used as-is for establishing the accept socket, but contains the following parameters:
The {path} is the URL-encoded namespace path of the preconfigured Hybrid Connection on which to register this
listener. This expression is appended to the fixed $hc/ path portion.
The path expression MAY be extended with a suffix and a query string expression that follows the registered name
after a separating forward slash. This allows the sender client to pass dispatch arguments to the accepting listener
when it is not possible to include HTTP headers. The expectation is that the listener framework will parse out the fixed
path portion and the registered name from the path and make the remainder, possibly without any query string
arguments prefixed by "sb-", available to the application for deciding whether to accept the connection.
For more details see the following "Sender Protocol" section.
If theres an error, the service may reply as follows:
After the connection has been established, the server will shut down the web socket when the sender web socket
shuts down, or with the following status.
WS Status Description
1008 The security token has expired and therefore the authorization policy is violated.
The protocol design choice here is to use a web socket handshake (that is designed to end in a defined error state) so
that listener client implementations can continue to rely on a web socket client and dont need to employ an extra,
bare HTTP client.
To reject the socket, the client takes the address URI from the "accept" message and appends two query string
parameters to it:
WS Status Description
1008 The security token has expired and the authorization policy is therefore violated.
Sender protocol
The sender protocol is effectively identical to how a listener is established. The goal is maximum transparency for the
end-to-end web socket. The address to connect to is the same as for the listener, but the "action" differs and the
token needs a different permission:
wss://{namespace-address}/$hc/{path}?sb-hc-action=...&sb-hc-id=...&sbc-hc-token=...
The namespace-address is the fully qualified domain name of the Azure Relay namespace that hosts the Hybrid
Connection, typically of the form {myname}.servicebus.windows.net.
The request may contain arbitrary extra HTTP headers, including application-defined ones. All supplied headers flow
to the listener and can be found on the "connectHeader" object of the "accept" control message.
The query string parameter options are as follows
282 | P a g e
70-534 Architecting Microsoft Azure Solutions
sb-hc- Yes For the sender role the parameter must be action=connect.
action
sb-hc- Yes* The listener must provide a valid, URL-encoded Service Bus Shared Access
token Token for the namespace or Hybrid Connection that confers the Send right.
The {path} is the URL-encoded namespace path of the preconfigured Hybrid Connection to register this listener on.
The path expression MAY be extended with a suffix and a query string expression to communicate further. If the
Hybrid Connection is registered under the path "hyco," the path expression can
be hyco/suffix?param=value&... followed by the query string parameters defined here. A complete expression may
then be:
wss://{namespace-address}/$hc/hyco/suffix?param=value&sb-hc-action=...[&sb-hc-id=...&]sbc-hc-token=...
The path expression is passed through to the listener in the address URI contained in the "accept" control message.
If the web socket connection fails due to the Hybrid Connection path not being registered, or an invalid or missing
token, or some other error, the error feedback will be provided using the regular HTTP 1.1 status feedback model. The
status description will contain an error tracking-id that can be communicated to Azure support:
404 Not Found The Hybrid Connection path is invalid or the base URL is malformed.
403 Forbidden The security token is not valid for this path for this action.
If the web socket connection is intentionally shut down by the service after it has been initially set up, the reason for
doing so will be communicated using an appropriate web socket protocol error code along with a descriptive error
message that will also include a tracking ID.
WS Status Description
1008 The security token has expired and therefore the authorization policy is violated.
283 | P a g e
70-534 Architecting Microsoft Azure Solutions
284 | P a g e
70-534 Architecting Microsoft Azure Solutions
To enable VNET Integration open your app Settings and then select Networking. The UI that opens up offers three
networking choices. This guide is only going into VNET Integration though Hybrid Connections and App Service
Environments are discussed later in this document.1
If your app is not in the correct pricing plan the UI will helpfully enable you to scale your plan to a higher pricing plan
of your choice.
285 | P a g e
70-534 Architecting Microsoft Azure Solutions
To enable integration simply click on the VNET you wish to integrate with. After you select the VNET, your app will be
automatically restarted for the changes to take effect.
Enable Point to Site in a Classic VNET
If your VNET does not have a gateway nor has Point to Site then you have to set that up first. To do this for a Classic
VNET, go to the Azure Portaland bring up the list of Virtual Networks(classic). From here click on the network you
want to integrate with and click on the big box under Essentials called VPN Connections. From here you can create
your point to site VPN and even have it create a gateway. After you go through the point to site with gateway creation
experience it will be about 30 minutes before it is ready.
286 | P a g e
70-534 Architecting Microsoft Azure Solutions
287 | P a g e
70-534 Architecting Microsoft Azure Solutions
Azure VNETs normally are created within private network addresses. By default the VNET Integration feature will
route any traffic destined for those IP address ranges into your VNET. The private IP address ranges are:
10.0.0.0/8 - this is the same as 10.0.0.0 - 10.255.255.255
172.16.0.0/12 - this is the same as 172.16.0.0 - 172.31.255.255
192.168.0.0/16 - this is the same as 192.168.0.0 - 192.168.255.255
The VNET address space needs to be specified in CIDR notation. If you are unfamiliar with CIDR notation, it is a
method for specifying address blocks using an IP address and an integer that represents the network mask. As a quick
reference, consider that 10.1.0.0/24 would be 256 addresses and 10.1.0.0/25 would be 128 addresses. An IPv4
address with a /32 would be just 1 address.
If you set the DNS server information here then that will be set for your VNET. After VNET creation you can edit this
information from the VNET user experiences.
When you create a Classic VNET using the VNET Integration UI, it will create a VNET in the same resource group as
your app.
How the system works
Under the covers this feature builds on top of Point-to-Site VPN technology to connect your app to your VNET. Apps in
Azure App Service have a multi-tenant system architecture which precludes provisioning an app directly in a VNET as
is done with virtual machines. By building on point-to-site technology we limit network access to just the virtual
288 | P a g e
70-534 Architecting Microsoft Azure Solutions
From the Network Feature Status page you can see if your app is connected to your VNET. If your VNET gateway is
down for whatever reason then this would show as not-connected.
289 | P a g e
70-534 Architecting Microsoft Azure Solutions
The information you now have available to you in the app level VNET Integration UI is the same as the detail
information you get from the ASP. Here are those items:
VNET Name - This link opens the the network UI
Location - This reflects the location of your VNET. It is possible to integrate with a VNET in another location.
Certificate Status - There are certificates used to secure the VPN connection between the app and the VNET.
This reflects a test to ensure they are in sync.
Gateway Status - Should your gateways be down for whatever reason then your app cannot access resources
in the VNET.
VNET address space - This is the IP address space for your VNET.
Point to Site address space - This is the point to site IP address space for your VNET. Your app will show
communication as coming from one of the IPs in this address space.
Site to site address space - You can use Site to Site VPNs to connect your VNET to your on premise resources
or to other VNETs. Should you have that configured then the IP ranges defined with that VPN connection will
show here.
DNS Servers - If you have DNS Servers configured with your VNET then they are listed here.
IPs routed to the VNET - There are a list of IP addresses that your VNET has routing defined for. Those
addresses will show here.
The only operation you can take in the app view of your VNET Integration is to disconnect your app from the VNET it is
currently connected to. To do this simply click Disconnect at the top. This action does not change your VNET. The
VNET and it's configuration including the gateways remains unchanged. If you then want to delete your VNET you
need to first delete the resources in it including the gateways.
The App Service Plan view has a number of additional operations. It is also accessed differently than from the app. To
reach the ASP Networking UI simply open your ASP UI and scroll down. There is a UI element called Network Feature
Status. It will give some minor details around your VNET Integration. Clicking on this UI opens the Network Feature
Status UI. If you then click on "Click here to manage" you will open up UI that lists the VNET Integrations in this ASP.
The location of the ASP is good to remember when looking at the locations of the VNETs you are integrating with.
When the VNET is in another location you are far more likely to see latency issues.
The VNETs integrated with is a reminder on how many VNETs your apps are integrated with in this ASP and how many
you can have.
To see added details on each VNET, just click on the VNET you are interested in. In addition to the details that were
noted earlier you will also see a list of the apps in this ASP that are using that VNET.
With respect to actions there are two primary actions. The first is the ability to add routes that drive traffic leaving
your app into your VNET. The second action is the ability to sync certificates and network information.
290 | P a g e
70-534 Architecting Microsoft Azure Solutions
Routing As noted earlier the routes that are defined in your VNET are what is used for directing traffic into your VNET
from your app. There are some uses though where customers want to send additional outbound traffic from an app
into the VNET and for them this capability is provided. What happens to the traffic after that is up to how the
customer configures their VNET.
Certificates The Certificate Status reflects a check being performed by the App Service to validate that the certificates
that we are using for the VPN connection are still good. When VNET Integration enabled, then if this is the first
integration to that VNET from any apps in this ASP, there is a required exchange of certificates to ensure the security
of the connection. Along with the certificates we get the DNS configuration, routes and other similar things that
describe the network. If those certificates or network information is changed then you will need to click "Sync
Network".NOTE: When you click "Sync Network" then you will cause a brief outage in connectivity between your app
and your VNET. While your app will not be restarted the loss of connectivity could cause your site to not function
properly.
291 | P a g e
70-534 Architecting Microsoft Azure Solutions
DNS is not accessible The DNS timeout is 3 seconds per DNS server. If you have 2 DNS servers that is 6
seconds. Use nameresolver to see if DNS is working. Remember you can't use nslookup as that does not use
the DNS your VNET is configured with.
Invalid P2S IP range The point to site IP range needs to be in the RFC 1918 private IP ranges (10.0.0.0-
10.255.255.255 / 172.16.0.0-172.31.255.255 / 192.168.0.0-192.168.255.255) If the range uses IPs outside of
that then things won't work.
If those items don't answer your problem, look first for the simple things like:
Does the Gateway show as being up in the Portal?
Do certificates show as being in sync?
Did anybody change the network configuration without doing a "Sync Network" in the affected ASPs?
If your gateway is down then bring it back up. If your certificates are out of sync then go to the ASP view of your VNET
Integration and hit "Sync Network". If you suspect that there has been a change made to your VNET configuration and
it wasn't sync'd with your ASPs then go to the ASP view of your VNET Integration and hit "Sync Network" Just as a
reminder, this will cause a brief outage with your VNET connection and your apps.
If all of that is fine then you need to dig in a bit deeper:
Are there any other apps using VNET Integration to reach resources in the same VNET?
Can you go to the app console and use tcpping to reach any other resources in your VNET?
If either of the above are true then your VNET Integration is fine and the problem is somewhere else. This is where it
gets to be more of a challenge because there is no simple way to see why you can't reach a host:port. Some of the
causes include:
you have a firewall up on your host preventing access to the application port from your point to site IP range.
Crossing subnets often requires Public access.
your target host is down
your application is down
you had the wrong IP or hostname
your application is listening on a different port than what you expected. You can check this by going onto that
host and using "netstat -aon" from the cmd prompt. This will show you what process ID is listening on what
port.
your network security groups are configured in such a manner that they prevent access to your application
host and port from your point to site IP range
Remember that you don't know what IP in your Point to Site IP range that your app will use so you need to allow
access from the entire range.
Additional debug steps include:
log onto another VM in your VNET and attempt to reach your resource host:port from there. There are some
TCP ping utilities that you can use for this purpose or can even use telnet if need be. The purpose here is just
to determine if connectivity is there from this other VM.
bring up an application on another VM and test access to that host and port from the console from your app
####On premise resources#### If your cannot reach resources on premise then the first thing you should
check is if you can reach a resource in your VNET. If that is working then the next steps are pretty easy. From a
VM in your VNET you need to try to reach the on premise application. You can use telnet or a TCP ping utility.
If your VM can't reach your on premise resource then first make sure your Site to Site VPN connection is
working. If it is working then check the same things noted earlier as well as the on premise gateway
configuration and status.
Now if your VNET hosted VM can reach your on premise system but your app can't then the reason is likely one of the
following:
your routes are not configured with your point to site IP ranges in your on premise gateway
your network security groups are blocking access for your Point to Site IP range
your on premise firewalls are blocking traffic from your Point to Site IP range
you have a User Defined Route(UDR) in your VNET that prevents your Point to Site based traffic from reaching
your on premise network
Hybrid Connections and App Service Environments
There are 3 features that enable access to VNET hosted resources. They are:
VNET Integration
Hybrid Connections
293 | P a g e
70-534 Architecting Microsoft Azure Solutions
294 | P a g e
70-534 Architecting Microsoft Azure Solutions
295 | P a g e
70-534 Architecting Microsoft Azure Solutions
296 | P a g e
70-534 Architecting Microsoft Azure Solutions
In addition, you can pull in telemetry from the host environments such as performance counters, Azure diagnostics, or
Docker logs. You can also set up web tests that periodically send synthetic requests to your web service.
All these telemetry streams are integrated in the Azure portal, where you can apply powerful analytic and search tools
to the raw data.
What's the overhead?
The impact on your app's performance is very small. Tracking calls are non-blocking, and are batched and sent in a
separate thread.
What does Application Insights monitor?
Application Insights is aimed at the development team, to help you understand how your app is performing and how
it's being used. It monitors:
Request rates, response times, and failure rates - Find out which pages are most popular, at what times of day,
and where your users are. See which pages perform best. If your response times and failure rates go high
when there are more requests, then perhaps you have a resourcing problem.
Dependency rates, response times, and failure rates - Find out whether external services are slowing you
down.
Exceptions - Analyse the aggregated statistics, or pick specific instances and drill into the stack trace and
related requests. Both server and browser exceptions are reported.
Page views and load performance - reported by your users' browsers.
AJAX calls from web pages - rates, response times, and failure rates.
User and session counts.
Performance counters from your Windows or Linux server machines, such as CPU, memory, and network
usage.
Host diagnostics from Docker or Azure.
Diagnostic trace logs from your app - so that you can correlate trace events with requests.
Custom events and metrics that you write yourself in the client or server code, to track business events such
as items sold or games won.
Where do I see my telemetry?
There are plenty of ways to explore your data. Check out these articles:
Application map
The components of your app, with key metrics and
alerts.
Profiler
Inspect the execution profiles of sampled requests.
297 | P a g e
70-534 Architecting Microsoft Azure Solutions
Usage analysis
Analyze user segmentation and retention.
Dashboards
Mash up data from multiple resources and share
with others. Great for multi-component
applications, and for continuous display in the team
room.
298 | P a g e
70-534 Architecting Microsoft Azure Solutions
Analytics
Answer tough questions about your app's
performance and usage by using this powerful
query language.
Visual Studio
See performance data in the code. Go to code from
stack traces.
Snapshot debugger
Debug snapshots sampled from live operations,
with parameter values.
Power BI
Integrate usage metrics with other business
intelligence.
REST API
Write code to run queries over your metrics and
raw data.
299 | P a g e
70-534 Architecting Microsoft Azure Solutions
Continuous export
Bulk export of raw data to storage as soon as it
arrives.
It discusses the various products and services available and how they work together. It can assist you to determine
which tools are most appropriate for you in what cases.
Why use Monitoring and Diagnostics?
Performance issues in your cloud app can impact your business. With multiple interconnected components and
frequent releases, degradations can happen at any time. And if youre developing an app, your users usually discover
issues that you didnt find in testing. You should know about these issues immediately, and have tools for diagnosing
and fixing the problems. Microsoft Azure has a range of tools for identifying these problems.
How do I monitor my Azure cloud apps?
There is a range of tools for monitoring Azure applications and services. Some of their features overlap. This is partly
for historical reasons and partly due to the blurring between development and operation of an application.
Here are the principal tools:
Azure Monitor is basic tool for monitoring services running on Azure. It gives you infrastructure-level data
about the throughput of a service and the surrounding environment. If you are managing your apps all in
Azure, deciding whether to scale up or down resources, then Azure Monitor gives you what you use to start.
Application Insights can be used for development and as a production monitoring solution. It works by
installing a package into your app, and so gives you a more internal view of whats going on. Its data includes
response times of dependencies, exception traces, debugging snapshots, execution profiles. It provides
powerful smart tools for analyzing all this telemetry both to help you debug an app and to help you
understand what users are doing with it. You can tell whether a spike in response times is due to something in
an app, or some external resourcing issue. If you use Visual Studio and the app is at fault, you can be taken
right to the problem line(s) of code so you can fix it.
Log Analytics is for those who need to tune performance and plan maintenance on applications running in
production. It is based in Azure. It collects and aggregates data from many sources, though with a delay of 10
to 15 minutes. It provides a holistic IT management solution for Azure, on-premises, and third-party cloud-
based infrastructure (such as Amazon Web Services). It provides richer tools to analyze data across more
sources, allows complex queries across all logs, and can proactively alert on specified conditions. You can even
collect custom data into its central repository so can query and visualize it.
System Center Operations Manager (SCOM) is for managing and monitoring large cloud installations. You
might be already familiar with it as a management tool for on-premises Windows Sever and Hyper-V based-
clouds, but it can also integrate with and manage Azure apps. Among other things, it can install Application
Insights on existing live apps. If an app goes down, it tells you in seconds. Note that Log Analytics does not
replace SCOM. It works well in conjunction with it.
Accessing monitoring in the Azure portal
All Azure monitoring services are now available in a single UI pane. For more information on how to access this area,
see Get started with Azure Monitor.
You can also access monitoring functions for specific resources by highlighting those resources and drilling down into
their monitoring options.
Examples of when to use which tool
The following sections show some basic scenarios and which tools should be used together.
Scenario 1 Fix errors in an Azure Application under development
The best option is to use Application Insights, Azure Monitor, and Visual Studio together
Azure now provides the full power of the Visual Studio debugger in the cloud. Configure Azure Monitor to send
telemetry to Application Insights. Enable Visual Studio to include the Application Insights SDK in your application. Once
in Application Insights, you can use the Application Map to discover visually which parts of your running application
are unhealthy or not. For those parts that are not healthy, errors and exceptions are already available for exploration.
You can use the various analytics in Application Insights to go deeper. If you are not sure about the error, you can use
the Visual Studio debugger to trace into code and pin point a problem further.
For more information, see Monitoring Web Apps and refer to the table of contents on the left for instructions on
various types of apps and languages.
Scenario 2 Debug an Azure .NET web application for errors that only show in production
Note
These features are in preview.
The best option is to use Application Insights and if possible Visual Studio for the full debugging experience.
301 | P a g e
70-534 Architecting Microsoft Azure Solutions
Use the Application Insights Snapshot Debugger to debug your app. When a certain error threshold occurs with
production components, the system automatically captures telemetry in windows of time called snapshots." The
amount captured is safe for a production cloud because its small enough not to affect performance but significant
enough to allow tracing. The system can capture multiple snapshots. You can look at a point in time in the Azure
portal or use Visual Studio for the full experience. With Visual Studio, developers can walk through that snapshot as if
they were debugging in real-time. Local variables, parameters, memory, and frames are all available. Developers must
be granted access to this production data via an RBAC role.
For more information, see Snapshot debugging.
Scenario 3 Debug an Azure application that uses containers or microservices
Same as scenario 1. Use Application Insights, Azure Monitor, and Visual Studio together Application Insights also
supports gathering telemetry from processes running inside containers and from microservices (Kubernetes, Docker,
Azure Service Fabric). For more information, see this video on debugging containers and microservices.
Scenario 4 Fix performance issues in your Azure application
The Application Insights profiler is designed to help troubleshoot these types of issues. You can identify and
troubleshoot performance issues for applications running in App Services (Web Apps, Logic Apps, Mobile Apps, API
Apps) and other compute resources such as Virtual Machines, Virtual machine scale sets (VMSS), Cloud Services, and
Service Fabric.
Note
Ability to profile Virtual Machines, Virtual machine scale sets (VMSS), Cloud Services and Services Fabric is in preview.
In addition, you are proactively notified by email about certain types of errors, such as slow page load times, by the
Smart Detection tool. You dont need to do any configuration on this tool. For more information, see Smart Detection
- Performance Anomalies and Smart Detection - Performance Anomalies.
302 | P a g e
70-534 Architecting Microsoft Azure Solutions
Log Analytics includes a query language to quickly retrieve and consolidate data in the repository. You can create and
save Log Searches to directly analyze data in the portal or have log searches run automatically to create an alert if the
results of the query indicate an important condition.
To get a quick graphical view of the health of your overall environment, you can add visualizations for saved log
searches to your dashboard.
303 | P a g e
70-534 Architecting Microsoft Azure Solutions
In order to analyze data outside of Log Analytics, you can export the data from the OMS repository into tools such
as Power BI or Excel. You can also leverage the Log Search API to build custom solutions that leverage Log Analytics
data or to integrate with other systems.
Add functionality with management solutions
Management solutions add functionality to OMS, providing additional data and analysis tools to Log Analytics. They
may also define new record types to be collected that can be analyzed with Log Searches or by additional user
interface provided by the solution in the dashboard. The example image below shows the Change Tracking solution
Solutions are available for a variety of functions, and additional solutions are consistently being added. You can easily
browse available solutions and add them to your OMS workspace from the Solutions Gallery or Azure Marketplace.
Many will be automatically deployed and start working immediately while others may require moderate configuration.
304 | P a g e
70-534 Architecting Microsoft Azure Solutions
Connected sources are the computers and other resources that generate data collected by Log Analytics. This can
include agents installed on Windows and Linux computers that connect directly or agents in a connected System
Center Operations Manager management group. For Azure resources, Log Analytics collects data from Azure Monitor
and Azure Diagnostics.
Data sources are the different kinds of data collected from each connected source. This
includes events and performance data from Windows and Linux agents in addition to sources such as IIS logs,
and custom text logs. You configure each data source that you want to collect, and the configuration is automatically
delivered to each connected source.
If you have custom requirements, then you can use the HTTP Data Collector API to write data to the repository from a
REST API client.
Log Analytics architecture
305 | P a g e
70-534 Architecting Microsoft Azure Solutions
The deployment requirements of Log Analytics are minimal since the central components are hosted in the Azure
cloud. This includes the repository in addition to the services that allow you to correlate and analyze collected data.
The portal can be accessed from any browser so there is no requirement for client software.
You must install agents on Windows and Linux computers, but there is no additional agent required for computers
that are already members of a connected SCOM management group. SCOM agents will continue to communicate with
management servers which will forward their data to Log Analytics. Some solutions though will require agents to
communicate directly with Log Analytics. The documentation for each solution will specify its communication
requirements.
When you sign up for Log Analytics, you will create an OMS workspace. You can think of the workspace as a unique
Log Analytics environment with its own data repository, data sources, and solutions. You may create multiple
workspaces in your subscription to support multiple environments such as production and test.
306 | P a g e
70-534 Architecting Microsoft Azure Solutions
Replication performanceSite Recovery provides continuous replication for Azure VMs and VMware VMs, and
replication frequency as low as 30 seconds for Hyper-V. You can reduce recovery time objective (RTO) with
Site Recovery's automated recovery process, and integration with Azure Traffic Manager
Application consistency You can configure application-consistent snapshots for the recovery points. In
addition to capturing disk data, application-consistent snapshots capture all data in memory, and all
transactions in process.
Testing without disruptionYou can easily run test failovers to support disaster recovery drills, without
affecting production environments and the ongoing replication.
Flexible failover and recoveryYou can run planned failovers for expected outages with zero-data loss, or
unplanned failovers with minimal data loss (depending on replication frequency) for unexpected disasters.
You can easily fail back to your primary site when it's available again.
Custom recovery plansRecovery plans allow you to model and customize failover and recovery of multi-tier
applications that are spread over multiple VMs. You order groups within plans, and add scripts and manual
actions. Recovery plans can be integrated with Azure automation runbooks.
Multi-tier appsYou can create recovery plans for sequenced failover and recovery of multi-tiered apps. You
can group machines in different tiers (for example database, web, app) within a recovery plan, and customize
how each group fails over and starts up.
Integration with existing BCDR technologiesSite Recovery integrates with other BCDR technologies. For
example, you can use Site Recovery to protect the SQL Server backend of corporate workloads, including
native support for SQL Server AlwaysOn, to manage the failover of availability groups.
Integration with the automation libraryA rich Azure Automation library provides production-ready,
application-specific scripts that can be downloaded and integrated with Site Recovery.
Simple network managementAdvanced network management in Site Recovery and Azure simplifies
application network requirements, including reserving IP addresses, configuring load-balancers, and
integrating Azure Traffic Manager for efficient network switchovers.
What's supported?
Supported Details
What can I replicate? Azure VMs (in preview), On-premises VMware VMs, Hyper-V VMs,
Windows and Linux physical servers.
Where can I replicate to? For Azure VMs, you can replicate another Azure region.
For on-premises machines, you can replicate to Azure storage, or
to a secondary datacenter.
Note
For Hyper-V, only VMs on Hyper-V hosts managed in System Center VMM clouds can replicate to a secondary
datacenter.
307 | P a g e
70-534 Architecting Microsoft Azure Solutions
What VMware servers/hosts do I need? | VMware VMs you want to replicate can be managed by supported vSphere
hosts/vCenter servers What workloads can I replicate | You can replicate any workload running on a supported
replication machine. In addition, the Site Recovery team have performed app-specific testing for a number of apps.
Which Azure portal?
Site Recovery can be deployed in the Azure portal.
In the Azure classic portal, you can manage Site Recovery with the classic services management model.
The classic portal should only be used to maintain existing Site Recovery deployments. You can't create new
vaults in the classic portal.
Azure In Azure you need a Replicated data is stored in the storage account, and
Microsoft Azure account, Azure VMs are created with the replicated data when
an Azure storage account, failover from your on-premises site occurs.
and a Azure network.
The Azure VMs connect to the Azure virtual network
when they're created.
VMM Hyper-V hosts are located If Hyper-V hosts are managed in VMM clouds, you
server in VMM clouds register the VMM server in the Recovery Services vault.
Hyper-V Hyper-V hosts and clusters If there's no VMM server, the Site Recovery Provider is
host can be deployed with or installed on the host to orchestrate replication with Site
without VMM server. Recovery over the internet. If there's a VMM server, the
Provider is installed on it, and not on the host.
308 | P a g e
70-534 Architecting Microsoft Azure Solutions
Hyper-V You need one or more Nothing needs to explicitly installed on VMs.
VMs VMs running on a Hyper-V
host server.
Learn about the deployment prerequisites and requirements for each of these components in the support matrix.
Figure 1: Hyper-V site to Azure replication
Replication process
309 | P a g e
70-534 Architecting Microsoft Azure Solutions
Enable protection
1. After you enable protection for a Hyper-V VM, in the Azure portal or on-premises, the Enable
protection starts.
2. The job checks that the machine complies with prerequisites, before invoking
the CreateReplicationRelationship, to set up replication with the settings you've configured.
3. The job starts initial replication by invoking the StartReplication method, to initialize a full VM replication, and
send the VM's virtual disks to Azure.
310 | P a g e
70-534 Architecting Microsoft Azure Solutions
311 | P a g e
70-534 Architecting Microsoft Azure Solutions
1. After the initial replication finishes, the Finalize protection on the virtual machine job configures network and
other post-replication settings, so that the virtual machine is protected.
2. If you're replicating to Azure, you might need to tweak the settings for the virtual machine so that it's ready
for failover. At this point you can run a test failover to check everything is working as expected.
Replicate the delta
1. After the initial replication, delta synchronization begins, in accordance with replication settings.
2. The Hyper-V Replica Replication Tracker tracks the changes to a virtual hard disk as .hrl files. Each disk that's
configured for replication has an associated .hrl file. This log is sent to the customer's storage account after
initial replication is complete. When a log is in transit to Azure, the changes in the primary disk are tracked in
another log file, in the same directory.
3. During initial and delta replication, you can monitor the VM in the VM view. Learn more.
Synchronize replication
1. If delta replication fails, and a full replication would be costly in terms of bandwidth or time, then a VM is
marked for resynchronization. For example, if the .hrl files reach 50% of the disk size, then the VM will be
marked for resynchronization.
2. Resynchronization minimizes the amount of data sent by computing checksums of the source and target
virtual machines, and sending only the delta data. Resynchronization uses a fixed-block chunking algorithm
where source and target files are divided into fixed chunks. Checksums for each chunk are generated and
then compared to determine which blocks from the source need to be applied to the target.
3. After resynchronization finishes, normal delta replication should resume. By default resynchronization is
scheduled to run automatically outside office hours, but you can resynchronize a virtual machine manually.
For example, you can resume resynchronization if a network outage or another outage occurs. To do this,
select the VM in the portal > Resynchronize.
Retry logic
312 | P a g e
70-534 Architecting Microsoft Azure Solutions
If a replication error occurs, there's a built-in retry. This logic can be classified into two categories:
Category Details
Recoverable Retries occur every replication interval, using an exponential back-off that increases the
errors retry interval from the start of the first attempt by 1, 2, 4, 8, and 10 minutes. If an error
persists, retry every 30 minutes. Examples include: network errors; low disk errors; low
memory conditions
Azure Site Recovery support matrix for replicating from on-premises to Azure
This article summarizes supported configurations and components for Azure Site Recovery when replicating and
recovering to Azure. For more about Azure Site Recovery requirements, see the prerequisites.
Support for deployment options
313 | P a g e
70-534 Architecting Microsoft Azure Solutions
Deployment Support
VMware VM/physical server vSphere 6.0, 5.5, or 5.1 with latest update
Hyper-V (with Virtual Machine System Center Virtual Machine Manager 2016 and System
Manager) Center Virtual Machine Manager 2012 R2
Note
A System Center Virtual Machine Manager 2016 cloud with a mixture of Windows Server 2016 and 2012 R2 hosts
isn't currently supported.
Host servers
Deployment Support
VMware VM/physical server vCenter 5.5 or 6.0 (support for 5.5 features only)
Hyper-V (with/without Virtual Windows Server 2016, Windows Server 2012 R2 with
Machine Manager) latest updates.
If SCVMM is used, Windows Server 2016 hosts should be
managed by SCVMM 2016.
Note
A Hyper-V site that mixes hosts running Windows Server 2016 and 2012 R2 isn't currently supported. Recovery to
an alternate location for VMs on a Windows Server 2016 host isn't currently supported.
Support for replicated machine OS versions
Virtual machines that are protected must meet Azure requirements when replicating to Azure. The following table
summarizes replicated operating system support in various deployment scenarios while using Azure Site Recovery.
This support is applicable for any workload running on the mentioned OS.
Hyper-V
(with/without
VMware/physical server VMM)
64-bit Windows Server 2012 R2, Windows Server 2012, Windows Server Any guest
2008 R2 with at least SP1 OS supported by
Azure
Red Hat Enterprise Linux 6.7, 6.8, 7.1, 7.2
Oracle Enterprise Linux 6.4, 6.5 running either the Red Hat compatible
kernel or Unbreakable Enterprise Kernel Release 3 (UEK3)
314 | P a g e
70-534 Architecting Microsoft Azure Solutions
Important
(Applicable to VMware/Physical servers replicating to Azure)
On Red Hat Enterprise Linux Server 7+ and CentOS 7+ servers, kernel version 3.10.0-514 is supported starting
from version 9.8 of the Azure Site Recovery Mobility service.
Customers on the 3.10.0-514 kernel with a version of the Mobility service lower than version 9.8 are required to
disable replication, update the version of the Mobility service to version 9.8 and then enable replication again.
Supported Ubuntu kernel versions for VMware/physical servers
Supported file systems and guest storage configurations on Linux (VMware/Physical servers)
The following file systems and storage configuration software is supported on Linux servers running on VMware or
Physical servers:
File systems: ext3, ext4, ReiserFS (Suse Linux Enterprise Server only), XFS (upto v4 only)
Volume manager : LVM2
Multipath software : Device Mapper
Physical servers with the HP CCISS storage controller aren't supported.
Note
On Linux servers the following directories (if set up as separate partitions/file-systems) must all be on the same
disk (the OS disk) on the source server: / (root), /boot, /usr, /usr/local, /var, /etc
XFS v5 features such as metadata checksum are currently not supported by ASR on XFS filesystems. Ensure that
your XFS filesystems aren't using any v5 features. You can use the xfs_info utility to check the XFS superblock for
the partition. If ftype is set to 1, then XFSv5 features are being used.
Support for network configuration
The following tables summarize network configuration support in various deployment scenarios that use Azure
Site Recovery to replicate to Azure.
Host network configuration
IPv6 No No
315 | P a g e
70-534 Architecting Microsoft Azure Solutions
NIC teaming No No
IPv6 No No
Static IP (Linux) No No
316 | P a g e
70-534 Architecting Microsoft Azure Solutions
No for physical
servers
EFI/UEFI No Yes
Encrypted disk No No
NFS No N/A
SMB 3.0 No No
Disk > 1 TB No No
317 | P a g e
70-534 Architecting Microsoft Azure Solutions
LVM-Logical Volume
Management
Cool storage No No
Hot storage No No
Import/export No No
service
You can deploy Site Recovery to replicate virtual machines and physical servers running any operating system
supported by Azure. This includes most versions of Windows and Linux. On-premises VMs that you want to
replicate must conform with the following Azure requirements while replicating to Azure.
319 | P a g e
70-534 Architecting Microsoft Azure Solutions
Move storage, No No No
network, Azure VMs
across resource groups
320 | P a g e
70-534 Architecting Microsoft Azure Solutions
Installed on VMware VM or
physical servers you want to
replicate
they don't offer you the type or amount of storage you need, or administrative tasks require too much time. In
contrast, Azure Backup delivers these key benefits:
Automatic storage management - Hybrid environments often require heterogeneous storage - some on-premises and
some in the cloud. With Azure Backup, there is no cost for using on-premises storage devices. Azure Backup
automatically allocates and manages backup storage, and it uses a pay-as-you-use model. Pay-as-you-use means that
you only pay for the storage that you consume. For more information, see the Azure pricing article.
Unlimited scaling - Azure Backup uses the underlying power and unlimited scale of the Azure cloud to deliver high-
availability - with no maintenance or monitoring overhead. You can set up alerts to provide information about events,
but you don't need to worry about high-availability for your data in the cloud.
Multiple storage options - An aspect of high-availability is storage replication. Azure Backup offers two types of
replication: locally redundant storage and geo-redundant storage. Choose the backup storage option based on need:
Locally redundant storage (LRS) replicates your data three times (it creates three copies of your data) in a
paired datacenter in the same region. LRS is a low-cost option for protecting your data from local hardware
failures.
Geo-redundant storage (GRS) replicates your data to a secondary region (hundreds of miles away from the
primary location of the source data). GRS costs more than LRS, but GRS provides a higher level of durability for
your data, even if there is a regional outage.
Unlimited data transfer - Azure Backup does not limit the amount of inbound or outbound data you transfer. Azure
Backup also does not charge for the data that is transferred. However, if you use the Azure Import/Export service to
import large amounts of data, there is a cost associated with inbound data. For more information about this cost,
see Offline-backup workflow in Azure Backup. Outbound data refers to data transferred from a Recovery Services
vault during a restore operation.
Data encryption - Data encryption allows for secure transmission and storage of your data in the public cloud. You
store the encryption passphrase locally, and it is never transmitted or stored in Azure. If it is necessary to restore any
of the data, only you have encryption passphrase, or key.
Application-consistent backup - Whether backing up a file server, virtual machine, or SQL database, you need to know
that a recovery point has all required data to restore the backup copy. Azure Backup provides application-consistent
backups, which ensured additional fixes are not needed to restore the data. Restoring application consistent data
reduces the restoration time, allowing you to quickly return to a running state.
Long-term retention - Instead of switching backup copies from disk to tape and moving the tape to an off-site location,
you can use Azure for short-term and long-term retention. Azure doesn't limit the length of time data remains in a
Backup or Recovery Services vault. You can keep data in a vault for as long as you like. Azure Backup has a limit of
9999 recovery points per protected instance. See the Backup and retention section in this article for an explanation of
how this limit may impact your backup needs.
Which Azure Backup components should I use?
If you aren't sure which Azure Backup component works for your needs, see the following table for information about
what you can protect with each component. The Azure portal provides a wizard, which is built into the portal, to guide
you through choosing the component to download and deploy. The wizard, which is part of the Recovery Services
vault creation, leads you through the steps for selecting a backup goal, and choosing the data or application to
protect.
Where are
What is backups
Component Benefits Limits protected? stored?
Azure Backup Back up files and folders Backup 3x per day Files, Recovery
(MARS) agent on physical or virtual Not application Folders Services
Windows OS (VMs can be aware; file, folder, vault
on-premises or in Azure) and volume-level
No separate backup restore only,
server required. No support for
Linux.
322 | P a g e
70-534 Architecting Microsoft Azure Solutions
Where are
What is backups
Component Benefits Limits protected? stored?
Target
Can be deployed on- storage
Component Can be deployed in Azure? premises? supported
323 | P a g e
70-534 Architecting Microsoft Azure Solutions
Target
Can be deployed on- storage
Component Can be deployed in Azure? premises? supported
Source
Data or Workload environment Azure Backup solution
Hyper-V virtual machine Windows Server System Center DPM (+ the Azure Backup
(Windows) agent),
Azure Backup Server (includes the Azure
Backup agent)
324 | P a g e
70-534 Architecting Microsoft Azure Solutions
Source
Data or Workload environment Azure Backup solution
Hyper-V virtual machine Windows Server System Center DPM (+ the Azure Backup
(Linux) agent),
Azure Backup Server (includes the Azure
Backup agent)
Microsoft SQL Server Windows Server System Center DPM (+ the Azure Backup
agent),
Azure Backup Server (includes the Azure
Backup agent)
Microsoft SharePoint Windows Server System Center DPM (+ the Azure Backup
agent),
Azure Backup Server (includes the Azure
Backup agent)
Microsoft Exchange Windows Server System Center DPM (+ the Azure Backup
agent),
Azure Backup Server (includes the Azure
Backup agent)
Azure IaaS VMs (Windows) running in Azure Azure Backup (VM extension)
Azure IaaS VMs (Linux) running in Azure Azure Backup (VM extension)
Linux support
The following table shows the Azure Backup components that have support for Linux.
System Center DPM File-consistent backup of Linux Guest VMs on Hyper-V and VMWare
VM restore of Hyper-V and VMWare Linux Guest VMs
Azure Backup Server File-consistent backup of Linux Guest VMs on Hyper-V and VMWare
VM restore of Hyper-V and VMWare Linux Guest VMs
File-consistent backup not available for Azure VM
Azure IaaS VM Backup Application-consistent backup using pre-script and post-script framework
Granular file recovery
Restore all VM disks
VM restore
325 | P a g e
70-534 Architecting Microsoft Azure Solutions
Azure Backup protects Premium Storage VMs. Azure Premium Storage is solid-state drive (SSD)-based storage
designed to support I/O-intensive workloads. Premium Storage is attractive for virtual machine (VM) workloads. For
more information about Premium Storage, see the article, Premium Storage: High-Performance Storage for Azure
Virtual Machine Workloads.
Back up Premium Storage VMs
While backing up Premium Storage VMs, the Backup service creates a temporary staging location, named
"AzureBackup-", in the Premium Storage account. The size of the staging location is equal to the size of the recovery
point snapshot. Be sure the Premium Storage account has adequate free space to accommodate the temporary
staging location. For more information, see the article, premium storage limitations. Once the backup job finishes, the
staging location is deleted. The price of storage used for the staging location is consistent with all Premium storage
pricing.
Note: Do not modify or edit the staging location.
Restore Premium Storage VMs
Premium Storage VMs can be restored to either Premium Storage or to normal storage. Restoring a Premium Storage
VM recovery point back to Premium Storage is the typical process of restoration. However, it can be cost effective to
restore a Premium Storage VM recovery point to standard storage. This type of restoration can be used if you need a
subset of files from the VM.
Using managed disk VMs with Azure Backup
Azure Backup protects managed disk VMs. Managed disks free you from managing storage accounts of virtual
machines and greatly simplify VM provisioning.
Back up managed disk VMs
Backing up VMs on managed disks is no different than backing up Resource Manager VMs. In the Azure portal, you
can configure the backup job directly from the Virtual Machine view or from the Recovery Services vault view. You can
back up VMs on managed disks through RestorePoint collections built on top of managed disks. Azure Backup also
supports backing up managed disk VMs encrypted using Azure Disk encryption(ADE).
Restore managed disk VMs
Azure Backup allows you to restore a complete VM with managed disks, or restore managed disks to a storage
account. Azure manages the managed disks during the restore process. You (the customer) manage the storage
account created as part of the restore process. When restoring managed encrypted VMs, the VM's keys and secrets
should exist in the key vault prior to starting the restore operation.
What are the features of each Backup component?
The following sections provide tables that summarize the availability or support of various features in each Azure
Backup component. See the information following each table for additional support or details.
Storage
Recovery Services
vault
Disk storage
Tape storage
Compression
(in Recovery Services
vault)
326 | P a g e
70-534 Architecting Microsoft Azure Solutions
Incremental backup
Disk deduplication
The Recovery Services vault is the preferred storage target across all components. System Center DPM and Azure
Backup Server also provide the option to have a local disk copy. However, only System Center DPM provides the
option to write data to a tape storage device.
Compression
Backups are compressed to reduce the required storage space. The only component that does not use compression is
the VM extension. The VM extension copies all backup data from your storage account to the Recovery Services vault
in the same region. No compression is used when transferring the data. Transferring the data without compression
slightly inflates the storage used. However, storing the data without compression allows for faster restoration, should
you need that recovery point.
Disk Deduplication
You can take advantage of deduplication when you deploy System Center DPM or Azure Backup Server on a Hyper-V
virtual machine. Windows Server performs data deduplication (at the host level) on virtual hard disks (VHDs) that are
attached to the virtual machine as backup storage.
Note
Deduplication is not available in Azure for any Backup component. When System Center DPM and Backup Server are
deployed in Azure, the storage disks attached to the VM cannot be deduplicated.
Incremental backup explained
Every Azure Backup component supports incremental backup regardless of the target storage (disk, tape, Recovery
Services vault). Incremental backup ensures that backups are storage and time efficient, by transferring only those
changes made since the last backup.
Comparing Full, Differential and Incremental backup
Storage consumption, recovery time objective (RTO), and network consumption varies for each type of backup
method. To keep the backup total cost of ownership (TCO) down, you need to understand how to choose the best
backup solution. The following image compares Full Backup, Differential Backup, and Incremental Backup. In the
image, data source A is composed of 10 storage blocks A1-A10, which are backed up monthly. Blocks A2, A3, A4, and
A9 change in the first month, and block A5 changes in the next month.
327 | P a g e
70-534 Architecting Microsoft Azure Solutions
With Full Backup, each backup copy contains the entire data source. Full backup consumes a large amount of network
bandwidth and storage, each time a backup copy is transferred.
Differential backup stores only the blocks that changed since the initial full backup, which results in a smaller amount
of network and storage consumption. Differential backups don't retain redundant copies of unchanged data.
However, because the data blocks that remain unchanged between subsequent backups are transferred and stored,
differential backups are inefficient. In the second month, changed blocks A2, A3, A4, and A9 are backed up. In the
third month, these same blocks are backed up again, along with changed block A5. The changed blocks continue to be
backed up until the next full backup happens.
Incremental Backup achieves high storage and network efficiency by storing only the blocks of data that changed since
the previous backup. With incremental backup, there is no need to take regular full backups. In the example, after the
full backup is taken for the first month, changed blocks A2, A3, A4, and A9 are marked as changed and transferred for
the second month. In the third month, only changed block A5 is marked and transferred. Moving less data saves
storage and network resources, which decreases TCO.
Security
Network
security
(to Azure)
Data security
(in Azure)
Network security
All backup traffic from your servers to the Recovery Services vault is encrypted using Advanced Encryption Standard
256. The backup data is sent over a secure HTTPS link. The backup data is also stored in the Recovery Services vault in
encrypted form. Only you, the Azure customer, have the passphrase to unlock this data. Microsoft cannot decrypt the
backup data at any point.
Warning
328 | P a g e
70-534 Architecting Microsoft Azure Solutions
Once you establish the Recovery Services vault, only you have access to the encryption key. Microsoft never maintains
a copy of your encryption key, and does not have access to the key. If the key is misplaced, Microsoft cannot recover
the backup data.
Data security
Backing up Azure VMs requires setting up encryption within the virtual machine. Use BitLocker on Windows virtual
machines and dm-crypt on Linux virtual machines. Azure Backup does not automatically encrypt backup data that
comes through this path.
Network
Network compression
(to backup server)
Network compression
(to Recovery Services
vault)
The VM extension (on the IaaS VM) reads the data directly from the Azure storage account over the storage network,
so it is not necessary to compress this traffic.
If you use a System Center DPM server or Azure Backup Server as a secondary backup server, compress the data going
from the primary server to the backup server. Compressing data before backing it up to DPM or Azure Backup Server,
saves bandwidth.
Network Throttling
The Azure Backup agent offers network throttling, which allows you to control how network bandwidth is used during
data transfer. Throttling can be helpful if you need to back up data during work hours but do not want the backup
process to interfere with other internet traffic. Throttling for data transfer applies to back up and restore activities.
Backup and retention
Azure Backup has a limit of 9999 recovery points, also known as backup copies or snapshots, per protected instance. A
protected instance is a computer, server (physical or virtual), or workload configured to back up data to Azure. For
more information, see the section, What is a protected instance. An instance is protected once a backup copy of data
has been saved. The backup copy of data is the protection. If the source data was lost or became corrupt, the backup
copy could restore the source data. The following table shows the maximum backup frequency for each component.
Your backup policy configuration determines how quickly you consume the recovery points. For example, if you create
a recovery point each day, then you can retain recovery points for 27 years before you run out. If you take a monthly
recovery point, you can retain recovery points for 833 years before you run out. The Backup service does not set an
expiration time limit on a recovery point.
329 | P a g e
70-534 Architecting Microsoft Azure Solutions
Backup frequency Three backups Two backups per Two backups per One backup
(to Recovery per day day day per day
Services vault)
Retention options Daily, weekly, Daily, weekly, Daily, weekly, Daily, weekly,
monthly, yearly monthly, yearly monthly, yearly monthly, yearly
Recovery points on Not applicable 64 for File 64 for File Not applicable
local disk Servers, Servers,
448 for 448 for
Application Application
Servers Servers
servers and workstations. Recovery Services vaults make it easy to organize your backup data, while minimizing
management overhead. You can create as many Recovery Services vaults as you like, within a subscription.
Backup vaults, which are based on Azure Service Manager, were the first version of the vault. Recovery Services
vaults, which add the Azure Resource Manager model features, are the second version of the vault. See the Recovery
Services vault overview article for a full description of the feature differences. You can no longer create Backup vaults
in the Azure portal, but Backup vaults are still supported.
Important
You can now upgrade your Backup vaults to Recovery Services vaults. For details, see the article Upgrade a Backup
vault to a Recovery Services vault. Microsoft encourages you to upgrade your Backup vaults to Recovery Services
vaults.
Starting November 1, 2017:
Any remaining Backup vaults will be automatically upgraded to Recovery Services vaults.
You won't be able to access your backup data in the classic portal. Instead, use the Azure portal to access your
backup data in Recovery Services vaults.
How does Azure Backup differ from Azure Site Recovery?
Azure Backup and Azure Site Recovery are related in that both services back up data and can restore that data.
However, these services serve different purposes in providing business continuity and disaster recovery in your
business. Use Azure Backup to protect and restore data at a more granular level. For example, if a presentation on a
laptop became corrupted, you would use Azure Backup to restore the presentation. If you wanted to replicate the
configuration and data on a VM across another datacenter, use Azure Site Recovery.
Azure Backup protects data on-premises and in the cloud. Azure Site Recovery coordinates virtual-machine and
physical-server replication, failover, and failback. Both services are important because your disaster recovery solution
needs to keep your data safe and recoverable (Backup) and keep your workloads available (Site Recovery) when
outages occur.
The following concepts can help you make important decisions around backup and disaster recovery.
Recovery The amount of Backup solutions have wide variability Disaster recovery
point acceptable data in their acceptable RPO. Virtual solutions have low
objective loss if a recovery machine backups usually have an RPO RPOs. The DR copy can
(RPO) needs to be of one day, while database backups be behind by a few
done. have RPOs as low as 15 minutes. seconds or a few
minutes.
Recovery The amount of Because of the larger RPO, the amount Disaster recovery
time time that it of data that a backup solution needs to solutions have smaller
objective takes to process is typically much higher, which RTOs because they are
(RTO) complete a leads to longer RTOs. For example, it more in sync with the
recovery or can take days to restore data from source. Fewer changes
restore. tapes, depending on the time it takes need to be processed.
to transport the tape from an off-site
location.
331 | P a g e
70-534 Architecting Microsoft Azure Solutions
If there are recovery services vaults in the subscription, the vaults are listed.
3. On the Recovery Services vaults menu, click Add.
332 | P a g e
70-534 Architecting Microsoft Azure Solutions
The Recovery Services vault blade opens, prompting you to provide a Name, Subscription, Resource group,
and Location.
333 | P a g e
70-534 Architecting Microsoft Azure Solutions
4. For Name, enter a friendly name to identify the vault. The name needs to be unique for the Azure
subscription. Type a name that contains between 2 and 50 characters. It must start with a letter, and can
contain only letters, numbers, and hyphens.
5. In the Subscription section, use the drop-down menu to choose the Azure subscription. If you use only one
subscription, that subscription appears and you can skip to the next step. If you are not sure which
subscription to use, use the default (or suggested) subscription. There are multiple choices only if your
organizational account is associated with multiple Azure subscriptions.
6. In the Resource group section:
select Create new if you want to create a new Resource group. Or
select Use existing and click the drop-down menu to see the available list of Resource groups.
For complete information on Resource groups, see the Azure Resource Manager overview.
7. Click Location to select the geographic region for the vault. This choice determines the geographic region
where your backup data is sent.
8. At the bottom of the Recovery Services vault blade, click Create.
It can take several minutes for the Recovery Services vault to be created. Monitor the status notifications in the upper
right-hand area of the portal. Once your vault is created, it appears in the list of Recovery Services vaults. If after
several minutes you don't see your vault, click Refresh.
Once you see your vault in the list of Recovery Services vaults, you are ready to set the storage redundancy.
334 | P a g e
70-534 Architecting Microsoft Azure Solutions
When you select the vault, the Recovery Services vault blade narrows, and the Settings blade (which has the name of
the vault at the top) and the vault details blade open.
335 | P a g e
70-534 Architecting Microsoft Azure Solutions
2. In the new vault's Settings blade, use the vertical slide to scroll down to the Manage section, and click Backup
Infrastructure. The Backup Infrastructure blade opens.
3. In the Backup Infrastructure blade, click Backup Configuration to open the Backup Configuration blade.
336 | P a g e
70-534 Architecting Microsoft Azure Solutions
By default, your vault has geo-redundant storage. If you use Azure as a primary backup storage endpoint, continue to
use Geo-redundant. If you don't use Azure as a primary backup storage endpoint, then choose Locally-redundant,
which reduces the Azure storage costs. Read more about geo-redundant and locally redundant storage options in
this Storage redundancy overview.
Now that you've created a vault, configure it for backing up files and folders.
Configure the vault
1. On the Recovery Services vault blade (for the vault you just created), in the Getting Started section,
click Backup, then on the Getting Started with Backup blade, select Backup goal.
337 | P a g e
70-534 Architecting Microsoft Azure Solutions
2. From the Where is your workload running? drop-down menu, select On-premises.
You choose On-premises because your Windows Server or Windows computer is a physical machine that is not in
Azure.
3. From the What do you want to backup? menu, select Files and folders, and click OK.
After clicking OK, a checkmark appears next to Backup goal, and the Prepare infrastructure blade opens.
338 | P a g e
70-534 Architecting Microsoft Azure Solutions
4. On the Prepare infrastructure blade, click Download Agent for Windows Server or Windows Client.
339 | P a g e
70-534 Architecting Microsoft Azure Solutions
If you are using Windows Server Essential, then choose to download the agent for Windows Server Essential. A pop-up
menu prompts you to run or save MARSAgentInstaller.exe.
You don't need to install the agent yet. You can install the agent after you have downloaded the vault credentials.
6. On the Prepare infrastructure blade, click Download.
340 | P a g e
70-534 Architecting Microsoft Azure Solutions
The vault credentials download to your Downloads folder. After the vault credentials finish downloading, you see a
pop-up asking if you want to open or save the credentials. Click Save. If you accidentally click Open, let the dialog that
attempts to open the vault credentials, fail. You cannot open the vault credentials. Proceed to the next step. The vault
credentials are in the Downloads folder.
2. Complete the Microsoft Azure Recovery Services Agent Setup Wizard. To complete the wizard, you need to:
Choose a location for the installation and cache folder.
Provide your proxy server info if you use a proxy server to connect to the internet.
341 | P a g e
70-534 Architecting Microsoft Azure Solutions
Provide your user name and password details if you use an authenticated proxy.
Provide the downloaded vault credentials
Save the encryption passphrase in a secure location.
Note
If you lose or forget the passphrase, Microsoft cannot help recover the backup data. Save the file in a secure location.
It is required to restore a backup.
The agent is now installed and your machine is registered to the vault. You're ready to configure and schedule your
backup.
Back up your files and folders
The initial backup includes two key tasks:
Schedule the backup
Back up files and folders for the first time
To complete the initial backup, use the Microsoft Azure Recovery Services agent.
To schedule the backup job
1. Open the Microsoft Azure Recovery Services agent. You can find it by searching your machine for Microsoft
Azure Backup.
3. On the Getting started page of the Schedule Backup Wizard, click Next.
4. On the Select Items to Backup page, click Add Items.
5. Select the files and folders that you want to back up, and then click Okay.
6. Click Next.
7. On the Specify Backup Schedule page, specify the backup schedule and click Next.
You can schedule daily (at a maximum rate of three times per day) or weekly backups.
342 | P a g e
70-534 Architecting Microsoft Azure Solutions
Note
For more information about how to specify the backup schedule, see the article Use Azure Backup to replace your
tape infrastructure.
8. On the Select Retention Policy page, select the Retention Policy for the backup copy.
The retention policy specifies how long the backup data is stored. Rather than specifying a flat policy for all backup
points, you can specify different retention policies based on when the backup occurs. You can modify the daily,
weekly, monthly, and yearly retention policies to meet your needs.
9. On the Choose Initial Backup Type page, choose the initial backup type. Leave the option Automatically over
the network selected, and then click Next.
You can back up automatically over the network, or you can back up offline. The remainder of this article describes
the process for backing up automatically. If you prefer to do an offline backup, review the article Offline backup
workflow in Azure Backup for additional information.
10. On the Confirmation page, review the information, and then click Finish.
11. After the wizard finishes creating the backup schedule, click Close.
To back up files and folders for the first time
1. In the Recovery Services agent, click Back Up Now to complete the initial seeding over the network.
2. On the Confirmation page, review the settings that the Back Up Now Wizard will use to back up the machine.
Then click Back Up.
343 | P a g e
70-534 Architecting Microsoft Azure Solutions
3. Click Close to close the wizard. If you close the wizard before the backup process finishes, the wizard
continues to run in the background.
After the initial backup is completed, the Job completed status appears in the Backup console.
runbook or PowerShell Workflow runbook that you edit offline or with the textual editor in the Azure portal. If you
prefer to edit a runbook without being exposed to the underlying code, then you can create a Graphical runbook using
the graphical editor in the Azure portal.
Prefer watching to reading? Have a look at the below video from Microsoft Ignite session in May 2015. Note: While
the concepts and features discussed in this video are correct, Azure Automation has progressed a lot since this video
was recorded, it now has a more extensive UI in the Azure portal, and supports additional capabilities.
Automating configuration management with Desired State Configuration
PowerShell DSC is a management platform that allows you to manage, deploy and enforce configuration for physical
hosts and virtual machines using a declarative PowerShell syntax. You can define configurations on a central DSC Pull
Server that target machines can automatically retrieve and apply. DSC provides a set of PowerShell cmdlets that you
can use to manage configurations and resources.
Azure Automation DSC is a cloud based solution for PowerShell DSC that provides services required for enterprise
environments. You can manage your DSC resources in Azure Automation and apply configurations to virtual or
physical machines that retrieve them from a DSC Pull Server in the Azure cloud. It also provides reporting services that
inform you of important events such as when nodes have deviated from their assigned configuration and when a new
configuration has been applied.
Creating your own DSC configurations with Azure Automation
DSC configurations specify the desired state of a node. Multiple nodes can apply the same configuration to assure that
they all maintain an identical state. You can create a configuration using any text editor on your local machine and
then import it into Azure Automation where you can compile it and apply it nodes.
Getting modules and configurations
You can get PowerShell modules containing cmdlets that you can use in your runbooks and DSC configurations from
the PowerShell Gallery. You can launch this gallery from the Azure portal and import modules directly into Azure
Automation, or you can download and import them manually. You cannot install the modules directly from the Azure
portal, but you can download them install them as you would any other module.
Example practical applications of Azure Automation
Following are just a few examples of what are the kinds of automation scenarios with Azure Automation.
Create and copy virtual machines in different Azure subscriptions.
Schedule file copies from a local machine to an Azure Blob Storage container.
Automate security functions such as deny requests from a client when a denial of service attack is detected.
Ensure machines continually align with configured security policy.
Manage continuous deployment of application code across cloud and on premises infrastructure.
Build an Active Directory forest in Azure for your lab environment.
Truncate a table in a SQL database if DB is approaching maximum size.
Remotely update environment settings for an Azure website.
How does Azure Automation relate to other automation tools?
Service Management Automation (SMA) is intended to automate management tasks in the private cloud. It is installed
locally in your data center as a component of Microsoft Azure Pack. SMA and Azure Automation use the same
runbook format based on Windows PowerShell and Windows PowerShell Workflow, but SMA does not
support graphical runbooks.
System Center 2012 Orchestrator is intended for automation of on-premises resources. It uses a different runbook
format than Azure Automation and Service Management Automation and has a graphical interface to create runbooks
without requiring any scripting. Its runbooks are composed of activities from Integration Packs that are written
specifically for Orchestrator.
Where can I get more information?
A variety of resources are available for you to learn more about Azure Automation and creating your own runbooks.
Azure Automation Library is where you are right now. The articles in this library provide complete
documentation on the configuration and administration of Azure Automation and for authoring your own
runbooks.
Azure PowerShell cmdlets provides information for automating Azure operations using Windows PowerShell.
Runbooks use these cmdlets to work with Azure resources.
Management Blog provides the latest information on Azure Automation and other management technologies
from Microsoft. You should subscribe to this blog to stay up to date with the latest from the Azure
Automation team.
345 | P a g e
70-534 Architecting Microsoft Azure Solutions
Automation Forum allows you to post questions about Azure Automation to be addressed by Microsoft and
the Automation community.
Azure Automation Cmdlets provides information for automating administration tasks. It contains cmdlets to
manage Automation accounts, assets, runbooks, DSC.
3. Create a new runbook by clicking the Add a runbook button and then Create a new runbook.
4. Give the runbook the name MyFirstRunbook-PowerShell.
346 | P a g e
70-534 Architecting Microsoft Azure Solutions
5. In this case, we're going to create a PowerShell runbook so select Powershell for Runbook type.
6. Click Create to create the runbook and open the textual editor.
Step 2 - Add code to the runbook
You can either type code directly into the runbook, or you can select cmdlets, runbooks, and assets from the Library
control and have them added to the runbook with any related parameters. For this walkthrough, we type directly in
the runbook.
1. Our runbook is currently empty, type Write-Output "Hello World.".
2. Click Start to start the test. This should be the only enabled option.
3. A runbook job is created and its status displayed.
The job status starts as Queued indicating that it is waiting for a runbook worker in the cloud to come
347 | P a g e
70-534 Architecting Microsoft Azure Solutions
available. It will then move to Starting when a worker claims the job, and then Running when the runbook
actually starts running.
4. When the runbook job completes, its output is displayed. In our case, we should see Hello World.
2. If you scroll left to view the runbook in the Runbooks pane now, it will show an Authoring Status of Published.
3. Scroll back to the right to view the pane for MyFirstRunbook-PowerShell.
The options across the top allow us to start the runbook, view the runbook, schedule it to start at some time
in the future, or create a webhook so it can be started through an HTTP call.
4. We want to start the runbook, so click Start and then click Ok when the Start Runbook blade opens.
5. A job pane is opened for the runbook job that we created. We can close this pane, but in this case we leave it
open so we can watch the job's progress.
348 | P a g e
70-534 Architecting Microsoft Azure Solutions
6. The job status is shown in Job Summary and matches the statuses that we saw when we tested the runbook.
7. Once the runbook status shows Completed, click Output. The Output pane is opened, and we can see
our Hello World.
349 | P a g e
70-534 Architecting Microsoft Azure Solutions
350 | P a g e
70-534 Architecting Microsoft Azure Solutions
10. Close the Streams pane and the Job pane to return to the MyFirstRunbook-PowerShell pane.
11. Click Jobs to open the Jobs pane for this runbook. This lists all of the jobs created by this runbook. We should
only see one job listed since we only ran the job once.
351 | P a g e
70-534 Architecting Microsoft Azure Solutions
12. You can click this job to open the same Job pane that we viewed when we started the runbook. This allows
you to go back in time and view the details of any job that was created for a particular runbook.
Step 5 - Add authentication to manage Azure resources
We've tested and published our runbook, but so far it doesn't do anything useful. We want to have it manage Azure
resources. It won't be able to do that though unless we have it authenticate using the credentials that are referred to
in the prerequisites. We do that with the Add-AzureRmAccount cmdlet.
1. Open the textual editor by clicking Edit on the MyFirstRunbook-PowerShell pane.
2. We don't need the Write-Output line anymore, so go ahead and delete it.
3. Type or copy and paste the following code that handles the authentication with your Automation Run As
account:
Copy
$Conn = Get-AutomationConnection -Name AzureRunAsConnection
Add-AzureRMAccount -ServicePrincipal -Tenant $Conn.TenantID `
-ApplicationId $Conn.ApplicationID -CertificateThumbprint $Conn.CertificateThumbprint
352 | P a g e
70-534 Architecting Microsoft Azure Solutions
2. Save the runbook and then click Test pane so that we can test it.
3. Click Start to start the test. Once it completes, check that the virtual machine was started.
Step 7 - Add an input parameter to the runbook
Our runbook currently starts the virtual machine that we hardcoded in the runbook, but it would be more useful if we
specify the virtual machine when the runbook is started. We will now add input parameters to the runbook to provide
that functionality.
1. Add parameters for VMName and ResourceGroupName to the runbook and use these variables with the Start-
AzureRmVM cmdlet as in the example below.
Copy
Param(
[string]$VMName,
[string]$ResourceGroupName
)
$Conn = Get-AutomationConnection -Name AzureRunAsConnection
Add-AzureRMAccount -ServicePrincipal -Tenant $Conn.TenantID `
-ApplicationID $Conn.ApplicationID -CertificateThumbprint $Conn.CertificateThumbprint
Start-AzureRmVM -Name $VMName -ResourceGroupName $ResourceGroupName
2. Save the runbook and open the Test pane. You can now provide values for the two input variables that are
used in the test.
3. Close the Test pane.
4. Click Publish to publish the new version of the runbook.
5. Stop the virtual machine that you started in the previous step.
6. Click Start to start the runbook. Type in the VMName and ResourceGroupName for the virtual machine that
you're going to start.
353 | P a g e
70-534 Architecting Microsoft Azure Solutions
7. When the runbook completes, check that the virtual machine was started.
Differences from PowerShell Workflow
PowerShell runbooks have the same lifecycle, capabilities, and management as PowerShell Workflow runbooks but
there are some differences and limitations:
1. PowerShell runbooks run fast compared to PowerShell Workflow runbooks as they dont have compilation
step.
2. PowerShell Workflow runbooks support checkpoints, using checkpoints, PowerShell Workflow runbooks can
resume from any point in the runbook whereas PowerShell runbooks can only resume from the beginning.
3. PowerShell Workflow runbooks support parallel and serial execution whereas PowerShell runbooks can only
execute commands serially.
4. In a PowerShell Workflow runbook, an activity, a command, or a script block can have its own runspace
whereas in a PowerShell runbook, everything in a script runs in a single runspace. There are also
some syntactic differences between a native PowerShell runbook and a PowerShell Workflow runbook.
Azure Automation DSC brings the same management layer to PowerShell Desired State Configuration as Azure
Automation offers for PowerShell scripting.
From the Azure portal, or from PowerShell, you can manage all your DSC configurations, resources, and target nodes.
355 | P a g e
Table of Contents
Application design
Avoid any single point of failure. All components, services, resources, and compute instances should be
deployed as multiple instances to prevent a single point of failure from affecting availability. This includes
authentication mechanisms. Design the application to be configurable to use multiple instances, and to
automatically detect failures and redirect requests to non-failed instances where the platform does not do this
automatically.
Decompose workload per different service-level agreement. If a service is composed of critical and less-
critical workloads, manage them differently and specify the service features and number of instances to meet
their availability requirements.
Minimize and understand service dependencies. Minimize the number of different services used where
possible, and ensure you understand all of the feature and service dependencies that exist in the system. This
includes the nature of these dependencies, and the impact of failure or reduced performance in each one on the
overall application. Microsoft guarantees at least 99.9 percent availability for most services, but this means that
every additional service an application relies on potentially reduces the overall availability SLA of your system
by 0.1 percent.
Design tasks and messages to be idempotent (safely repeatable) where possible, so that duplicated
requests will not cause problems. For example, a service can act as a consumer that handles messages sent as
requests by other parts of the system that act as producers. If the consumer fails after processing the message,
but before acknowledging that it has been processed, a producer might submit a repeat request which could be
handled by another instance of the consumer. For this reason, consumers and the operations they carry out
should be idempotent so that repeating a previously executed operation does not render the results invalid. This
may mean detecting duplicated messages, or ensuring consistency by using an optimistic approach to handling
conflicts.
Use a message broker that implements high availability for critical transactions. Many scenarios for
initiating tasks or accessing remote services use messaging to pass instructions between the application and the
target service. For best performance, the application should be able to send the message and then return to
process more requests, without needing to wait for a reply. To guarantee delivery of messages, the messaging
system should provide high availability. Azure Service Bus message queues implement at least once semantics.
This means that each message posted to a queue will not be lost, although duplicate copies may be delivered
under certain circumstances. If message processing is idempotent (see the previous item), repeated delivery
should not be a problem.
Design applications to gracefully degrade when reaching resource limits, and take appropriate action to
minimize the impact for the user. In some cases, the load on the application may exceed the capacity of one or
more parts, causing reduced availability and failed connections. Scaling can help to alleviate this, but it may
reach a limit imposed by other factors, such as resource availability or cost. Design the application so that, in this
situation, it can automatically degrade gracefully. For example, in an ecommerce system, if the order-processing
subsystem is under strain (or has even failed completely), it can be temporarily disabled while allowing other
functionality (such as browsing the product catalog) to continue. It might be appropriate to postpone requests to
a failing subsystem, for example still enabling customers to submit orders but saving them for later processing,
when the orders subsystem is available again.
Gracefully handle rapid burst events. Most applications need to handle varying workloads over time, such as
peaks first thing in the morning in a business application or when a new product is released in an ecommerce
site. Auto-scaling can help to handle the load, but it may take some time for additional instances to come online
and handle requests. Prevent sudden and unexpected bursts of activity from overwhelming the application:
design it to queue requests to the services it uses and degrade gracefully when queues are near to full capacity.
Ensure that there is sufficient performance and capacity available under non-burst conditions to drain the
queues and handle outstanding requests. For more information, see the Queue-Based Load Leveling Pattern.
NOTE
Roles are also distributed across fault domains, each of which is reasonably independent from other fault domains in
terms of server rack, power, and cooling provision, in order to minimize the chance of a failure affecting all role
instances. This distribution occurs automatically, and you cannot control it.
Configure availability sets for Azure virtual machines. Placing two or more virtual machines in the same
availability set guarantees that these virtual machines will not be deployed to the same fault domain. To
maximize availability, you should create multiple instances of each critical virtual machine used by your system
and place these instances in the same availability set. If you are running multiple virtual machines that serve
different purposes, create an availability set for each virtual machine. Add instances of each virtual machine to
each availability set. For example, if you have created separate virtual machines to act as a web server and a
reporting server, create an availability set for the web server and another availability set for the reporting server.
Add instances of the web server virtual machine to the web server availability set, and add instances of the
reporting server virtual machine to the reporting server availability set.
Data management
Geo-replicate data in Azure Storage. Data in Azure Storage is automatically replicated within in a datacenter.
For even higher availability, use Read-access geo-redundant storage (-RAGRS), which replicates your data to a
secondary region and provides read-only access to the data in the secondary location. The data is durable even
in the case of a complete regional outage or a disaster. For more information, see [Azure Storage
replication(/azure/storage/storage-redundancy)].
Geo-replicate databases. Azure SQL Database and Cosmos DB both support geo-replication, which enables
you to configure secondary database replicas in other regions. Secondary databases are available for querying
and for failover in the case of a data center outage or the inability to connect to the primary database. For more
information, see Failover groups and active geo-replication (SQL Database) and How to distribute data globally
with Azure Cosmos DB?.
Use optimistic concurrency and eventual consistency where possible. Transactions that block access to
resources through locking (pessimistic concurrency) can cause poor performance and considerably reduce
availability. These problems can become especially acute in distributed systems. In many cases, careful design
and techniques such as partitioning can minimize the chances of conflicting updates occurring. Where data is
replicated, or is read from a separately updated store, the data will only be eventually consistent. But the
advantages usually far outweigh the impact on availability of using transactions to ensure immediate
consistency.
Use periodic backup and point-in-time restore, and ensure it meets the Recovery Point Objective (RPO).
Regularly and automatically back up data that is not preserved elsewhere, and verify you can reliably restore
both the data and the application itself should a failure occur. Data replication is not a backup feature because
errors and inconsistencies introduced through failure, error, or malicious operations will be replicated across all
stores. The backup process must be secure to protect the data in transit and in storage. Databases or parts of a
data store can usually be recovered to a previous point in time by using transaction logs. Microsoft Azure
provides a backup facility for data stored in Azure SQL Database. The data is exported to a backup package on
Azure blob storage, and can be downloaded to a secure on-premises location for storage.
Enable the high availability option to maintain a secondary copy of an Azure Redis cache. When using
Azure Redis Cache, choose the standard option to maintain a secondary copy of the contents. For more
information, see Create a cache in Azure Redis Cache.
DevOps is the integration of development, quality assurance, and IT operations into a unified culture and set of
processes for delivering software.
Use this checklist as a starting point to assess your DevOps culture and process.
Culture
Ensure business alignment across organizations and teams. Conflicts over resources, purpose, goals, and
priorities within an organization can be a risk to successful operations. Ensure that the business, development, and
operations teams are all aligned.
Ensure the entire team understands the software lifecycle. Your team needs to understand the overall
lifecycle of the application, and which part of the lifecycle the application is currently in. This helps all team
members know what they should be doing now, and what they should be planning and preparing for in the future.
Reduce cycle time. Aim to minimize the time it takes to move from ideas to usable developed software. Limit the
size and scope of individual releases to keep the test burden low. Automate the build, test, configuration, and
deployment processes whenever possible. Clear any obstacles to communication among developers, and between
developers and operations.
Review and improve processes. Your processes and procedures, both automated and manual, are never final. Set
up regular reviews of current workflows, procedures, and documentation, with a goal of continual improvement.
Do proactive planning. Proactively plan for failure. Have processes in place to quickly identify issues when they
occur, escalate to the correct team members to fix, and confirm resolution.
Learn from failures. Failures are inevitable, but it's important to learn from failures to avoid repeating them. If an
operational failure occurs, triage the issue, document the cause and solution, and share any lessons that were
learned. Whenever possible, update your build processes to automatically detect that kind of failure in the future.
Optimize for speed and collect data. Every planned improvement is a hypothesis. Work in the smallest
increments possible. Treat new ideas as experiments. Instrument the experiments so that you can collect production
data to assess their effectiveness. Be prepared to fail fast if the hypothesis is wrong.
Allow time for learning. Both failures and successes provide good opportunities for learning. Before moving on
to new projects, allow enough time to gather the important lessons, and make sure those lessons are absorbed by
your team. Also give the team the time to build skills, experiment, and learn about new tools and techniques.
Document operations. Document all tools, processes, and automated tasks with the same level of quality as your
product code. Document the current design and architecture of any systems you support, along with recovery
processes and other maintenance procedures. Focus on the steps you actually perform, not theoretically optimal
processes. Regularly review and update the documentation. For code, make sure that meaningful comments are
included, especially in public APIs, and use tools to automatically generate code documentation whenever possible.
Share knowledge. Documentation is only useful if people know that it exists and can find it. Ensure the
documentation is organized and easily discoverable. Be creative: Use brown bags (informal presentations), videos,
or newsletters to share knowledge.
Development
Provide developers with production-like environments. If development and test environments don't match
the production environment, it is hard to test and diagnose problems. Therefore, keep development and test
environments as close to the production environment as possible. Make sure that test data is consistent with the
data used in production, even if it's sample data and not real production data (for privacy or compliance reasons).
Plan to generate and anonymize sample test data.
Ensure that all authorized team members can provision infrastructure and deploy the application. Setting
up production-like resources and deploying the application should not involve complicated manual tasks or
detailed technical knowledge of the system. Anyone with the right permissions should be able to create or deploy
production-like resources without going to the operations team.
This recommendation doesn't imply that anyone can push live updates to the production deployment. It's about
reducing friction for the development and QA teams to create production-like environments.
Instrument the application for insight. To understand the health of your application, you need to know how it's
performing and whether it's experiencing any errors or problems. Always include instrumentation as a design
requirement, and build the instrumentation into the application from the start. Instrumentation must include event
logging for root cause analysis, but also telemetry and metrics to monitor the overall health and usage of the
application.
Track your technical debt. In many projects, release schedules can get prioritized over code quality to one degree
or another. Always keep track when this occurs. Document any shortcuts or other nonoptimal implementations,
and schedule time in the future to revisit these issues.
Consider pushing updates directly to production. To reduce the overall release cycle time, consider pushing
properly tested code commits directly to production. Use feature toggles to control which features are enabled. This
allows you to move from development to release quickly, using the toggles to enable or disable features. Toggles
are also useful when performing tests such as canary releases, where a particular feature is deployed to a subset of
the production environment.
Testing
Automate testing. Manually testing software is tedious and susceptible to error. Automate common testing tasks
and integrate the tests into your build processes. Automated testing ensures consistent test coverage and
reproducibility. Integrated UI tests should also be performed by an automated tool. Azure offers development and
test resources that can help you configure and execute testing. For more information, see Development and test.
Test for failures. If a system can't connect to a service, how does it respond? Can it recover once the service is
available again? Make fault injection testing a standard part of review on test and staging environments. When
your test process and practices are mature, consider running these tests in production.
Test in production. The release process doesn't end with deployment to production. Have tests in place to ensure
that deployed code works as expected. For deployments that are infrequently updated, schedule production testing
as a regular part of maintenance.
Automate performance testing to identify performance issues early. The impact of a serious performance
issue can be just as severe as a bug in the code. While automated functional tests can prevent application bugs,
they might not detect performance problems. Define acceptable performance goals for metrics like latency, load
times, and resource usage. Include automated performance tests in your release pipeline, to make sure the
application meets those goals.
Perform capacity testing. An application might work fine under test conditions, and then have problems in
production due to scale or resource limitations. Always define the maximum expected capacity and usage limits.
Test to make sure the application can handle those limits, but also test what happens when those limits are
exceeded. Capacity testing should be performed at regular intervals.
After the initial release, you should run performance and capacity tests whenever updates are made to production
code. Use historical data to fine tune tests and to determine what types of tests need to be performed.
Perform automated security penetration testing. Ensuring your application is secure is as important as testing
any other functionality. Make automated penetration testing a standard part of the build and deployment process.
Schedule regular security tests and vulnerability scanning on deployed applications, monitoring for open ports,
endpoints, and attacks. Automated testing does not remove the need for in-depth security reviews at regular
intervals.
Perform automated business continuity testing. Develop tests for large scale business continuity, including
backup recovery and failover. Set up automated processes to perform these tests regularly.
Release
Automate deployments. Automate deploying the application to test, staging, and production environments.
Automation enables faster and more reliable deployments, and ensures consistent deployments to any supported
environment. It removes the risk of human error caused by manual deployments. It also makes it easy to schedule
releases for convenient times, to minimize any effects of potential downtime.
Use continuous integration. Continuous integration (CI) is the practice of merging all developer code into a
central codebase on a regular schedule, and then automatically performing standard build and test processes. CI
ensures that an entire team can work on a codebase at the same time without having conflicts. It also ensures that
code defects are found as early as possible. Preferably, the CI process should run every time that code is committed
or checked in. At the very least, it should run once per day.
Consider adopting a trunk based development model. In this model, developers commit to a single branch (the
trunk). There is a requirement that commits never break the build. This model facilitates CI, because all feature
work is done in the trunk, and any merge conflicts are resolved when the commit happens.
Consider using continuous delivery. Continuous delivery (CD) is the practice of ensuring that code is always
ready to deploy, by automatically building, testing, and deploying code to production-like environments. Adding
continuous delivery to create a full CI/CD pipeline will help you detect code defects as soon as possible, and
ensures that properly tested updates can be released in a very short time.
Continuous deployment is an additional process that automatically takes any updates that have passed through
the CI/CD pipeline and deploys them into production. Continuous deployment requires robust automatic
testing and advanced process planning, and may not be appropriate for all teams.
Make small incremental changes. Large code changes have a greater potential to introduce bugs. Whenever
possible, keep changes small. This limits the potential effects of each change, and makes it easier to understand and
debug any issues.
Control exposure to changes. Make sure you're in control of when updates are visible to your end users.
Consider using feature toggles to control when features are enabled for end users.
Implement release management strategies to reduce deployment risk. Deploying an application update to
production always entails some risk. To minimize this risk, use strategies such as canary releases or blue-green
deployments to deploy updates to a subset of users. Confirm the update works as expected, and then roll the
update out to the rest of the system.
Document all changes. Minor updates and configuration changes can be a source of confusion and versioning
conflict. Always keep a clear record of any changes, no matter how small. Log everything that changes, including
patches applied, policy changes, and configuration changes. (Don't include sensitive data in these logs. For example,
log that a credential was updated, and who made the change, but don't record the updated credentials.) The record
of the changes should be visible to the entire team.
Automate Deployments. Automate all deployments, and have systems in place to detect any problems during
rollout. Have a mitigation process for preserving the existing code and data in production, before the update
replaces them in all production instances. Have an automated way to roll forward fixes or roll back changes.
Consider making infrastructure immutable. Immutable infrastructure is the principle that you shouldnt modify
infrastructure after its deployed to production. Otherwise, you can get into a state where ad hoc changes have
been applied, making it hard to know exactly what changed. Immutable infrastructure works by replacing entire
servers as part of any new deployment. This allows the code and the hosting environment to be tested and
deployed as a block. Once deployed, infrastructure components aren't modified until the next build and deploy
cycle.
Monitoring
Make systems observable. The operations team should always have clear visibility into the health and status of a
system or service. Set up external health endpoints to monitor status, and ensure that applications are coded to
instrument the operations metrics. Use a common and consistent schema that lets you correlate events across
systems. Azure Diagnostics and Application Insights are the standard method of tracking the health and status of
Azure resources. Microsoft Operation Management Suite also provides centralized monitoring and management
for cloud or hybrid solutions.
Aggregate and correlate logs and metrics. A properly instrumented telemetry system will provide a large
amount of raw performance data and event logs. Make sure that telemetry and log data is processed and correlated
in a short period of time, so that operations staff always have an up-to-date picture of system health. Organize and
display data in ways that give a cohesive view of any issues, so that whenever possible it's clear when events are
related to one another.
Consult your corporate retention policy for requirements on how data is processed and how long it should be
stored.
Implement automated alerts and notifications. Set up monitoring tools like Azure Monitor to detect patterns
or conditions that indicate potential or current issues, and send alerts to the team members who can address the
issues. Tune the alerts to avoid false positives.
Monitor assets and resources for expirations. Some resources and assets, such as certificates, expire after a
given amount of time. Make sure to track which assets expire, when they expire, and what services or features
depend on them. Use automated processes to monitor these assets. Notify the operations team before an asset
expires, and escalate if expiration threatens to disrupt the application.
Management
Automate operations tasks. Manually handling repetitive operations processes is error-prone. Automate these
tasks whenever possible to ensure consistent execution and quality. Code that implements the automation should
be versioned in source control. As with any other code, automation tools must be tested.
Take an infrastructure-as-code approach to provisioning. Minimize the amount of manual configuration
needed to provision resources. Instead, use scripts and Azure Resource Manager templates. Keep the scripts and
templates in source control, like any other code you maintain.
Consider using containers. Containers provide a standard package-based interface for deploying applications.
Using containers, an application is deployed using self-contained packages that include any software,
dependencies, and files needed to run the application, which greatly simplifies the deployment process.
Containers also create an abstraction layer between the application and the underlying operating system, which
provides consistency across environments. This abstraction can also isolate a container from other processes or
applications running on a host.
Implement resiliency and self-healing. Resiliency is the ability of an application to recover from failures.
Strategies for resiliency include retrying transient failures, and failing over to a secondary instance or even another
region. For more information, see Designing resilient applications for Azure. Instrument your applications so that
issues are reported immediately and you can manage outages or other system failures.
Have an operations manual. An operations manual or runbook documents the procedures and management
information needed for operations staff to maintain a system. Also document any operations scenarios and
mitigation plans that might come into play during a failure or other disruption to your service. Create this
documentation during the development process, and keep it up to date afterwards. This is a living document, and
should be reviewed, tested, and improved regularly.
Shared documentation is critical. Encourage team members to contribute and share knowledge. The entire team
should have access to documents. Make it easy for anyone on the team to help keep documents updated.
Document on-call procedures. Make sure on-call duties, schedules, and procedures are documented and shared
to all team members. Keep this information up-to-date at all times.
Document escalation procedures for third-party dependencies. If your application depends on external third-
party services that you don't directly control, you must have a plan to deal with outages. Create documentation for
your planned mitigation processes. Include support contacts and escalation paths.
Use configuration management. Configuration changes should be planned, visible to operations, and recorded.
This could take the form of a configuration management database, or a configuration-as-code approach.
Configuration should be audited regularly to ensure that what's expected is actually in place.
Get an Azure support plan and understand the process. Azure offers a number of support plans. Determine
the right plan for your needs, and make sure the entire team knows how to use it. Team members should
understand the details of the plan, how the support process works, and how to open a support ticket with Azure. If
you are anticipating a high-scale event, Azure support can assist you with increasing your service limits. For more
information, see the Azure Support FAQs.
Follow least-privilege principles when granting access to resources. Carefully manage access to resources.
Access should be denied by default, unless a user is explicitly given access to a resource. Only grant a user access to
what they need to complete their tasks. Track user permissions and perform regular security audits.
Use role-based access control. Assigning user accounts and access to resources should not be a manual process.
Use Role-Based Access Control (RBAC) grant access based on Azure Active Directory identities and groups.
Use a bug tracking system to track issues. Without a good way to track issues, it's easy to miss items, duplicate
work, or introduce additional problems. Don't rely on informal person-to-person communication to track the status
of bugs. Use a bug tracking tool to record details about problems, assign resources to address them, and provide
an audit trail of progress and status.
Manage all resources in a change management system. All aspects of your DevOps process should be
included in a management and versioning system, so that changes can be easily tracked and audited. This includes
code, infrastructure, configuration, documentation, and scripts. Treat all these types of resources as code
throughout the test/build/review process.
Use checklists. Create operations checklists to ensure processes are followed. Its common to miss something in a
large manual, and following a checklist can force attention to details that might otherwise be overlooked. Maintain
the checklists, and continually look for ways to automate tasks and streamline processes.
For more about DevOps, see What is DevOps? on the Visual Studio site.
Resiliency checklist
6/23/2017 27 min to read Edit Online
Designing your application for resiliency requires planning for and mitigating a variety of failure modes that could
occur. Review the items in this checklist against your application design to improve its resiliency.
Requirements
Define your customer's availability requirements. Your customer will have availability requirements for the
components in your application and this will affect your application's design. Get agreement from your
customer for the availability targets of each piece of your application, otherwise your design may not meet the
customer's expectations. For more information, see Defining your resiliency requirements.
Application Design
Perform a failure mode analysis (FMA) for your application. FMA is a process for building resiliency
into an application early in the design stage. For more information, see Failure mode analysis. The goals of
an FMA include:
Identify what types of failures an application might experience.
Capture the potential effects and impact of each type of failure on the application.
Identify recovery strategies.
Deploy multiple instances of services. If your application depends on a single instance of a service, it
creates a single point of failure. Provisioning multiple instances improves both resiliency and scalability. For
Azure App Service, select an App Service Plan that offers multiple instances. For Azure Cloud Services,
configure each of your roles to use multiple instances. For Azure Virtual Machines (VMs), ensure that your
VM architecture includes more than one VM and that each VM is included in an availability set.
Use autoscaling to respond to increases in load. If your application is not configured to scale out
automatically as load increases, it's possible that your application's services will fail if they become saturated
with user requests. For more details, see the following:
General: Scalability checklist
Azure App Service: Scale instance count manually or automatically
Cloud Services: How to auto scale a cloud service
Virtual Machines: Automatic scaling and virtual machine scale sets
Use load balancing to distribute requests. Load balancing distributes your application's requests to healthy
service instances by removing unhealthy instances from rotation. If your service uses Azure App Service or
Azure Cloud Services, it is already load balanced for you. However, if your application uses Azure VMs, you will
need to provision a load balancer. See the Azure Load Balancer overview for more details.
Configure Azure Application Gateways to use multiple instances. Depending on your application's
requirements, an Azure Application Gateway may be better suited to distributing requests to your application's
services. However, single instances of the Application Gateway service are not guaranteed by an SLA so it's
possible that your application could fail if the Application Gateway instance fails. Provision more than one
medium or larger Application Gateway instance to guarantee availability of the service under the terms of the
SLA.
Use Availability Sets for each application tier. Placing your instances in an availability set provides a higher
SLA.
Consider deploying your application across multiple regions. If your application is deployed to a single
region, in the rare event the entire region becomes unavailable, your application will also be unavailable. This
may be unacceptable under the terms of your application's SLA. If so, consider deploying your application and
its services across multiple regions. A multi-region deployment can use an active-active pattern (distributing
requests across multiple active instances) or an active-passive pattern (keeping a "warm" instance in reserve, in
case the primary instance fails). We recommend that you deploy multiple instances of your application's
services across regional pairs. For more information, see Business continuity and disaster recovery (BCDR):
Azure Paired Regions.
Use Azure Traffic Manager to route your application's traffic to different regions. Azure Traffic Manager
performs load balancing at the DNS level and will route traffic to different regions based on the traffic routing
method you specify and the health of your application's endpoints. Without Traffic Manager, you are limited to a
single region for your deployment, which limits scale, increases latency for some users, and causes application
downtime in the case of a region-wide service disruption.
Configure and test health probes for your load balancers and traffic managers. Ensure that your health
logic checks the critical parts of the system and responds appropriately to health probes.
The health probes for Azure Traffic Manager and Azure Load Balancer serve a specific function. For Traffic
Manager, the health probe determines whether to fail over to another region. For a load balancer, it
determines whether to remove a VM from rotation.
For a Traffic Manager probe, your health endpoint should check any critical dependencies that are
deployed within the same region, and whose failure should trigger a failover to another region.
For a load balancer, the health endpoint should report the health of the VM. Don't include other tiers or
external services. Otherwise, a failure that occurs outside the VM will cause the load balancer to remove
the VM from rotation.
For guidance on implementing health monitoring in your application, see Health Endpoint Monitoring
Pattern.
Monitor third-party services. If your application has dependencies on third-party services, identify where and
how these third-party services can fail and what effect those failures will have on your application. A third-party
service may not include monitoring and diagnostics, so it's important to log your invocations of them and
correlate them with your application's health and diagnostic logging using a unique identifier. For more
information on proven practices for monitoring and diagnostics, see Monitoring and Diagnostics guidance.
Ensure that any third-party service you consume provides an SLA. If your application depends on a third-
party service, but the third party provides no guarantee of availability in the form of an SLA, your application's
availability also cannot be guaranteed. Your SLA is only as good as the least available component of your
application.
Implement resiliency patterns for remote operations where appropriate. If your application depends on
communication between remote services, follow design patterns for dealing with transient failures, such as
Retry Pattern, and Circuit Breaker Pattern. For more information, see Resiliency strategies.
Implement asynchronous operations whenever possible. Synchronous operations can monopolize
resources and block other operations while the caller waits for the process to complete. Design each part of your
application to allow for asynchronous operations whenever possible. For more information on how to
implement asynchronous programming in C#, see Asynchronous Programming with async and await.
Data management
Understand the replication methods for your application's data sources. Your application data will be
stored in different data sources and have different availability requirements. Evaluate the replication methods
for each type of data storage in Azure, including Azure Storage Replication and SQL Database Active Geo-
Replication to ensure that your application's data requirements are satisfied.
Ensure that no single user account has access to both production and backup data. Your data backups
are compromised if one single user account has permission to write to both production and backup sources. A
malicious user could purposely delete all your data, while a regular user could accidentally delete it. Design your
application to limit the permissions of each user account so that only the users that require write access have
write access and it's only to either production or backup, but not both.
Document your data source fail over and fail back process and test it. In the case where your data source
fails catastrophically, a human operator will have to follow a set of documented instructions to fail over to a new
data source. If the documented steps have errors, an operator will not be able to successfully follow them and
fail over the resource. Regularly test the instruction steps to verify that an operator following them is able to
successfully fail over and fail back the data source.
Validate your data backups. Regularly verify that your backup data is what you expect by running a script to
validate data integrity, schema, and queries. There's no point having a backup if it's not useful to restore your
data sources. Log and report any inconsistencies so the backup service can be repaired.
Consider using a storage account type that is geo-redundant. Data stored in an Azure Storage account
is always replicated locally. However, there are multiple replication strategies to choose from when a Storage
Account is provisioned. Select Azure Read-Access Geo Redundant Storage (RA-GRS) to protect your
application data against the rare case when an entire region becomes unavailable.
NOTE
For VMs, do not rely on RA-GRS replication to restore the VM disks (VHD files). Instead, use Azure Backup.
Security
Implement application-level protection against distributed denial of service (DDoS) attacks. Azure
services are protected against DDos attacks at the network layer. However, Azure cannot protect against
application-layer attacks, because it is difficult to distinguish between true user requests from malicious user
requests. For more information on how to protect against application-layer DDoS attacks, see the "Protecting
against DDoS" section of Microsoft Azure Network Security (PDF download).
Implement the principle of least privilege for access to the application's resources. The default for
access to the application's resources should be as restrictive as possible. Grant higher level permissions on an
approval basis. Granting overly permissive access to your application's resources by default can result in
someone purposely or accidentally deleting resources. Azure provides role-based access control to manage user
privileges, but it's important to verify least privilege permissions for other resources that have their own
permissions systems such as SQL Server.
Testing
Perform failover and failback testing for your application. If you haven't fully tested failover and failback,
you can't be certain that the dependent services in your application come back up in a synchronized manner
during disaster recovery. Ensure that your application's dependent services failover and fail back in the correct
order.
Perform fault-injection testing for your application. Your application can fail for many different reasons,
such as certificate expiration, exhaustion of system resources in a VM, or storage failures. Test your application
in an environment as close as possible to production, by simulating or triggering real failures. For example,
delete certificates, artificially consume system resources, or delete a storage source. Verify your application's
ability to recover from all types of faults, alone and in combination. Check that failures are not propagating or
cascading through your system.
Run tests in production using both synthetic and real user data. Test and production are rarely identical,
so it's important to use blue/green or a canary deployment and test your application in production. This allows
you to test your application in production under real load and ensure it will function as expected when fully
deployed.
Deployment
Document the release process for your application. Without detailed release process documentation, an
operator might deploy a bad update or improperly configure settings for your application. Clearly define and
document your release process, and ensure that it's available to the entire operations team.
Automate your application's deployment process. If your operations staff is required to manually deploy
your application, human error can cause the deployment to fail.
Design your release process to maximize application availability. If your release process requires services
to go offline during deployment, your application will be unavailable until they come back online. Use the
blue/green or canary release deployment technique to deploy your application to production. Both of these
techniques involve deploying your release code alongside production code so users of release code can be
redirected to production code in the event of a failure.
Log and audit your application's deployments. If you use staged deployment techniques such as
blue/green or canary releases there will be more than one version of your application running in production. If a
problem should occur, it's critical to determine which version of your application is causing a problem.
Implement a robust logging strategy to capture as much version-specific information as possible.
Have a rollback plan for deployment. It's possible that your application deployment could fail and cause
your application to become unavailable. Design a rollback process to go back to a last known good version and
minimize downtime.
Operations
Implement best practices for monitoring and alerting in your application. Without proper monitoring,
diagnostics, and alerting, there is no way to detect failures in your application and alert an operator to fix them.
For more information, see Monitoring and Diagnostics guidance.
Measure remote call statistics and make the information available to the application team. If you don't
track and report remote call statistics in real time and provide an easy way to review this information, the
operations team will not have an instantaneous view into the health of your application. And if you only
measure average remote call time, you will not have enough information to reveal issues in the services.
Summarize remote call metrics such as latency, throughput, and errors in the 99 and 95 percentiles. Perform
statistical analysis on the metrics to uncover errors that occur within each percentile.
Track the number of transient exceptions and retries over an appropriate timeframe. If you don't track
and monitor transient exceptions and retry attempts over time, it's possible that an issue or failure could be
hidden by your application's retry logic. That is, if your monitoring and logging only shows success or failure of
an operation, the fact that the operation had to be retried multiple times due to exceptions will be hidden. A
trend of increasing exceptions over time indicates that the service is having an issue and may fail. For more
information, see Retry service specific guidance.
Implement an early warning system that alerts an operator. Identify the key performance indicators of
your application's health, such as transient exceptions and remote call latency, and set appropriate threshold
values for each of them. Send an alert to operations when the threshold value is reached. Set these thresholds at
levels that identify issues before they become critical and require a recovery response.
Ensure that more than one person on the team is trained to monitor the application and perform any
manual recovery steps. If you only have a single operator on the team who can monitor the application and
kick off recovery steps, that person becomes a single point of failure. Train multiple individuals on detection and
recovery and make sure there is always at least one active at any time.
Ensure that your application does not run up against Azure subscription limits. Azure subscriptions have
limits on certain resource types, such as number of resource groups, number of cores, and number of storage
accounts. If your application requirements exceed Azure subscription limits, create another Azure subscription
and provision sufficient resources there.
Ensure that your application does not run up against per-service limits. Individual Azure services have
consumption limits for example, limits on storage, throughput, number of connections, requests per second,
and other metrics. Your application will fail if it attempts to use resources beyond these limits. This will result in
service throttling and possible downtime for affected users. Depending on the specific service and your
application requirements, you can often avoid these limits by scaling up (for example, choosing another pricing
tier) or scaling out (adding new instances).
Design your application's storage requirements to fall within Azure storage scalability and
performance targets. Azure storage is designed to function within predefined scalability and performance
targets, so design your application to utilize storage within those targets. If you exceed these targets your
application will experience storage throttling. To fix this, provision additional Storage Accounts. If you run up
against the Storage Account limit, provision additional Azure Subscriptions and then provision additional
Storage Accounts there. For more information, see Azure Storage Scalability and Performance Targets.
Select the right VM size for your application. Measure the actual CPU, memory, disk, and I/O of your VMs in
production and verify that the VM size you've selected is sufficient. If not, your application may experience
capacity issues as the VMs approach their limits. VM sizes are described in detail in Sizes for virtual machines in
Azure.
Determine if your application's workload is stable or fluctuating over time. If your workload fluctuates
over time, use Azure VM scale sets to automatically scale the number of VM instances. Otherwise, you will have
to manually increase or decrease the number of VMs. For more information, see the Virtual Machine Scale Sets
Overview.
Select the right service tier for Azure SQL Database. If your application uses Azure SQL Database, ensure
that you have selected the appropriate service tier. If you select a tier that is not able to handle your application's
database transaction unit (DTU) requirements, your data use will be throttled. For more information on selecting
the correct service plan, see SQL Database options and performance: Understand what's available in each
service tier.
Create a process for interacting with Azure support. If the process for contacting Azure support is not set
before the need to contact support arises, downtime will be prolonged as the support process is navigated for
the first time. Include the process for contacting support and escalating issues as part of your application's
resiliency from the outset.
Ensure that your application doesn't use more than the maximum number of storage accounts per
subscription. Azure allows a maximum of 200 storage accounts per subscription. If your application requires
more storage accounts than are currently available in your subscription, you will have to create a new
subscription and create additional storage accounts there. For more information, see Azure subscription and
service limits, quotas, and constraints.
Ensure that your application doesn't exceed the scalability targets for virtual machine disks. An Azure
IaaS VM supports attaching a number of data disks depending on several factors, including the VM size and type
of storage account. If your application exceeds the scalability targets for virtual machine disks, provision
additional storage accounts and create the virtual machine disks there. For more information, see Azure Storage
Scalability and Performance Targets
Telemetry
Log telemetry data while the application is running in the production environment. Capture robust
telemetry information while the application is running in the production environment or you will not have
sufficient information to diagnose the cause of issues while it's actively serving users. For more information, see
Monitoring and Diagnostics.
Implement logging using an asynchronous pattern. If logging operations are synchronous, they might
block your application code. Ensure that your logging operations are implemented as asynchronous operations.
Correlate log data across service boundaries. In a typical n-tier application, a user request may traverse
several service boundaries. For example, a user request typically originates in the web tier and is passed to the
business tier and finally persisted in the data tier. In more complex scenarios, a user request may be distributed
to many different services and data stores. Ensure that your logging system correlates calls across service
boundaries so you can track the request throughout your application.
Azure Resources
Use Azure Resource Manager templates to provision resources. Resource Manager templates make it
easier to automate deployments via PowerShell or the Azure CLI, which leads to a more reliable deployment
process. For more information, see Azure Resource Manager overview.
Give resources meaningful names. Giving resources meaningful names makes it easier to locate a specific
resource and understand its role. For more information, see Naming conventions for Azure resources
Use role-based access control (RBAC). Use RBAC to control access to the Azure resources that you deploy.
RBAC lets you assign authorization roles to members of your DevOps team, to prevent accidental deletion or
changes to deployed resources. For more information, see Get started with access management in the Azure
portal
Use resource locks for critical resources, such as VMs. Resource locks prevent an operator from accidentally
deleting a resource. For more information, see Lock resources with Azure Resource Manager
Choose regional pairs. When deploying to two regions, choose regions from the same regional pair. In the
event of a broad outage, recovery of one region is prioritized out of every pair. Some services such as Geo-
Redundant Storage provide automatic replication to the paired region. For more information, see Business
continuity and disaster recovery (BCDR): Azure Paired Regions
Organize resource groups by function and lifecycle. In general, a resource group should contain resources
that share the same lifecycle. This makes it easier to manage deployments, delete test deployments, and assign
access rights, reducing the chance that a production deployment is accidentally deleted or modified. Create
separate resource groups for production, development, and test environments. In a multi-region deployment,
put resources for each region into separate resource groups. This makes it easier to redeploy one region without
affecting the other region(s).
Azure Services
The following checklist items apply to specific services in Azure.
App Service
Use Standard or Premium tier. These tiers support staging slots and automated backups. For more
information, see Azure App Service plans in-depth overview
Avoid scaling up or down. Instead, select a tier and instance size that meet your performance requirements
under typical load, and then scale out the instances to handle changes in traffic volume. Scaling up and down
may trigger an application restart.
Store configuration as app settings. Use app settings to hold configuration settings as app settings. Define
the settings in your Resource Manager templates, or using PowerShell, so that you can apply them as part of an
automated deployment / update process, which is more reliable. For more information, see Configure web apps
in Azure App Service.
Create separate App Service plans for production and test. Don't use slots on your production deployment
for testing. All apps within the same App Service plan share the same VM instances. If you put production and
test deployments in the same plan, it can negatively affect the production deployment. For example, load tests
might degrade the live production site. By putting test deployments into a separate plan, you isolate them from
the production version.
Separate web apps from web APIs. If your solution has both a web front-end and a web API, consider
decomposing them into separate App Service apps. This design makes it easier to decompose the solution by
workload. You can run the web app and the API in separate App Service plans, so they can be scaled
independently. If you don't need that level of scalability at first, you can deploy the apps into the same plan, and
move them into separate plans later, if needed.
Avoid using the App Service backup feature to back up Azure SQL databases. Instead, use SQL Database
automated backups. App Service backup exports the database to a SQL .bacpac file, which costs DTUs.
Deploy to a staging slot. Create a deployment slot for staging. Deploy application updates to the staging slot,
and verify the deployment before swapping it into production. This reduces the chance of a bad update in
production. It also ensures that all instances are warmed up before being swapped into production. Many
applications have a significant warmup and cold-start time. For more information, see Set up staging
environments for web apps in Azure App Service.
Create a deployment slot to hold the last-known-good (LKG) deployment. When you deploy an update
to production, move the previous production deployment into the LKG slot. This makes it easier to roll back a
bad deployment. If you discover a problem later, you can quickly revert to the LKG version. For more
information, see Basic web application.
Enable diagnostics logging, including application logging and web server logging. Logging is important for
monitoring and diagnostics. See Enable diagnostics logging for web apps in Azure App Service
Log to blob storage. This makes it easier to collect and analyze the data.
Create a separate storage account for logs. Don't use the same storage account for logs and application
data. This helps to prevent logging from reducing application performance.
Monitor performance. Use a performance monitoring service such as New Relic or Application Insights to
monitor application performance and behavior under load. Performance monitoring gives you real-time insight
into the application. It enables you to diagnose issues and perform root-cause analysis of failures.
Application Gateway
Provision at least two instances. Deploy Application Gateway with at least two instances. A single instance is
a single point of failure. Use two or more instances for redundancy and scalability. In order to qualify for the
SLA, you must provision two or more medium or larger instances.
Azure Search
Provision more than one replica. Use at least two replicas for read high-availability, or three for read-write
high-availability.
Configure indexers for multi-region deployments. If you have a multi-region deployment, consider
your options for continuity in indexing.
If the data source is geo-replicated, you should generally point each indexer of each regional Azure
Search service to its local data source replica. However, that approach is not recommended for large
datasets stored in Azure SQL Database. The reason is that Azure Search cannot perform incremental
indexing from secondary SQL Database replicas, only from primary replicas. Instead, point all indexers to
the primary replica. After a failover, point the Azure Search indexers at the new primary replica.
If the data source is not geo-replicated, point multiple indexers at the same data source, so that Azure
Search services in multiple regions continuously and independently index from the data source. For more
information, see Azure Search performance and optimization considerations.
Azure Storage
For application data, use read-access geo-redundant storage (RA-GRS). RA-GRS storage replicates the
data to a secondary region, and provides read-only access from the secondary region. If there is a storage
outage in the primary region, the application can read the data from the secondary region. For more
information, see Azure Storage replication.
For VM disks, use Managed Disks. Managed Disks provide better reliability for VMs in an availability set,
because the disks are sufficiently isolated from each other to avoid single points of failure. Also, Managed Disks
aren't subject to the IOPS limits of VHDs created in a storage account. For more information, see Manage the
availability of Windows virtual machines in Azure.
For Queue storage, create a backup queue in another region. For Queue storage, a read-only replica has
limited use, because you can't queue or dequeue items. Instead, create a backup queue in a storage account in
another region. If there is a storage outage, the application can use the backup queue, until the primary region
becomes available again. That way, the application can still process new requests.
Cosmos DB
Replicate the database across regions. Cosmos DB allows you to associate any number of Azure regions
with a Cosmos DB database account. A Cosmos DB database ca one write region and multiple read regions. If
there is a failure in the write region, you can read from another replica. The Client SDK handles this
automatically. You can also fail over the write region to another region. For more information, see How to
distribute data globally with Azure Cosmos DB?
SQL Database
Use Standard or Premium tier. These tiers provide a longer point-in-time restore period (35 days). For more
information, see SQL Database options and performance.
Enable SQL Database auditing. Auditing can be used to diagnose malicious attacks or human error. For more
information, see Get started with SQL database auditing.
Use Active Geo-Replication Use Active Geo-Replication to create a readable secondary in a different region. If
your primary database fails, or simply needs to be taken offline, perform a manual failover to the secondary
database. Until you fail over, the secondary database remains read-only. For more information, see SQL
Database Active Geo-Replication.
Use sharding. Consider using sharding to partition the database horizontally. Sharding can provide fault
isolation. For more information, see Scaling out with Azure SQL Database.
Use point-in-time restore to recover from human error. Point-in-time restore returns your database to an
earlier point in time. For more information, see Recover an Azure SQL database using automated database
backups.
Use geo-restore to recover from a service outage. Geo-restore restores a database from a geo-redundant
backup. For more information, see Recover an Azure SQL database using automated database backups.
SQL Server (running in a VM )
Replicate the database. Use SQL Server Always On Availability Groups to replicate the database. Provides
high availability if one SQL Server instance fails. For more information, see Run Windows VMs for an N-tier
application
Back up the database. If you are already using Azure Backup to back up your VMs, consider using Azure
Backup for SQL Server workloads using DPM. With this approach, there is one backup administrator role for the
organization and a unified recovery procedure for VMs and SQL Server. Otherwise, use SQL Server Managed
Backup to Microsoft Azure.
Traffic Manager
Perform manual failback. After a Traffic Manager failover, perform manual failback, rather than automatically
failing back. Before failing back, verify that all application subsystems are healthy. Otherwise, you can create a
situation where the application flips back and forth between data centers. For more information, see Run VMs in
multiple regions for high availability.
Create a health probe endpoint. Create a custom endpoint that reports on the overall health of the
application. This enables Traffic Manager to fail over if any critical path fails, not just the front end. The endpoint
should return an HTTP error code if any critical dependency is unhealthy or unreachable. Don't report errors for
non-critical services, however. Otherwise, the health probe might trigger failover when it's not needed, creating
false positives. For more information, see Traffic Manager endpoint monitoring and failover.
Virtual Machines
Avoid running a production workload on a single VM. A single VM deployment is not resilient to planned
or unplanned maintenance. Instead, put multiple VMs in an availability set or VM scale set, with a load balancer
in front.
Specify an availability set when you provision the VM. Currently, there is no way to add a VM to an
availability set after the VM is provisioned. When you add a new VM to an existing availability set, make sure to
create a NIC for the VM, and add the NIC to the back-end address pool on the load balancer. Otherwise, the load
balancer won't route network traffic to that VM.
Put each application tier into a separate Availability Set. In an N-tier application, don't put VMs from
different tiers into the same availability set. VMs in an availability set are placed across fault domains (FDs) and
update domains (UD). However, to get the redundancy benefit of FDs and UDs, every VM in the availability set
must be able to handle the same client requests.
Choose the right VM size based on performance requirements. When moving an existing workload to
Azure, start with the VM size that's the closest match to your on-premises servers. Then measure the
performance of your actual workload with respect to CPU, memory, and disk IOPS, and adjust the size if needed.
This helps to ensure the application behaves as expected in a cloud environment. Also, if you need multiple NICs,
be aware of the NIC limit for each size.
Use Managed Disks for VHDs. Managed Disks provide better reliability for VMs in an availability set, because
the disks are sufficiently isolated from each other to avoid single points of failure. Also, Managed Disks aren't
subject to the IOPS limits of VHDs created in a storage account. For more information, see Manage the
availability of Windows virtual machines in Azure.
Install applications on a data disk, not the OS disk. Otherwise, you may reach the disk size limit.
Use Azure Backup to back up VMs. Backups protect against accidental data loss. For more information, see
Protect Azure VMs with a recovery services vault.
Enable diagnostic logs, including basic health metrics, infrastructure logs, and boot diagnostics. Boot
diagnostics can help you diagnose a boot failure if your VM gets into a non-bootable state. For more
information, see Overview of Azure Diagnostic Logs.
Use the AzureLogCollector extension. (Windows VMs only.) This extension aggregates Azure platform logs
and uploads them to Azure storage, without the operator remotely logging into the VM. For more information,
see AzureLogCollector Extension.
Virtual Network
To whitelist or block public IP addresses, add an NSG to the subnet. Block access from malicious users, or
allow access only from users who have privilege to access the application.
Create a custom health probe. Load Balancer Health Probes can test either HTTP or TCP. If a VM runs an HTTP
server, the HTTP probe is a better indicator of health status than a TCP probe. For an HTTP probe, use a custom
endpoint that reports the overall health of the application, including all critical dependencies. For more
information, see Azure Load Balancer overview.
Don't block the health probe. The Load Balancer Health probe is sent from a known IP address,
168.63.129.16. Don't block traffic to or from this IP in any firewall policies or network security group (NSG)
rules. Blocking the health probe would cause the load balancer to remove the VM from rotation.
Enable Load Balancer logging. The logs show how many VMs on the back-end are not receiving network
traffic due to failed probe responses. For more information, see Log analytics for Azure Load Balancer.
Scalability checklist
6/23/2017 14 min to read Edit Online
Service design
Partition the workload. Design parts of the process to be discrete and decomposable. Minimize the size of
each part, while following the usual rules for separation of concerns and the single responsibility principle. This
allows the component parts to be distributed in a way that maximizes use of each compute unit (such as a role
or database server). It also makes it easier to scale the application by adding instances of specific resources. For
more information, see Compute Partitioning Guidance.
Design for scaling. Scaling allows applications to react to variable load by increasing and decreasing the
number of instances of roles, queues, and other services they use. However, the application must be designed
with this in mind. For example, the application and the services it uses must be stateless, to allow requests to be
routed to any instance. This also prevents the addition or removal of specific instances from adversely
impacting current users. You should also implement configuration or auto-detection of instances as they are
added and removed, so that code in the application can perform the necessary routing. For example, a web
application might use a set of queues in a round-robin approach to route requests to background services
running in worker roles. The web application must be able to detect changes in the number of queues, to
successfully route requests and balance the load on the application.
Scale as a unit. Plan for additional resources to accommodate growth. For each resource, know the upper
scaling limits, and use sharding or decomposition to go beyond these limits. Determine the scale units for the
system in terms of well-defined sets of resources. This makes applying scale-out operations easier, and less
prone to negative impact on the application through limitations imposed by lack of resources in some part of
the overall system. For example, adding x number of web and worker roles might require y number of
additional queues and z number of storage accounts to handle the additional workload generated by the roles.
So a scale unit could consist of x web and worker roles, y queues, and z storage accounts. Design the application
so that it's easily scaled by adding one or more scale units.
Avoid client affinity. Where possible, ensure that the application does not require affinity. Requests can thus
be routed to any instance, and the number of instances is irrelevant. This also avoids the overhead of storing,
retrieving, and maintaining state information for each user.
Take advantage of platform autoscaling features. Where the hosting platform supports an autoscaling
capability, such as Azure Autoscale, prefer it to custom or third-party mechanisms unless the built-in
mechanism can't fulfill your requirements. Use scheduled scaling rules where possible to ensure resources are
available without a start-up delay, but add reactive autoscaling to the rules where appropriate to cope with
unexpected changes in demand. You can use the autoscaling operations in the Service Management API to
adjust autoscaling, and to add custom counters to rules. For more information, see Auto-scaling guidance.
Offload intensive CPU/IO tasks as background tasks. If a request to a service is expected to take a long time
to run or absorb considerable resources, offload the processing for this request to a separate task. Use worker
roles or background jobs (depending on the hosting platform) to execute these tasks. This strategy enables the
service to continue receiving further requests and remain responsive. For more information, see Background
jobs guidance.
Distribute the workload for background tasks. Where there are many background tasks, or the tasks
require considerable time or resources, spread the work across multiple compute units (such as worker roles or
background jobs). For one possible solution, see the Competing Consumers Pattern.
Consider moving towards a shared-nothing architecture. A shared-nothing architecture uses independent,
self-sufficient nodes that have no single point of contention (such as shared services or storage). In theory, such
a system can scale almost indefinitely. While a fully shared-nothing approach is generally not practical for most
applications, it may provide opportunities to design for better scalability. For example, avoiding the use of
server-side session state, client affinity, and data partitioning are good examples of moving towards a shared-
nothing architecture.
Data management
Use data partitioning. Divide the data across multiple databases and database servers, or design the
application to use data storage services that can provide this partitioning transparently (examples include Azure
SQL Database Elastic Database, and Azure Table storage). This approach can help to maximize performance and
allow easier scaling. There are different partitioning techniques, such as horizontal, vertical, and functional. You
can use a combination of these to achieve maximum benefit from increased query performance, simpler
scalability, more flexible management, better availability, and to match the type of store to the data it will hold.
Also, consider using different types of data store for different types of data, choosing the types based on how
well they are optimized for the specific type of data. This may include using table storage, a document database,
or a column-family data store, instead of, or as well as, a relational database. For more information, see Data
partitioning guidance.
Design for eventual consistency. Eventual consistency improves scalability by reducing or removing the time
needed to synchronize related data partitioned across multiple stores. The cost is that data is not always
consistent when it is read, and some write operations may cause conflicts. Eventual consistency is ideal for
situations where the same data is read frequently but written infrequently. For more information, see the Data
Consistency Primer.
Reduce chatty interactions between components and services. Avoid designing interactions in which an
application is required to make multiple calls to a service (each of which returns a small amount of data), rather
than a single call that can return all of the data. Where possible, combine several related operations into a single
request when the call is to a service or component that has noticeable latency. This makes it easier to monitor
performance and optimize complex operations. For example, use stored procedures in databases to encapsulate
complex logic, and reduce the number of round trips and resource locking.
Use queues to level the load for high velocity data writes. Surges in demand for a service can overwhelm
that service and cause escalating failures. To prevent this, consider implementing the Queue-Based Load
Leveling Pattern. Use a queue that acts as a buffer between a task and a service that it invokes. This can smooth
intermittent heavy loads that may otherwise cause the service to fail or the task to time out.
Minimize the load on the data store. The data store is commonly a processing bottleneck, a costly resource,
and often not easy to scale out. Where possible, remove logic (such as processing XML documents or JSON
objects) from the data store, and perform processing within the application. For example, instead of passing
XML to the database (other than as an opaque string for storage), serialize or deserialize the XML within the
application layer and pass it in a form that is native to the data store. It's typically much easier to scale out the
application than the data store, so you should attempt to do as much of the compute-intensive processing as
possible within the application.
Minimize the volume of data retrieved. Retrieve only the data you require by specifying columns and using
criteria to select rows. Make use of table value parameters and the appropriate isolation level. Use mechanisms
like entity tags to avoid retrieving data unnecessarily.
Aggressively use caching. Use caching wherever possible to reduce the load on resources and services that
generate or deliver data. Caching is typically suited to data that is relatively static, or that requires considerable
processing to obtain. Caching should occur at all levels where appropriate in each layer of the application,
including data access and user interface generation. For more information, see the Caching Guidance.
Handle data growth and retention. The amount of data stored by an application grows over time. This
growth increases storage costs, and increases latency when accessing the data which affects application
throughput and performance. It may be possible to periodically archive some of the old data that is no longer
accessed, or move data that is rarely accessed into long-term storage that is more cost efficient, even if the
access latency is higher.
Optimize Data Transfer Objects (DTOs) using an efficient binary format. DTOs are passed between the
layers of an application many times. Minimizing the size reduces the load on resources and the network.
However, balance the savings with the overhead of converting the data to the required format in each location
where it is used. Adopt a format that has the maximum interoperability to enable easy reuse of a component.
Set cache control. Design and configure the application to use output caching or fragment caching where
possible, to minimize processing load.
Enable client side caching. Web applications should enable cache settings on the content that can be cached.
This is commonly disabled by default. Configure the server to deliver the appropriate cache control headers to
enable caching of content on proxy servers and clients.
Use Azure blob storage and the Azure Content Delivery Network to reduce the load on the
application. Consider storing static or relatively static public content, such as images, resources, scripts, and
style sheets, in blob storage. This approach relieves the application of the load caused by dynamically
generating this content for each request. Additionally, consider using the Content Delivery Network to cache
this content and deliver it to clients. Using the Content Delivery Network can improve performance at the client
because the content is delivered from the geographically closest datacenter that contains a Content Delivery
Network cache. For more information, see the Content Delivery Network Guidance.
Optimize and tune SQL queries and indexes. Some T-SQL statements or constructs may have an impact on
performance that can be reduced by optimizing the code in a stored procedure. For example, avoid converting
datetime types to a varchar before comparing with a datetime literal value. Use date/time comparison
functions instead. Lack of appropriate indexes can also slow query execution. If you use an object/relational
mapping framework, understand how it works and how it may affect performance of the data access layer. For
more information, see Query Tuning.
Consider de-normalizing data. Data normalization helps to avoid duplication and inconsistency. However,
maintaining multiple indexes, checking for referential integrity, performing multiple accesses to small chunks of
data, and joining tables to reassemble the data imposes an overhead that can affect performance. Consider if
some additional storage volume and duplication is acceptable in order to reduce the load on the data store.
Also, consider if the application itself (which is typically easier to scale) can be relied upon to take over tasks
such as managing referential integrity in order to reduce the load on the data store. For more information, see
Data partitioning guidance.
Service implementation
Use asynchronous calls. Use asynchronous code wherever possible when accessing resources or services that
may be limited by I/O or network bandwidth, or that have a noticeable latency, in order to avoid locking the
calling thread. To implement asynchronous operations, use the Task-based Asynchronous Pattern (TAP).
Avoid locking resources, and use an optimistic approach instead. Never lock access to resources such as
storage or other services that have noticeable latency, because this is a primary cause of poor performance.
Always use optimistic approaches to managing concurrent operations, such as writing to storage. Use features
of the storage layer to manage conflicts. In distributed applications, data may be only eventually consistent.
Compress highly compressible data over high latency, low bandwidth networks. In the majority of cases
in a web application, the largest volume of data generated by the application and passed over the network is
HTTP responses to client requests. HTTP compression can reduce this considerably, especially for static content.
This can reduce cost as well as reducing the load on the network, though compressing dynamic content does
apply a fractionally higher load on the server. In other, more generalized environments, data compression can
reduce the volume of data transmitted and minimize transfer time and costs, but the compression and
decompression processes incur overhead. As such, compression should only be used when there is a
demonstrable gain in performance. Other serialization methods, such as JSON or binary encodings, may reduce
the payload size while having less impact on performance, whereas XML is likely to increase it.
Minimize the time that connections and resources are in use. Maintain connections and resources only for
as long as you need to use them. For example, open connections as late as possible, and allow them to be
returned to the connection pool as soon as possible. Acquire resources as late as possible, and dispose of them
as soon as possible.
Minimize the number of connections required. Service connections absorb resources. Limit the number
that are required and ensure that existing connections are reused whenever possible. For example, after
performing authentication, use impersonation where appropriate to run code as a specific identity. This can
help to make best use of the connection pool by reusing connections.
NOTE
: APIs for some services automatically reuse connections, provided service-specific guidelines are followed. It's
important that you understand the conditions that enable connection reuse for each service that your application
uses.
Send requests in batches to optimize network use. For example, send and read messages in batches when
accessing a queue, and perform multiple reads or writes as a batch when accessing storage or a cache. This can
help to maximize efficiency of the services and data stores by reducing the number of calls across the network.
Avoid a requirement to store server-side session state where possible. Server-side session state
management typically requires client affinity (that is, routing each request to the same server instance), which
affects the ability of the system to scale. Ideally, you should design clients to be stateless with respect to the
servers that they use. However, if the application must maintain session state, store sensitive data or large
volumes of per-client data in a distributed server-side cache that all instances of the application can access.
Optimize table storage schemas. When using table stores that require the table and column names to be
passed and processed with every query, such as Azure table storage, consider using shorter names to reduce
this overhead. However, do not sacrifice readability or manageability by using overly compact names.
Use the Task Parallel Library (TPL) to perform asynchronous operations. The TPL makes it easy to write
asynchronous code that performs I/O-bound operations. Use ConfigureAwait(false) wherever possible to
eliminate the dependency of a continuation on a specific synchronization context. This reduces the chances of
thread-deadlock occurring.
Create resource dependencies during deployment or at application startup. Avoid repeated calls to
methods that test the existence of a resource and then create the resource if it does not exist. (Methods such as
CloudTable.CreateIfNotExists and CloudQueue.CreateIfNotExists in the Azure Storage Client Library follow this
pattern). These methods can impose considerable overhead if they are invoked before each access to a storage
table or storage queue. Instead:
Create the required resources when the application is deployed, or when it first starts (a single call to
CreateIfNotExists for each resource in the startup code for a web or worker role is acceptable). However,
be sure to handle exceptions that may arise if your code attempts to access a resource that doesn't exist.
In these situations, you should log the exception, and possibly alert an operator that a resource is
missing.
Under some circumstances, it may be appropriate to create the missing resource as part of the exception
handling code. But you should adopt this approach with caution as the non-existence of the resource
might be indicative of a programming error (a misspelled resource name for example), or some other
infrastructure-level issue.
Use lightweight frameworks. Carefully choose the APIs and frameworks you use to minimize resource usage,
execution time, and overall load on the application. For example, using Web API to handle service requests can
reduce the application footprint and increase execution speed, but it may not be suitable for advanced scenarios
where the additional capabilities of Windows Communication Foundation are required.
Consider minimizing the number of service accounts. For example, use a specific account to access
resources or services that impose a limit on connections, or perform better where fewer connections are
maintained. This approach is common for services such as databases, but it can affect the ability to accurately
audit operations due to the impersonation of the original user.
Carry out performance profiling and load testing during development, as part of test routines, and before
final release to ensure the application performs and scales as required. This testing should occur on the same
type of hardware as the production platform, and with the same types and quantities of data and user load as it
will encounter in production. For more information, see Testing the performance of a cloud service.