Ilovepdf Merged

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 508

Implementing

Microservices on AWS
AWS Whitepaper
Implementing Microservices on AWS AWS Whitepaper

Implementing Microservices on AWS: AWS Whitepaper


Copyright © 2023 Amazon Web Services, Inc. and/or its affiliates. All rights reserved.

Amazon's trademarks and trade dress may not be used in connection with any product or service that is not
Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or
discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may
or may not be affiliated with, connected to, or sponsored by Amazon.
Implementing Microservices on AWS AWS Whitepaper

Table of Contents
Abstract and introduction .................................................................................................................... i
Introduction .............................................................................................................................. 1
Are you Well-Architected? ........................................................................................................... 2
Modernizing to microservices ....................................................................................................... 2
Simple microservices architecture on AWS ............................................................................................. 3
User interface ............................................................................................................................ 3
Microservices ............................................................................................................................. 4
Microservices implementations ............................................................................................ 4
Continuous integration and continuous deployment (CI/CD) ............................................................ 4
Private networking ..................................................................................................................... 5
Data store ................................................................................................................................. 5
Simplifying operations ................................................................................................................ 5
Deploying Lambda-based applications .................................................................................. 6
Abstracting multi-tenancy complexities ................................................................................. 7
API management ................................................................................................................ 7
Microservices on serverless technologies ............................................................................................... 8
Resilient, efficient, and cost-optimized systems .................................................................................... 10
Disaster recovery (DR) ............................................................................................................... 10
High availability (HA) ................................................................................................................ 10
Distributed systems components ........................................................................................................ 11
Distributed data management ........................................................................................................... 12
Configuration management ............................................................................................................... 14
Secrets management ................................................................................................................ 14
Cost optimization and sustainability ................................................................................................... 15
Communication mechanisms .............................................................................................................. 16
REST-based communication ....................................................................................................... 16
GraphQL-based communication .................................................................................................. 16
gRPC-based communication ....................................................................................................... 16
Asynchronous messaging and event passing ................................................................................ 16
Orchestration and state management ......................................................................................... 18
Observability ................................................................................................................................... 20
Monitoring ............................................................................................................................... 20
Centralizing logs ...................................................................................................................... 21
Distributed tracing .................................................................................................................... 22
Log analysis on AWS ................................................................................................................ 23
Other options for analysis ......................................................................................................... 24
Managing chattiness in microservices communication ........................................................................... 26
Using protocols and caching ...................................................................................................... 26
Auditing .......................................................................................................................................... 27
Resource inventory and change management .............................................................................. 27
Conclusion ....................................................................................................................................... 29
Contributors .................................................................................................................................... 30
Document history ............................................................................................................................. 31
Notices ............................................................................................................................................ 32
AWS Glossary .................................................................................................................................. 33

iii
Implementing Microservices on AWS AWS Whitepaper
Introduction

Implementing Microservices on AWS


Publication date: July 31, 2023 (Document history (p. 31))

Microservices offer a streamlined approach to software development that accelerates deployment,


encourages innovation, enhances maintainability, and boosts scalability. This method relies on
small, loosely coupled services that communicate through well-defined APIs, which are managed by
autonomous teams. Adopting microservices offers benefits, such as improved scalability, resilience,
flexibility, and faster development cycles.

This whitepaper explores three popular microservices patterns: API driven, event driven, and data
streaming. We provide an overview of each approach, outline microservices' key features, address the
challenges in their development, and illustrate how Amazon Web Services (AWS) can help application
teams tackle these obstacles.

Considering the complex nature of topics like data store, asynchronous communication, and service
discovery, you are encouraged to weigh your application's specific needs and use cases alongside the
guidance provided when making architectural decisions.

Introduction
Microservices architectures combine successful and proven concepts from various fields, such as:

• Agile software development


• Service-oriented architectures
• API-first design
• Continuous Integration/Continuous Delivery (CI/CD)

Often, microservices incorporate design patterns from the Twelve-Factor App.

While microservices offer many benefits, it's vital to assess your use case's unique requirements and
associated costs. Monolithic architecture or alternative approaches may be more appropriate in some
cases. Deciding between microservices or monoliths should be made on a case-by-case basis, considering
factors like scale, complexity, and specific use cases.

We first explore a highly scalable, fault-tolerant microservices architecture (user interface, microservices
implementation, data store) and demonstrate how to build it on AWS using container technologies.
We then suggest AWS services to implement a typical serverless microservices architecture, reducing
operational complexity.

Serverless is characterized by the following principles:

• No infrastructure to provision or manage


• Automatically scaling by unit of consumption
• "Pay for value" billing model
• Built-in availability and fault tolerance
• Event Driven Architecture (EDA)

Lastly, we examine the overall system and discuss cross-service aspects of a microservices architecture,
such as distributed monitoring, logging, tracing, auditing, data consistency, and asynchronous
communication.

1
Implementing Microservices on AWS AWS Whitepaper
Are you Well-Architected?

This document focuses on workloads running in the AWS Cloud, excluding hybrid scenarios and
migration strategies. For information on migration strategies, refer to the Container Migration
Methodology whitepaper.

Are you Well-Architected?


The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make
when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best
practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems.
Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can
review your workloads against these best practices by answering a set of questions for each pillar.

In the Serverless Application Lens, we focus on best practices for architecting your serverless applications
on AWS.

For more expert guidance and best practices for your cloud architecture—reference architecture
deployments, diagrams, and whitepapers—refer to the AWS Architecture Center.

Modernizing to microservices
Microservices are essentially small, independent units that make up an application. Transitioning from
traditional monolithic structures to microservices can follow various strategies.

This transition also impacts the way your organization operates:

• It encourages agile development, where teams work in quick cycles.


• Teams are typically small, sometimes described as two pizza teams—small enough that two pizzas
could feed the entire team.
• Teams take full responsibility for their services, from creation to deployment and maintenance.

2
Implementing Microservices on AWS AWS Whitepaper
User interface

Simple microservices architecture on


AWS
Typical monolithic applications consist of different layers: a presentation layer, an application layer,
and a data layer. Microservices architectures, on the other hand, separate functionalities into cohesive
verticals according to specific domains, rather than technological layers. Figure 1 illustrates a reference
architecture for a typical microservices application on AWS.

Figure 1: Typical microservices application on AWS

User interface
Modern web applications often use JavaScript frameworks to develop single-page applications that
communicate with backend APIs. These APIs are typically built using Representational State Transfer
(REST) or RESTful APIs, or GraphQL APIs. Static web content can be served using Amazon Simple Storage
Service (Amazon S3) and Amazon CloudFront.

3
Implementing Microservices on AWS AWS Whitepaper
Microservices

Microservices
APIs are considered the front door of microservices, as they are the entry point for application logic.
Typically, RESTful web services API or GraphQL APIs are used. These APIs manage and process client
calls, handling functions such as traffic management, request filtering, routing, caching, authentication,
and authorization.

Microservices implementations
AWS offers building blocks to develop microservices, including Amazon ECS and Amazon EKS as the
choices for container orchestration engines and AWS Fargate and EC2 as hosting options. AWS Lambda is
another serverless way to build microservices on AWS. Choice between these hosting options depends on
the customer’s requirements to manage the underlying infrastructure.

AWS Lambda allows you to upload your code, automatically scaling and managing its execution with
high availability. This eliminates the need for infrastructure management, so you can move quickly and
focus on your business logic. Lambda supports multiple programming languages and can be triggered by
other AWS services or called directly from web or mobile applications.

Container-based applications have gained popularity due to portability, productivity, and efficiency. AWS
offers several services to build, deploy and manage containers.

• App2Container, a command line tool for migrating and modernizing Java and .NET web applications
into container format. AWS A2C analyzes and builds an inventory of applications running in bare
metal, virtual machines, Amazon Elastic Compute Cloud (EC2) instances, or in the cloud.

• Amazon Elastic Container Service (Amazon ECS) and Amazon Elastic Kubernetes Service (Amazon
EKS) manage your container infrastructure, making it easier to launch and maintain containerized
applications.
• Amazon EKS is a managed Kubernetes service to run Kubernetes in the AWS cloud and on-premises
data centers (Amazon EKS Anywhere). This extends cloud services into on-premises environments
for low-latency, local data processing, high data transfer costs, or data residency requirements (see
the whitepaper on "Running Hybrid Container Workloads With Amazon EKS Anywhere"). You can use
all the existing plug-ins and tooling from the Kubernetes community with EKS.
• Amazon Elastic Container Service (Amazon ECS) is a fully managed container orchestration service
that simplifies your deployment, management, and scaling of containerized applications. Customers
choose ECS for simplicity and deep integration with AWS services.

For further reading, see the blog Amazon ECS vs Amazon EKS: making sense of AWS container services.

• AWS App Runner is a fully managed container application service that lets you build, deploy, and run
containerized web applications and API services without prior infrastructure or container experience.
• AWS Fargate, a serverless compute engine, works with both Amazon ECS and Amazon EKS to
automatically manage compute resources for container applications.
• Amazon ECR is a fully managed container registry offering high-performance hosting, so you can
reliably deploy application images and artifacts anywhere.

Continuous integration and continuous


deployment (CI/CD)
Continuous integration and continuous delivery (CI/CD) is a crucial part of a DevOps initiative for rapid
software changes. AWS offers services to implement CI/CD for microservices, but a detailed discussion is

4
Implementing Microservices on AWS AWS Whitepaper
Private networking

beyond the scope of this document. For more information, see the Practicing Continuous Integration and
Continuous Delivery on AWS whitepaper.

Private networking
AWS PrivateLink is a technology that enhances the security of microservices by allowing private
connections between your Virtual Private Cloud (VPC) and supported AWS services. It helps isolate and
secure microservices traffic, ensuring it never crosses the public internet. This is particularly useful for
complying with regulations like PCI or HIPAA.

Data store
The data store is used to persist data needed by the microservices. Popular stores for session data
are in-memory caches such as Memcached or Redis. AWS offers both technologies as part of the
managed Amazon ElastiCache service.

Putting a cache between application servers and a database is a common mechanism for reducing the
read load on the database, which, in turn, may allow resources to be used to support more writes. Caches
can also improve latency.

Relational databases are still very popular to store structured data and business objects. AWS offers six
database engines (Microsoft SQL Server, Oracle, MySQL, MariaDB, PostgreSQL, and Amazon Aurora) as
managed services through Amazon Relational Database Service (Amazon RDS).

Relational databases, however, are not designed for endless scale, which can make it difficult and time
intensive to apply techniques to support a high number of queries.

NoSQL databases have been designed to favor scalability, performance, and availability over the
consistency of relational databases. One important element of NoSQL databases is that they typically
don’t enforce a strict schema. Data is distributed over partitions that can be scaled horizontally and is
retrieved using partition keys.

Because individual microservices are designed to do one thing well, they typically have a simplified
data model that might be well suited to NoSQL persistence. It is important to understand that NoSQL
databases have different access patterns than relational databases. For example, it's not possible to
join tables. If this is necessary, the logic has to be implemented in the application. You can use Amazon
DynamoDB to create a database table that can store and retrieve any amount of data and serve any level
of request traffic. DynamoDB delivers single-digit millisecond performance, however, there are certain
use cases that require response times in microseconds. DynamoDB Accelerator (DAX) provides caching
capabilities for accessing data.

DynamoDB also offers an automatic scaling feature to dynamically adjust throughput capacity in
response to actual traffic. However, there are cases where capacity planning is difficult or not possible
because of large activity spikes of short duration in your application. For such situations, DynamoDB
provides an on-demand option, which offers simple pay-per-request pricing. DynamoDB on-demand is
capable of serving thousands of requests per second instantly without capacity planning.

For more information, see Distributed data management (p. 12) and How to Choose a Database.

Simplifying operations
To further simplify the operational efforts needed to run, maintain, and monitor microservices, we can
use a fully serverless architecture.

5
Implementing Microservices on AWS AWS Whitepaper
Deploying Lambda-based applications

Topics
• Deploying Lambda-based applications (p. 6)
• Abstracting multi-tenancy complexities (p. 7)
• API management (p. 7)

Deploying Lambda-based applications


You can deploy your Lambda code by uploading a zip file archive or by creating and uploading a
container image through the console UI using a valid Amazon ECR image URI. However, when a Lambda
function becomes complex, meaning it has layers, dependencies, and permissions, uploading through the
UI can become unwieldy for code changes.

Using AWS CloudFormation and the AWS Serverless Application Model (AWS SAM), AWS Cloud
Development Kit (AWS CDK), or Terraform streamlines the process of defining serverless applications.
AWS SAM, natively supported by CloudFormation, offers a simplified syntax for specifying serverless
resources. AWS Lambda Layers help manage shared libraries across multiple Lambda functions,
minimizing function footprint, centralizing tenant-aware libraries, and improving the developer
experience. Lambda SnapStart for Java enhances startup performance for latency-sensitive applications.

To deploy, specify resources and permissions policies in a CloudFormation template, package


deployment artifacts, and deploy the template. SAM Local, an AWS CLI tool, allows local development,
testing, and analysis of serverless applications before uploading to Lambda.

Integration with tools like AWS Cloud9 IDE, AWS CodeBuild, AWS CodeDeploy, and AWS CodePipeline
streamlines authoring, testing, debugging, and deploying SAM-based applications.

The following diagram shows deploying AWS Serverless Application Model resources using
CloudFormation and AWS CI/CD tools.

Figure 2: AWS Serverless Application Model (AWS SAM)

6
Implementing Microservices on AWS AWS Whitepaper
Abstracting multi-tenancy complexities

Abstracting multi-tenancy complexities


In a multi-tenant environment like SaaS platforms, it's crucial to streamline the intricacies related to
multi-tenancy, freeing up developers to concentrate on feature and functionality development. This
can be achieved using tools such as AWS Lambda Layers, which offer shared libraries for addressing
cross-cutting concerns. The rationale behind this approach is that shared libraries and tools, when used
correctly, efficiently manage tenant context.

However, they should not extend to encapsulating business logic due to the complexity and risk they
may introduce. A fundamental issue with shared libraries is the increased complexity surrounding
updates, making them more challenging to manage compared to standard code duplication. Thus, it's
essential to strike a balance between the use of shared libraries and duplication in the quest for the most
effective abstraction.

API management
Managing APIs can be time-consuming, especially when considering multiple versions, stages of the
development cycle, authorization, and other features like throttling and caching. Apart from API
Gateway, some customers also use ALB (Application Load Balancer) or NLB (Network Load Balancer)
for API management. Amazon API Gateway helps reduce the operational complexity of creating and
maintaining RESTful APIs. It allows you to create APIs programmatically, serves as a "front door" to
access data, business logic, or functionality from your backend services, Authorization and access control,
rate limiting, caching, monitoring, and traffic management and runs APIs without managing servers.

Figure 3 illustrates how API Gateway handles API calls and interacts with other components. Requests
from mobile devices, websites, or other backend services are routed to the closest CloudFront Point of
Presence (PoP) to reduce latency and provide an optimal user experience.

Figure 3: API Gateway call flow

7
Implementing Microservices on AWS AWS Whitepaper

Microservices on serverless
technologies
Using microservices with serverless technologies can greatly decrease operational complexity. AWS
Lambda and AWS Fargate, integrated with API Gateway, allows for the creation of fully serverless
applications. As of April 7, 2023, Lambda functions can progressively stream response payloads back
to the client, enhancing performance for web and mobile applications. Prior to this, Lambda-based
applications using the traditional request-response invocation model had to fully generate and buffer
the response before returning it to the client, which could delay the time to first byte. With response
streaming, functions can send partial responses back to the client as they become ready, significantly
improving the time to first byte, which web and mobile applications are especially sensitive to.

Figure 4 demonstrates a serverless microservice architecture using AWS Lambda and managed services.
This serverless architecture mitigates the need to design for scale and high availability, and reduces the
effort needed for running and monitoring the underlying infrastructure.

Figure 4: Serverless microservice using AWS Lambda

Figure 5 displays a similar serverless implementation using containers with AWS Fargate, removing
concerns about underlying infrastructure. It also features Amazon Aurora Serverless, an on-demand,
auto-scaling database that automatically adjusts capacity based on your application's requirements.

8
Implementing Microservices on AWS AWS Whitepaper

Figure 5: Serverless microservice using AWS Fargate

9
Implementing Microservices on AWS AWS Whitepaper
Disaster recovery (DR)

Resilient, efficient, and cost-


optimized systems
Disaster recovery (DR)
Microservices applications often follow the Twelve-Factor Application patterns, where processes are
stateless, and persistent data is stored in stateful backing services like databases. This simplifies disaster
recovery (DR) because if a service fails, it's easy to launch new instances to restore functionality.

Disaster recovery strategies for microservices should focus on downstream services that maintain the
application's state, such as file systems, databases, or queues. Organizations should plan for recovery
time objective (RTO) and recovery point objective (RPO). RTO is the maximum acceptable delay between
service interruption and restoration, while RPO is the maximum time since the last data recovery point.

For more on disaster recovery strategies, refer to the Disaster Recovery of Workloads on AWS: Recovery
in the Cloud whitepaper.

High availability (HA)


We'll examine high availability (HA) for various components of a microservices architecture.

Amazon EKS ensures high availability by running Kubernetes control and data plane instances across
multiple Availability Zones. It automatically detects and replaces unhealthy control plane instances and
provides automated version upgrades and patching.

Amazon ECR uses Amazon Simple Storage Service (Amazon S3) for storage to make your container
images highly available and accessible. It works with Amazon EKS, Amazon ECS, and AWS Lambda,
simplifying development to production workflow.

Amazon ECS is a regional service that simplifies running containers in a highly available manner across
multiple Availability Zones within a Region, offering multiple scheduling strategies that place containers
for resource needs and availability requirements.

AWS Lambda operates in multiple Availability Zones, ensuring availability during service interruptions in
a single zone. If connecting your function to a VPC, specify subnets in multiple Availability Zones for high
availability.

10
Implementing Microservices on AWS AWS Whitepaper

Distributed systems components


In a microservices architecture, service discovery refers to the process of dynamically locating and
identifying the network locations (IP addresses and ports) of individual microservices within a distributed
system.

When choosing an approach on AWS, consider factors such as:

• Code modification: Can you get the benefits without modifying code?
• Cross-VPC or cross-account traffic: If required, does your system need efficient management of
communication across different VPCs or AWS accounts?
• Deployment strategies: Does your system use or plan to use advanced deployment strategies such as
blue-green or canary deployments?
• Performance considerations: If your architecture frequently communicates with external services,
what will be the impact on overall performance?

AWS offers several methods for implementing service discovery in your microservices architecture:

• Amazon ECS Service Discovery: Amazon ECS supports service discovery using its DNS-based method
or by integrating with AWS Cloud Map (see ECS Service discovery). ECS Service Connect further
improves connection management, which can be especially beneficial for larger applications with
multiple interacting services.
• Amazon Route 53: Route 53 integrates with ECS and other AWS services, such as EKS, to facilitate
service discovery. In an ECS context, Route 53 can use the ECS Service Discovery feature, which
leverages the Auto Naming API to automatically register and deregister services.
• AWS Cloud Map: This option offers a dynamic API-based service discovery, which propagates changes
across your services.

For more advanced communication needs, AWS provides two service mesh options:

• Amazon VPC Lattice is an application networking service that consistently connects, monitors,
and secures communications between your services, helping to improve productivity so that your
developers can focus on building features that matter to your business. You can define policies for
network traffic management, access, and monitoring to connect compute services in a simplified and
consistent way across instances, containers, and serverless applications.
• AWS App Mesh: Based on the open-source Envoy proxy, App Mesh caters to advanced needs with
sophisticated routing, load balancing, and comprehensive reporting. Unlike Amazon VPC Lattice, App
Mesh does support the TCP protocol.

In case you're already using third-party software, such as HashiCorp Consul, or Netflix Eureka for
service discovery, you might prefer to continue using these as you migrate to AWS, enabling a smoother
transition.

The choice between these options should align with your specific needs. For simpler requirements, DNS-
based solutions like Amazon ECS or AWS Cloud Map might be sufficient. For more complex or larger
systems, service meshes like Amazon VPC Lattice or AWS App Mesh might be more suitable.

In conclusion, designing a microservices architecture on AWS is all about selecting the right tools to
meet your specific needs. By keeping in mind the considerations discussed, you can ensure you're making
informed decisions to optimize your system's service discovery and inter-service communication.

11
Implementing Microservices on AWS AWS Whitepaper

Distributed data management


In traditional applications, all components often share a single database. In contrast, each component of
a microservices-based application maintains its own data, promoting independence and decentralization.
This approach, known as distributed data management, brings new challenges.

One such challenge arises from the trade-off between consistency and performance in distributed
systems. It's often more practical to accept slight delays in data updates (eventual consistency) than to
insist on instant updates (immediate consistency).

Sometimes, business operations require multiple microservices to work together. If one part fails, you
might have to undo some completed tasks. The Saga pattern helps manage this by coordinating a series
of compensating actions.

To help microservices stay in sync, a centralized data store can be used. This store, managed with
tools like AWS Lambda, AWS Step Functions, and Amazon EventBridge, can assist in cleaning up and
deduplicating data.

Figure 6: Saga execution coordinator

A common approach in managing changes across microservices is event sourcing. Every change in the
application is recorded as an event, creating a timeline of the system's state. This approach not only
helps debug and audit but also allows different parts of an application to react to the same events.

Event sourcing often works hand-in-hand with the Command Query Responsibility Segregation
(CQRS) pattern, which separates data modification and data querying into different modules for better
performance and security.

12
Implementing Microservices on AWS AWS Whitepaper

On AWS, you can implement these patterns using a combination of services. As you can see in Figure
7, Amazon Kinesis Data Streams can serve as your central event store, while Amazon S3 provides a
durable storage for all event records. AWS Lambda, Amazon DynamoDB, and Amazon API Gateway work
together to handle and process these events.

Figure 7: Event sourcing pattern on AWS

Remember, in distributed systems, events might be delivered multiple times due to retries, so it's
important to design your applications to handle this.

13
Implementing Microservices on AWS AWS Whitepaper
Secrets management

Configuration management
In a microservices architecture, each service interacts with various resources like databases, queues, and
other services. A consistent way to configure each service's connections and operating environment is
vital. Ideally, an application should adapt to new configurations without needing a restart. This approach
is part of the Twelve-Factor App principles, which recommend storing configurations in environment
variables.

A different approach is to use AWS App Config. It’s a feature of AWS Systems Manager which makes it
easy for customers to quickly and safely configure, validate, and deploy feature flags and application
configuration. Your feature flag and configurations data can be validated syntactically or semantically
in the pre-deployment phase, and can be monitored and automatically rolled back if an alarm that you
have configured is triggered. AppConfig can be integrated with Amazon ECS and Amazon EKS by using
the AWS AppConfig agent. The agent functions as a sidecar container running alongside your Amazon
ECS and Amazon EKS container applications. If you use AWS AppConfig feature flags or other dynamic
configuration data in a Lambda function, then we recommend that you add the AWS AppConfig Lambda
extension as a layer to your Lambda function.

GitOps is an innovative approach to configuration management that uses Git as the source of truth
for all configuration changes. This means that any changes made to your configuration files are
automatically tracked, versioned, and audited through Git.

Secrets management
Security is paramount, so credentials should not be passed in plain text. AWS offers secure services
for this, like AWS Systems Manager Parameter Store and AWS Secrets Manager. These tools can send
secrets to containers in Amazon EKS as volumes, or to Amazon ECS as environment variables. In
AWS Lambda, environment variables are made available to your code automatically. For Kubernetes
workflows, the External Secrets Operator fetches secrets directly from services like AWS Secrets Manager,
creating corresponding Kubernetes Secrets. This enables a seamless integration with Kubernetes-native
configurations.

14
Implementing Microservices on AWS AWS Whitepaper

Cost optimization and sustainability


Microservices architecture can enhance cost optimization and sustainability. By breaking an application
into smaller parts, you can scale up only the services that need more resources, reducing cost and waste.
This is particularly useful when dealing with variable traffic. Microservices are independently developed.
So customers can do smaller updates, and reduce the resources spent on end to end testing. While
updating they will have to test only a subset of the features as opposed to monoliths.

Stateless components (services that store state in an external data store instead of a local data store)
in your architecture can make use of Amazon EC2 Spot Instances, which offer unused EC2 capacity in
the AWS cloud. These instances are more cost efficient than on-demand instances and are perfect for
workloads that can handle interruptions. This can further cut costs while maintaining high availability.

With isolated services, you can use cost-optimized compute options for each auto-scaling group. For
example, AWS Graviton offers cost-effective, high-performance compute options for workloads that suit
ARM-based instances.

Optimizing costs and resource usage also helps minimize environmental impact, aligning with the
Sustainability pillar of the Well-Architected Framework. You can monitor your progress in reducing
carbon emissions using the AWS Customer Carbon Footprint Tool. This tool provides insights into the
environmental impact of your AWS usage.

15
Implementing Microservices on AWS AWS Whitepaper
REST-based communication

Communication mechanisms
In the microservices paradigm, various components of an application need to communicate over
a network. Common approaches for this include REST-based, GraphQL-based, gRPC-based and
asynchronous messaging.

Topics
• REST-based communication (p. 16)
• GraphQL-based communication (p. 16)
• gRPC-based communication (p. 16)
• Asynchronous messaging and event passing (p. 16)
• Orchestration and state management (p. 18)

REST-based communication
The HTTP/S protocol, used broadly for synchronous communication between microservices, often
operates through RESTful APIs. AWS's API Gateway offers a streamlined way to build an API that serves
as a centralized access point to backend services, handling tasks like traffic management, authorization,
monitoring, and version control.

GraphQL-based communication
Similarly, GraphQL is a widespread method for synchronous communication, using the same protocols
as REST but limiting exposure to a single endpoint. With AWS AppSync, you can create and publish
GraphQL applications that interact with AWS services and datastores directly, or incorporate Lambda
functions for business logic.

gRPC-based communication
gRPC is a synchronous, lightweight, high performance, open-source RPC communication protocol.
gRPC improves upon its underlying protocols by using HTTP/2 and enabling more features such as
compression and stream prioritization. It uses Protobuf Interface Definition Language (IDL) which is
binary-encoded and thus takes advantage of HTTP/2 binary framing.

Asynchronous messaging and event passing


Asynchronous messaging allows services to communicate by sending and receiving messages via a
queue. This enables services to remain loosely coupled and promote service discovery.

Messaging can be defined of the following three types:

• Message Queues: A message queue acts as a buffer that decouples senders (producers) and receivers
(consumers) of messages. Producers enqueue messages into the queue, and consumers dequeue and
process them. This pattern is useful for asynchronous communication, load leveling, and handling
bursts of traffic.

16
Implementing Microservices on AWS AWS Whitepaper
Asynchronous messaging and event passing

• Publish-Subscribe: In the publish-subscribe pattern, a message is published to a topic, and multiple


interested subscribers receive the message. This pattern enables broadcasting events or messages to
multiple consumers asynchronously.
• Event-Driven Messaging: Event-driven messaging involves capturing and reacting to events that occur
in the system. Events are published to a message broker, and interested services subscribe to specific
event types. This pattern enables loose coupling and allows services to react to events without direct
dependencies.

To implement each of these message types, AWS offers various managed services such as Amazon SQS,
Amazon SNS, Amazon EventBridge, Amazon MQ, and Amazon MSK. These services have unique features
tailored to specific needs:

• Amazon Simple Queue Service (Amazon SQS) and Amazon Simple Notification Service (Amazon
SNS): As you can see in Figure 8, these two services complement each other, with Amazon SQS
providing a space for storing messages and Amazon SNS enabling delivery of messages to multiple
subscribers. They are effective when the same message needs to be delivered to multiple destinations.

Figure 8: Message bus pattern on AWS


• Amazon EventBridge: a serverless service that uses events to connect application components
together, making it easier for you to build scalable event-driven applications. Use it to route events
from sources such as home-grown applications, AWS services, and third- party software to consumer
applications across your organization. EventBridge provides a simple and consistent way to ingest,
filter, transform, and deliver events so you can build new applications quickly. EventBridge event buses
are well suited for many-to-many routing of events between event-driven services.
• Amazon MQ: a good choice if you have a pre-existing messaging system that uses standard protocols
like JMS, AMQP, etc. This managed service provides a replacement for your system without disrupting
operations.
• Amazon MSK (Managed Kafka): a messaging system for storing and reading messages, useful for
cases where messages need to be processed multiple times. It also supports real-time message
streaming.
• Amazon Kinesis: real-time processing and analyzing of streaming data. This allows for the
development of real-time applications and provides seamless integration with the AWS ecosystem.

Remember, the best service for you depends on your specific needs, so it's important to understand what
each one offers and how they align with your requirements.

17
Implementing Microservices on AWS AWS Whitepaper
Orchestration and state management

Orchestration and state management


Microservices orchestration refers to a centralized approach, where a central component, known as the
orchestrator, is responsible for managing and coordinating the interactions between microservices.
Orchestrating workflows across multiple microservices can be challenging. Embedding orchestration
code directly into services is discouraged, as it introduces tighter coupling and hinders replacing
individual services.

Step Functions provides a workflow engine to manage service orchestration complexities, such as
error handling and serialization. This allows you to scale and change applications quickly without
adding coordination code. Step Functions is part of the AWS serverless platform and supports Lambda
functions, Amazon EC2, Amazon EKS, Amazon ECS, SageMaker, AWS Glue, and more.

18
Implementing Microservices on AWS AWS Whitepaper
Orchestration and state management

Figure 9: An example of a microservices workflow with parallel and sequential steps invoked by AWS Step
Functions

AWS Managed Workflows for Apache Airflow (MWAA) is an alternative to Step Functions. You should use
Amazon MWAA if you prioritize open source and portability. Airflow has a large and active open-source
community that contributes new functionality and integrations regularly.

19
Implementing Microservices on AWS AWS Whitepaper
Monitoring

Observability
Since microservices architectures are inherently made up of many distributed components, observability
across all those components becomes critical. Amazon CloudWatch enables this, collecting and tracking
metrics, monitoring log files, and reacting to changes in your AWS environment. It can monitor AWS
resources and custom metrics generated by your applications and services.

Topics
• Monitoring (p. 20)
• Centralizing logs (p. 21)
• Distributed tracing (p. 22)
• Log analysis on AWS (p. 23)
• Other options for analysis (p. 24)

Monitoring
CloudWatch offers system-wide visibility into resource utilization, application performance, and
operational health. In a microservices architecture, custom metrics monitoring via CloudWatch is
beneficial, as developers can choose which metrics to collect. Dynamic scaling can also be based on these
custom metrics.

CloudWatch Container Insights extends this functionality, automatically collecting metrics for many
resources like CPU, memory, disk, and network. It helps in diagnosing container-related issues,
streamlining resolution.

For Amazon EKS, an often-preferred choice is Prometheus, an open-source platform providing


comprehensive monitoring and alerting capabilities. It's typically coupled with Grafana for intuitive
metrics visualization. Amazon Managed Service for Prometheus (AMP) offers a monitoring service fully
compatible with Prometheus, letting you oversee containerized applications effortlessly. Additionally,
Amazon Managed Grafana (AMG) simplifies the analysis and visualization of your metrics, eliminating the
need for managing underlying infrastructure.

20
Implementing Microservices on AWS AWS Whitepaper
Centralizing logs

Figure 10: A serverless architecture with monitoring components

Figure 11: A container-based architecture with monitoring components

Centralizing logs
Logging is key to pinpoint and resolve issues. With microservices, you can release more frequently and
experiment with new features. AWS provides services like Amazon S3, CloudWatch Logs, and Amazon
OpenSearch Service to centralize log files. Amazon EC2 uses a daemon for sending logs to CloudWatch,
while Lambda and Amazon ECS natively send their log output there. For Amazon EKS, either Fluent
Bit or Fluentd can be used to forward logs to CloudWatch for reporting using OpenSearch and Kibana.
However, due to the smaller footprint and performance advantages, Fluent Bit is recommended over
Fluentd.

Figure 12 illustrates how logs from various AWS services are directed to Amazon S3 and CloudWatch.
These centralized logs can be further analyzed using Amazon OpenSearch Service, inclusive of Kibana for
data visualization. Also, Amazon Athena can be employed for ad hoc queries against the logs stored in
Amazon S3.

21
Implementing Microservices on AWS AWS Whitepaper
Distributed tracing

Figure 12: Logging capabilities of AWS services

Distributed tracing
Microservices often work together to handle requests. AWS X-Ray uses correlation IDs to track requests
across these services. X-Ray works with Amazon EC2, Amazon ECS, Lambda, and Elastic Beanstalk.

22
Implementing Microservices on AWS AWS Whitepaper
Log analysis on AWS

Figure 13: AWS X-Ray service map

AWS Distro for OpenTelemetry is part of the OpenTelemetry project and provides open-source APIs
and agents to gather distributed traces and metrics, improving your application monitoring. It sends
metrics and traces to multiple AWS and partner monitoring solutions. By collecting metadata from your
AWS resources, it aligns application performance with the underlying infrastructure data, accelerating
problem-solving. Plus, it's compatible with a variety of AWS services and can be used on-premises.

Log analysis on AWS


Amazon CloudWatch Logs Insights allows for real-time log exploration, analysis, and visualization.
For further log file analysis, Amazon OpenSearch Service, which includes Kibana, is a powerful tool.
CloudWatch Logs can stream log entries to OpenSearch Service in real time. Kibana, seamlessly
integrated with OpenSearch, visualizes this data and offers an intuitive search interface.

23
Implementing Microservices on AWS AWS Whitepaper
Other options for analysis

Figure 14: Log analysis with Amazon OpenSearch Service

Other options for analysis


For further log analysis, Amazon Redshift, a fully-managed data warehouse service, and Amazon
QuickSight, a scalable business intelligence service, offer effective solutions. QuickSight provides easy
connectivity to various AWS data services such as Redshift, RDS, Aurora, EMR, DynamoDB, Amazon S3,
and Kinesis, simplifying data access.

CloudWatch Logs has the capability to stream log entries to Amazon Kinesis Data Firehose, a service
for delivering real-time streaming data. QuickSight then utilizes the data stored in Redshift for
comprehensive analysis, reporting, and visualization.

24
Implementing Microservices on AWS AWS Whitepaper
Other options for analysis

Figure 15: Log analysis with Amazon Redshift and Amazon QuickSight

Moreover, when logs are stored in S3 buckets, an object storage service, the data can be loaded into
services like Redshift or EMR, a cloud-based big data platform, allowing for thorough analysis of the
stored log data.

Figure 16: Streamlining Log Analysis: From AWS services to QuickSight

25
Implementing Microservices on AWS AWS Whitepaper
Using protocols and caching

Managing chattiness in microservices


communication
Chattiness refers to excessive communication between microservices, which can cause inefficiency due to
increased network latency. It's essential to manage chattiness effectively for a well-functioning system.

Some key tools for managing chattiness are REST APIs, HTTP APIs and gRPC APIs. REST APIs offer
a range of advanced features such as API keys, per-client throttling, request validation, AWS WAF
integration, or private API endpoints. HTTP APIs are designed with minimal features and hence come at
a lower price. For more details on this topic and a list of core features that are available in REST APIs and
HTTP APIs, see Choosing between REST APIs and HTTP APIs.

Often, microservices use REST over HTTP for communication due to its widespread use. But in high-
volume situations, REST's overhead can cause performance issues. It’s because the communication uses
TCP handshake which is required for every new request. In such cases, gRPC API is a better choice. gRPC
reduces the latency as it allows multiple requests over a single TCP connection. gRPC also supports bi-
directional streaming, allowing clients and servers to send and receive messages at the same time. This
leads to more efficient communication, especially for large or real-time data transfers.

If chattiness persists despite choosing the right API type, it may be necessary to reevaluate your
microservices architecture. Consolidating services or revising your domain model could reduce chattiness
and improve efficiency.

Using protocols and caching


Microservices often use protocols like gRPC and REST for communication (see the previous discussion
on Communication mechanisms (p. 16). ) gRPC uses HTTP/2 for transport, while REST typically uses
HTTP/1.1. gRPC employs protocol buffers for serialization, while REST usually uses JSON or XML. To
reduce latency and communication overhead, caching can be applied. Services like Amazon ElastiCache
or the caching layer in API Gateway can help reduce the number of calls between microservices.

26
Implementing Microservices on AWS AWS Whitepaper
Resource inventory and change management

Auditing
In a microservices architecture, it's crucial to have visibility into user actions across all services. AWS
provides tools like AWS CloudTrail, which logs all API calls made in AWS, and AWS CloudWatch, which
is used to capture application logs. This allows you to track changes and analyze behavior across your
microservices. Amazon EventBridge can react to system changes quickly, notifying the right people or
even automatically starting workflows to resolve issues.

Figure 17: Auditing and remediation across your microservices

Resource inventory and change management


In an agile development environment with rapidly evolving infrastructure configurations, automated
auditing and control are vital. AWS Config Rules provide a managed approach to monitoring these
changes across microservices. They enable the definition of specific security policies that automatically
detect, track, and send alerts on policy violations.

For instance, if an API Gateway configuration in a microservice is altered to accept inbound HTTP traffic
instead of only HTTPS requests, a predefined AWS Config rule can detect this security violation. It logs
the change for auditing and triggers an SNS notification, restoring the compliant state.

27
Implementing Microservices on AWS AWS Whitepaper
Resource inventory and change management

Figure 18: Detecting security violations with AWS Config

28
Implementing Microservices on AWS AWS Whitepaper

Conclusion
Microservices architecture, a versatile design approach that provides an alternative to traditional
monolithic systems, assists in scaling applications, boosting development speed, and fostering
organizational growth. With its adaptability, it can be implemented using containers, serverless
approaches, or a blend of the two, tailoring to specific needs.

However, it's not a one-size-fits-all solution. Each use case requires meticulous evaluation given
the potential increase in architectural complexity and operational demands. But when approached
strategically, the benefits of microservices can significantly outweigh these challenges. The key is in
proactive planning, especially in areas of observability, security, and change management.

It's also important to note that beyond microservices, there are entirely different architectural
frameworks like Generative AI architectures such as Retrieval Augmented Generation (RAG), providing a
range of options to best fit your needs.

AWS, with its robust suite of managed services, empowers teams to build efficient microservices
architectures and effectively minimize complexity. This whitepaper has aimed to guide you through
the relevant AWS services and the implementation of key patterns. The goal is to equip you with the
knowledge to harness the power of microservices on AWS, enabling you to capitalize on their benefits
and transform your application development journey.

29
Implementing Microservices on AWS AWS Whitepaper

Contributors
The following individuals and organizations contributed to this document:

• Sascha Möllering, Solutions Architecture, Amazon Web Services


• Christian Müller, Solutions Architecture, Amazon Web Services
• Matthias Jung, Solutions Architecture, Amazon Web Services
• Peter Dalbhanjan, Solutions Architecture, Amazon Web Services
• Peter Chapman, Solutions Architecture, Amazon Web Services
• Christoph Kassen, Solutions Architecture, Amazon Web Services
• Umair Ishaq, Solutions Architecture, Amazon Web Services
• Rajiv Kumar, Solutions Architecture, Amazon Web Services
• Ramesh Dwarakanath, Solutions Architecture, Amazon Web Services
• Andrew Watkins, Solutions Architecture, Amazon Web Services
• Yann Stoneman, Solutions Architecture, Amazon Web Services
• Mainak Chaudhuri, Solutions Architecture, Amazon Web Services

30
Implementing Microservices on AWS AWS Whitepaper

Document history
To be notified about updates to this whitepaper, subscribe to the RSS feed.

Change Description Date

Major update (p. 31) Added information about AWS July 31, 2023
Customer Carbon Footprint
Tool, Amazon EventBridge,
AWS AppSync (GraphQL),
AWS Lambda Layers, Lambda
SnapStart, Large Language
Models (LLMs), Amazon
Managed Streaming for Apache
Kafka (MSK), Amazon Managed
Workflows for Apache Airflow
(MWAA), Amazon VPC Lattice,
AWS AppConfig. Added separate
section on cost optimization and
sustainability.

Minor updates (p. 31) Added Well-Architected to April 13, 2022


abstract.

Whitepaper updated (p. 31) Integration of Amazon November 9, 2021


EventBridge, AWS
OpenTelemetry, AMP, AMG,
Container Insights, minor text
changes.

Minor updates (p. 31) Adjusted page layout April 30, 2021

Minor updates (p. 31) Minor text changes. August 1, 2019

Whitepaper updated (p. 31) Integration of Amazon EKS, June 1, 2019


AWS Fargate, Amazon MQ, AWS
PrivateLink, AWS App Mesh,
AWS Cloud Map

Whitepaper updated (p. 31) Integration of AWS Step September 1, 2017


Functions, AWS X-Ray, and ECS
event streams.

Initial publication (p. 31) Implementing Microservices on December 1, 2016


AWS published.

Note
To subscribe to RSS updates, you must have an RSS plug-in enabled for the browser you are
using.

31
Implementing Microservices on AWS AWS Whitepaper

Notices
Customers are responsible for making their own independent assessment of the information in this
document. This document: (a) is for informational purposes only, (b) represents current AWS product
offerings and practices, which are subject to change without notice, and (c) does not create any
commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services
are provided “as is” without warranties, representations, or conditions of any kind, whether express or
implied. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements,
and this document is not part of, nor does it modify, any agreement between AWS and its customers.

Copyright © 2023 Amazon Web Services, Inc. or its affiliates.

32
Implementing Microservices on AWS AWS Whitepaper

AWS Glossary
For the latest AWS terminology, see the AWS glossary in the AWS Glossary Reference.

33
Cloud Practitioner (CLF-C02) Preparation Plan
Approach the exam day with confidence

v.1.0 Amazon Confidential 1|Page


Table of Contents
Cloud Practitioner (CLF-C02) Preparation Plan ............................................................................................................ 1

Summary .................................................................................................................................................................... 3

Introduction................................................................................................................................................................ 3

Prepare a schedule ..................................................................................................................................................... 3

Testing format ............................................................................................................................................................ 3

In the exam ................................................................................................................................................................. 4

Preparation method ................................................................................................................................................... 4

Preparation plan ......................................................................................................................................................... 5

v.1.0 Amazon Confidential Page |2


Summary
You can benefit from a wide range of expert insights, resources, and programs as you progress toward
achieving AWS Certification. Whether you’re just starting out or adding another certification, AWS can help
you effectively validate your cloud expertise.

The purpose of this article is to recommend the steps you can follow to get ready for the AWS Certified
Cloud Practitioner exam day.

Introduction
The AWS Certified Cloud Practitioner (CLF-C02) exam is intended for individuals who can effectively
demonstrate overall knowledge of the AWS Cloud, independent of a specific job role. The exam validates a
candidate’s ability to:
• Explain the value of the AWS Cloud.
• Understand and explain the AWS shared responsibility model.
• Understand security best practices.
• Understand AWS Cloud costs, economics, and billing practices.
• Describe and position the core AWS services, including compute, network, database, and storage
services.
• Identify AWS services for common use cases.

Prepare a schedule
To be efficient and effective with studying, you need to optimize the conditions under which you focus
best; this can include both the time of day that you study, as well the environment in which you study. Pay
attention when studying at different times and in different environments to figure out when and where
you are the most productive, and study under the conditions that work best for you.

Come up with a game plan:


• Develop a realistic study schedule based upon clear and specific goals for each session. Write down
where you’ll study, at what time, and exactly what you intend to do with that time. Note: don’t
plan to cram—studies show that cramming leads to higher stress and lower scores. Think in terms
of a five-day plan for each course.
• Determine available study time, blocks of time for specific tasks, and study with a sense of urgency.
Schedule sleep, meals, and (some) down time. Sleep deprivation reduces efficiency.
• Organize your study area and materials, and make necessary plans (e.g., with study group
members).

Testing format
AWS certification dumps aren’t an effective tool when taking you exam. This is because each time a student
takes the exam, the questions that they answer are picked at random from a pool of over 500 questions.
Even if you could find a dump that covers all of them, you’d still have to memorize over 500 possible
answers

There is no true substitute for experience. AWS recommends that you have specific hands-on experience
that covers the competencies, domains, and objectives in the content outline for each exam. In addition,

v.1.0 Amazon Confidential Page |3


AWS offers multiple training courses, and resources such as whitepapers and blogs, to assist you with
acquiring the knowledge and skills necessary for competent practice.
Unanswered questions are scored as incorrect; there is no penalty for guessing. Partial credit is not
awarded for multiple-response questions.

In the exam
A 30-minute exam extension is available upon request to non-native English speakers when taking an exam
in English. The accommodation, “ESL +30,” only needs to be requested once, prior to registering for an
exam. It will apply to all future exam registrations with all test delivery providers.

Preparation method
The advices about the preparation methods for the certification exam here is true and can be quite
effective for many individuals.

Maintaining motivation throughout your exam preparation is crucial. Certification exams can be
challenging, and having the drive to keep learning and studying will help you stay on track and achieve
better results.

Engaging in discussions with study partners can indeed help you understand exam objectives better.
Through conversations, you can gain new insights, uncover knowledge gaps, and reinforce your
understanding of exam topics.

Studying with others can be beneficial as it allows for collaborative learning. You can discuss concepts,
share different perspectives, and clarify doubts together. Additionally, group study can help keep you
accountable to a study schedule. Studying with a small group of like-minded individuals who are also
motivated to excel in the exam can ensure focused and productive discussions. This way, you can avoid
distractions and maintain a study environment that fosters learning.

Practicing with sample questions and exams is an excellent way to assess your knowledge and readiness for
the actual certification exam. Working on questions selected by someone else can expose you to different
question styles and topics, preparing you for a broader range of challenges.

While these methods can be highly effective for some learners, it's essential to recognize that different
individuals have varying learning styles. Some people prefer self-study and independent learning, while
others thrive in group settings. It's essential to experiment with different methods and find what works
best for you personally.

Remember that exam preparation should also include a balanced approach that covers understanding
concepts, hands-on practice with AWS services (if applicable), reviewing official documentation, and using
reputable study materials. Additionally, setting clear goals and a study schedule can help you stay
organized and focused during your exam preparation journey.

v.1.0 Amazon Confidential Page |4


Preparation plan
Use this 4-step method to prepare for your exam with confidence:

Step 0: Get your Skill Builder account ready

In the ever-evolving landscape of cloud technology, staying up-to-date with the latest tools and techniques
is crucial. Whether you’re an experienced developer, a cloud newbie, or someone looking to pivot into the
tech industry, acquiring the right skills can set you apart. That’s where the AWS Skill Builder comes into
play—a comprehensive learning platform designed to help you master Amazon Web Services (AWS).

AWS Skill Builder is a repository of over 700 training lessons to help you learn AWS and refine your
knowledge of AWS services and improve your skills so you can put them into practice or apply the
knowledge during the many AWS certifications.

Free digital training on AWS Skill Builder offers 700+ on-demand courses and learning plans so you can
build the skills you need, your way. Want to build problem-solving cloud skills in an interactive, engaging
experience? A Skill Builder subscription offers access to self-paced labs, practice exams, role-based games,
and real-world challenges to accelerate your learning.

Log into AWS Skill Builder. Some of the materials referenced below require digital subscriptions.

Step 1: Get to know the exam and exam-style questions

The first step is getting to know the exam and exam-style questions.

1. Review the exam details page and exam guide (linked from exam details page) to understand who
should take the exam and what is tested on the exam.

v.1.0 Amazon Confidential Page |5


2. Take the AWS Certified Cloud Practitioner Official Practice Question Set which features 20
questions developed by AWS to demonstrate the style and depth of our certification exams. You
can experience these exam-style questions, which offers detailed feedback and recommended
resources to help fill gaps. AWS Certification Official Practice Question Sets are available for free on
AWS Skill Builder.

The questions are created following the same process as questions you will see on the actual Certification
exams. They include detailed feedback and recommended resources to help you prepare for your exam.

Step 2: Learn about exam topics in AWS Skill Builder

The next step is brushing up on exam topics.

1. Take AWS Cloud Practitioner Essentials.


2. Take AWS Technical Essentials, modules 1, 2, 3, 5, and 6.
3. Take Getting Started with AWS Cloud Essentials, focusing on these lessons:
a. Understanding the AWS Global Infrastructure
b. Core Services Overview: Storage
c. Core Services Overview: Pricing
4. Gain hands-on experience using AWS services by playing AWS Cloud Quest: Cloud Practitioner.

Step 3: Take exam prep training in AWS Skill Builder

1. Take Exam Prep Enhanced Course: AWS Certified Cloud Practitioner (CLF-C02) to understand what
is tested on the exam and to review exam-style questions.

Step 4: Assess your exam readiness with a practice exam

1. Take AWS Certified Cloud Practitioner Official Practice Exam (CLF-C02 - English).

Each practice exam includes the same number of questions as the actual exam. They provide practice with
the same question style, depth, and rigor as the certification exam. They include exam-style scoring and a
pass/fail. You’ll also receive feedback on the answer choices for each question with recommended
resources to deepen your understanding of key topics.

You can determine if you want to simulate the exam experience by taking a timed exam with answers only
shown at the end. Or, you can choose other options, like un-timed, and with answers shown after
submitting each question.

v.1.0 Amazon Confidential Page |6


Migrate and Optimize:
Your Cloud Adoption
Roadmap
Migrate and Optimize: Your Cloud Adoption Roadmap

Contents

03 Introduction

04 The migration journey roadmap

05 Develop a cloud migration strategy

08 Planning your migration

10 Implement your migration

14 Continuously optimize and improve your workloads

16 Conclusion

2
Migrate and Optimize: Your Cloud Adoption Roadmap

Introduction
Your Azure migration journey is fundamental This roadmap guide provides a catalog of
to optimizing your workloads and costs. resources to assist both customers who
To help you build your specific migration and have signed a Microsoft Unified Agreement
optimization roadmap, Microsoft offers proven (Unified Contract customers) and those using
guidance, tools, and frameworks to guide you unmanaged disks (Unmanaged customers)
from initial strategy, planning, readiness, and in their cloud migration and workload
migration to ongoing innovation, management, optimization efforts at any stage in their
and organizational alignment. journey. It includes all relevant resources and
channels of support available for each step and
phase of the journey. Some are self-serve, while
Once you’ve successfully Microsoft and its partners assist with others.
migrated to the cloud, you
can begin turning ideas into This end-to-end migration
game-changing business roadmap has four phases.
impacts by taking advantage Each phase involves multiple
of Azure tools and resources action steps.
to help you:
Unmanaged customers can access Azure
customer enablement resources, a free library
1 Accelerate innovation and go to
of online materials, tools, and resources
market faster with differentiating
services and experiences. designed to help customers get started, build,
deploy applications, and optimize workloads
using Azure services. These resources are
2 Streamline productivity and control available to all Azure customers, including
with a unified platform that simplifies documentation, training videos, webinars,
complex IT management.
forums, and other self-paced learning materials.

3 Build trust through enhanced security In addition to Azure customer enablement


and compliance. resources, Unified Contract customers have
access to Microsoft Services Hub, a portal
providing all-day access to on-demand
learning and personalized recommendations.
It also provides tools for assessing and
managing IT health, customized reports
and insights, and unlimited end-to-end
managed Microsoft support.

Learn more about


Microsoft Unified
3
Migrate and Optimize: Your Cloud Adoption Roadmap

The migration
journey roadmap
Plan and implement an adoption journey
that’s tailored to your needs.

Plan your migration


Inventory your digital estate, align technical
and business strategies, and establish an
initial organizational structure.

Optimize and
improve your
workloads
Keep your workloads
Develop your optimized using
cloud migration best practices from
strategy the operational
excellence pillar of
Define a clear path
the Well-Architected
forward with proven
Framework.
guidance, best practices,
programs, and learning
resources.

Implement your migration


Create pre-provisioned landing zones
to host your workloads using the
Azure Well-Architected Framework.

4
Migrate and Optimize: Your Cloud Adoption Roadmap

01
Develop a cloud
migration strategy
Cloud adoption is a means to an end. It begins Understand your business
when business and IT decision-makers realize motivations
the cloud can accelerate specific business Understanding and evaluating your
transformation goals. A solid cloud strategy motivation for moving to the cloud
sets your initiative on the right footing to build contributes to a successful business outcome.
your business and technical migration plan Many organizations want to save cost,
against well-defined expectations and business scale growth, and onboard new technical
outcomes, taking into consideration the capabilities—several motivations can apply
trade-offs and your initial migration approach. simultaneously. They help you pinpoint
your strategic migration goals and shape
Microsoft helps you define a decisions your cloud adoption and workload
clear path forward with proven optimization team may make in the future.
guidance. With a collection of best Your team should meet with stakeholders,
practices, programs, and learning executives, and business leaders to discuss
resources, you can build expertise, which motivations are driving your business’s
create your cloud migration cloud adoption.
strategy, and achieve ongoing
cloud value through workload
optimization.

5
Migrate and Optimize: Your Cloud Adoption Roadmap

Document your business outcomes Build your business case


After motivations are aligned, document your The business case helps you to understand
desired business outcomes. This information how migrating a workload to the cloud
provides clear metrics to measure the overall can bring the most value to your business.
transformation of your business with respect Creating a business case for cloud migration
to the cloud migration journey. Cloud fosters support from your finance and
adoption requires investment in people business teams. It can also help accelerate
and resources. Understanding your desired migration and promote agility.
business outcomes will help you foster the
right level of executive sponsorship and
support needed for your organization.   With the business case,
you should be able to show:
Understand your
1 On-premises resource utilization
financial considerations  
versus Azure total cost of ownership.
Migrating to Azure requires careful financial
planning to help you achieve a cost-effective 2 Year-on-year cash flow analysis.
transition. Identify and plan a migration path
tailored to your workloads to optimize your
3 On-premises servers and workloads
cost projections. Consider factors like upfront that are ideal to shift to the cloud.
and ongoing costs, resource usage, hybrid
environments, and software licenses. 4 Quick wins for migration and
modernization including
Consider your technical end-of-support for Windows OS
and SQL versions.
requirements
Migrating to Azure requires careful technical
5 Long-term cost savings by moving from
planning to help ensure application a capital expenditure model
compatibility, network configuration, data to an operating expenditure model,
management, security and compliance, by paying only for the cloud
resources used.
monitoring and management, scalability,
and resiliency. By considering these technical
variables, organizations can set a course for Making the business case for migration
a successful migration to Azure. is likely to be an iterative conversation
among stakeholders. The goal is to align
Identify any trade-offs all stakeholders around the value of
Cloud migration has trade-offs between cloud adoption.
the cloud operational principles of security,
reliability, cost efficiency, and sustainability.
For instance, one workload may require
advanced security features that come at
a higher cost. In contrast, a different workload
needs more redundancy and backup to
meet compliance requirements, impacting
cost efficiency.

6
Migrate and Optimize: Your Cloud Adoption Roadmap

Migrate or modernize?
Each workload will require the decision of
whether to migrate (rehost) or modernize
(re-platform) your existing application.
The answer will likely depend on the type
of application or workload and your business
Resources for Unified Contract customers
goals for moving it to the cloud.
Onboard: SQL Server Migration Readiness

When you migrate your application, you Windows Migration Factory


move it to the cloud as-is (lift-and-shift) Onboarding Accelerator­—Migrate Single
to take advantage of infrastructure as a service Sign-on Applications to Microsoft Entra ID
(IaaS) to reduce your data center footprint Ramp Up on Technical Concepts, Skills,
and achieve immediate cost savings. and Tools With Azure Migrate

With a modernization approach, the Resources for Unmanaged customers


application is rebuilt and enhanced for the Cloud journey tracker assessment
cloud, delivering better performance and (15 minutes)
cost-efficiency. This platform-as-a-service Cloud adoption strategy evaluator
(PaaS) approach enables faster deployment, (10 minutes)
enhanced development productivity, and Strategic migration assessment and
increased potential for innovation. You’re not readiness tool
just moving the application but modernizing Governance benchmark assessment
your databases and processes. A DevOps (30 minutes)
methodology accelerates your workload App and data modernization readiness tool
modernization efforts, and PaaS solutions (30 minutes)
help you scale and reduce your management
Moving to the cloud: Your guide on when
overhead. to migrate and when to modernize

It’s generally faster and less expensive Resources for all customers   
to migrate an existing application, but that Get started with the Cloud Adoption
adoption approach doesn’t take advantage Framework for Azure
of opportunities to innovate in the cloud. Understand cloud operating models
Consider a migration approach if the
Cloud migration in the Cloud Adoption
source code is likely to remain stable and
the workload currently supports business Migration scenarios
processes and will continue to do so.

Consider modernizing a workload that


creates market differentiation and potentially
creates new experiences or service offerings.

7
Migrate and Optimize: Your Cloud Adoption Roadmap

02 Planning your migration


With your migration strategy in place, it’s time
to build an actionable plan to align your
technical efforts with your business strategy.

This involves a four-step exercise


to inventory and define your
digital estate, establish the initial
organizational alignment, address
skills gaps, and develop a cloud
adoption plan.

1 Rationalize your digital estate


The first step in planning your migration is to evaluate your current assets to determine the best approach
to hosting them in the cloud. There are three basic approaches to analyzing your digital estate:

Workload-driven A top-down assessment that measures the operational requirement


approach. of an application to determine the relative difficulty of migrating the
application to an IaaS, PaaS, or SaaS cloud platform. It also evaluates
the financial benefits such as operational efficiencies, total cost of
ownership, and return on investment.

Asset-driven A plan based on the assets that support an application for migration.
approach. In this approach, you pull statistical usage data from a configuration
management database (CMDB) or other infrastructure assessment
tools. This approach usually assumes an IaaS model of deployment
as a baseline.

Incremental This is a multiphase process and is strongly recommended as it


approach. enables more streamlined plan and accelerated results. It breaks
digital estate planning into initial cost analysis, workload assessment,
release planning, and implementation analysis.

8
Migrate and Optimize: Your Cloud Adoption Roadmap

2 Organizational alignment
Cloud adoption impacts all aspects of your business, IT, and corporate culture. Use this
guidance to get your people ready for cloud adoption with a cloud adoption and cloud
governance team.

3 Skills readiness plan


Many roles are likely to change as you shift to cloud computing. As you prepare for your
migration, each team should discuss and document the skills required for the new roles and
identify gaps and retraining opportunities.

4 Cloud adoption plan


This is where you put it all together and start prioritizing your workloads, aligning assets
to those workloads, and establishing iterations and release plans with rough timelines.

Migration tools and services


Simplify your migration and modernization with
a single portal for end-to-end visibility into your
on-premises and cloud estate. Azure Migrate helps
you discover and assess your on-premises resources
and manage your progress with a single dashboard.
You can also save time and money using
Migration Factory for Unified customers.

Planning resources for Unified customers


Onboard: SQL Server Migration Readiness
Windows Migration Factory
Onboarding Accelerator—Migrate Single Sign-on Applications to Microsoft Entra ID

Planning resources for Unmanaged customers


Strategic migration assessment and readiness tool
Build high confidence migration plans using Azure Migrate

Resources for all customers


Azure migration guide
Azure Migrate—Cloud migration services

9
Migrate and Optimize: Your Cloud Adoption Roadmap

03
Implement your migration
When it’s time to deploy your cloud workload,
the Azure guidance continues with ready-made,
infrastructure-as-code environments for hosting
your workloads, called Azure landing zones.
These conceptual architectures are hosting
environments pre-provisioned with foundational
capabilities that account for scale, security
governance, networking, and identity.

Azure landing zones support cloud adoption at


scale by providing repeatable environments with
consistent configuration and controls, regardless
of the workloads or Azure resources deployed
to each landing zone instance. The extensibility
of an Azure landing zone enables an organization
to easily scale specific elements of the environment,
as requirements evolve.

Two landing zones are available for


deploying your Azure subscriptions:

1 Platform landing zones are for Azure


subscriptions that provide centralized
services that various workloads and
applications will use.

2 Application landing zones are tailored


versions of the platform landing zone
for one or more subscriptions deployed
as an environment for an application
or workload.

10
Migrate and Optimize: Your Cloud Adoption Roadmap

Landing zone accelerators Subscription vending


For organizations where this conceptual A design principle of Azure landing zones
architecture fits with the operating model is to use subscriptions, not resource
and resource structure, Azure landing zone groups, as units of management and scale.
accelerators are ready-made deployment This subscription vending approach provides
experiences called Azure landing zone a platform mechanism for issuing subscriptions
accelerators. These services provide a rich to application teams for deploying workloads.
initial implementation of landing zones with It standardizes the process for requesting,
fully integrated governance, security, and deploying, and governing subscriptions,
operations included. enabling application teams to deploy their
workloads faster.
There are different approaches to deploying
and operating a landing zone architecture. Landing zone review
Depending on customizations and the One of the many valuable Azure assessments
technologies used. For instance, if you’re is geared toward evaluating your Azure
using an Azure landing zone portal landing zone for your specific scenarios.
accelerator a portal-based deployment, The Azure landing zone review provides
provides a full implementation of the curated and personalized guidance to fit your
architecture, along with configurations workload needs and help identify investment
for key components such as management areas to help you cloud adoption strategy.
groups and policies.
The assessment asks a series of multiple-choice
The preferred deployment option of questions to determine your cloud operating
most customers with large environments model and IaaS capabilities and responds
uses the Azure CLI or Azure PowerShell. with actions to consider for improving the
The Azure landing zone Terraform land zone across the design areas of the
accelerator provides an orchestrator module landing zone.
but allows you to deploy each capability
individually or in part. The Azure landing All technical and business requirements are
zone Bicep accelerator takes a modular considered complete when your environment
approach, where each module represents a configuration aligns with the Azure landing
core capability of the conceptual architecture. zone conceptual architecture. You can then
focus on enhancing your landing zone for
As new Azure features and services are various workload types.
released, landing zones might be modified
to include them. Likewise, as older Azure
features are deprecated or newer ones are
introduced, changes might also be made
to landing zones.

11
Migrate and Optimize: Your Cloud Adoption Roadmap

Azure Well-Architected Framework


The Azure Well-Architected Framework offers best practices, tools, assessments, and programs to help
ensure that your workloads are well-architected from the beginning. The framework is based on five pillars
that you can prioritize based on your specific workloads and needs:

Reliability When you build for reliability in the cloud, you help ensure a highly
available architecture as well as recovery from failures such as data
loss, major downtime, or ransomware incidents.

Cost Reduce unnecessary costs and improve operational efficiencies with


optimization tools, offers, and guidance to help optimize workload costs, save
money, and understand and forecast your bill.

Operational Keep your workloads running as expected with guidance for


excellence performing monitoring and diagnostics.

Performance Take performance requirements and budget into consideration as you


efficiency scale capacity up or out with Azure solutions.

Security Protect workloads with the multilayered security of Azure—across


physical datacenters, infrastructure, and operations—and stay ahead of
evolving threats using AI.

12
Migrate and Optimize: Your Cloud Adoption Roadmap

Azure
Well-Architected
Review

Well-Architected Reliability
recommendations Azure
Advisor
process Cost
optimization
Azure
Well-Architected Operational
excellence
Framework
Performance
efficiency
Architecture
Documentation
Center
Security

Partners
support and
service offers

Understanding the Well-Architected Framework pillars can help you


produce a high-quality, stable, and efficient cloud architecture.

Resources for unmanaged customers


Azure landing zone accelerator
Azure landing zone design areas and conceptual architecture

Resources for all customers


Landing zone review
Well-Architected review: Go-Live assessment
Subscription vending

13
Migrate and Optimize: Your Cloud Adoption Roadmap

04
Continuously optimize and
improve your workloads
To keep your workloads performing and The tool generates recommendations
optimized for your purposes after deployment, through a guided assessment and can pull
the Well-Architected Framework operational in Azure Advisor recommendations based
excellence pillar offers best practices for on your Azure subscription or resource
monitoring and diagnostic. group. The review helps establish a baseline
across the Well-Architected pillars to
Know the potential impact of your monitor improvements.
workload design decisions by using the
Azure Well-Architected Framework Review The operational excellence pillar of the
assessment to get recommendations for Well-Architected Framework covers the
optimizing your workload types, such as IoT, processes for reliable deployment and
SAP, data services, or machine learning. keeping your workloads running predictably
in production.

Access Integrate

Monitor Implement Triage

14
Migrate and Optimize: Your Cloud Adoption Roadmap

Implementation Accessed through the Azure portal or


For ongoing assessment of Azure workloads, API, Azure Advisor displays personalized
Azure Advisor offers a personalized cloud recommendations for all your Azure
consultant service to keep your workloads subscriptions. Advisor score is a core feature
optimized across the pillars of Well- of Advisor. It aggregates the findings into
Architected. a single score that measures and prioritizes
recommendations based on the
Azure Advisor continually analyzes your Well-Architected Framework.
resource configuration and usage telemetry
to recommend solutions to help improve the Advisor score will also provide historical
cost-effectiveness, performance, reliability, and trends and tell you which recommendation
security of your Azure workloads. will improve your score the most.

Resources for unmanaged customers


Azure Well-Architected Review

Resources for all customers


Armchair Architects: Architectural Erosion and Technical Debt
Azure Well-Architected Framework
Introduction to Azure Advisor

15
Migrate and Optimize: Your Cloud Adoption Roadmap

Conclusion
Microsoft guidance and learning resources help you maximize
your cloud investments to achieve continuous innovation
and workload optimization using advanced technologies like
AI, data analytics, machine learning, and more. Following a
successful migration, you can rapidly develop new ways
of driving business growth in your unified and secure
cloud environments.

Take the next steps


Connect with Azure Sales
Find a Microsoft partner for expertise and solutions
Get started with an Azure free account

©2023 Microsoft Corporation. All rights reserved.

Disclaimer
This document is provided “as-is.” Information and views expressed in this document, including URL and other Internet Web site references, may change without
notice. You bear the risk of using it. Examples herein may be for illustration only and if so are fictitious. No real association is intended or inferred.

This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your
internal, reference purposes.

This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.

Microsoft assumes no responsibility for any errors, omissions, or inaccuracies in this document or for any actions taken by the reader based on the information
provided. The information provided in this document should not be considered as legal, financial, or professional advice, and should not be relied upon as such.
Cloud Computing
Practical Topics – Platforms, Applications
and Best Practices

Stuttgart University
WS 2023/24
07-11-2023

Dr. Kristof Kloeckner Gerd Breiter


GM and CTO, IBM GTS (retired) DE, IBM (retired)
kristof.kloeckner@iaas.uni- gbreiter58@gmail.com
stuttgart.de
1

1
Recap

2
A Definition of Cloud Computing
National Institute of Standards 2011

Cloud computing is a model for enabling convenient, on-demand


network access to a shared pool of configurable computing
resources (e.g., networks, servers, storage, applications, and
services) that can be rapidly provisioned and released with
minimal management effort or service provider interaction. This
cloud model promotes availability and is composed of five
essential characteristics, three service models, and four
deployment models.

http://csrc.nist.gov/groups/SNS/cloud-computing/

This definition is old, but still useful.


3

Ask who is familiar with this definition


Let’s look at some of the underpinnings.

3
New digital services require new infrastructure, architectures,
processes and tools
Hybrid Clouds
Systems of Record Containers Systems of Engagement

• Data & Transactions • Mobile


• App Infrastructure • Social Networking
• Virtualized Resources
Next • Big Data and Analytics
Generation
Architectures
CRM ERP Insight
Cloud Native, Serverless, APIs
Systems of Discovery
• Sensors Analytics and
• Embedded Machine Learning
intelligence Signal from noise
• Connected devices New Processes
New Tools
Internet of Things
Edge Topologies 4

Point out the different characteristics and requirements.


• Systems of Record: Consistency most important. Transactional
• Systems of Engagement: Availability most important (and also convenience, ease
of use). Eventually consistent
• IoT: Huge numbers of devices, lots of data.
Speed of change
Speed of response
Reliability and recovery.
Note: Systems of discovery require integration (workflows) across multiple sources
and application of new technology (e.g. potentially quantum computing, generative
AI etc.)

4
A Short Tour of a Commercial
Cloud (AWS)

5
Source: Awsome Day 2019 Detroit Slideshare
6

6
Source: Awsome Day Detroit Slideshare, early 2019
7

7
Amazon’s focus on ‘builders’ and business
• “The broadest and deepest platform for today’s builders” (W. Vogels, CTO, AWS
Summit NYC July 2019)
– ”We provide a toolbox, you pick”
– Using most functions themselves to run and grow their own digital business
• “Enabling digital transformation” (A. Jassy, then AWS CEO, 2020)
• These messages were reinforced in the following years
– Builder Community Hub 2023
• Agility, DevOps pipeline, operational principles
– ‘Well-architected Framework’
• Toolkits, IDEs, Integration with e.g. Visual Studio
• Microservices, containers, serverless across the stack
– “You write the business logic, we do the heavy (infrastructure) lifting”
• Automation
• Security
• AI/Machine Learning (first separate keynote in 2020)
• Significant and growing investment in IoT
8

8
Some Basic Amazon (IaaS) Services
• EC2 – Elastic Compute Cloud,
– Consists of virtual machines (called EC2 instances) launched from
Amazon Machine Images (AMIs)
• AMI – Amazon Machine Image
– Templates for building EC2 instances
• VPC - Virtual Private Cloud
– Private network that isolates your resources
• EBS – Elastic Block Storage
– Persistent Storage Volumes that can be attached to an EC2 instance
• S3 – Simple Storage Service
– Object Store Service
• These services plus Relational Database Services are
sometimes called Foundational Services by AWS.
9

9
AWS Learner Lab

10

10
Cloud Services in the Learner Lab
• Most AWS services are available, sometimes with capacity
restrictions (see documentation)
• Limitation in using IAM, use the preconfigured role (LabRole)
rather than creating new roles
• You can create your own key pairs, but a digital key (vockey) is
also already available in the preconfigured terminal.
• The Learner Lab comes with $100 credits, so stop or
terminate your resources if you are not using them anymore.
Some resources (like EC2 instances) are automatically
restarted when starting a lab.

11

11
A Short Tour of AWS Services
• The AWS Console
• ‘The biggest toolbox for builders’ - AWS Services
• Some important examples of AWS services
– Virtual Machines (EC2)
– Object Storage (S3)
– NoSQL Database (DynamoDB)
– Simple Notifications (SNS)
– Serverless Functions (Lambda)
• The Market Place
Other Clouds are similar, we will introduce them later!

12

EC2 – Your server in the cloud


S3 – Your photos in the cloud
DynamoDB – Customer Database, Shopping Cart,
SNS – Subscription Service (News Feeds, Blog subscription)
Serverless Function – Event gets triggered

12
Hosting a Simple Web Application

VPC Virtual Machine

David Clinton. Learn Amazon Web Services in a


Month of Lunches (Kindle Locations 502-506).
Manning Publications. Kindle Edition.

13

• The security group controls the movement of data between your AWS resources
and the big, bad internet beyond
• The EC2 Amazon Machine Image (AMI) acts as a template for replicating precise
OS environments
• The Simple Storage Service (S3) bucket stores and delivers data for both backup
and delivery to users

David Clinton. Learn Amazon Web Services in a Month of Lunches (Kindle Locations
502-506). Manning Publications. Kindle Edition.

13
Storage and Databases
• Object Storage – S3
– Objects have urls, are stored in ‘buckets’
– Objects can be versioned
– Objects can be arranged in ‘folders’
– Buckets can host static websites
• Block Storage – EBS
– Virtual Disks for Virtual Machines
• Relational Database Services (RDS)
– Managed service with many option
• NoSQL DB (DynamoDB)
– Managed Service
– Key Value Store (Tables)
• Streams - Kinesis
14

Exercise: Include object from S3 bucket into a WordPress Blog


Exercise: Simple static website
Glacier: 11 Nines durability, cost about $1 for 1TB. Retrieval time from a few minutes
to hours
Also Deep Glacier with retrieval times between 12 and 48 hours

14
Cloud Networking
• A VPC can span availability zones
• A subnet is restricted to a single one
• Some services require 2 subnets
in different zones for enhanced
availability
• A route table determines where
traffic is routed
• Internet gateways and virtual private
gateways
• Security groups control traffic
(firewall rules)

Source: AWS Documentation


See also:
https://aws.amazon.com/vpc/?vpc-blogs.sort-by=item.additionalFields.createdDate&vpc-blogs.sort-order=desc 15

A VPC lets you provision a logically isolated section of the Amazon Web Services
(AWS) cloud where you can launch AWS resources in a virtual network that you
define

Note: CIDR = Classless Internet Domain Routing


Choose netmask 16 for VPC (First 2 octets for network, last two for devices), i.e.
10.0.0.0/16. Choose netmask 24 for subnets.
For moving between VPCs, you have to create an AMI, and create a new instance
from that.
Associate an Internet Gateway, route 0.0.0.0/0 to this gateway.
Make sure you enable access from and to ports 22 (SSH) and 80(HTTP)

15
Availability, Scaling, Security

Source: David Clinton, Learn AWS… 16

16
Integration Services
• Simple Notification Service – SNS
– Topic based publish and subscribe service
– Many services are able to publish or subscribe to a topic
• Simple Queuing Service – SQS
– At-least-once delivery
• Step Functions
– Basic Workflow
• Lambda Functions
– Trigger-based event processing
– Serverless computing (we will talk more about this)
– More than 140 integrations 17

17
A Modern Cloud Application
(AWS)
Events

Dynamic Content
DynamoDB

Longer-Running
Processes

S3
Static Content

Source: Implementing Microservices in AWS


18

API-driven
Monitoring from the start (CloudWatch)
Different styles of Compute (but always ‘small’ services)

18
Exercise
• Activate the AWS Academy Learner Lab
• Familiarize yourself with the console
• Launch an EC2 instance and connect to it
• Reboot, stop and terminate the instance
• Create an S3 bucket and upload an object

19

19
Cloud Design Principles and Best
Practices Frameworks

20

20
Designing Successful Cloud
Services
• Cloud Architecture Styles
• Cloud Native Applications
• 12 Factor App
• Azure Cloud Design Guidelines
• Amazon Well Architected Framework
• Google Tips for Building Reliable Services

21

21
Choices for Building a Cloud
Application

• Building new or moving to cloud


• Architecture Style, based on application type
• Core services and interfaces
– Standards, e.g. Cloud Native Foundation
– Basic behavior, e.g. eventual consistency
• Desired quality of service
• Applicable best practices
22

22
The Spectrum of Cloud Services
Cloud Enabled Cloud Centric/Cloud Native
Workloads Workloads

Scalable Elastic

Virtualized/Containerized Microservices/Containerized/Functions

Automated LIfecycle Integrated DevOps Lifecycle

Standardized Infrastructure Standardized Infrastructure/Serverless

Migration of Existing New


Middleware Workloads Cloud Platform
Workloads

Compatibility with existing systems Exploitation of cloud environments

23
23

Cloud enabled workloads – hosting in virtual machines (LAMP, WordPress)


Cloud Native – making use of cloud unique services (serverless etc.)

23
Cloud Service Level Agreements (SLAs)
Example: AWS
https://aws.amazon.com/legal/service-level-agreements/

• For EC2, “AWS will use commercially reasonable efforts to make Amazon
EC2 available for each AWS region with a Monthly Uptime Percentage of at
least 99.99%, in each case during any monthly billing cycle (the “Region-
Level SLA”). In the event Amazon EC2 does not meet the Region-Level SLA,
you will be eligible to receive a Service Credit as described below.”
– Less than 4.5 minutes downtime (of service) per month
• For an individual EC2 instance, the service commitment is 99.5% hourly
uptime (Last Update May 2022).
– Less than 3.65 hours downtime per month
• For S3, the service commitment is 99.9%
• For Container Services, a credit of 10% is given if availability is less than
99.99 and more than 99.0 %
• For higher level services, availability is often 99.9%
– Less than 45 minutes downtime per month
24

Note – They have updated their commitments.


Note: Control plane vs. data plane

99% 7.2 hours downtime/month

24
A Modern Cloud Application
(AWS)
Events

Dynamic Content
DynamoDB

Longer-Running
Processes
VMs
Containers

S3
Static Content

Source: Implementing Microservices in AWS


25

API-driven
Monitoring from the start (CloudWatch)
Different styles of Compute (but always ‘small’ services)

25
Cloud Architecture Styles
(from: Microsoft Cloud Application Architecture Guide)

• N-Tier
• Web-Queue-Worker
• Microservices
• CQRS – Command and Query Responsibility Separation
• Event driven
• Big data
• Big compute

26

26
Cloud Architecture Styles
N-Tier:
• Traditional Enterprise Architecture
• Separation of concerns into layers (e.g. presentation, logic, data)
• Often rigid and monolithic, first step in migration to cloud native
• Usually based on virtual machines (like the web server example)
• Containerize for movement across environments

Web-Queue-Worker:
• Frontend handling client requests
• Message queue to backend (asynchronous)
• Backend handles heavy load (can be coordinator-worker(s)
configuration)
27

Elastic Beanstalk is a pre-configured, parametrized environment

27
Web-Queue-Worker Environment
Example: Amazon Elastic Beanstalk

28

28
AWS Elastic Beanstalk

A simple way to deploy and manage web applications

29

29
Cloud Architecture Styles
Microservices:
• Evolution of service-oriented architectures
• Can be next step from n-tier ‘monoliths’
• Break up large applications and containerize, enable parallel
development, scaling and separation of concerns
• Use API gateway to decouple clients from the services
themselves (gateway routes API requests to the appropriate
service). Avoid tight coupling of services. Store data with service.
Keep domain knowledge out of gateway
• My view: Microservices need to publish contracts or SLAs
• https://xebialabs.com/assets/files/whitepapers/exploring-
microservices-questions.pdf
30

API Gateway is a broker pattern

30
• The microservices perform distinct business functions
• ‘Enabling services’ support communication between microservices

31

Blue-green deployment – two identical production environments. Ability to switch


over, if a change goes wrong.
1. Deploy change to green environment (staging).
2. Switch to green production once it is stable
3. Blue is now idle, can either be fallback or later new staging environment

Can be partitions of the same environment

Note: The left side shows something similar to the WordPress service we built. The
right-hand side separates out the various backends and allows for distinctive services
from third parties (like Shopify for e-commerce). Also separate scaling properties etc.

31
Cloud Architecture Styles
CQRS – Command and Query Responsibility Separation:

• Read and write workloads are often asymmetrical, with very different
performance and scale requirements.
• CQRS separates reads and writes into separate models, using commands to
update data, and queries to read data.
• Commands should be task based, rather than data centric.
– (“Book hotel room,” not “set ReservationStatus to Reserved.”)
• Commands may be placed on a queue for asynchronous processing, rather than
being processed synchronously.
• Queries never modify the database. A query returns a Data Transfer Object
(DTO) that does not encapsulate any domain
• Allows independent scaling. Really requires messaging. Can be useful in
microservices, where no direct access to another service’s data store is allowed.
• Example: e-Commerce (Catalog vs. Transaction). Read Replicas enable this
Style. 32

32
Read replicas for read-heavy databases

• Replication to read
replicas is done
asynchronously
• Read replicas can be
promoted to master, for
instance in case the
source instance fails

Example: AWS
33

33
Cloud Architecture Styles
Event driven:
• Pub/sub or streaming.
• Decoupling of producers and consumers. Real-time processing,
high volume/high velocity, pattern matching and complex events
• Challenges: Guaranteed delivery, preservation of order
• Examples: Function as a Service (e.g. Lambda), Kinesis Streams,
Kafka

34

34
Event-Driven Architectures

Source: AWS 35

35
Cloud Architecture Styles
Big data:
• Too big to be handled by traditional data bases
• Performance through parallelism (sharding of data,
coordinator/worker pattern)
• Distributed data store
• Partitioned data
• Use cases: Batch, Machine Learning (Model Training), Complex
Analytics

36

36
Cloud Architecture Styles

Big compute (HPC):

• Job queue with scheduler/coordinator and ‘worker


nodes’
• Simulation, number crunching, ‘embarrassingly
parallel’ tasks
• Fluid dynamics, risk analysis, image rendering etc.

37

Example: Map-Reduce

37
Cloud Native Architectures
• Cloud Native Foundation
• Characteristics of Cloud Native Services
• Eventual Consistency

38

38
Cloud Native Definition
(Cloud Native Foundation)
• Cloud native technologies empower organizations to build and run
scalable applications in modern, dynamic environments such as public,
private, and hybrid clouds. Containers, service meshes, microservices,
immutable infrastructure, and declarative APIs exemplify this approach.
• These techniques enable loosely coupled systems that are resilient,
manageable, and observable. Combined with robust automation, they
allow engineers to make high-impact changes frequently and predictably
with minimal toil.
• The Cloud Native Computing Foundation seeks to drive adoption of this
paradigm by fostering and sustaining an ecosystem of open source,
vendor-neutral projects. We democratize state-of-the-art patterns to make
these innovations accessible for everyone.

https://github.com/cncf/toc/blob/master/DEFINITION.md

39

Definition starts with a goal, and gives examples of technologies to achieve the goal

39
Cloud Native Foundation
https://www.cncf.io/
• Vendor neutral open source software foundation
• Fosters collaboration between developers, end-users and vendors
• Runs software projects, certification and education programs
• 18 platinum members, including Amazon, IBM, Google, Red Hat,
Microsoft, Alibaba, Huawei, Oracle, Cisco, Intel, SAP…
• Hundreds of other members
• 6 graduated projects, including:
– Kubernetes (Container Management)
– Prometheus (Monitoring)
– Envoy (Service Proxy)
– fluentd (logging)
• About 20 incubating projects, including Helm (Packaging), Jaeger
(Distributed Tracing), NATS (Messaging)
40

40
https://github.com/cncf/lan
dscape/blob/master/READ
ME.md#trail-map

41

https://github.com/cncf/landscape/blob/master/README.md#trail-map

41
Cloud native services are loosely coupled and
eventually consistent
• The concept of loose coupling goes back to principles for building robust
distributed systems
– Services are assumed to be autonomous
– No tight dependencies (if one service fails, others can still operate)
– Services communicate through clearly defined interfaces, preferably asynchronously
– Everything is a service
§ A service will invoke many other (distributed) services
§ Resources are cheap, plentiful, and not very reliable (commodity parts, where
possible)
§ Don’t try to prevent failure – fail fast and recover
§ Don’t debug and repair – kill and restart
§ Single points of failure are to be avoided through replication of resources
(services)
– Data is always stored in multiple copies

42

Multiple copies that are not updated synchronously imply the risk of accessing a copy
that is not up to date.

42
Cloud native services are loosely coupled and
eventually consistent
§ Eventual Consistency for data handling & replication: sometimes the data
storage service or database service will return the wrong answer, but eventually
the answer is going to be correct
§ This is due to replication for high availability
§ Message queues will deliver messages at least once (but can deliver multiple
times)
§ Scale is achieved through parallel asynchronous execution
§ Avoid synchronization overhead, stateless execution servers (store state/session
information outside the application)
§ Applications need to tolerate redundant execution (idempotency)

43

Multiple copies that are not updated synchronously imply the risk of accessing a copy
that is not up to date.

43
Eventual Consistency
• Eric Brewer’s CAP Theorem
– Of 3 properties of a shared data system (consistency, availability, tolerance to
network partitioning/failure), only 2 can be achieved simultaneously
• Strategies for availability all depend on data replication to multiple copies
– Quorum approaches with N= Number of Replicas, R = Read Quorum, W= Write
Quorum guarantees consistency if R + W > N (overlap of read and write sets)
– Systems focusing on fault tolerance often use N=3, W=R=2
• Other requirements (e.g. high load) require large N. If few writes, often R=1 to ensure a
read is available if at least one node operates
• To minimize likelihood of lost writes, choose W>1
• Very large distributed systems have to live with network partitioning
• If read and write set don’t overlap, we cannot achieve strong consistency, but this is
often combined with a ‘lazy’ update approach to eventually update all nodes
– Good example: Shopping cart
– Amazon shopping cart prioritizes availability for write
• Other considerations: Failure detection
44

From Vogels:
He presented the CAP theorem, which states
that of three properties of shared-data systems—data
consistency, system availability, and tolerance to network
partition—only two can be achieved at any given time. A
more formal confirmation can be found in a 2002 paper
by Seth Gilbert and Nancy Lynch.4

Thoughts on levels of consistency. Basically, a quorum system. N=number of


copies, W=write quorum, R=Read Quorum.

W+R > N ensures (strong) consistency.


W+R <= N does not guarantee consistency, since read and write don’t
overlap. R=1 ensures a write is always avaliable if at least one node operates,
but consistency is not ensured.
W=1 ensures a write is always possible, but writes can easily be lost if this
node fails, and no consistency is ensured.

44
Suggested Reading
• Werner Vogels, Eventually consistent, Comm.
ACM, Jan 2009

45
Examples of eventual consistency

• Amazon Simple Storage Service (S3) keeps three


replicas of a bucket, therefore at any given time a
replica might not represent the latest state
• Amazon Simple Queuing Service (SQS) delivers at
least once, but might deliver a message multiple
times due to multiple replicas of a queue being kept
• Amazon DynamoDB NoSQL DB achieves availability
due to replication and is therefore eventually
consistent

46

46
Cloud native services move from ACID
to BASE
• ACID – predictive and accurate
– Atomic
– Consistent
– Isolated
– Durable

• BASE – trade-offs for highly distributed systems


– Basically available
– Soft state (state is not necessarily always consistent)
– Eventually consistent (…but will be eventually)
47

47
Characteristics of modern applications –
it’s not just the technology that counts
https://aws.amazon.com/modern-apps/

• Culture of ownership (for lifecycle of a service)


• Architectural patterns: microservices
• Computing in modern applications: containers and
AWS Lambda (Cloud Functions, event-driven)
• Data management: purpose-built databases
• Release pipelines: standardized and automated
• Operational model: as serverless as possible
• Security: everyone's responsibility
48

Built for change, in small (co-operating) parts. Think not just about the app, but also
it’s lifecycle. Offload pain as much as possible.

48
A Modern Application in AWS

Events

Dynamic Content
DynamoDB

Longer-Running
Processes

S3
Static Content

Source: Implementing Microservices in AWS


49

API-driven
Monitoring from the start (CloudWatch)
Different styles of Compute (but always ‘small’ services)

49
Designing Successful Cloud
Services – Part 2
• Cloud Architecture Styles
• Cloud Native Applications
• 12 Factor App
• Azure Cloud Design Guidelines
• Amazon Well Architected Framework
• Google Tips for Building Reliable Services

51

51
The Twelve Factors for aaS Applications
(12factor.net, Adam Wiggins)
“The twelve-factor app is a methodology for building software-as-a-service apps that:
•Use declarative formats for setup automation, to minimize time and cost for new
developers joining the project;
•Have a clean contract with the underlying operating system, offering maximum portability
between execution environments;
•Are suitable for deployment on modern cloud platforms, obviating the need for servers
and systems administration;
•Minimize divergence between development and production, enabling continuous
deployment for maximum agility;
•And can scale up without significant changes to tooling, architecture, or development
practices.
The twelve-factor methodology can be applied to apps written in any programming
language, and which use any combination of backing services (database, queue, memory
cache, etc).”
(Quote from 12factor.net web site)
54

Adam Wiggins – formerly Heroku (Cloud Platform acquired by Salesforce)

54
The Twelve Factors for aaS Applications

I. Codebase
One codebase tracked in revision control, many deploys
II. Dependencies
Explicitly declare and isolate dependencies
III. Config
Store config in the environment
IV. Backing Services
Treat backing services as attached resources
V. Build, release, run
Strictly separate build and run stages
VI. Processes
Execute the app as one or more stateless processes
VII. Port binding
Export services via port binding (no runtime injection, listening to requests on a port)
55

Backing Services:
attached resources, accessed via a URL or other locator/credentials stored in
the config.

Port Binding:
The twelve-factor app is completely self-contained and does not rely on runtime
injection of a webserver into the execution environment to create a web-facing
service. The web app exports HTTP as a service by binding to a port, and listening to
requests coming in on that port.

Processes:
Share-nothing, horizontally scalable

55
The Twelve Factors for aaS Applications…..

VIII. Concurrency
Scale out via the process model (share-nothing, horizontally partionable)
IX. Disposability
Maximize robustness with fast startup and graceful shutdown
X. Dev/prod parity
Keep development, staging, and production as similar as possible
XI. Logs
Treat logs as event streams
XII. Admin processes
Run admin/management tasks as one-off processes

56

Concurrency:
Processes in the twelve-factor app take strong cues from the unix process model for
running service daemons. Using this model, the developer can architect their app to
handle diverse workloads by assigning each type of work to a process type.
The share-nothing, horizontally partitionable nature of twelve-factor app processes

56
Azure Design Principles for Cloud Services

• Design for self healing


• Make all things redundant
• Minimize coordination
• Design to scale out
• Partition (to work) around limits
• Design for operations
• Use managed services
• Use the best data store for the job
• Design for evolution
• Build for the needs of business
• Design resilient applications for Azure (with details)

Source: (Microsoft) Cloud Application Architecture Guide

Use this for discussion purposes

57
Azure Design Principles for Cloud Services

• Design for self healing (instrumentation, automation)


• Make all things redundant (e.g. use multiple availability zones)
• Minimize coordination (loose coupling, messaging)
• Design to scale out (horizontal scaling)
• Partition (to work) around limits
• Design for operations (automation, infrastructure as code)
• Use managed services (rather than managing instances oneself)
• Use the best data store for the job (not everything needs SQL)
• Design for evolution (APIs, containers for hybrid mobility)
• Build for the needs of business (don’t overprovision, use elasticity)
• Design resilient applications for Azure (with details)

Source: (Microsoft) Cloud Application Architecture Guide

Use this for discussion purposes

58
AWS Guiding Principles for ‘Well-Architected’ Cloud
Services
Stop guessing your capacity needs: Eliminate guessing about your infrastructure capacity needs. When
you make a capacity decision before you deploy a system, you might end up sitting on expensive idle
resources or dealing with the performance implications of limited capacity. With cloud computing, these
problems can go away. You can use as much or as little capacity as you need, and scale up and down
automatically.
Test systems at production scale: In the cloud, you can create a production-scale test environment on
demand, complete your testing, and then decommission the resources. Because you only pay for the test
environment when it's running, you can simulate your live environment for a fraction of the cost of testing on
premises.
Automate to make architectural experimentation easier: Automation allows you to create and replicate
your systems at low cost and avoid the expense of manual effort. You can track changes to your automation,
audit the impact, and revert to previous parameters when necessary.
Allow for evolutionary architectures: Allow for evolutionary architectures. In a traditional environment,
architectural decisions are often implemented as static, one-time events, with a few major versions of a
system during its lifetime. As a business and its context continue to change, these initial decisions might
hinder the system's ability to deliver changing business requirements. In the cloud, the capability to automate
and test on demand lowers the risk of impact from design changes. This allows systems to evolve over time
so that businesses can take advantage of innovations as a standard practice.
Drive architectures using data: In the cloud you can collect data on how your architectural choices affect
the behavior of your workload. This lets you make fact-based decisions on how to improve your workload.
Your cloud infrastructure is code, so you can use that data to inform your architecture choices and
improvements over time.
Improve through game days: Test how your architecture and processes perform by regularly scheduling
game days to simulate events in production. This will help you understand where improvements can be made
and can help develop organizational experience in dealing with events.
Source: AWS ‘Well-Architected Framework’ 59

The first principle has two elements: elasticity and being serverless

59
The Five Pillars of the Well-Architected
Framework

• Operational Excellence
• Security
• Reliability
• Performance Efficiency
• Cost Optimization

60

60
Design Principles for Operational Excellence
Continuous Improvement

• Perform operations as code


• Annotate documentation
• Make frequent, small, reversible changes (w/o impacting users, if possible)
• Refine operations procedures frequently
• Anticipate failure (‘pre-mortems’, testing failure scenarios)
• Learn from all operational failures

Best Practices:
Design, Operate, Evolve

Source: AWS ‘Well-Architected Framework’


61

61
Design Principles for Security
Risk Assessment and Mitigation
• Implement a strong identity foundation (least privilege, separation of
duties, centralized privilege management, reduce long-term credentials)
• Enable traceability
• Apply security to all layers (defense in depth)
• Automate security best practices
• Protect data in transit and at rest (encryption)
• Keep people away from data (eliminate manual processing)
• Prepare for security events

Best Practices:
Identity and Access Management, Detective Controls, Infrastructure
Protection (Separation/Segmentation), Data Protection, Incident Response

Source: AWS ‘Well-Architected Framework’ 62

Detective Controls: Detection violations


Infrastructure protection: Separation/Segmentation
Data protection: Encryption

Have AWS Principles of Shared Responsibility in mind

62
Design Principles for Reliability
Failure Prevention and Recovery

• Test your recovery procedures


• Automatically recover from failure (monitoring, event management,
automation)
• Scale horizontally to increase aggregate system availability
• Stop guessing capacity (elasticity, serverless)
• Manage change in automation (automate change management, and
manage this automation)

Best Practices
Foundations, Change Management, Failure Management

Source: AWS ‘Well-Architected Framework’ 63

63
Design Principles for Performance
Efficiency
• Democratize advanced technologies (by enabling them to be consumed as
a service)
• Go global in minutes (by deploying to multiple regions ‘at a click’)
• Use serverless architectures
• Experiment more often (enabled through virtualization and automation)
• Mechanical sympathy (use the technology approach that aligns best with
your goals)

Best Practices:
Selection, Review, Monitoring, Trade-offs

Source: AWS ‘Well-Architected Framework’


64

64
Design Principles for Cost Optimization
Continuous Refinement and Improvement

• Adopt a consumption model (enforce deactivation of un-needed


resources)
• Measure overall efficiency (business output)
• Stop spending money on data center operations ( Self-serving advice:
‘Amazon does the heavy lifting’)
• Analyze and attribute expenditure
• Use managed services and application level services

Best Practices:
Expenditure Awareness, Cost-Effective Resources, Matching supply and
demand, Optimizing over time

Source: AWS ‘Well-Architected Framework’ 65

This is heavily interlaced with Amazon self-promotion. It also talks about some basic
elements of the cloud business model.

65
Further Resources for the Well-Architected
Framework
• https://aws.amazon.com/architecture/well-architected/
• 14 well-architected ‘lenses’ provide best practices for building
specific types of applications, e.g.
– Serverless applications
– Containers
– High Performance Computing (HPC)
– Internet of Things
• AWS Architecture Center
• Well-Architected labs https://www.wellarchitectedlabs.com/

66

Demo!

66
Well-Architected Tool

67

This seems to be mostly a questionnaire with an assessment showing gaps in best


practices. It’s really more process than architecture.

67
Building Reliable Services on the Cloud
Phillip Tischler with Steve McGhee and Shylaja Nukala:
Building Reliable Services on the Cloud.
Systematic Resilience for Sustained Reliability,
O’Reilly, 2022

68

68
Core Architecture Choices

• Use Service-Oriented Architectures for product-level


and microservices to implement individual major
services
• Use well-defined interfaces and loose-coupling (APIs)
• Determine needs for horizontal and vertical scaling
– Use sharding and replication to horizontally scale stateful
services
• Use frameworks and platforms where available

69

69
Choice of Services
• Compute
– Use containers! Keep them small, start with serverless.
– Optimize for startup time, implement ready/live checks, terminate gracefully.
• Network
– Use provider's CDN, Load Balancers, private WANs, service meshes
• Storage
– Consider object stores, NoSQL databases, multi-regional database services
– Use Publish/Subscribe service to decouple readers/writers, improve retries
– Consider Batch (MapReduce/Flume) for high volumes or data

70

70
Failure Domains and Redundancy

• A failure domain is a group of resources that can fail


as a unit
• Create redundancy by using multiple servers, zones,
regions
• Eliminate shared dependencies (across units)
• Redundancy level as the sum of instances needed to
run a services plus number of failures to be tolerated

71

Example: Services spread across 2 zones use single database in 1 zone

71
Avoid Common Failure Modes
• Bad Changes
– Supervision/monitoring
– Progressive roll-out
– Automatic (tested) roll-back
– Infrastructure-as-Code
• Error Handling
– Soft dependencies, fail-safe alternatives
– Standardized error codes
– Distinguish between server overload, service overload and
quota exhaustion, prevent retry amplification

72

Multiple retries through server overload should lead to service overload designation

72
Avoid Common Failure Modes..
• Resource Exhaustion
– Avoid cascading failures
– Use cost modeling, quotas, load shedding, criticality
criteria
– Auto-scaling, caching
• Hotspots and Thundering Herds
– Gating to batch up equivalent requests
– Add jitter or random delays to cache expiration
• Don’t miss regular backups!

73

Hotspot: subset of servers receives disproportionate load


Thundering Herd; Sudden increase of requests due to an event (like upgrade to new
version)

73
74

74
Transformation to Cloud using multiple concurrent approaches
… to minimize risk & cost while leveraging new & existing investments to innovate & differentiate

Application Lift-Standardize-Consolidate-Automate-Shift Evolution to Cloud-


Portfolio based Application
Bare metal, VMs, Containers, Automation - SDDC • Base Virtualization
with Standardization
& Automation
Customer Information Contain-Expose-Extend
• Cloud native
API Creation & Management, Connectivity & Integration • Loosely-Coupled
• 12-factor
• Horizontal Scaling
Payment Systems • Eventually consistent
Refactor/Create as Cloud-Native/Microservices • Microservices
• Auto-scaling
• DevOps & CI
Event-Driven, aPaas, Containers, Microservices • Self-recovering
Business Process

Data Classification, Movement & Governance


VMs | Containers | aPaaS | iPaaS
New Applications Cognitive Data Classification, High-volume data transfer, Event Driven
Metadata Management
On-premises | Off-premises

75
76

76
Cloud Security (AWS)

77

77
Security and Compliance as Shared
Responsibility between AWS and Customer
https://aws.amazon.com/compliance/shared-responsibility-model/

78

In other words – YOU are responsible for YOUR content…


Amazon has 3rd party audits about their part of the stack.

78
https://aws.amazon.com/compliance/
79

79
AWS Security Capabilities
• Network Security
– Built-in Firewalls
– Encryption in transit
– Private/dedicated connections
– Distributed Denial of Service (DDoS) mitigation (Amazon Shield)
• Inventory and Configuration Management
– Control and track changes
• Data Encryption
– Encryption capabilities
– Key management services
– Hardware-based cryptographic key storage
• Access and Control Management
– Identity and Access Management (IAM)
– Multi-factor authentication
– Directory integration and Federation
– Amazon Cognito User Management and Secure Sign-On (SSO)
• Monitoring and Logging 80

Cognito: Authorization, Authentication & User Management for Mobile and Web
Amazon Inspector: Service performing security risk assessment and establishing best
practices. Hundreds of rules mapping to compliance standards
DDoS typically uses botnets to flood a website with traffic to deny service to
legitimate requests

AWS Shield Standard is free of charge protects against most common attacks
Advanced gives access to a response team.

80
Using Virtual Private Clouds (VPCs) to
secure resources

• A VPC can span availability zones


• A subnet is restricted to a single one
• Some services require 2 subnets
in different zones
• A route table determines where
traffic is routed
• Internet gateways and virtual private
gateways provide access
• Security groups control traffic
(firewall rules)

Source: AWS Documentation


See also:
https://aws.amazon.com/vpc/?vpc-blogs.sort-by=item.additionalFields.createdDate&vpc-blogs.sort-order=desc 81

A VPC lets you provision a logically isolated section of the Amazon Web Services
(AWS) cloud where you can launch AWS resources in a virtual network that you
define

Note: CIDR = Classless Internet Domain Routing


Choose netmask 16 for VPC (First 2 octets for network, last two for devices), i.e.
10.0.0.0/16. Choose netmask 24 for subnets.
For moving between VPCs, you have to create an AMI, and create a new instance
from that.
Associate an Internet Gateway, route 0.0.0.0/0 to this gateway.
Make sure you enable access from and to ports 22 (SSH) and 80(HTTP)

81
IAM - Configuring Access Rights

• IAM = Identity and Access Management


– https://aws.amazon.com/iam/
• The IAM console promotes best practices
– Select IAM from services panel
• Create and manage users, roles and groups
– Users are named operators with permanent credentials
– Roles have temporary credentials (for specific execution)
• Policies define permissions and are attached to users or roles
• AWS recommends to lock down root
– Eliminate root access keys
– Note: Only root user can control budget

82

Show some IAM roles for services


Role Based Access Control (RBAC)

82
IAM Management Console

83

83
IAM User Groups

Source: David Clinton, Learn AWS…


84

84
Policy Types

• Identity-based policies
• Resource-based policies
• Permissions boundaries
• Organization SCPs (Service Control Policies)
• Access control lists (ACLs)
• Session Policies

https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html

85

85
Containers and
Container Orchestration
Kubernetes, OpenShift, Operators, Service Mesh,
Security, Use cases
2023-11-16
Wolfram Richter
Manager Solution Architecture
Red Hat

1
Kubernetes Stabilizing since 2020

https://k8s.devstats.cncf.io/d/8/company-statistics-by-repository-group?orgId=1&var-period=y&var-metric=contributors&var-repogroup_name=All&var-repo_name=kubernetes&var-companies=All&from=1438380000000&to=1659391199000
Innovation Focus on the Surrounding Areas

https://all.devstats.cncf.io/d/1/activity-repository-groups?orgId=1&var-period=y&var-repogroups=All&from=1438380000000&to=1659391199000
CNCF Ecosystem Slide

4
Red Hat OpenShift | Functional Overview

Monitoring, Logging, Service Mesh, Serverless, Dev Tools, CI/CD,


Registry, Router, Telemetry Middleware/Runtimes, ISVs Automated Builds, IDE

Cluster Services Application Services Developer Services

Operations Services

Kubernetes

Linux OS

CaaS PaaS FaaS


Edge Physical Virtual Private cloud Multi-Arch Public cloud Managed cloud
5 (Azure, AWS, IBM, Google)
Red Hat OpenShift | Functional Overview

Unique Value of Red Hat OpenShift

Monitoring, Logging, Service Mesh, Serverless, Dev Tools, CI/CD,


Registry, Router, Telemetry Middleware/Runtimes, ISVs Automated Builds, IDE

Cluster Services Application Services Developer Services

Automated Operations

Kubernetes

Red Hat Enterprise Linux | RHEL CoreOS

Edge Physical Virtual Private cloud Multi-Arch Public cloud Managed cloud
6 (Azure, AWS, IBM, Google)
Let’s take a first peek...

https://developers.redhat.com/learn/openshift

7
Containers
Red Hat OpenShift | Functional Overview

Monitoring, Logging, Service Mesh, Serverless, Dev Tools, CI/CD,


Registry, Router, Telemetry Middleware/Runtimes, ISVs Automated Builds, IDE

Cluster Services Application Services Developer Services

Operations Services

Kubernetes

Linux OS

CaaS PaaS FaaS


Edge Physical Virtual Private cloud Multi-Arch Public cloud Managed cloud
9 (Azure, AWS, IBM, Google)
Container Analogy: cargo transport pre-1960

● Schiffscontainerbeispielslide
Solution: the inter-modal shipping container
Container-Basics - Namespaces

A kernel feature, not a container feature

● It’s a process!

● Mount - isolate filesystem mount points


● UTS - isolate hostname and domainname
● IPC - isolate interprocess communication (IPC) resources
CONTAINER
● PID - isolate the PID number space
● Network - isolate network interfaces
● User - isolate UID/GID number spaces
● (Time - allow each process to see different system time)
● Cgroup - isolate cgroup root directory
Container-Basics - Control Groups (cgroups)

CONTAINER
WHAT ARE CONTAINERS?
It Depends Who You Ask

INFRASTRUCTURE APPLICATIONS

● Application processes on a shared kernel ● Package apps with all dependencies


● Simpler, lighter, and denser than VMs ● Deploy to any environment in seconds
● Portable across different environments ● Easily accessed and shared

1
4
Red Hat OpenShift Concepts

A container is the smallest compute unit

CONTAINER

15
Red Hat OpenShift Concepts

Containers are created from container images

CONTAINER CONTAINER
IMAGE

BINARY RUNTIME

16
Red Hat OpenShift Concepts

Container images are stored in an image registry

CONTAINER CONTAINER
REGISTRY

17
Red Hat OpenShift Concepts

An image repository contains all versions of an image in


the image registry

IMAGE REGISTRY

myregistry/frontend myregistry/mongo

frontend:latest mongo:latest
frontend:2.0 mongo:3.7
frontend:1.1 mongo:3.6
frontend:1.0 mongo:3.4

18
From Hosts to VM’s to Containers

VM Management Kubernetes
App1 App2 App3 App1 App2 App3 App1 App2 App3

Bins/Libs Bins/Libs Bins/Libs Bins/Libs Bins/Libs Bins/Libs Bins/Libs

OS
Guest OS Guest OS

Hypervisor
Guest OS
?
(Host Operating System) Host Operating System
Infrastructure
Infrastructure Infrastructure

Traditional Virtual Machines Containers


From Hosts to VM’s to Containers

VM Management Kubernetes
App1 App2 App3 App1 App2 App3 App1 App2 App3

Bins/Libs Bins/Libs Bins/Libs Bins/Libs Bins/Libs Bins/Libs Bins/Libs


App4 App5 App6

Guest OS Guest OS Guest OS Bins/Libs Bins/Libs Bins/Libs


OS App8 App9
App7

Hypervisor Bins/Libs Bins/Libs Bins/Libs

(Host Operating System) Host Operating System


Infrastructure
Infrastructure Infrastructure

Traditional Virtual Machines Containers


From Hosts to VM’s to Containers
VM Management
Kubernetes
App1 App2 App3
Bins/Libs Bins/Libs Bins/Libs
App4 App5 App6
VM Management Kubernetes
Bins/Libs Bins/Libs Bins/Libs
App1 App2 App3 App1 App2 App3
App7 App8 App9
Bins/Libs Bins/Libs Bins/Libs Bins/Libs Bins/Libs Bins/Libs Bins/Libs Bins/Libs Bins/Libs
App4 App5 App6

Guest OS Guest OS Guest OS CoreOS CoreOS CoreOS Bins/Libs Bins/Libs Bins/Libs


App7 App8 App9

Hypervisor Hypervisor Bins/Libs Bins/Libs Bins/Libs

(Host Operating System) (Host Operating System) CoreOS

Infrastructure Infrastructure Infrastructure

Virtual Machines Containers on Virtual Machines Native Containers


How to handle all these Containers?

22
Red Hat OpenShift | Functional Overview

Monitoring, Logging, Service Mesh, Serverless, Dev Tools, CI/CD,


Registry, Router, Telemetry Middleware/Runtimes, ISVs Automated Builds, IDE

Cluster Services Application Services Developer Services

Operations Services

Kubernetes

Linux OS

CaaS PaaS FaaS


Edge Physical Virtual Private cloud Multi-Arch Public cloud Managed cloud
23 (Azure, AWS, IBM, Google)
Red Hat OpenShift Concepts

Everything runs in pods

CONTAINER

CONTAINER CONTAINER
IMAGE
POD

10.140.4.44

24
Red Hat OpenShift Concepts

Containers are wrapped in pods which are units of


deployment and management

CONTAINER CONTAINER CONTAINER

POD POD

10.140.4.44 10.15.6.55

25
Red Hat OpenShift Concepts

ReplicationControllers & ReplicaSets ensure a specified


number of pods are running at any given time

1 2 N

image name
replicas
labels
cpu CONTAINER CONTAINER ... CONTAINER

memory
storage

ReplicaSet
ReplicationController
POD POD POD

26
Red Hat OpenShift Concepts

Deployments and DeploymentConfigurations define


how to roll out new versions of Pods

v1 v2

image name
replicas CONTAINER
CONTAINER
labels
version
strategy

Deployment
DeploymentConfig POD POD

27
Red Hat OpenShift Concepts

A daemonset ensures that all (or some) nodes run a


copy of a pod

image name
✓ ✓
replicas
CONTAINER CONTAINER
labels
cpu
memory
storage
POD POD

DaemonSet Node Node Node


foo = bar foo = bar foo = baz

28
Red Hat OpenShift Concepts

Configmaps allow you to decouple configuration


artifacts from image content

Dev Prod

appconfig.conf appconfig.conf

CONTAINER CONTAINER

MYCONFIG=true MYCONFIG=false

ConfigMap POD ConfigMap POD

29
Red Hat OpenShift Concepts

Secrets provide a mechanism to hold sensitive


information such as passwords

Dev Prod

hash.pw hash.pw

CONTAINER CONTAINER

ZGV2Cg== cHJvZAo=

Secret POD Secret POD

30
The etcd datastore can be encrypted for additional security
https://docs.openshift.com/container-platform/4.6/security/encrypting-etcd.html
Red Hat OpenShift & Kubernetes Concepts

Services provide internal load-balancing and service


discovery across pods
SERVICE
“backend”
role:
backend

role: role: role:


frontend
CONTAINER CONTAINER CONTAINER role:
backen backen CONTAINER
backen
d d
d

POD POD POD


POD
10.140.4.44 10.110.1.11 10.120.2.22 10.130.3.33

31
Red Hat OpenShift & Kubernetes Concepts

Apps can talk to each other via services

SERVICE
“backend”
role:
backend

role: role: role:


frontend
CONTAINER CONTAINER CONTAINER role:
frontend frontend CONTAINER
frontend

POD POD POD


POD
10.140.4.44 10.110.1.11 10.120.2.22 10.130.3.33

32
Red Hat OpenShift Concepts

Routes make services accessible to clients outside the


environment via real-world urls
app-prod.mycompany.com

SERVICE
“frontend”
Route
role:
> curl http://app-prod.mycompany.com
backend

role:
role: role: frontend
frontend C frontend C C

POD POD
POD

33
Red Hat OpenShift Concepts

Persistent Volume and Claims

2Gi 2Gi CONTAINER

PersistentVolumeClaim PersistentVolume

My app is
POD
stateful.

34
Red Hat OpenShift Concepts

Liveness and Readiness

alive?

ready?
35
Red Hat OpenShift Concepts

Projects isolate apps across environments, teams,


groups and departments

PAYMENT DEV CATALOG

C C C ❌ C C C

POD POD POD POD POD POD

PAYMENT PROD INVENTORY

C C C
❌ ❌ C C C

POD POD POD POD POD POD

36
Kubernetes and what it can do for you

Service discovery and load Automatic bin packing - You Self-healing - Kubernetes
balancing - Kubernetes can provide Kubernetes with a cluster restarts containers that fail,
expose a container using the DNS of nodes that it can use to run replaces containers, kills
name or using their own IP containerized tasks. You tell containers that don't respond to
address. If traffic to a container is Kubernetes how much CPU and your user-defined health check,
high, Kubernetes is able to load memory (RAM) each container and doesn't advertise them to
balance and distribute the network needs. Kubernetes can fit clients until they are ready to
traffic so that the deployment is containers onto your nodes to serve.
stable. make the best use of your
resources.

37
Kubernetes and what it can do for you

Storage orchestration - Secret and configuration Automated rollouts and


Kubernetes allows you to management - Kubernetes lets rollbacks - You can describe the
automatically mount a storage you store and manage sensitive desired state for your deployed
system of your choice, such as information, such as passwords, containers using Kubernetes, and
local storages, public cloud OAuth tokens, and SSH keys. You it can change the actual state to
providers, and more. can deploy and update secrets and the desired state at a controlled
application configuration without rate. For example, you can
rebuilding your container images, automate Kubernetes to create
and without exposing secrets in new containers for your
your stack configuration. deployment, remove existing
containers and adopt all their
38
resources to the new container.
Kubernetes and what it can’t do for you

Does not deploy source code Does not provide Does not dictate logging,
and does not build your application-level services, such monitoring, or alerting
application. Continuous as middleware (for example, solutions. It provides some
Integration, Delivery, and message buses), data-processing integrations as proof of concept,
Deployment (CI/CD) workflows frameworks (for example, Spark), and mechanisms to collect and
are determined by organization databases (for example, export metrics.
cultures and preferences as well as PostgreSQL), caches, nor cluster
technical requirements. storage systems (for example,
Ceph) as built-in services.

39
Red Hat OpenShift | Architectural Overview

Your choice of infrastructure

COMPUTE NETWORK STORAGE

40
Red Hat OpenShift | Architectural Overview

Workers run workloads

COMPUTE COMPUTE

COMPUTE NETWORK STORAGE

41
Red Hat OpenShift | Architectural Overview

Control plane nodes control the cluster

CONTROL PLANE

COMPUTE NETWORK STORAGE

42
Red Hat OpenShift | Architectural Overview

State of everything

etcd

CONTROL PLANE

COMPUTE NETWORK STORAGE

43
Red Hat OpenShift | Architectural Overview

The core Kubernetes components

Kubernetes
API server

Kubernetes Services Scheduler

etcd

Cluster
Management
CONTROL PLANE

COMPUTE NETWORK STORAGE

44
Red Hat OpenShift | Architectural Overview

Internal and support infrastructure services

Infrastructure Services Monitoring | Logging |Tuned |SDN | DNS | Kubelet

Kubernetes Services

etcd

CONTROL PLANE

COMPUTE NETWORK STORAGE

45
Red Hat OpenShift | Architectural Overview

The core Red Hat OpenShift components

Red Hat OpenShift OpenShift


Services API server

Infrastructure Services

Kubernetes Services Operator Lifecycle


Management

etcd

Web Console
CONTROL PLANE

COMPUTE NETWORK STORAGE

46
Red Hat OpenShift | Architectural Overview

Run on all hosts

Red Hat OpenShift


Services

Infrastructure Services

Kubernetes Services

etcd

Monitoring | Logging | SDN Monitoring | Logging | SDN


Tuned | DNS | Kubelet SDN | DNS | Kubelet

CONTROL PLANE COMPUTE COMPUTE

COMPUTE NETWORK STORAGE

47
Red Hat OpenShift | Architectural Overview

Cluster monitoring

Red Hat OpenShift


Services

Infrastructure Services

Kubernetes Services

Prometheus | Grafana Prometheus | Grafana


etcd
Alertmanager Alertmanager

Monitoring | Logging | Monitoring | Logging |


Tuned SDN | DNS | Kubelet Tuned SDN | DNS | Kubelet

CONTROL PLANE COMPUTE COMPUTE

COMPUTE NETWORK STORAGE

48
Red Hat OpenShift | Architectural Overview

Integrated routing

Red Hat OpenShift


Services

Infrastructure Services

Kubernetes Services Router Router

Prometheus | Grafana Prometheus | Grafana


etcd
Alertmanager Alertmanager

Monitoring | Logging | Monitoring | Logging |


Tuned SDN | DNS | Kubelet Tuned SDN | DNS | Kubelet

CONTROL PLANE COMPUTE COMPUTE

COMPUTE NETWORK STORAGE

49
Red Hat OpenShift | Architectural Overview

Integrated image registry

Red Hat OpenShift


Services

Infrastructure Services Registry Registry

Kubernetes Services Router Router

Prometheus | Grafana Prometheus | Grafana


etcd
Alertmanager Alertmanager

Monitoring | Logging | Monitoring | Logging |


Tuned SDN | DNS | Kubelet Tuned SDN | DNS | Kubelet

CONTROL PLANE COMPUTE COMPUTE

COMPUTE NETWORK STORAGE

50
Red Hat OpenShift | Architectural Overview

Log aggregation

Red Hat OpenShift Kibana | Elasticsearch Kibana | Elasticsearch


Services

Infrastructure Services Registry Registry

Kubernetes Services Router Router

Prometheus | Grafana Prometheus | Grafana


etcd
Alertmanager Alertmanager

Monitoring | Logging | Monitoring | Logging |


Tuned SDN | DNS | Kubelet Tuned SDN | DNS | Kubelet

CONTROL PLANE COMPUTE COMPUTE

COMPUTE NETWORK STORAGE

51
Red Hat OpenShift | Architectural Overview

Normal cluster operations

Red Hat OpenShift


Services

Infrastructure Services

Kubernetes Services

etcd

CONTROL PLANE COMPUTE COMPUTE

COMPUTE NETWORK STORAGE

52
Red Hat OpenShift | Architectural Overview

Auto-healing failed pods

Red Hat OpenShift


Services

Infrastructure Services

Kubernetes Services

etcd

CONTROL PLANE COMPUTE COMPUTE

COMPUTE NETWORK STORAGE

53
Red Hat OpenShift | Architectural Overview

Auto-healing failed pods

Red Hat OpenShift


Services

Infrastructure Services

Kubernetes Services

etcd

CONTROL PLANE COMPUTE COMPUTE

COMPUTE NETWORK STORAGE

54
Red Hat OpenShift | Architectural Overview

Auto-healing failed nodes

Red Hat OpenShift


Services

Infrastructure Services

Kubernetes Services

etcd

CONTROL PLANE COMPUTE COMPUTE

COMPUTE NETWORK STORAGE

55
Red Hat OpenShift | Architectural Overview

Auto-healing failed nodes

Red Hat OpenShift


Services

Infrastructure Services

Kubernetes Services

etcd

CONTROL PLANE COMPUTE COMPUTE

COMPUTE NETWORK STORAGE

56
Red Hat OpenShift | Architectural Overview

Dev and Ops via web, cli, API, and IDE

Red Hat OpenShift


Kibana | Elasticsearch Kibana | Elasticsearch
Services
SCM
(GIT)

Infrastructure Services Registry Registry

Developers CI/CD
Kubernetes Services Router Router

Prometheus | Grafana Prometheus | Grafana


etcd
EXISTING Alertmanager Alertmanager
AUTOMATION
TOOLSETS
Monitoring | Logging | Monitoring | Logging |
Admins Tuned SDN | DNS | Kubelet Tuned SDN | DNS | Kubelet

CONTROL PLANE COMPUTE COMPUTE

COMPUTE NETWORK STORAGE

57
Red Hat OpenShift | Functional Overview

Monitoring, Logging, Service Mesh, Serverless, Dev Tools, CI/CD,


Registry, Router, Telemetry Middleware/Runtimes, ISVs Automated Builds, IDE

Cluster Services Application Services Developer Services

Operations Services

Kubernetes

Linux OS

CaaS PaaS FaaS


Edge Physical Virtual Private cloud Multi-Arch Public cloud Managed cloud
58 (Azure, AWS, IBM, Google)
Operators give OpenShift Users access to managed
applications in a cloud-like experience, wherever their
cluster runs.
HOW DOES AN OPERATOR OFFER MANAGED SERVICES?

Embed ops Operator v1.1.2 Deployments


knowledge from the StatefulSets
experts Autoscalers
Secrets
Config maps

60
HOW DOES AN OPERATOR WORK?

Developer / K8s API


Kubernetes Operator Native Kubernetes
OpenShift User Resources

Custom Resource Custom Kubernetes


kind: ProductionReadyDatabase
Controller
apiVersion:
database .example.com/v1alpha1 Watch Events
metadata:
name: my-important -database
spec: Reconciliation Deployments
connectionPoolSize: 300 StatefulSets
readReplicas: 2 Autoscalers
version: v4.0.1 Secrets
Config maps
+ PersistentVolume
61
Custom Resource Definition
CONFIDENTIAL INTERNAL USE

● Operator SDK - Allows developers to build, package and test an Operator


based on your expertise without requiring all the knowledge of Kubernetes
API complexities

● Operator Lifecycle Manager - Helps you to deploy, and update, and


generally manage the lifecycle of all of the Operators (and their associated
services) running across your clusters

● OperatorHub.io - Publishing platform for Kubernetes Operators, allows for


easy discovery and install of Operators using a graphical user interface
62
Operator Lifecycle Manager
The missing control panel for Operators

Operator Developer Cluster Admin Cluster User

Operator Catalog Operator Installation Operator Updates Operator Discovery Rich UI Controls

Dependency Resolution CRD Lifecycle Collision Detection Controller Transitions

63
Operator SDK
Enabling everybody to write Operators

Install & Update

Packaging

Install & Update + Day 2


Operations

Testing & Validation


Install & Update + Day 2 Operations +
Metrics/Alert analysis & workload tuning

64
OPERATOR FRAMEWORK

Operator Framework Themes

Operator Maturity Global Operators


Provide better tooling and abstractions to New management model for global
let developers focus on Operator features Operators that have multi-tenancy
and maturity built-in

Managed Services Ecosystem Subsumption


Enable use of Operators to power hosted Support more programming languages
services with declarative APIs and reduced and other packaging formats in Operator
resource consumption Framework

65
HOW OPERATORS POWER OPENSHIFT

Platform Operators Workload Operators / Add-Ons

○ Manage the OpenShift control plane and cluster ○ Provide workloads and automation on top of the
infrastructure platform
○ Cluster functionality exposed via Operator APIs ○ Managed Workloads exposed via Operator APIs
○ 40+ Operators, managed by Cluster Version ○ 69 Red Hat Operators, 155 ISV Operators, 152
Operator Community Operators, managed by OLM

ImageRegistry
MachineConfig CloudCredentials
Red Hat Quay OpenShift Pipelines

OpenShift GitOps
WebConsole Ingress

CrunchyDB Postgres
GitLabs Runner

66
KubeVirt - Virtualization in Containers
Red Hat OpenShift
Security
Features, mechanisms and processes for container and platform
isolation

68
Back to Index
OPENSHIFT SECURITY | Comprehensive features

Fine-Grained RBAC

● Project scope & cluster scope


available

● Matches request attributes


(verb,object,etc)

● If no roles match, request is


denied ( deny by default )
● Operator- and user-level
roles are defined by default
● Custom roles are supported
69
Red Hat OpenShift Security | Comprehensive features

Container Content CI/CD Pipeline


CONTROL
Application Security
Container Registry Deployment Policies

Container Platform Container Host Multi-tenancy

DEFEND
Network Isolation Storage
Infrastructure

Audit & Logging API Management

EXTEND Security Ecosystem

70
Red Hat OpenShift Security | Comprehensive features

SELinux - Extending Isolation Controls

● Everything in the operating system has a label


● Policy defines the interaction between a labeled process and labeled resources
● Policy comes with the distribution, but you can add your own
● Policy is enforced by the Kernel
● Enforcement is turned on by default
● Systems with SELinux enabled are less susceptible (e.g. container breakouts)

● Linux capabilities - break root privileges into smaller groups and control them
● Libseccomp - syscall filtering mechanism
● Namespaces - isolation primitives

71
SELinux

SELinux mitigates container runtime vulnerabilities

72

https://www.redhat.com/en/blog/selinux-mitig
ates-container-vulnerability
https://www.redhat.com/en/blog/latest-container-exploit-runc-
can-be-blocked-selinux
72
Security Context Constraints

Red Hat OpenShift Security Context Constraints


Use them to manage these controls

• Allow administrators to
control permissions for
pods

• Restricted SCC is
granted to all users

• By default, no containers
can run as root

• Admin can grant access


to privileged SCC

• Custom SCCs can be


created
73
Red Hat OpenShift Security | Comprehensive features

Red Hat OpenShift and FIPS 140-2


FIPS ready Services
● When built with RHEL 7 base image
OPERATORS & SERVICES*
Crypto calls
OpenShift calls FIPS validated crypto
● When running on RHEL 7.6 in FIPS mode, Red
Hat OpenShift components bypass go RED HAT OPENSHIFT PLATFORM
COMPONENTS
cryptographic routines and call into a RHEL FIPS Crypto calls
140-2 validated cryptographic library
● This feature is specific to binaries built with the RED HAT ENTERPRISE LINUX in FIPS mode
RHEL go compiler and running on RHEL FIPS Validated / Implementation Under Test
crypto libraries
RHEL CoreOS FIPS mode
*When built with RHEL base images
● Configure at install to enforce use of FIPS
Implementation Under Test* modules
74
Red Hat OpenShift Security | Comprehensive features

Certificates and Certificate Management

● Red Hat OpenShift provides its own CONTROL


internal CA ✓ PLANE

● Certificates are used to provide secure ✓ ETCD


connections to
✓ NODES
○ Control plane (APIs) and nodes
○ Ingress controller and registry INGRESS
✓ CONTROLLER
○ etcd
✓ CONSOLE
● Certificate rotation is automated
● Optionally configure external endpoints ✓ REGISTRY

to use custom certificates

75
Red Hat OpenShift Security | Comprehensive features

Service Certificates
service.beta.openshift.io/
inject-cabundle="true"

service-ca.crt

ConfigMap

service.alpha.openshift.io/
serving-cert-my CONTAINER

My
Service

POD
serving-cert-my

tls.crt
tls.key

Secret
ConfigMap Service Serving
CAbundle Injector Cert Signer
76
Red Hat OpenShift Security | Comprehensive features

Identity and Access Management

(6) Token
LDAP Google

Keystone OpenID

Control Node GitHub Request Header (3) Validate


(API) Credentials
GitLab Basic
(1) Referral

(2) Credentials userXX


(4) Create & Map
Identity
User (5) Token Oauth Server Identity

77
Red Hat OpenShift Security | Comprehensive features

Fine-Grained RBAC

• Project scope & cluster scope available


• Matches request attributes (verb,object,etc)
• If no roles match, request is
denied ( deny by default )
• Operator- and user-level
roles are defined by default
• Custom roles are supported

78
Red Hat OpenShift | Functional Overview

Monitoring, Logging, Service Mesh, Serverless, Dev Tools, CI/CD,


Registry, Router, Telemetry Middleware/Runtimes, ISVs Automated Builds, IDE

Cluster Services Application Services Developer Services

Operations Services

Kubernetes

Linux OS

CaaS PaaS FaaS


Edge Physical Virtual Private cloud Multi-Arch Public cloud Managed cloud
79 (Azure, AWS, IBM, Google)
Red Hat OpenShift
Monitoring
An integrated cluster monitoring and alerting stack

80
Back to Index
Red Hat OpenShift Monitoring | Solution Overview

Red Hat OpenShift Cluster Monitoring

Metrics collection and Alerting/notification via Metrics visualization via


storage via Prometheus, Prometheus’ Grafana, the leading
an open-source Alertmanager, an metrics visualization
monitoring system time open-source tool that technology.
series database. handles alerts send by
Prometheus.

81
Red Hat OpenShift Monitoring | Operator & Operand Relationships

cluster-monitoring- Grafana node-exporter


operator

kube-state-metrics Openshift-state-
Metrics (4.2)

prometheus-adapter telemeter-client

Prometheus Alertmanager
prometheus-operator

82
Red Hat OpenShift Monitoring | Prometheus, Grafana and Alertmanager Wiring

Red Hat OpenShift API


Grafana Prometheus Alertmanager

openshift-state-metrics

kube-state-metrics
Control Plane (API)

node-exporter node-exporter

Node (kubelet) Node (kubelet)


Infra/Worker (“hardware”) Worker (“hardware”)

83
Red Hat OpenShift
Logging
An integrated solution for exploring and corroborating application logs

84
Back to Index
Red Hat OpenShift Logging | Solution Overview

Observability via
log exploration and corroboration with EFK
• Components

• Elasticsearch: a search and analytics engine to store logs


• Fluentd: gathers logs and sends to Elasticsearch.
• Kibana: A web UI for Elasticsearch.

• Access control

• Cluster administrators can view all logs


• Users can only view logs for their projects

• Ability to forward logs elsewhere

• External elasticsearch, Splunk, etc


85
Red Hat OpenShift Logging | Operator & Operand Relationships

ElasticSearch
Operator

ElasticSearch
Cluster

Cluster Logging
Operator

Kibana

Curator CronJob ...

Fluentd
(per node)
Curator
86
Red Hat OpenShift Logging | Architecture

Log data flow in Red Hat OpenShift

Fluentd
TLS TLS
Fluentd
Node Elasticsearch Kibana
Fluentd
Node Application Logs

Node

87
Red Hat OpenShift Logging | Architecture

Log data flow in Red Hat OpenShift

stdout
stderr

Fluentd
TLS

Elasticsearch

CRI-O
OS DISK journald

kubelet

Node (OS)

88
Red Hat OpenShift Logging | Architecture

New log forwarding API (since 4.6)


inputSource=app
Abstract Fluentd configuration by introducing new Forward logs to
different systems
log forwarding API to improve support and based on their
“inputSource”.
experience for customers.
apiVersion: "logging.openshift.io/v1"
• Introducing a new, cluster-wide ClusterLogForwarder CRD kind: "ClusterLogForwarder"

(API) that replaces needs to configure log forwarding via Infra spec:
outputs:
- name: MyLogs
Fluentd ConfigMap. type: Syslog
App syslog:
Facility: Local0
• The API helps to reduce probability to misconfigure Fluentd url: localstore.example.com:9200
pipelines:
and helps bringing in more stability into the Logging stack. Audit - inputs: [Infrastructure, Application, Audit]
outputs: [MyLogs]

• Features include: Audit log collection and forwarding, Kafka


support, namespace- and source-based routing, tagging,
as well as improvements to the existing log forwarding
features (e.g. syslog RFC5424 support).

inputSource=audit

89

Product Manager: Christian Heidenreich


Red Hat OpenShift Logging | Architecture

Secure Log Forwarding to 3rd party


apiVersion: "logging.openshift.io/v1"
kind: "ClusterLogForwarder" Cluster Logging
Operator
spec:
outputs:
- name: MyLogs
type: Syslog
syslog:
watches
Facility: Local0
url: localstore.example.com:9200
pipelines:
- inputs: [Infrastructure, Application, Audit]
outputs: [MyLogs]

“ClusterLogForwarder”
Custom Resource creates

External Logging system

Fluentd Fluentd
daemonset forwarder

90 Node
Customer
Examples

91
https://www.redhat.com/en/resources/audi-case-study?sc_cid=701f2000000txokAAA&utm_source=bambu&utm_medium=social
https://www.cio.de/a/volkswagen-senkt-testkosten-um-die-haelfte,3670004
Container and Kubernetes extending to Edge deployments

https://news.lockheedmartin.com/2022-10-25-Lockheed-Marti-Red-Hat-Collaborate-Advance-Artificial-Intelligence-Military-Missions
https://new.abb.com/news/detail/93075/abb-and-red-hat-partner-to-deliver-further-scalable-digital-solutions-across-industrial-edge-and-hybrid-cloud
INDUSTRY 4.0

ONCITE is a cloud-native, highly scalable technology and


infrastructure platform for Industrial Edge Appliance
The quick and easy implementation of
data-driven shop floor applications with
simultaneous data sovereignty is currently
one of the greatest challenges facing the
manufacturing industry. The Industrial
Edge Appliance ONCITE from German
Edge Cloud (GEC) has been expanded to
include components from the IBM Cloud
Paks, which are based on the Kubernetes
platform for companies Red Hat OpenShift.
With the package made up of hardware,
software and application management
services, production companies, OEM
manufacturers and the supplier industry
can quickly benefit from digitalization in
production through the use of hybrid cloud
- even with little of their own resources and
know-how.
96
Red Hat OpenShift | Functional Overview

Monitoring, Logging, Service Mesh, Serverless, Dev Tools, CI/CD,


Registry, Router, Telemetry Middleware/Runtimes, ISVs Automated Builds, IDE

Cluster Services Application Services Developer Services

Operations Services

Kubernetes

Linux OS

CaaS PaaS FaaS


Edge Physical Virtual Private cloud Multi-Arch Public cloud Managed cloud
97 (Azure, AWS, IBM, Google)
Build and Deploy
Container Images
Tools and automation that makes developers productive quickly

98
Back to Index
Build and Deploy | Three Ways to Serve Developers

DEPLOY YOUR DEPLOY YOUR DEPLOY YOUR


SOURCE CODE APP BINARY CONTAINER IMAGE

99
Build and Deploy | Source-to-Image (S2I) for building/deploying from code

Code
(Git)

Developer Repository

CODE APPLICATION

Image Builder
S2I
Registry Image

BUILD IMAGE

Application
Image

DEPLOY
OpenShift Pipelines

Continuous Integration (CI) and Continuous Delivery (CD)

GIT COMMIT RELEASE

Code Run Debug Build Test Package Deploy Stage Prod

DEVELOPMENT CONTINUOUS INTEGRATION CONTINUOUS DELIVERY

101
OpenShift Pipelines

One Continuous Delivery


Multiple Clouds Multiple Geographies
Multiple Platforms Isolated Environments

Pre-Production Production

LOCAL DEV CI CD CD

Workstation AWS VMware


Cloud GCP OpenStack
Azure baremetal
102
OpenShift Pipelines

One Continuous Delivery


Multiple Clouds
Multiple Platforms

DEVELOPMENT CONTINUOUS INTEGRATION CONTINUOUS DELIVERY

Workstation Kubernetes Kubernetes


Azure
AWS
GCP
VMware
Kubernetes OpenStack
baremetal
103
OpenShift Pipelines

What is Cloud-Native
Continuous Integration and Continuous Delivery (CI/CD)?

Containers Serverless DevOps


Built for container apps and runs Runs serverless with no CI/CD Designed with microservices and
on Kubernetes engine to manage and maintain distributed teams in mind

104
OpenShift Pipelines

OpenShift Pipelines
a Cloud-Native CI/CD Experience on OpenShift

Standard Kubernetes-style
pipelines Build images with Kubernetes tools
Declarative pipelines with standard Kubernetes Use tools of your choice (source-to-image, buildah,
custom resources (CRDs) based on Tekton* kaniko, jib, etc) for building container images

Run pipelines in containers Deploy to multiple platforms


Scale pipeline executions on-demand with containers Deploy applications to multiple platforms like
on Kubernetes serverless, virtual machines and Kubernetes

Integration with OpenShift and


Powerful command-line tool Tooling
Run and manage pipelines with an interactive A CI/CD experience integrated with OpenShift,
command-line tool developer tools and IDE extensions

105
OpenShift Pipelines

An open-source project for providing a set of shared and standard


components for building Kubernetes-style CI/CD systems

Governed by the Continuous Delivery Foundation


Contributions from Google, Red Hat, Cloudbees, IBM, Pivotal and many more
106
OpenShift Pipelines

OpenShift Pipelines Architecture

Developer Dev Console Tekton CLI


CodeReady Workspaces
Visual Studio Code
(Eclipse Che)
Tools

API

OpenShift Pipelines

CI/CD Operator Extensions Integrations Tasks


Core
Tekton Core

Kubernetes OpenShift

107
Red Hat OpenShift | Functional Overview

Monitoring, Logging, Service Mesh, Serverless, Dev Tools, CI/CD,


Registry, Router, Telemetry Middleware/Runtimes, ISVs Automated Builds, IDE

Cluster Services Application Services Developer Services

Operations Services

Kubernetes

Linux OS

CaaS PaaS FaaS


Edge Physical Virtual Private cloud Multi-Arch Public cloud Managed cloud
108 (Azure, AWS, IBM, Google)
OpenShift ServiceMesh

What are Microservices?


an architectural style that structures an application as a collection of services

▸ Single purpose
▸ Independently deployable
▸ Have their context bound to a biz
domain
▸ Owned by a small team
▸ Often stateless

109
OpenShift ServiceMesh

Benefits of Microservices
Agility
Deliver updates faster and react faster to new business demands

Highly scalable
Scale independently to meet temporary traffic increases, complete batch
processing, or other business needs

Can be purpose-built
Use the languages and frameworks best suited for the service’s domain

Resilience
Improved fault isolation restricts service issues, such as memory leaks or
open database connections, to only affect that specific service

Many orgs have had success with Microservices - Netflix, Amazon, eBay, The Guardian

110
OpenShift ServiceMesh

There is inherent complexity in adopting microservices


Some common areas where organizations stumble when adopting microservices

Tolerance to Faults DevOps and Deployments


Cascading failure, partial outages, traffic spikes More failure surface, version incompatibility, untracked svcs

Services Communication Needs Inability to Monitor & Understand Performance


Latency, concurrence, distributed transactions More to monitor & different types of monitoring required

Securing Services Highly Distributed Logs


Malicious requests, DoS, id & access control Scattered logs, lots more logs to manage, access control

111
OpenShift ServiceMesh

Istio Service Mesh


A modern way to manage the complexity of microservice applications

112
OpenShift Service Mesh
OpenShift ServiceMesh

Connect, Secure, Control and Observe Services on OpenShift

Services
➤ Connect services securely with zero-trust
network policies. F
➤ Automatically secure your services with
managed authentication, authorization and Envoy Envoy Envoy
encryption.

➤ Control traffic to safely manage deployments,


OpenShift Service Mesh
A/B testing, chaos engineering and more.

➤ See what’s happening with out of the box Istio Kiali Jaeger

distributed tracing, metrics and logging.

➤ Manage OpenShift Service Mesh with the Kiali OPENSHIFT


web console.
Red Hat Enterprise Linux CoreOS

Physical Virtual Private cloud Public cloud


113
OpenShift ServiceMesh

Istio service mesh visualized with Kiali

114
Let’s try it...

Demo Scenario:
https://istio.io/latest/docs/examples/bookinfo/

$ git clone
https://github.com/istio/istio.git
$ cd istio

115
What's new in OpenShift 4.6

OpenShift Pipelines

OpenShift GitOps
(new add-on)

● Enable teams to adopt a declarative GitOps


Cluster
approach to multi-cluster configuration and State
continuous delivery

● OpenShift GitOps is complementary to OpenShift


Pipelines and includes Desired OpenShift Observe
State GitOps State
○ Argo CD
○ GitOps Application Manager CLI
○ Integrated into Dev Console (App Stages) Take
Action
● Included in OpenShift SKUs

116
GitOps with Argo CD

● Cluster and application configuration versioned in Git


Monitor
● Automatically syncs configuration from Git to clusters
● Drift detection, visualization and correction
● Granular control over sync order for complex rollouts Detect
Sync
drift
● Rollback and rollforward to any Git commit
● Manifest templating support (Helm, Kustomize, etc)
Take
● Visual insight into sync status and history action

117
OpenShift Platform Services

Wrap up

118
Red Hat OpenShift | Functional Overview

Unique Value of Red Hat OpenShift

Monitoring, Logging, Service Mesh, Serverless, Dev Tools, CI/CD,


Registry, Router, Telemetry Middleware/Runtimes, ISVs Automated Builds, IDE

Cluster Services Application Services Developer Services

Automated Operations

Kubernetes

Red Hat Enterprise Linux | RHEL CoreOS

Edge Physical Virtual Private cloud Multi-Arch Public cloud Managed cloud
119 (Azure, AWS, IBM, Google)
Thank you linkedin.com/company/red-hat

youtube.com/user/RedHatVideos

Red Hat is the world’s leading provider of enterprise


facebook.com/redhatinc
open source software solutions. Award-winning
support, training, and consulting services make
Red Hat a trusted adviser to the Fortune 500. twitter.com/RedHat

120
HYBRID CLOUD
SECURITY

Rhonda Childress
Kyndryl Fellow
Chief Innovation Officer – Kyndryl Security Practice
Cloud Security: Current State

Cloud Security
Current State
• More investments in Zero Trust and securing/encrypting data. Cloud
isn’t delivering the ROI as quickly as many expected, and CISOs are
facing complex cloud and legacy environments that are full of gaps and
seams. So, security is getting more involved in the assembly process.

• Companies are investing money in infrastructure services providers to


stitch their cloud and on-prem environments together and run the
ecosystem, while the enterprise sits on top of it all and
monitors/operates it.

• As the security threat of quantum computing and Generative AI hovers


on the horizon, companies are revisiting their encryption standards,
increasing the complexity of encryption algorithms and working to see
how GenAI can assist.
Capital One
Data Breach
• July 2019 - An Amazon engineer downloaded data from
more than 100 million Capital One users, including 120,000
Social Security numbers and about 77,000 bank account
numbers.
• U.S. Attorney Nick Brown said Thompson “did more than
$250 million in damage to companies and individuals.”
• Convicted in June 2022 on seven hacking-related charges,
Seattle resident Paige Thompson was sentenced to time
served and five years of probation for violating an anti-
hacking law known as the Computer Fraud and Abuse Act.
Office 365
Data Breach

On July 11, Microsoft disclosed that Chinese hackers had


leveraged an exploit in their cloud systems to spy on the
emails of U.S. government officials. They have since disclosed
more details, and in September, a Senate staffer stated that
over 60,000 emails had been exposed in the breach.
Cloud Breeches
Continue to Increase

• 39% of businesses experienced a data breach in their


cloud environment last year, an increase of 4 points
from the previous year (35%)

• More sensitive data moving to the cloud with 75%


of businesses saying more than 40% of data stored
in the cloud is sensitive, up 26% from last year

• Despite dramatic increase in sensitive data stored in


the cloud, on average only 45% of this sensitive data
is encrypted

Thales Group – 2023 Cloud Security Study


How quickly can a compromise happen….
Cloud Security Concerns
Among the biggest concerns raised by firms about their cloud environments are:

• Risk of data loss/leakage (64%)


• Privacy/confidentiality issues (62%)

• Accidental exposure of credentials (46%)


• Legal/regulatory compliance (44%)
Cloud Security Challenges
• Account Hijacking

• A highly dynamic environments


• Lack of cloud security architecture and strategy

• System vulnerabilities
• Unsecured APIs

• Misconfiguration and inadequate change control

• Cloud Forensics (dynamic environments, volatile data and limitation of available tools)
• Lack of Encryption and Key Control Causes Cloud Data Concerns

• Multi-cloud Causing Operational Complexity


Hybrid
Cloud
Security Definition:
• Running applications in datacenters and using public cloud Services
• Amazon Web Services, Microsoft Azure, Google Cloud Services, IBM Cloud
Industry Activities:
• All cloud providers are pushing their platform into the datacenter, so understanding
yoru cloud security posture is critical to protecting the data center
• AWS Edge (Outposts), Azure Arc
Issues/Challenges:
• As businesses move to cloud services, they are following a strategy of integrating their
user directories into the Cloud directories – this can cause elevated privileges in the
cloud and if the cloud service is compromised, bad actors can gain access to critical
datacenter systems.
Cloud • Adversaries are sharpening their use of cloud TTPs.
2023 • A number of adversary groups (a lot of state sponsored ones), are more sophisticated and
determined in targeting the cloud.
Findings
• Nation-state and criminal adversaries are using cloud infrastructure to\host phishing lure
documents and malware. Adept threat actors implement command-and-control (C2)
channels on top o f existing cloud services.
• In 28% of incidents found during the observation window, adversaries manually deleted a
cloud instance to remove evidence and evade detection.

• Identity is the key cloud access point.


• Adversaries are ramping up their use of valid accounts, which were used to gain initial
access in 43% of cloud intrusions over the last year.
• Nearly half (47%) of critical misconfigurations in the cloud are related to poor identity and
entitlement practices.
• In 67% of cloud security incidents, identity and access management (IAM) roles with
elevated privileges beyond what was required — indicating an adversary may have
subverted the role to compromise the environment and move laterally.*

• Human error drives cloud risk.


• 60% of containers observed lacked properly configured security protections.
• 36% of cloud environments had insecure cloud service provider default settings.
Cloud Security Myths
1. The Cloud is Unsafe
• The cloud isn't inherently unsafe, when used properly it is no less safe than a traditional datacenter

2. My Cloud Provider Will Keep Me Secure


• The tenant is responsible for security of their cloud services and applications
• A simple misconfiguration could lead to data or credential access or breach

3. The Cloud Is Just Someone Else’s Computer


• Traditional data forensics is a challenge for applications running on the cloud
• Make sure you have configured all appropriate logging for forensic analysis

4. Advanced Adversaries Aren’t Attacking The Cloud


• The more data that is on the cloud, the more attackers will be trying to break in and access the data
Software Supply Chain Attacks
What is an example of a software supply chain attack?

• Solar Winds, a major U.S. IT firm, fell victim to a supply chain recently. Weak information security practices by a former intern exposed a critical internal password (solarwinds123).

• Once the password was compromised, suspected Russian hackers were able to access a system that SolarWinds used to assemble updates to Orion, one of its flagship products.

• From here, the attackers inserted malicious code into an otherwise legitimate software update, allowing them to monitor and identify running processes that were involved in the
compilation of Orion, and replace source files to include SUNBURST malware.

• Orion updates were deployed to an estimated 18,000 customers, and SUNBURST sent information back to the attackers that was used to identify targets of additional malware,
broadened access, and spying.

• The fact that the intended targets and victims of the attack were several degrees of separation away from the entry point, makes this a popular example of a modern software
supply chain attack.

Repercussions:

• The SEC recently announced fraud charges against SolarWinds Corporation and its CISO. The SEC’s complaint alleges that SolarWinds misled investors by overstating its cybersecurity
practices and understating known cybersecurity risks. The SEC is charging SolarWinds with violating various provisions of the Securities Act of 1933 and the Securities Exchange Act of
1934, and the CISO with aiding and abetting the company’s violations. The SEC’s complaint seeks civil penalties, permanent injunctive relief, disgorgement, and a ban on the CISO
serving as a director or officer of a publicly traded company.
Software Supply Chain Attack Protection
How can you reduce supply chain security risks?

• Assess the security and trustworthiness of the code that you consume
• Ensure developers are keeping writing secure proprietary code

• Securely build and deploy code


• Harden data transfer methods used by applications

• Continuously test and monitor deployed applications for threats

• Provide consumers with a Software Bill of Materials (SBOM)


• Especially open-source components

• Also – companies are deploying “Due Diligence Questionnaires” (DDQ) asking for specific information on suppliers.
Hypervisors are the new attack vector
• Hypervisor attacks have increased in the last 12 months.
• Belief that hypervisors could never be breached
• Lack of availability of similar security tooling as used
on the VMs
• Use of environments by bad actors led to an
understanding of the vulnerabilities
• Living off the land
Cloud Consoles (are not secure)
• Protection recommendations
• Require Strong passwords
• Require 2 Factor Authentication
• Use IAM (vs. Admin/root users) and 'least privileges' strategy
• Grant specific access to specific resources and services
• Don't share passwords
• Use Local IDs and passwords
Common Cloud Attack Vectors
Bad Practices

Exploitable internet facing workloads


• Misconfigured firewall and/or security groups and policies
• Lack of vulnerability and patch management
• Unused or over-provisioned permissions
• Zero Days

Identity and Access Management (IAM) with bad hygiene


• Not using MFA
• Weak password policies
• Over permissible IAM policies
• Unused access keys
• Hypervisor federation
Common Cloud Attack Vectors
Bad Practices

Unauthenticated public access to data store


• Public vs Private access to a datastore
• Lack of encryption both at rest and in transit
• Unauthenticated user access

Cleartext cloud credentials or SSH keys stored on instances


• Untracked or unmanned SSH and access keys
• Password enabled SSH instances
Common Cloud Attack Vectors
Bad Practices

3rd party / cross account access


• Overly permissive roles and security policies
• Lack of periodic review of third-party accounts and keys

Distributed Denial of Service (DDoS)


• Lack of IP based filtering and/or rate limiting
• Not leveraging technologies such as Content Delivery Network (CDN), Auto
Scaling etc.
• Incorrect instance and network sizing
The Promise of Gen-AI
• Positive • Negative
• Ability to augment the lack of security • Ability for people with no security skills to write
workforce security exploits – especially insiders
• Early threat detection • Creation of never before seen zero days
• Automated incident response • Lack of understanding in the use of Gen-AI to
• A potential level playing field with the bad prevent data breaches
actors
Careers in Cybersecurity
Careers Skills

• Cyber Security Consultant • System configuration

• Security Analyst • Cloud architecture

• Security Architect • Virtualization

• Security Engineer • Identity and access management

• Risk Analyst • Encryption

• Networking
Cloud Security Certifications
Entry

• Comptia Cloud+

• Certificate of Cloud Security Knowledge (CCSK)

Experienced

• AWS/Azure/GCP Security Engineer certifications

• Certified Cloud Security Professional (CCSP)

• Certified Cloud Security Engineer (C|CSE)

Expert

• GIAC Cloud Security Automation (GCSA)

• GIAC Cloud Penetration Tester (GCPN)


Cloud Specific Security Certifications
• AWS Certified Security

• Google Professional Cloud Security Engineer


• Microsoft Certified Azure Security Engineer Associate

Top cloud services to get certified in

AWS ranks as the largest cloud infrastructure provider, followed by Microsoft Azure, and Google Cloud.
Summary
• The cloud is becoming a more popular attack
target for bad actors
• It is not inherently secure, but can be made
secure with diligence and tools
• Understanding attack vectors and patterns are
critical to writing secure software products and
services
• There are many lucrative career opportunities
in the security domain including certifications
Questions?
Hybrid Cloud Management

Dave Lindquist
Red Hat VP Engineering, Hybrid Cloud Management
1 IBM Fellow
Red Hat
Enterprise open source solutions,
● using a community-powered approach to deliver high-performing Linux, cloud, container, and Kubernetes
technologies.

Our development model begins in the open source community


● operating transparently and responsibly, we continue to be a catalyst in open source communities

We deliver ‘hardened’, open source solutions


● making it easier for enterprises to work across platforms and environments, from the core datacenter to clouds
and the network edge.
Open source

Product development model

More than
1,000,000
projects

Participate Integrate Stabilize


We create and participate in We integrate upstream We commercialize these
community-powered projects, fostering open, platforms together with a rich
upstream projects. community platforms. ecosystem of services and
certifications.

3
F24576-200910
Open source

From community to enterprise

Fedora OKD

Ceph Open Cluster Mgmt


More than
1,000,000 Advanced Cluster
projects Management

Foreman Ansible

4
F24576-200910
Red Hat Open Hybrid Cloud

CLOUD-NATIVE
MODERNIZE APPLICATIONS AI APPLICATIONS EDGE APPLICATIONS
APPLICATIONS

Platform Engineering Services

Cloud-native Application Platform

Managed | Automated | Secure Infrastructure

Physical Virtual Private cloud Public cloud Edge


Red Hat open hybrid cloud platform with ISV ecosystem

Multicluster management Cluster security Global registry Cluster data management

Manage workloads Build cloud-native apps Developer productivity Data-driven insights


Platform services Application services Developer services Data services

Kubernetes cluster services Kubernetes (orchestration)

Kubernetes (orchestration)

Container host operating


system

Infrastructure

6
Hybrid Cloud Platform Management Requirements

Management Security Integration /


Lifecycle Inventory Configuration (governance, risk, Automation Continuity Observability Cost Ecosystem
Capabilities compliance

Core set of platform management capabilities per SRE, Operations, IT teams:


● Lifecycle - deploy, update, upgrade
● Resource Inventory - across hybrid environment
● Configuration management - policy based
● Platform Security - focus on compliance and risk
● Automation - automating tasks within and orachtrating each of the domains of management
● Continuity - backup/recovery, DR
● Observability - monitoring, logging, events, analytics
● Cost - understanding investments, usage and costs
Customers Implementing Unified Management Control Plane

Unified management solutions across on-prem and public cloud resources are already
implemented or planned by more than half (64%) of organizations worldwide in order to solve
this complexity and provide consistency

To what extent is your organization implementing or planning to implement a unified management solution to
standardize infrastructure operations across on-premises resources and public cloud services?

N=876 Weighted by Country IT Spend


Source: Future of Digital Infrastructure Worldwide Sentiment Survey, IDC June 2023
Thinking of Hybrid Cloud Management as a System

Shared Workflows Across Services

Management Platform Lifecycle Service


Services Automation Security Svc. Application Observability Cost
Edge Device (governance, risk &
horizontal Host Mgmt.
Mgmt.
Cluster Mgmt. Service compliance) Lifecycle Service Service Service
capabilities
Fabric / Control Plane

IAM
Common identity

Inventory / Data
Common inventory of resources

AI /Analytics
AIOps/Insights/analytics/ML

Policy
AIOps/Insights/analytics/ML
Hybrid Cloud Platform Management - future state

Management Security Integration /


Lifecycle Inventory Configuration (governance, risk, Automation Continuity Observability Cost Ecosystem
Capabilities compliance

Orchestrated and Consistent Management Topology with a Unified Management Fabric

Results, Outcomes:
● An ability to manage Hybrid Cloud & Fleet vs list(s) of platforms
● Reduce costs (SRE toil) through automated lifecycle operations, reduce security exposure and risk
● Policy based management, abstracting operational complexity
● Improving productivity, proactive remediation by providing automation and analytics (EDA, AIOps)
● Enhanced risk and compliance management [via causality analysis with change event controls]
Advanced Cluster
Management
Red Hat
Red Hat Advanced Cluster Management for Kubernetes

11
Red Hat Advanced Cluster Management
for Kubernetes

Multicluster lifecycle
management

Policy driven governance,


risk, and compliance

Advanced application
lifecycle management

Multicluster Observability
12
and Search for health and
optimization

Multicluster networking for


interconnecting apps
Unified Multi Cluster Management
Single Management for all your Kubernetes Clusters

• Centrally create, update and delete


Kubernetes clusters across multiple
private and public clouds

• Configure Cluster Pools for


simplified OCP cluster management

• Search, find and modify any


kubernetes resource across the
entire domain

• Quickly troubleshoot and resolve


issues across your federated
domain

13
Policy based Governance, Risk, and Compliance
Don’t wait for your security team to tap you on the shoulder

• Centrally set & enforce policies


for security, applications, &
infrastructure

• Quickly visualize detailed


auditing on configuration of apps
and clusters

• Perform remediation actions by


leveraging Ansible Automation
Platform integration.

• Built-in compliance policies and


audit checks, including GitOps
integration.

• Immediate visibility into your


compliance posture based on your
defined standards

14
Advanced Application Lifecycle Management
Simplify your Application Lifecycle

• Deploy applications from multiple


Sources (Git/Helm/Object Storage)

• Integrate with OpenShift GitOps


(Argo CD).

• Automatically detect and visualize


Argo CD Applications in RHACM

• Quickly visualize application


relationships across clusters and
those that span clusters

15
Multicluster Observability
Overview

● Global Query view with Grafana for OCP Clusters


○ Out of the Box multi cluster health monitoring

dashboards

● Centralize Alerts and notifications on the RHACM


Hub. Forward to 3rd Party Systems (PagerDuty /
Slack)

● Long Term Data Retention


○ Observe Metric trends

○ Set Alert Patterns

○ Supported Object Storage

■ AWS S3 (and compatible)


■ Ceph for on-premise
■ Google Cloud Storage
■ Azure Storage

16
Ansible
Automation across the hybrid cloud ecosystem

Infrastructure Edge Cloud native


140+
Certified
Content
Collections

55+
Certified
Technology Network Private cloud
partners
Routers Switches

100+ IPAM LBs & SDN


Systems
integrators
+ Resellers Red Hat
Ansible Automation Platform
Security Public cloud
1000+ PAM SIEM/SOAR ITSM
Active open
source
contributors IDS/IPS Firewalls
Consolidation and Integration
Red Hat Ansible Automation Platform

Private automation hub. Collaborate, sign, and publish.

Developer IDE

Custom
enterprise content

Automation Hub
console.redhat.com

Private
Ansible Galaxy Automation Hub

Build Collaborate, sign, and publish Trust

19
Execution layers of Event Driven Automation
Event- Driven Ansible
Optional

Event Message Service Routing


Automation
Source Layer Layer Layer

Metrics driven value Ansible Playbooks

Management interface Ansible Modules (ie.


SNOW ticket update, or
API call)
+Datadog and others

Sources Rules Actions

20
The Ansible Lightspeed experience
Enhancing Playbook creation

2
3 5

1. Ansible Lightspeed with IBM Watson 4 6


Code Assistant is accessible via VSCode
extension 5. If “accepted,” playbook is automatically
3. Ansible Lightspeed will make a code populated and user can move on to the
recommendation for the developer to next task
2. Type in a task directly into the VSCode consider
editor. Ansible Lightspeed takes over. 6. User prompted to provide feedback;
4. User has option to Accept, Ignore, or this is important for helping to train the
Modify recommended code snippet model.
Thank You

22
Cloud Computing
Practical Topics – Platforms, Applications
and Best Practices

Stuttgart University
WS 2023/24
07-11-2023

Dr. Kristof Kloeckner Gerd Breiter


GM and CTO, IBM GTS (retired) DE, IBM (retired)
kristof.kloeckner@iaas.uni- gbreiter58@gmail.com
stuttgart.de
1
Introduction

2
About Me – Kristof Kloeckner
• https://www.linkedin.com/in/kristofkloeckner/
• With IBM from 1984 to 2017
• Advising clients on technology and strategy in hybrid
cloud contexts
• First CTO of IBM WebSphere Application Server
Platform
• First CTO of IBM Cloud
• Responsible for DevOps Tools (Rational Software)
• Responsible for AI and Automation in Technology
Services
• Honorarprofessor, University of Stuttgart, since 1997 3
About Me – Gerd Breiter
• https://www.linkedin.com/in/gerd-breiter-82467b2/
• With IBM from 1982 to 2018
• Working on Intelligent Infrastructure / On Demand
Computing / Cloud from 2000 - 2014
• Started and led work on TOSCA in close cooperation
with Frank Leymann and Stuttgart University
• Co-Lead for Definition of IBM Cloud Computing
Reference Architecture
• Chief Architect IBM Operations Management from
2014 – 2018
4
Other Contributors
• Rhonda Childress, Kyndryl Fellow
• Dave Lindquist, IBM Fellow
• Wolfram Richter, Manager of Chief Architects Germany, Red
Hat
• Simon Moser, DE, Lead Architect Serverless & Cloud, IBM
• Kristian Stewart, former DE, IBM
• Isabell Sippli, DE, IBM
• Amardeep Kalsi, STSM, IBM
• Albrecht Stäbler, CEO dibuco & Team

5
Logistics
• Weekly Webex session on Thursday at 15:45 Central European
Time
– https://unistuttgart.webex.com/meet/kristof.kloeckner
• Charts and Recordings will be posted on ILIAS
• External participants can access the course material through the
Technology Partnership Lab (TPL)
– https://tpl.informatik.uni-stuttgart.de/2023/10/18/practical-cloud-topics-
platforms-services-and-best-practices-ws2023-24/
• Office hour on request
• Please fill out the survey of prior knowledge on ILIAS!
• Ungraded exercises supported by an AWS Academy Learner Lab
• Exercise discussion every Tuesday at 17:30 (same Webex)
• Credits through completion of a case study and a short test/quiz
6
Basic Tenets of this Course
• Digital disruption makes cloud adoption a necessity for enterprises
• It is no longer a question of whether cloud should be adopted, but
– how,
– for what applications and data,
– and with which controls
• Cloud is as much about business models as it is about technology
• There is not ONE cloud - hybrid (multi-)clouds are central to enterprise requirements
– Containers are increasingly used as key enablers of hybrid clouds
• For cloud providers, winning developers is key. This drives the growth of platforms.
– Most platforms support multiple run times, programming and service models
– Cloud native models are growing in importance (microservices, serverless)
– Platforms are moving up the value stack, specialized platforms are emerging, e.g. for
Internet of Things, Machine Learning, industries
• Cloud enables and requires changes in development and operational models
• To understand cloud, it is important to understand how large distributed systems work,
and to understand service oriented architectures 7
Agenda
• Overview, Market and Technology Trends
• Cloud Native Architectures and Best Practices Frameworks
• Cloud Security
• Containers and Container Orchestration
• Serverless Computing and Serverless Stacks
• Commercial Clouds: AWS, Google, Azure
• Multi-Cloud and Hybrid Cloud Management
• Transforming Applications for the Cloud
• DevOps and Site Reliability Engineering
• AI in the Cloud
• Big Data in the Cloud
• IoT and Edge Computing
8
Agenda Details 1
• 10/19 Intro and Cloud Basics (Kristof Kloeckner)
– Core Concepts (Recap)
– A walk through a commercial cloud (AWS)
• 10/26 Overview of Cloud Native Architectures and Best Practices
Frameworks (Kristof Kloeckner)
– Some basic principles: Loose coupling, eventual consistency
– Architecture Styles
– AWS, MS & Google Architecture Guidance
– Well-architected Framework
• 11/2 Best Practices Frameworks Part 2 (Kristof Kloeckner)
• 11/9 Hybrid Cloud Security (Rhonda Childress, Kyndryl)
• 11/16 Containers & Container Orchestration (Wolfram Richter, Red Hat)
– Kubernetes, OpenShift, Operator Concept, Service Mesh, Security, Use cases)
• 11/23 Serverless 1 – IBM Code Engine (Simon Moser, IBM)
– Potentially OpenShift Serverless
• 11/30 Serverless 2 – AWS Lambda and Serverless Stack (Kristof Kloeckner)
– Examples of modern serverless applications 9
Agenda Details 2
• 12/7 Multi-Cloud and Commercial Clouds (KK)
• 12/14 Hybrid Cloud Management (Dave Lindquist)
• 12/21 Event Management (Kristian Stewart)
• No lecture week of 12/26
• No lecture week of 1/2
• 1/11 Transforming Applications for the Cloud (Isabell Sippli, IBM)
– Migration, Replatforming, Refactoring
– (KK) Amazon 6R and other portfolio transformation approaches
• 1/18 DevOps and Site Reliability Engineering (Amardeep Kalsi, IBM)
• 1/25 Big Data in the Cloud (Albrecht Staebler and team, dibuco) tbd
• 2/1 AI in the Cloud (KK)
• 2/8 IoT & Edge (Gerd Breiter) & Wrap-Up (KK)
– Home Automation Use Case

10
Agenda Details 3

Cloud Ecosystem Day


One day in late February/early March with senior technologists about cloud-
related topics from the automotive and mobility industry. Speakers to be
announced in early 2023

11
References
• The Developer’s Guide to Microsoft Azure, 2nd Edition 2022
• Brendan Burns, Designing Distributed Systems, O’Reilly 2018
• David Clinton. Learn Amazon Web Services in a Month of Lunches. Manning
Publications, August 2017
• Ian Foulds, Learn Azure in a Month of Lunches, Manning Publications 2018
• Scott Galloway, The Four. The Hidden DNA of Amazon, Apple, Facebook and
Google, Penguin 2017
• Cloud Application Architecture Guide, Microsoft 2022
• Cornelia Davis, Cloud Native Patterns, Manning 2019
• Implementing Microservices on AWS, Amazon, November 2021
• Fehling, Leymann et. al., Cloud Computing Patterns, Springer 2014
• Andrew Tanenbaum, Maarten van Steen, Distributed Systems, Prentice Hall
2002
• Mary and Tom Poppendieck, Lean Software Development, An Agile Toolkit,
Addison Wesley, 2003
• Betsy Beyer et al., Site Reliability Engineering. How Google runs production
systems, O’Reilly 2016 12
References (AWS)
• David Clinton. Learn Amazon Web Services in a Month of Lunches.
Manning Publications, August 2017
https://www.manning.com/books/learn-amazon-web-services-in-a-month-of-
lunches
• AWS Tutorials
https://aws.amazon.com/getting-started/tutorials/
• AWS Developer Guides
• AWS Well Architected Framework
• AWS Getting Started Guided Projects, e.g.
https://aws.amazon.com/getting-started/use-cases/websites/?csl_l2b_ws
• https://www.slideshare.net/AmazonWebServices/awsome-day-2019-
detroit
• AWS Samples on GitHub https://github.com/aws-samples
• AWS SDKs https://aws.amazon.com/tools/#sdk
• Blogs like https://aws.amazon.com/blogs/aws/
• Builder Hub https://devops.com/builder-community-hub
13
References (AWS Certification)
• Amazon Web Services Overview, July 2019
– https://d0.awsstatic.com/whitepapers/aws-overview.pdf
• Architecting for the Cloud. AWS Best Practices, October 2018
– https://d1.awsstatic.com/whitepapers/AWS_Cloud_Best_Practices.p
df
• How AWS pricing works, June 2018
– https://d0.awsstatic.com/whitepapers/aws_pricing_overview.pdf
• Total Cost of (Non) Ownership of Web Applications in the
Cloud, August 2012
– https://media.amazonwebservices.com/AWS_TCO_Web_Application
s.pdf
• Compare AWS Support Plans
– https://aws.amazon.com/premiumsupport/plans/
14
Overview and Cloud Principles

15
We live in times of digital disruption

• Increasingly, services are delivered digitally


• Clients are expecting instant, personalized fulfillment (self-
service)
• New business models have emerged, delivering services
without owning physical assets and often establishing digital
platforms and achieving flywheel effects
• Technology is critical for business success, but becoming more
complex, harder to manage and more vulnerable
• Business cycles are accelerating
• Data is becoming the ‘new oil’ and AI the ‘new electricity’
• Interconnection is reshaping how people, organizations and
things interact and do business
16
Digital disruption is transforming enterprises and industries,
changing business and service consumption models

Business Model Engagement Applications Highly


Innovation Efficient Infrastructure

New businesses are composed by New apps are consolidating Business imperatives increase
leveraging digital services from decision making capabilities demand on IT resources and drive
a broad ecosystem at the fingertips of people a focus on maximum efficiency
who need to act and agility

17
Each of these business needs is distinguished by a set of
attributes

Business Model
Engagement Efficiency
Innovation
§ Enable new business § Data personalization § Increase operational
models (at scale) efficiency
§ Time sensitivity /
§ Leverage network effects Context sensitivity § Shift Capital Expense to
Operational Expense
§ Achieve Internet scale § Consumability of
(globalization) services § Consume services rather
than own assets
§ Increase speed of § Improve customer
innovation service § Improve IT agility

§ Shorten time to market § Actionable insights

Client example Client example Client example

64% of companies Sensor and social data 82% of organizations see an


believe they need to integrate with backend as a service model as critical
establish digital business systems to personalize for business success
models (IDC 2020) services (Deloitte 2021)

18
Specific technology adoption patterns align with each of the
three classes of business needs

Business Model Engagement Efficiency


Innovation
§ Integrate data from § Employ a “mobile first” § CIO becomes “service
devices to improve approach to new broker” to maintain
business processes development visibility and control over
IT use and costs
§ Infuse analytics and data § Integrate Social Media
services into business feeds § Replace existing
processes enterprise assets with
§ Leverage analytics in SaaS applications
§ Adopt commercial SaaS new systems
systems for non- § Move to DevOps model
differentiated workloads § Integrate SaaS for seamless delivery
marketing analytics with
§ Connect to partners transactional systems § Shift to hybrid cloud

§ Compose business § Ensure workload


services using APIs portability

§ Employ a Common
Security Model

19
Systems of Engagement and Internet of Things combine with Systems
of Record to enable new types of services
Speed, Convenience
Consistency
Systems of Record Systems of Engagement

• Data & Transactions • Mobile


• App Infrastructure • Social Networking
• Virtualized Resources
Next • Big Data and Analytics
Generation
Architectures
CRM ERP Insight
Workflows across Systems
Back Office Systems of Discovery
• Sensors Analytics and
• Embedded Machine Learning
intelligence Signal from noise
• Connected devices

Internet of Things Scale, Volume

20
These new service require new infrastructure, architectures,
processes and tools
Hybrid Clouds
Systems of Record Containers Systems of Engagement

• Data & Transactions • Mobile


• App Infrastructure • Social Networking
• Virtualized Resources
Next • Big Data and Analytics
Generation
Architectures
CRM ERP Insight
Cloud Native, Serverless, APIs

Systems of Discovery
• Sensors Analytics and
• Embedded Machine Learning
intelligence Signal from noise
• Connected devices
New Processes
New Tools
Internet of Things
Edge Topologies 21
The challenges are not new:
In the past, industries have responded to similar challenges
through virtualization, automation and standardization
Telcos automate traffic through
switches to assure service and
lower cost.

Manufacturers use robotics


to improve quality and
lower cost.

Banks use automated teller


machines to improve service
and lower cost.

… breakthroughs like these are enabled by service


management systems.
22
The Essentials of Cloud Computing
(IBM Cloud Team, ca. 2010)

“Cloud” is a service consumption and delivery model inspired by consumer Internet


services.

Enabled by Virtualization, (Service) Automation, Standardization

“Cloud” enables: Cloud Services


§ Self-service
§ Sourcing options Cloud Computing
§ Economies-of-scale Model

“Cloud” represents the Industrialization of Delivery for IT supported Services

Multiple Types of Clouds will continue to co-exist:


§ Private, Public and Hybrid
§ Cloud Native Programming Models
§ Industry or Domain Specific Platforms
A Definition of Cloud Computing
National Institute of Standards 2011

Cloud computing is a model for enabling convenient, on-demand


network access to a shared pool of configurable computing
resources (e.g., networks, servers, storage, applications, and
services) that can be rapidly provisioned and released with
minimal management effort or service provider interaction. This
cloud model promotes availability and is composed of five
essential characteristics, three service models, and four
deployment models.

http://csrc.nist.gov/groups/SNS/cloud-computing/

This definition is old, but still useful.


24
A Definition of Cloud Computing
. NIST 2011
Essential Characteristics:

• On-demand self-service
– A consumer can unilaterally provision computing capabilities, such as server time and network storage, as
needed automatically without requiring human interaction with each service’s provider.
• Broad network access.
– Capabilities are available over the network and accessed through standard mechanisms that promote use by
heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, and PDAs).
• Resource pooling.
– The provider’s computing resources are pooled to serve multiple consumers using a multi-tenant model,
with different physical and virtual resources dynamically assigned and reassigned according to consumer
demand. There is a sense of location independence in that the customer generally has no control or
knowledge over the exact location of the provided resources but may be able to specify location at a higher
level of abstraction (e.g., country, state, or datacenter). Examples of resources include storage, processing,
memory, network bandwidth, and virtual machines.
• Rapid elasticity.
– Capabilities can be rapidly and elastically provisioned, in some cases automatically, to quickly scale out and
rapidly released to quickly scale in. To the consumer, the capabilities available for provisioning often appear
to be unlimited and can be purchased in any quantity at any time.
• Measured Service.
– Cloud systems automatically control and optimize resource use by leveraging a metering capability at some
level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user
accounts). Resource usage can be monitored, controlled, and reported providing transparency for both the
provider and consumer of the utilized service.

25
The Concept of Virtualization
Users of Resources

Virtual Resources – substitutes for real resources


• Same interfaces and functions as their real counterparts
• Less constrained by physical limitations; may differ in attributes or numbers
• Often of part of the underlying resource, but may span multiple resources
Virtualization – a substitution process
• Creates virtual resources from real resources
• Primarily accomplished with software and/or firmware
Real Resources
• Things with architected interfaces/functions
• Often physical. May be centralized or distributed.
• Examples: memory, disk drives, networks, servers

• Virtualization separates the presentation of resources to users from the actual resources
• Virtual resources have the same interfaces and functions as the real resources they
replace, but can have different attributes/numbers that make this substitution worthwhile
26
Virtual Machine Monitors
(Hypervisors)

Source: Intel
27
Types of Hypervisors
Type 1 Hypervisor (bare metal or native)
– Runs directly on Hardware
– Examples: z/VM, Hyper-V, XEN, VMware ESXI
– KVM (Kernel Virtual Machine) runs inside Linux and
converts the OS into a type 1 hypervisor
– Amazon is using XEN, but has developed a hypervisor
based on KVM for next-generation EC2 (Nitro)
• Type 2 Hypervisor (hosted)
– Runs on top of an operating system
– Examples: QEMU and WINE, VirtualBox
– Less performance
28
Server Utilization Patterns Benefiting from Cloud
Scale and Resource Sharing through Virtualization

29
Cloud Pricing Principles
• Modeled after utilities
• Pay as you go – only pay for what you consume
• Reserved capacity
– Significant discounts based on upfront commitment
– Good for baseline capacity
• Spot instances
– Bid on available excess capacity
– Good for workloads like HPC or Machine Learning Training
• Some consumption-based pricing is tiered – you pay less
when you consume more
• Free tier for new customers, plus free tiers in some services to
encourage trials
• Free data transfer into the cloud
30
A Definition of Cloud Computing
NIST 2011
Service Models:
• Cloud Software as a Service (SaaS)
– The capability provided to the consumer is to use the provider’s applications running on a cloud
infrastructure. The applications are accessible from various client devices through a thin client interface such
as a web browser (e.g., web-based email). The consumer does not manage or control the underlying cloud
infrastructure including network, servers, operating systems, storage, or even individual application
capabilities, with the possible exception of limited user-specific application configuration settings.

• Cloud Platform as a Service (PaaS)


– The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or
acquired applications created using programming languages and tools supported by the provider. The
consumer does not manage or control the underlying cloud infrastructure including network, servers,
operating systems, or storage, but has control over the deployed applications and possibly application
hosting environment configurations.

• Cloud Infrastructure as a Service (IaaS)


– The capability provided to the consumer is to provision processing, storage, networks, and other
fundamental computing resources where the consumer is able to deploy and run arbitrary software, which
can include operating systems and applications. The consumer does not manage or control the underlying
cloud infrastructure but has control over operating systems, storage, deployed applications, and possibly
limited control of select networking components (e.g., host firewalls).
32
33
XaaS

34
A Definition of Cloud Computing
NIST 2011
Deployment Models:

• Private cloud
– The cloud infrastructure is operated solely for an organization. It may be managed by the organization or a
third party and may exist on premise or off premise.
• Community cloud
– The cloud infrastructure is shared by several organizations and supports a specific community that has
shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be
managed by the organizations or a third party and may exist on premise or off premise.
• Public cloud
– The cloud infrastructure is made available to the general public or a large industry group and is owned by
an organization selling cloud services.
• Hybrid cloud.
– The cloud infrastructure is a composition of two or more clouds (private, community, or public) that
remain unique entities but are bound together by standardized or proprietary technology that enables
data and application portability (e.g., cloud bursting for load-balancing between clouds).

Note: Cloud software takes full advantage of the cloud paradigm by being service oriented
with a focus on statelessness, low coupling, modularity, and semantic interoperability.
35
Expected benefits of cloud are increasingly shifting
from cost savings to increased business agility
• For public cloud, no capital expense (capex), no upfront commitment
• More effective sharing of resources
– No need to own resources for peak consumption – elastic pool of resources
– Variety of payment options, including ’pay by the drink’
• Reduced operational expense (opex) and risk through standardization and
automation
– Standardize first, then automate
• Faster and more convenient access to resources
– Instant access through self-service and automated provisioning
– No need to build up physical capacity
– Scale globally through cloud provider infrastructure
• Access to an ecosystem of services (platform effect)
– Technology providers are adopting a ‘cloud first’ approach
• Cloud is serving as a ‘crucible’ to try out and integrate new technologies
36
Prevalent Cloud Use Cases
• Application Development Lifecycle
– Build or train on cloud, deploy everywhere
• New applications (cost, agility) – Cloud First
• Hybrid pattern – supplementing existing workloads
– Analytics, Disaster Recovery, Data Warehousing
• Hybrid pattern - connecting systems of engagement
and systems of records
• Migration/consolidation of existing applications
• Data Center migration
• All-in: IT in the Cloud (born on the cloud, move to
cloud) 37
In the Cloud, it’s all about services
• Everything is a service, everything is defined by software
• Software-defined everything, infrastructure as code
• Microservices and Service Meshes
• Bringing service builders and service consumers together is key
• Marketplaces and catalogs
• Consumption models (billing, licensing)
• Brokers
• Exposing and monetizing service interfaces (API ‘economy’)
• Orchestrating services
• What’s the right granularity
• Separation of concerns
• What’s the lifecycle of a service
• Sharing services
• Service abstractions and virtualization
• Encapsulation (containers) 38
Rapid cloud growth continues, driven by digital
transformation
IDC July 2023
• Spending on public cloud services and infrastructure grew to $545.8B in 2022, a growth
of 22%. Foundational Services for ‘Digital First’ grew by 288.%.
• In 1Q23, infrastructure spending on non-cloud dropped 0.9%, shared cloud was up 22.5%,
dedicated cloud down 1.5% (IDC)
• In 2Q 2020, spending on public cloud infrastructure exceeded that on traditional infra
• The pandemic has accelerated digital transformation and the shift to cloud
• (Application) Software as a Service is the largest cloud market ($246B, 18% growth),
followed by Infrastructure as a service, Platform as a Service (32% growth), and SaaS
Infrastructure as a Service
• ‘This highlights the increasing reliance of enterprises on a cloud innovation platform built
around widely deployed compute services, data/AI services, and app framework services
to drive innovation.’

Public Cloud Market Shares (IaaS) (2021, Gartner)


• Amazon (39%) leads before Microsoft (21%), then Google, Alibaba, Huawei
• Microsoft growing faster than Amazon
39
• Requirements are shifting to enterprise quality of service (Gartner)
40
Enterprises are moving to the cloud, but the
transition is messy
• Only 20% of enterprise workloads have moved to the cloud (IBM 2019)
• 83% percent of the respondents in the 2022 ‘Nutanix Enterprise Cloud Index’
rank hybrid cloud as the ideal IT operating environment
– 91% moved one or more apps to a different IT environment within the last 12 months
– 64% expect to operate in a multi-cloud environment in the next 1 – 3 years (36% do this today)
– Complexity considered biggest problem (87% say simpler management needed)
– Application modernization is the top reason for multiple clouds (53%)
• An average enterprise has 6 different clouds and 1000+ cloud apps (IBM 2019)
• Top multicloud challenges (Nutanix)
– Security 49%
– Data Integration 49%
– Managing cost 43%
• 71% had a recent security breach
• An average enterprise analytics application has 33 data sources

42
A Short Tour of a Commercial
Cloud (AWS)

43
Source: Awsome Day 2019 Detroit Slideshare
44
Source: Awsome Day Detroit Slideshare, early 2019
45
Amazon’s focus on ‘builders’ and business
• “The broadest and deepest platform for today’s builders” (W. Vogels, CTO, AWS
Summit NYC July 2019)
– ”We provide a toolbox, you pick”
– Using most functions themselves to run and grow their own digital business
• “Enabling digital transformation” (A. Jassy, then AWS CEO, 2020)
• These messages were reinforced in the following years
– Builder Community Hub 2023
• Agility, DevOps pipeline, operational principles
– ‘Well-architected Framework’
• Toolkits, IDEs, Integration with e.g. Visual Studio
• Microservices, containers, serverless across the stack
– “You write the business logic, we do the heavy (infrastructure) lifting”
• Automation
• Security
• AI/Machine Learning (first separate keynote in 2020)
• Significant and growing investment in IoT
46
Some Basic Amazon (IaaS) Services
• EC2 – Elastic Compute Cloud,
– Consists of virtual machines (called EC2 instances) launched from
Amazon Machine Images (AMIs)
• AMI – Amazon Machine Image
– Templates for building EC2 instances
• VPC - Virtual Private Cloud
– Private network that isolates your resources
• EBS – Elastic Block Storage
– Persistent Storage Volumes that can be attached to an EC2 instance
• S3 – Simple Storage Service
– Object Store Service
• These services plus Relational Database Services are
sometimes called Foundational Services by AWS.
49
AWS Learner Lab

50
Cloud Services in the Learner Lab
• Most AWS services are available, sometimes with capacity
restrictions (see documentation)
• Limitation in using IAM, use the preconfigured role (LabRole)
rather than creating new roles
• You can create your own key pairs, but a digital key (vockey) is
also already available in the preconfigured terminal.
• The Learner Lab comes with $100 credits, so stop or
terminate your resources if you are not using them anymore.
Some resources (like EC2 instances) are automatically
restarted when starting a lab.

51
A Short Tour of AWS Services
• The AWS Console
• ‘The biggest toolbox for builders’ - AWS Services
• Some important examples of AWS services
– Virtual Machines (EC2)
– Object Storage (S3)
– NoSQL Database (DynamoDB)
– Simple Notifications (SNS)
– Serverless Functions (Lambda)
• The Market Place
Other Clouds are similar, we will introduce them later!

52
Hosting a Simple Web Application

VPC Virtual Machine

David Clinton. Learn Amazon Web Services in a


Month of Lunches (Kindle Locations 502-506).
Manning Publications. Kindle Edition.

53
Storage and Databases
• Object Storage – S3
– Objects have urls, are stored in ‘buckets’ can be
– Objects can be versioned
– Objects can be arranged in ‘folders’
– Buckets can host static websites
• Block Storage – EBS
– Virtual Disks for Virtual Machines
• Relational Database Services (RDS)
– Managed service with many option
• NoSQL DB (DynamoDB)
– Managed Service
– Key Value Store (Tables)
• Streams - Kinesis
54
Cloud Networking
• A VPC can span availability zones
• A subnet is restricted to a single one
• Some services require 2 subnets
in different zones for enhanced
availability
• A route table determines where
traffic is routed
• Internet gateways and virtual private
gateways
• Security groups control traffic
(firewall rules)

Source: AWS Documentation


See also:
https://aws.amazon.com/vpc/?vpc-blogs.sort-by=item.additionalFields.createdDate&vpc-blogs.sort-order=desc 55
Availability, Scaling, Security

Source: David Clinton, Learn AWS… 56


Integration Services
• Simple Notification Service – SNS
– Topic based publish and subscribe service
– Many services are able to publish or subscribe to a topic
• Simple Queuing Service – SQS
– At-least-once delivery
• Step Functions
– Basic Workflow
• Lambda Functions
– Trigger-based event processing
– Serverless computing (we will talk more about this)
– More than 140 integrations 57
A Modern Application in AWS

Events

Dynamic Content
DynamoDB

Longer-Running
Processes

S3
Static Content

Source: Implementing Microservices in AWS


58
Exercise
• Activate the AWS Academy Learner Lab
• Familiarize yourself with the console
• Launch an EC2 image, start, stop and terminate it
• Build a simple web application
• Steps are explained in the ‘Practice’ charts

59
Other Material

60
61
62
63
64
Cloud Computing
Multi-Cloud, Hybrid Cloud, Commercial
Clouds

Stuttgart University
WS 2023/24
11-30-2023

Dr. Kristof Kloeckner Gerd Breiter


GM and CTO, IBM GTS (retired) DE, IBM (retired)
kristof.kloeckner@iaas.uni- gbreiter58@gmail.com
Stuttgart.de
1
Multi-Cloud

2
A Modern Cloud Stack

Frameworks &
Platform Services Management Integration Blueprints Security
App Services

Container Fabric-Based Event-Driven


Apps Apps Apps

Infrastructure
Services Virtual Servers Storage Network

BSS & Client Support

Physical Public & Dedicated Local


Infrastructure Data Centers
Appliance Customer’s HW

BSS – Business Support Systems

4
Modern Cloud Stack: Domain Services

Domain Services Mobile Data/Analytics AI/ML Internet of Things Video

Frameworks &
Platform Services Management Integration Blueprints Security
App Services

Container Fabric-Based Event-Driven


Apps Apps Apps

Infrastructure
Services Virtual Servers Storage Network

BSS & Support

Public & Dedicated Local


Physical
Infrastructure Data Centers
Appliance Customer’s HW

5
Runtimes & Platform Services (Representative Choices)

DevOps Tools (GitHub, Open Toolchain)


Logging Analytics/Monitoring
(Logmet - ELK Stack) (Open Telemetry)
Image Management Language Runtimes Service Mesh
Control Plane

(Docker Registry) (Python/Node/Java/.Net) (istio)

Messaging 12-Factor Apps Event-Driven Apps


(Kafka) Well-Architected (KNative)

Container Orchestration Container


Security Networking
(Kubernetes)

Automation (Terraform, Ansible)

Infrastructure

6
Hybrid Clouds and Cloud Integration

7
A Hybrid Cloud Landscape
Hybrid Clouds
Systems of Record Containers Systems of Engagement

• Data & Transactions • Mobile


• Enterprise Legacy • Social Networking
• Virtualized Resources Hybrid • Big Data and Analytics
Architectures
CRM ERP Insight
Cloud Native, Serverless

Systems of Discovery
• Sensors Analytics and
• Embedded Machine Learning
intelligence Signal from noise
• Connected devices
New Processes
New Tools
Internet of Things
Edge Topologies 8
Enterprise Cloud Requirements
from Nutanix Enterprise Cloud Index 2023

● 99% moved one or more applications to a different IT


infrastructure in the past 12 months
○ 86% say it’s costly
○ 94% say they could benefit from a single unified management
environment
● 85% view cloud cost management as a top challenge
○ 46% are ‘repatriating’ some workloads to data centers
● 97% have begun to use Kubernetes
● 38% see hybrid multicloud as the dominant environment in 1 –
3 years, 19% hybrid cloud
● 93% expect edge computing to become more important
● Significant need for new talent in the next 24 months
9
Private and Hybrid Cloud Support
• AWS Outposts
– Managed subset of AWS on premise
– https://aws.amazon.com/outposts/features/
• Azure Stack
– Azure Stack Edge (Cloud Managed Appliance)
– Azure Stack HCI (Hyperconverged Infrastructure)
– Azure Stack Hub (Cloud Native Integrated System)
• Azure Arc Control Plane for Hybrid Clouds
• Google Anthos
• IBM Cloud Private, Cloud Satellite & IBM Cloud Paks
10
Major Cloud Platforms
• Microsoft Azure
• Google
• IBM Cloud

11
A very short history of Microsoft Azure
• Initial virtualization technology in 1997, completely rebuilt as Hyper-V in 2008
• Very large web properties since the 90s – Hotmail, Bing
• Azure was launched in 2008 (LA) and initially focused exclusively on PaaS
– Branded as ‘Windows’ to give the connotation of ‘OS for the Cloud’
– “Scared” their developer base (radical change), too far ahead of its time?
– The market was much more comfortable with Amazon’s IaaS focus
• MS added a stateless “VM role” to Azure as a stop-gap
– It is now deprecated
• Major shift in 2012:
– Added full IaaS role support to Azure
– Shifted definition of “Azure” to mean “Microsoft’s Public Cloud”
– PaaS platform naming shifted to “Azure Cloud Services”
• Renamed from Windows Azure to Microsoft Azure; All In on Cloud!
• Now strong support for open source, Linux, Cloud Native standards
– #2 by market share
Azure Physical Infrastructure

Source: Learn Azure in a Month of Lunches..


13
Options for Azure Application Hosting
Microservices Comparable
Deployment to Elastic Beanstalk

Options for Code or Containers


Strong Support for Kubernetes

https://learn.microsoft.com/en-us/azure/guides/developer/azure-developer-guide
Microsoft Azure Home Page

15
Microsoft Azure Quickstart Center

16
Visual Studio and Azure
https://learn.microsoft.com/en-us/azure/azure-functions/create-first-function-vs-code-
python?pivots=python-mode-configuration

17
Azure Platform

• Azure App Service – single runtime supporting


– Web Apps
– Mobile Apps
– API Apps

• Azure Virtual Machines

• Azure Functions (serverless)


– Triggered by HTTP requests, webhooks, web service events, or scheduled
– Choice of programming languages

• Azure Server Fabric


– ‘Distributed systems platform that makes it easy to build, package, deploy, and
manage scalable and reliable microservices’
Azure App Service

Source: Microsoft Comparable to AWS Elastic Beanstalk 19


Function Apps

Or Container

Source: Learn Azure in a Month of Lunches..

Logic apps implement serverless workflows.


20
Docker and Azure

Source: Learn Azure in a Month of Lunches.. 22


Kubernetes and Azure

Source: Learn Azure in a Month of Lunches.. 23


AKS Resources

Source: Microsoft

24
Developing and Deploying Workloads on
AKS
Local

Source: Microsoft
25
Service Bus Queues

Source: Microsoft

26
Azure Events

Event Hub = Streaming Service


Event Grid = Discrete Event Distribution

Source: Microsoft

27
Azure Services
• Microsoft Azure compute & simple/scalable storage
• Azure SQL Database (fka SQL Azure)
– SQL Server as a Service
• Azure Cosmos – multi-model, distributed database service
• Azure DocumentDB – NoSQL database
• AppFabric (Cloud-based services)
– Access Control Service (Azure Active Directory)
– Enterprise Service Bus
– Distributed Object Caching
• Traffic Manager
– Global traffic management/routing (performance)
• Azure Connect (“VPN” between cloud and on-premise services)
• Azure Portals
– Web-based Service Lifecycle Management tools
– SQL Database management
– ReSTful APIs also available (non-Browser-based tools)
• Azure Media Services
• Azure Content Delivery Network (CDN)
• Various Migration Tools
Azure Services….
• Docker Services
– Azure Docker VM Extension turns VMs into Docker hosts
– Azure Container Service & Azure Kubernetes Service
– Docker Machine
• Authentication
– Azure Active Directory
– App Service Authentication (Azure ID and social identity providers)
• Monitoring
• DevOps Integration
– Popular open source tools
– Visual Studio
• Tools often use .Net under the covers
• For a full comparison with AWS, see
https://docs.microsoft.com/en-us/azure/architecture/aws-professional/services
Azure Free Services Tier

• Azure Cosmos DB – 400 RU/s


• App Service – 10
• Functions – 1M invocations/month
• Event Grid – 100K Ops/month
• Azure Kubernetes Service (Control Plane, Node Pools)
• DevTest Labs
• Azure DevOps – 5 users
• Load Balancer
• Service Fabric (Containers)
• Active Directory
• And many others…
30
https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/
31
Azure Arc: Control Plan for Hybrid Clouds

32
Azure Arc Structure

Source: Microsoft

33
Azure Arc Starting Page

34
Azure Arc Capabilities
• Inventory, management, governance, and security across a multicloud
environment
• Azure VM extensions to use Azure management services to monitor, secure, and
update servers
• Manage and govern Kubernetes clusters at scale
• Use GitOps to deploy configuration across one or more clusters from Git
repositories.
• Zero-touch compliance and configuration for Kubernetes clusters using Azure
Policy.
• Run Azure data services on any Kubernetes environment as if it runs in Azure
(specifically Azure SQL Managed Instance and Azure Database for PostgreSQL
• Create custom locations on top of your Azure Arc-enabled Kubernetes clusters,
using them as target locations for deploying Azure services instances.
• Azure service cluster extensions for Azure Arc-enabled Data Services, App Services
on Azure Arc (including web, function, and logic apps) and Event Grid on
Kubernetes.
• Single pane of glass
35
Source: Microsoft
36
Source: Microsoft
37
Source: Microsoft
38
Hybrid Applications with Azure Arc

39
Azure and Azure Stack

Source: Microsoft
40
Azure Stack Hub
https://learn.microsoft.com/en-us/azure-stack/user/user-overview?view=azs-
2206

• A subset of Azure services that can be deployed on-premise and managed


by the enterprise (or a service provider)
• Integrated hardware developed by MS partners
• Meant to support edge or disconnected solutions and satisfy data
residency requirements
• Enterprise decides which services to offer
• Supports both IaaS and PaaS
– VMs
– App Service supporting web apps and Functions apps
– Kubernetes
– App Fabric
• Supports hybrid deployments connected to Azure Cloud

41
Azure Stack Edge Pro 2
https://learn.microsoft.com/en-us/azure/databox-online/azure-stack-edge-
pro-2-overview

• ‘AI-enabled’ edge computing device ‘offered as a service’


• Targeted towards IoT use cases
– Inference with Azure Machine Learning
– Data preprocessing
– Data transfer to Azure Cloud
• Supports VMs and containerized workloads with Kubernetes
• Can be managed from Azure Portal (pre-configured resource)

42
Major Cloud Platforms
• Microsoft Azure
• Google
• IBM Cloud

44
Google Cloud Platform
• Google’s entry into Cloud is based on their experience with very large data and
initially focused on platform services in this area
• In 2008, they previewed Google App Engine targeted towards web applications,
quickly followed by data-focused services (like NoSQL databases)
• Here is a current description of a web application built with the Google Cloud
Platform

45
Google Cloud Platform Services Highlights
Google advertises

• Transformative Know-How
• World Class Security
• Choice with Hybrid and Multicloud
– Kubernetes originated with Google
• Serverless for Simplicity (Cloud Functions, Cloud Run, Knative open source)
• Innovation with AI, ML and Big Data/Analytics
– Tensorflow and Keras originated with Google
– BigQuery Cloud Data Warehouse
• Managed Open Source Software
• World-wide Network
• Google G-Suite productivity applications

Clearly targeting enterprise clients 46


Anthos Application Management Platform
for Hybrid Environments

47
Major Cloud Platforms
• Microsoft Azure
• Google
• IBM Cloud

48
IBM Cloud
• IBM Public Cloud offers very similar functions to
other commercial clouds
• IBM has a strong focus on private and hybrid clouds
and infrastructure and management spanning clouds
– Support for Cloud Native Foundation
– Acquisition of Red Hat
• Some cloud services from IBM (especially AI/ML) will
be available on other clouds as well
• For introductory projects, see
https://cloud.ibm.com/developer/appservice/starter-kits

49
Linux, Containers & Kubernetes are Foundational for the IBM
Platform
APPLICATION DEVELOPMENT MANAGEMENT,
OPERATIONS & SECURITY

DevSecOps Tools & Toolchains Self-Service


Catalog Hybrid
Multi-Cloud
DEVELOPMENT & DEPLOYMENT CONSUMABLE “SERVICES” Hybrid Security
Multi-Cloud
FUNCTIONS SaaS APIs Deployment & Compliance
Lifecycle Application
Management Identity
CODE & Public Cloud PaaS Services Access
BUILD PACKS Key
CONTAINER Serverless Policy based Certificates
Private Cloud Software Monitoring Network
IMAGES
VM IMAGES & Application OpenWhisk
OPEN
SERVICE
Logging Container
Images
Traditional Applications Cost
PATTERNS Platforms Knative
BROKER
Configuration
,,,
Automation Code Engine Metering
Containers OPERATOR
FRAMEWORK
Cloud Foundry Integration & API Placement
Terraform | Ansible | Management Platform Best Practices, Integration Content,
… AI-enabled

Infrastructure-independent common operating environment


Kubernetes, Containers, Linux
Common Services
IAM | Monitoring | Logging | Deployment, Self-Service Catalog | Metering | Key & Certificate Management | Service Mesh

Infrastructure-as-a-Service
Bare Metal | Compute, Network & Storage Virtualization – VMWare, Openstack | Private Cloud | Public Cloud
50
OPENSHIFT CONTAINER
PLATFORM
ANY
CONTAINER

APPLICATION LIFECYCLE MANAGEMENT

CONTAINER ORCHESTRATION AND MANAGEMENT


(KUBERNETES)

ENTERPRISE CONTAINER HOST

ANY
Laptop Datacenter OpenStack Amazon Web Services Microsoft Azure Google Cloud INFRASTRUCTURE

51
IBM & Red Hat: Strategic Architecture

Consulting Strategy Migration Development Management


Services
ISV Applications/Solutions

Advanced AI Analytics Blockchain Security IoT Quantum


Technologies

Cloud Paks Cloud Pak for


Applications
Cloud Pak for
Data
Cloud Pak for
Integration
Cloud Pak for Cloud Pak for
Multicloud
Cloud Pak for
Security
Automation
Management

Foundation Open Hybrid Multicloud Platform

Infrastructure IBM public AWS Microsoft Google Edge Private IBM Z


cloud Azure Cloud IBM LinuxOne
IBM Power
IBM Storage

© 2019 IBM Corporation 53


IBM Software Delivered on the Red Hat Platform

IBM Software
for the Hybrid Cloud

Containerized & Integrated


Cloud Pak for Cloud Pak for Cloud Pak for Cloud Pak for Cloud Pak for Cloud Pak for
Applications Data Integration Automation Multicloud Security
Management Runs Anywhere

Built in
Self-Service Automated Deployment
RED HAT HYBRID CLOUD PLATFORM + Day 2 Operations

Open Source + IBM Software

Certified & Optimized on


OpenShift

Private Cloud IBM Public Cloud AWS Microsoft Azure Google Cloud Edge On-premises IBM Power & Z

© 2019 IBM Corporation 54


Cloud Pak Added Value

Containers Alone IBM Cloud Paks


Client creates containers Complete solutions
or receives software as certified for
standalone container(s) enterprise use cases
Runs anywhere Yes Yes

Cloud Paks Vulnerability scanned Yes Yes


Red Hat container certification Depends on product Yes
Speed to market

Complete solution w/ container platform No Yes


Flexible & modular: Pay for what you use No Yes
IBM certified/orchestrated for production No Yes
(Built for Kubernetes by experts; certified against 250+ criteria)

Multicloud validation No Yes


Integrated deployment experience No Yes
Full stack support by IBM No Yes
(Base OS, software, and container platform)

License metering integration No Yes


Scalable and resilient No Yes
Containerized
software alone Encrypted secrets / limited privileges Do it yourself Yes
Management and operations Build your own Yes

Enterprise security Lifecycle Management Manage it yourself Yes

© 2019 IBM Corporation


55
Cloud Transformation
Considerations

56
Transforming an existing mission critical workload to
the cloud is challenging

Example financial services application portfolio CHALLENGES

• Refactoring complex,
interconnected applications & data

• Maintaining performance & SLA


requirements for applications, data
and integrations

• Multi-provider shared responsibility


models for security & compliance
NOTE: The above is a representative example only
• Integration, Data management,
service assurance & governance
across multiple cloud providers
More ready for cloud May be ready for cloud Not ready for cloud
• Rapidly evolving technology
What are “best” cloud technology choices and process changes needed ? choices and concerns of vendor
lock-in

• Organizational & cultural changes


to adopt DevOps transformations
Transformation to Cloud using multiple concurrent
approaches
… to minimize risk & cost while leveraging new & existing investments to innovate &
differentiate

Application Lift-Standardize-Consolidate-Automate-Shift Evolution to Cloud-


Portfolio based Application
Bare metal, VMs, Containers, Automation - SDDC
• Base Virtualization
with Standardization
& Automation
Customer Information Contain-Expose-Extend
• Cloud native
API Creation & Management, Connectivity & Integration • Loosely-Coupled
• 12-factor
• Horizontal Scaling
Payment Systems • Eventually consistent
Refactor/Create as Cloud-Native/Microservices •

Microservices
Auto-scaling
• DevOps & CI
• Self-recovering
Business Process

Data Classification, Movement & Governance


VMs | Containers | aPaaS | iPaaS
New Applications Cognitive Data Classification, High-volume data transfer, Event Driven
Metadata Management

On-premises | Off-premises
A pragmatic approach to modernizing applications

20% Rationalize and decommission


Hybrid Multicloud
Replace with “as a service”
10% IT
solution

50%
Modernize and Migrate
to Cloud Public SaaS
Public
10% Re-Host (lift & shift)

+ for selected Dedicated


10% Re-Platform …on any
apps: off-premise
15% Refactor
Automate cloud
Containerize Private platform
Re-Architect on premise
15%
(Microservices / APIs)
Traditional
Retain on on premise
20%
Traditional
© 2019 IBM Corporation 59
A hybrid strategy Sources of value
unleashes the full Business Incremental benefits from hybrid
approach are driven by:
Acceleration
potential of the cloud • Up to 50% more apps
migrated
ADM • Duplicative tools and

2.5 X
productivity processes removed

Additional value beyond


Infrastructure recurring benefits include:
cost efficiency
• Up to 25% lower cost on
certification for migration
with pre-certified stack
Regulatory
more value than a and risk • Reduced cyber security and
regulatory risks with single
public-only strategy pane of control
Strategic • Avoidance of vendor lock-in
optionality • Jump-start innovation with
architectural flexibility

Source: McKinsey study 2019

IBM Cloud Innovation 60


Forum 2019
AWS 6Rs – Options for Cloud Migration
• Rehost (Lift and Shift)
– Applications can be moved as is
– E.g. virtualized or containerized, horizontally scalable
– Automate!
– Can be a first step to refactoring
• Replatform (Lift, Tinker and Shift)
– Apply minor modifications (low hanging fruit)
– Replace functions (like a DB) with a managed service (e.g. RDS)
– Requires careful consideration of application lifecycle management
• Repurchase (Drop and Shop)
– Legacy applications no longer satisfying business needs or
incompatible with the cloud
– E.g. replace legacy CRM or ERP with SaaS offering

See also: https://aws.amazon.com/blogs/enterprise-strategy/minimal-viable-


refactoring-the-5-best-ways-to-improve-your-app-on-the-way-to-the-cloud/ 61
AWS 6Rs – Options for Cloud Migration
(Adapted from Gartner 5Rs, 2010)
https://aws.amazon.com/blogs/enterprise-strategy/6-strategies-for-
migrating-applications-to-the-cloud/

• Refactor/Re-Architect
– Most expensive option
– Necessary when application has reached a tipping point
– E.g. unwieldy monolith needs breaking up
• Retire
– Application is redundant
• Retain
– Core application with significant IP
– Performing well in specific enterprise setting (e.g. mainframe high
volume transactional application)
– Can be integrated into hybrid cloud settings

62
See also: See: https://aws.amazon.com/blogs/enterprise-
AWS 6Rs – Options for Cloud Migration

Source: https://aws.amazon.com/blogs/enterprise-strategy/6-strategies-for-migrating-
applications-to-the-cloud/
63
Further Considerations for Migration
(Adapted from Gartner 5Rs, ca. 2010)
• Business considerations for the type of migration may change
over time
• Consider the entire spectrum of hybrid cloud when making
choices
• Strive for workload ‘portability’
– Containerize!
– Use loosely coupled architectures (message and event based)
– Use standard interfaces and open source where possible, avoid lock-in
– Don’t underestimate management needs – observability is key
• Consider the entire lifecycle
• Consider an incremental approach (strangler pattern)

64
Serverless 1: IBM Cloud Code Engine
“Practical Cloud Topics – Platforms, Applications, and Best Practices”

University of Stuttgart, November 23, 2023

Simon Moser | Distinguished Engineer


smoser@de.ibm.com

© 2021 IBM Corporation


Who Am I?

Simon Moser
CTO Cloud Container Services
IBM Research & Development

Email: smoser@de.ibm.com
Linkedin: https://www.linkedin.com/in/simonmoser/
X / Twitter: @mosersd
Agenda

• What is Serverless?

• Introduction to FaaS

• Serverless 2.0 ? Why ?

• IBM Cloud Code Engine

• Serverless vs. other programming models: Guide


What is Serverless ?

• Serverless computing is a cloud computing execution model in which the cloud provider allocates
machine resources on demand, taking care of the servers on behalf of their customers1.

• Around 2016, the term "serverless functions" started to take off in the tech industry and was
presented as the undeniable future of infrastructure2

• "Serverless" is a misnomer in the sense that servers are still used by cloud service providers to
execute code for developers1.

• Developers of serverless applications are not concerned with capacity planning, configuration,
management, maintenance, fault tolerance, or scaling of containers, VMs, or physical servers1.

1 - https://en.wikipedia.org/wiki/Serverless_computing
2 - https://matduggan.com/serverless-functions-post-mortem/
Serverless 101: Traditional model

Worry about scaling:


• When to scale? (memory, cpu, response
time etc. for your application)
• How fast can you scale
Worry about resiliency & cost:
• At least 2 processes for HA.
• Keep them running & healthy.
• Deployment in multiple regions
Charged even while idling / not 100% utilized

Continuous polling due to missing event


programming model
Serverless 101: FaaS model

Scales inherently:
• One process per request
No cost overhead for resiliency :
• No long running process to be made HA /
multi-region
Introduces event programming model

Charges only for what is used :

Only worry about code higher dev velocity,


lower operational costs
Writing Applications (What we thought 2018)
Monolithic:
• Application deployed as one unit.
• Introducing new code is a complete new
deployment. Microservice
Function

Microservices: Function

• Each service is a functional component of


your application. Monolithic
Application
Microservice Function

• Decoupling
Function
FaaS
Microservice
• Further decoupling – each fn does one thing Function

& does it well.


• No infrastructure management.

• The idea of a serverless function replacing the traditional web framework or API has almost
disappeared. Even cloud providers pivoted, positioning functions as more "glue between
services" than the services themselves.
What does Functions
promise to developers?

Event Driven
Workloads
What does Functions
promise to developers?

Scalability
IBM Cloud Functions

Demo time
IBM Cloud Functions

Start container Initialize Run


docker run /init /run

cold container
IBM Cloud Functions

Start container Initialize Run


docker run /init /run

pre-warmed container
39
Performance is king

cold container pre-warmed container warm container

faster
Now that you
understood FaaS -
What’s Serverless ?
Serverless Value Proposition

• Targeted scaling: Workloads transparently scale with the number of


requests being served

• Better resource utilization: Only pay for resources being used, instead of
resources idling around

• Faster time to market: No management and operation of infrastructures

• Focus on developing value-adding code and on driving innovations


Sounds good!
But… What’s wrong?
Perception & Limitations

• Serverless if often equated to Functions-as-a-Service


• Stateless only
• Limited runtime options

• Traditional Serverless Offerings have (intentional?) limitations


• Fairly low CPU/Memory boundaries
• Execution time limits
• Enforced workload separations (Apps vs. Functions vs. Batch as separate offerings)

• No/limited hardware acceleration


What’s the problem?
Computation Requirements derived from deep learning *
Rethink for
Serverless 2.0

• Resource demand of modern


apps outgrowing Moore’s law &
Moore’s law coming to an end
• It’s hard to implement
nths
distributed apps with traditional
8 mo
approaches 1
very
• It’s hard to implement 5xe
3
applications using frameworks &
libs from various communities
(e.g. AI, microservices, Big Data,
s
HPC
every 18 month
2x
CPU
ths
1,05x every 18 mon

* Data from https://openai.com/blog/ai-and-compute/


“For its entire history, distributed computing research modeled
capacity as fixed but time as unlimited. With serverless time is
limited, but capacity is effectively infinite.
This only changes everything J ”

Tim Wagner (Inventor of AWS Lambda and former AWS GM for Lambda)
Example: World’s fastest video compression with serverless

• Remarkable : Video encoding is not


embarrassingly parallel at all
• Each frame is expressed as the
delta compared to the previous
frame
• Still, it was possible to rethink video
encoding using serverless
• Result: The world’s fastest video
encoding mechanism
• Other example: gg (highly
distributed code compiling,
rendering videos,, …)

http://pages.cs.wisc.edu/~shivaram/cs744-readings/excamera.pdf
Next Generation Serverless
Serverless + Elimination of limits = Next Gen Serverless
- Transparent scaling (including scale- - (Almost) no CPU, memory or disk
to-zero) limits
- No infrastructure mgmt. - No duration limit
- No capacity mgmt.
- Pay-by-consumption

Your source Interactive


(we build the (with or w/o route)
container for you)
< your Micro Data
Machine Analytics
custom services & processing …
Learning (e.g. Spark)
code > Web apps (e.g. ETL)
Run to completion
Your container
(batch)

IBM Cloud Code Engine


IBM Cloud
Code Engine

Run any code. Easily. At scale

IBM Cloud Code Engine is a fully managed runtime


where developers can go live in seconds, easily run any
code, pay only for what they use, scale up and down –
even to zero.

24 IBM Cloud Code Engine | © 2021


IBM Cloud Functions

Demo time
How did we build
Code Engine
IBM Cloud Code Engine Conceptual architecture
End users
Speed & IBM Cloud Code Engine
ease of use
Function

App </>
knative Istio
Batch jobs
Developer
(Code Engine user)
Container

...
Control

Multi-tenant Kubernetes (IBM Cloud)


* Note that developers don’t
“see” the cluster and are
not responsible for it. They Virtual machines
just deploy their workloads.
Physical machines
27
Multiple Layers of Scale

• On the Kubernets POD level


• Knative Autoscaler to scale upon incoming
requests

• Within the Kubernetes Cluster level


• Node Autoscaler to scale worker nodes up &
down within predefined limits

• On a regional level
• Custom autoscaler based on capacity
demands
Knative Istio Tekton
Serving is the runtime Components to connect, Define a series of re-
component that hosts and secure, control, and usable tasks to build a
scales your application as observe services. CI/CD workflow
K8s pods
ServiceMesh, auto-mTLS Each run as a container
Eventing contains tools for etc,
managing events between
loosely coupled services

Open Source
29
Shipwright Paketo Kaniko
Shipwright is an extensible A collection of buildpacks A tool to build container
framework for building Leveraging the Cloud Native images
container images on Buildpacks framework to
Kubernetes. make image builds easy, Builds container images
performant and secure from Dockerfiles
Declare and reuse build
strategies to build your Ensuring that upstream Executes each command
container images. languages, runtimes and within the Dockerfile
frameworks are completely in userspace
Shipwright supports continuously patched in
popular tools such as response to vulnerabilities Enables building container
Kaniko, Cloud Native and updates images in environments
Buildpacks, Buildah, and that can't easily or securely
more! run a Docker daemon (such
as a standard Kubernetes
cluster)

Open Source
30
Whom did we build
Code Engine for?
Serverless is suitable for different personas now

Container-Savvy Developer Functions Developer

• I can run my containerized application without having to • I love Functions-as-a-Service and can now run them with
worry about sizing, creating or managing a cluster. almost no limits.

• “Run my container” vs. “Give me a cluster, that I can • I now have a single platform to securely combine
then run my container on”. Functions with Apps and other workloads

Batch Job Creator PaaS Developer

• I can create powerful batch jobs and easily combine them I can start utilizing a new powerful platform and:
with events and other services.
• keep using a “push source code” experience
• The underlying platforms scales out and allows me to run
massively parallel jobs … and I only pay for what I use. • do not have to worry about containers

• can easily connect my code to backing services


Our architecture is a K8s-based
Container Platform that:
• allows me to deploy my container-based workloads easily,
quickly and securely.
• dynamically scales my containers according to load.
• has monitoring, logging and a service mesh “built in”.
• gives me the ability to run serverless workloads.
• does not “lock me in” and is built on open-source projects.

… and …

• unifies the deployment of containers, applications, batch


jobs and functions.
• starts as a production-grade, multi-tenant shared
container service, but will be extendable to other locations
(via IBM Cloud Satellite) for higher isolation.
So you built a
Swiss army knife for
Containers. Cool!
But didn’t you say
containers, VMs, or
physical servers on
Slide 1 ?
A glimpse to the (not so distant) future

• In the past, distributed/serverless workloads were isolated


• HPC had nothing to do with microservices etc.

• Today we see customers combining Web-Apps and Batch processing in one Project

• More and more apps are starting to integrate elements of AI


• AI training is a form of HPC

• A Service is needed that gives justice to these developments


• Code Engine is already addressing a lot of these requirements
A glimpse to the (not so distant) future (2)

• But not all workload is (or will be) containerized


• A HPC/AI developer doesn’t care how to package its e.g. python code, just wants to run it

• New, specialized hardware flavours comes to market with lightning speed


• A100, H100, GX200, IBM AIU, Apple Silicon, etc.

• Integration of new flavours into a K8s cluster pool is not instantaneous (and therefore slows
adoption)

• Maintaining an infrastructure pool is not desirable for a service


• Provider can never achieve a 100% utilization

• Eating into the profitability of the business, „pool-less“ architectures are a goal

• Can a VM be Serverless ?
Yes, it can. Please welcome: Serverless VMs
Rental Car Uber • The Uber of compute, just better
Rental car Trip from A to B
Unit & degree of lock-in • Customer doesn’t have to care about patching,
Fill tank, locking, no damage, driving, Get in and get out of the car, if it compliance and alike
if something breaks get it fixed, stick breaks, just get out and take another
Customer Responsibility to the traffic laws, if it breaks, calls Uber
service and wait with the car • Pays only for as long as his process is running
As long as the car is in the hands of Customer pays a fixed price for the
Charging model
the customer, From date x to date y, route from A to B • Can use all hardware flavors instantaneously
incl. when sleeping at night

Broad spectrum of cars Broad spectrum of cars • Scales transparently - the resource allocation happens
Size
Bad – if I need to take care of more Very good – don’t need to do anything transparently behind the scenes.
people, I need to order more cars, but special – whether it’s 1 person or
that takes a long time 1000, the ordering process is the
Elasticity same, and it’s within minutes. I order
them, and they come quickly in
basically any number

Very broad – from compact cars up to More limited than Hertz wrt car
trucks selection, but its higher-level service
Choice allows growing into domains Hertz
can’t (e.g. food delivery)

From date a to b – irregardless of how Only charged for the period of time
much used (incl. when I’m sleeping) when sitting in the car
Charged duration

In general cheaper than Uber Attractive for short trips. The longer
the distance, the more renting
Price becomes attractive
Code Engine vs. AWS Lambda
Typical customer complaint AWS Lambda IBM Code Engine
Why can’t I get more memory per execution? X 3GB ✅ 30GB and higher

Why can’t I get more CPUs? X 2 cores ✅ >= 4 cores

Why do you constrain the execution duration so strictly? X 15 mins ✅ hours

Why do you constrain my codebase to not having a


X 250MB ✅ GBs
bigger footprint?
Why do I have to spin up a new invocation instance, 1 invocation -> 1
X ✅ Multi-threading supported
even for lightweight functions? process

Why can’t my function invocations talk to each other via


X Not possible ✅ Possible
a local network?

Why can’t I just bring my existing container, that is


X Not possible ✅ Possible
hosting an HTTP-based component?

Why can’t I get GPUs? X Not possible ✅ Will be supported

© 2021 IBM Corporation 39


Code Engine vs. AWS
Feature Code Engine AWS Lambda AWS Fargate AWS Batch

Duration No defined limit 15 mins No defined limit No defined limit

CPU cores >> 4 / VSI limits Max 2 Max 4 vCPU EC2 limits

Memory >=30 GB / VSI limits Max 3 GB Max 30 GB EC2 limits

Payload size No specific one (streaming 6 MB No specific one No specific one


supported)

Processor arch X86, GPU (future) x86 X86 EC2 capabilities

Run general docker image Yes No Yes Yes (manual)

Run any unmodified web app Yes No Yes No

Push code Yes Yes No No

Scale-to-zero Yes Yes No No

Concurrency within a container Yes No yes yes

Cold start time low-digit seconds à few 100’s of Millisecs Seconds-mins minutes
millisecs

Private node-2-node network Yes No Yes Yes

No lifecycle mgmt. required Yes Yes No No

Event-driven invocation Yes Yes No No

Note that Code Engine unifies the best-of-breed capabilities, allowing to use them in combination, which is not possible on AWS. Examples:
- more than 15mins duration & scale-to-zero
- Payload size > 6MB & low cold start time
© 2021 IBM Corporation 40
Serverless distributed
Code Engine: applications

The next step towards the


serverless nirvana Functions

Apps
• Stateful or stateless

• Private node-to-
Containers node communication

• Very high cpu, mem,


disk, etc. limits
Virtual machines
• More generic
programming model

Bare Metal • Various processor


architectures
Grand Vision - The serverless supercomputer:
Continue to use existing commands and APIs, using 1 core or 10,000
cores transparently, using the cloud as if it was a single computer
Lithops: Map/reduce against Batch
IBM Cloud Web UI IBMCloud CLI
or isolated VM infrastructure

Distributed HTTP-serving Batch job Container Customer


Python/Java app app
workloads

Stateful serverless runtime Source-to-imageTekton & Buildright


(Ray)
Scale-out drop-in libraries Any
• Analytics • Stream container- Serverless HTTP (Web apps, REST APIs,
• AI/ML
• Microservices •
processing
Simulations packaged mobile backends, …) Serverless Batch & Background Workers
Map/Reduce

• Spark


CRISPR
Video
software
• HPC • … Knative Istio
Shared in-memory object
storage

Serverless Containers / Kubernetes

IBM Cloud Code Engine Provided as a service


Deployed by the customer

Running on client side / Used by client


Got it. So when do I
use which tech ?
Comparison: Workload Sweet
Spots

On-premise Datacenter VMs PaaS


Containers FaaS / Serverless

Workload • Special HW required • OS customizations • longer-living • stateless • Stateless/short-living


characteristics • Compliance-regulated • Full OS control • Any protocol • http(s)/websockets • Written in a well-defined
(sweet-spot) • Stronger isolation • custom OS set of languages or
requirements binaries required compiled binary

Workload examples • Data which must be in • Apps having special • Continously • High-volume web apps / • API /microservice / web
(sweetspot) on-prem DC OS requirements runnning processes APIs High-volume web app implementations
• Mainframe apps • Apps packaged into (e.g. game engines) apps / APIs • Mobile backends
existing VM images • distributed • Reaction to streaming /
• Live-videostreams technologies (e.g. data IoT, Cognitive, etc.
(resource-heavy) mongodb, events
zookeeper)

44
Comparison: Developer
Experience
On-premise
VMs PaaS
Datacenter Containers FaaS / Serverless

Time to Weeks/months Minutes Seconds/Minutes Seconds/Minutes Milliseconds


provision

Utilization Highest Higher Higher High Low

Ability to reuse Low Lower Medium High Highest


apps

Developer view Just the app code Just the app code container VM None

Autoscaling inherent, no delay mgmt function mgmt function mgmt function None

45
Comparison: Workload Sweet
Spots
On-premise VMs Containers PaaS FaaS / Serverless
Datacenter

Artifact physical machine VM container app code action code, trigger, rule

Developer • Developer • Installs or • Creates application or • Uploads complete • Uploads only artifacts
usage manually clones an microservices, and packages it in a application using a CF • No explicit management of
installs existing OS, containe supported runtime. computing resources
middleware and packages • Deploys the container to the server. • Explicitly binds services to required.
and services on the entire OS • Must manage loading of Docker application • No starting and stopping of
dedicated in a VM image components and any • Explicitly starts/stops the application required.
hardware. and deploys to orchestration/communication among cloud application.
the server. containers. • Entire applications is
• Developer atomically packaged and
must stop/stop executed.
the entire VM. • Any changes requires
deployment of the entire
application.

46
IBM Cloud / DOC ID / Month XX, 2018 / © 2018 IBM Corporation 47
Cloud Computing
Serverless in AWS

Stuttgart University
WS 2023/24
11-21-2023

Dr. Kristof Kloeckner Gerd Breiter


GM and CTO, IBM GTS (retired) DE, IBM (retired)
kristof.kloeckner@iaas.uni- gbreiter58@gmail.com
stuttgart.de
1
Cloud Platforms: Serverless 2

2
Shades of Serverless Computing
• Event-based computing (with Function as a Service)
– Apache OpenWhisk, Knative
– AWS Lambda
– IBM Functions, Azure Functions, Google Cloud Functions…

• Any kind of managed service where you ‘just deliver code’ (or
a container)
– AWS Serverless Platform (incl. AWS Lambda and Fargate)
– IBM Code Engine
– Google Cloud Run

• Over half of large organizations using AWS, Azure or Google


use serverless (Datadog telemetry data)
– Function as a Service most popular on all platforms, but serverless
3
containers picking up fast
Serverless Computing in AWS
• Amazon’s definition describes a managed service, while others in the
industry have a narrower definition relating to ‘Function as a Service’ and
event-based applications
• AWS: ‘Serverless is the native architecture of the cloud that enables you to
shift more of your operational responsibilities to AWS, increasing your
agility and innovation’
– Application Repository: https://aws.amazon.com/serverless/serverlessrepo/
• ‘AWS Serverless Platform’
– Lambda
– Lambda@Edge (running at CloudFront locations)
– AWS Fargate (Serverless Container Engine)
– Storage Services (S3 and Elastic File System)
– Database Services (DynamoDB and Aurora Serverless)
– API Gateway
– Integration Services (SNS, SQS, AppSync, Event Bridge)
– Multiple other services
• Resources: serverlessland.com 4
AWS Lambda
• https://aws.amazon.com/lambda/
• ‘Run code without thinking about servers. Pay only for the compute time you
consume’
• You do not rent and manage virtual servers, your function is triggered by an
event (including by other services)
– Triggers from S3, DynamoDB, Kinesis (Streaming), API Gateway …
– About 40 different event generators
• AWS Lambda takes care of scaling, availability etc.
– Up to thousands of events per second
– AWS handles trillions of events/month
– Half of new apps on AWS use Lambda (as of 2020)
• Lambda is based on Firecracker, an open source microVM developed by AWS
• Ideal for unpredictable workloads of short duration
– Currently limit of 15 minutes for execution
• https://console.aws.amazon.com/lambda/home?region=us-east-1#/discover
5
Lambda Execution Environment

6
Events, Functions, Services

https://aws.amazon.com/getting-started/deep-dive-serverless/?e=gs2020&p=gsrc
7
Lambda Execution

Object Event Message Stream


upload publish put in
queue

S3 SNS SQS Kinesis

removes
from polls
queue

8
Usage of Serverless in Lambda
https://www.datadoghq.com/state-of-serverless/

• Over half of organizations operating in AWS, Google Cloud or


Azure have adopted serverless
• Python and Node.js are dominant among Lambda users
• Over 60% of large organizations have deployed Lambda
functions in at least 3 languages
• API Gateway and SQS are the functions that invoke Lambda
the most on AWS
• 80% of Lambda invocations through API Gateway are to single
purpose functions
• Only one in 5 Lambda users is deploying container images in
Lambda
• More than 20% of Lambda customers are also deploying ECS
Fargate 10
Typical Lambda Workloads
https://docs.aws.amazon.com/lambda/latest/operatorguide

• Web applications: serve the front-end code via Amazon S3 and Amazon
CloudFront, or automating the entire deployment and hosting with AWS
Amplify Console.
• Web and mobile backends: the front-ends interact with the backend via
API Gateway. Integrated authorization and authentication are provided by
Amazon Cognito or APN Partners like Auth0.
• Data processing: event-based processing tasks triggered by data changes
in data stores, or streaming data ETL tasks with Amazon Kinesis and
Lambda.
• Parallelized computing tasks: splitting highly complex, long-lived
computations to individual tasks across many Lambda function instances
to process data more quickly in parallel.
• Internet of Things (IoT) workloads: processing data generated by physical
IoT devices.
11
Example: IoT Backend

This is a typical example for preventive maintenance that holds for many
Industries.

12
Example: Real-Time File Processing

CreateThumbnail

A realistic example of format conversion in media processing

13
AWS Lambda Concepts
• Lambda runs functions to process events.
• An event is a JSON formatted document that contains data for a function
to process. The Lambda runtime converts the event to an object and
passes it to your function code. Invoking a function determines the
structure and contents of the event.
• All runtimes share a common programming model. You tell the runtime
which method to run by defining a handler in the function configuration,
and the runtime runs that method. The runtime passes objects to the
handler that contain the invocation event and the context, such as the
function name and request ID.
• Functions can be synchronous (e.g. for API Gateways), asynchronous (e.g.
SNS, S3), or poll-based (DynamoDB, Kinesis)
• Concurrency is the number of requests that your function is serving at any
given time.
• A trigger is a resource or service that invokes a function, it can be a
program, an AWS service or an event source mapping (like reading from a
stream).
14
Creating a Serverless Function
• From the dashboard, select ‘create function’
• Four options:
1. Create from Scratch
2. Use a blueprint
3. Deploy a container image
4. Browse serverless app repository
• If creating a function from scratch, provide
– Name, runtime info, permissions
– Use LabRole for AWS Academy setup
– .Net, Go, Java, Python, Node.js, Ruby
• Create Function
• This brings up the console for your application
• Select the test tab to test it out
– You need to configure a test event
15
Sample Function (lambda_function.py)

import json

def lambda_handler(event, context):


# TODO implement
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}

16
Sample Function (index.js)

exports.handler = async (event) => {


// TODO implement
const response = {
statusCode: 200,
body: JSON.stringify('Hello from Lambda!'),
};
return response;
};

17
Showing event content (Python)
A greeting by name
# import the JSON utility package since we will be working with a JSON object
import json
# define the handler function that the Lambda service will use an entry point
def lambda_handler(event, context):
# extract values from the event object we got from the Lambda service
name = event['firstName'] +' '+ event['lastName']
# return a properly formatted JSON object
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda, ' + name)
}

The event has the structure:


{
“firstName”: firstname,
‘”lastName”: lastname
}
18
Blueprint Examples
• SNS-Message (Node.js or Python)
– Name the function, choose or create role
– Configure (and enable) an SNS Trigger (topic)
– Create function
– Configure a test event and run test
– Publish a message to your topic, look at log
• S3 Metadata (Python)
– Name the function, choose or create role
– Configure and enable an S3 bucket as trigger
– Configure a test event (insert bucket name and name of a
public object)
– Upload an object, look at log
20
Deploying Python Lambda Functions with
Container Images
• If you use an AWS base image, you only need to copy your
function to the container and install dependencies
• Sample app.py with simple handler

import sys
def handler(event, context):
return 'Hello from AWS Lambda using Python' + sys.version + ‘!’

• Sample Dockerfile

FROM public.ecr.aws/lambda/python:3.8

COPY app.py ./
CMD ["app.handler"]
21
Serverless Microservices with Lambda

Source: Implementing Microservices with AWS


27
Serverless Applications
• Serverless Application Model
– Open Source Framework to build serverless applications
for AWS
– https://docs.aws.amazon.com/serverless-application-
model/latest/developerguide/what-is-sam.html
• Serverless Application Repository
– Includes both AWS and community contributions
– Many useful patterns, including integrations with most
AWS services
– For use with Learner Lab, many templates need to be
modified due to implicit role creation
29
Serverless Best Practices
https://www.digitalocean.com/blog/best-practices-for-serverless-computing

• One event, one task


– Faster startup, easier test, small attach surface
• Keep functions short-lived
• Remember that functions are stateless
– Use databases for persistent data
• Test and configure
– Optimize duration, memory needs
• Secure your function
• Future-proof your setup

30
Well-Architected Best Practices for Lambda
https://www.datadoghq.com/blog/well-architected-serverless-applications-
best-practices/

• Limit Lambda privileges (attach policies to roles)


• Limit access to application through VPC (with at least 2 subnets)
• Manage failures (retry behavior)
• Reduce cold starts, use caching, keep packages small (reduced
initialization)
• Optimize memory size (test)
• Don’t overspend on provisioned concurrency
– Consider auto-scaling
• Monitor resource usage (a plug for Datadog, but good practice)
• Standardize logs (ditto)

31
Lambda Resources
• https://aws.amazon.com/lambda/getting-started/
• Tutorials:
– CRUD API with Lambda and DynamoDB
https://docs.aws.amazon.com/apigateway/latest/developerguide/http-api-dynamo-
db.html
– Basic Web Application
https://aws.amazon.com/getting-started/hands-on/build-web-app-s3-lambda-api-
gateway-dynamodb/?trk=gs_card
• serverless.com (good overview)
• Matthew Fuller: AWS Lambda, A Guide to Serverless Microservices, 2016,
available on Kindle (a bit dated, but decent introduction)
• Serverless resources from AWS: serverlessland.com
– Pattern collection: serverlessland.com/patterns
• github.com/serverless/serverless: Github repo with a serverless framework
• Best practices:
https://www.datadoghq.com/blog/well-architected-serverless-applications-best-practices/
32
API Gateway
• API Gateways enable developers to create, publish and
manage APIs
• Gateways serve as a control point for applications to access
backend services
• They provide authorization, traffic management, version
control etc.
• Amazon API Gateway options:
– HTTP API for RESTful APIs that only require proxy functions
– REST API for full API management
– Websocket API for persistent two-way connection between clients and
backend, e.g. for chat apps
• https://docs.aws.amazon.com/apigateway/latest/developergu
ide/welcome.html
33
Amazon API Gateway

34
Serverless Microservices with
API Gateway and Lambda

HelloWorldAPI

HelloWorldFunction
HelloWorldDatabase

MyBucket

Source: Implementing Microservices with AWS


35
Some AWS Integration Services
• Amazon Simple Queue Service
– Fully managed queuing service
– Priced per request, first 1M requests free
• Amazon MQ
– Fully managed message broker service for Apache ActiveMQ
• Amazon Simple Notification Service
– Fully managed publish/subscribe (pub/sub) and mobile notification
service
• Amazon Managed Services for Apache Kafka
• Related: Amazon EventBridge
– Serverless event bus service

36
Amazon Simple Notification Service (SNS)

• (Managed) System-to-system messaging decoupling


producers and consumers of messages through a
publish/subscribe mechanism

https://console.aws.amazon.com/sns/v3/home?region=us-east-1#/dashboard 37
Event-Driven Architecture with Topics, Filters
and Queues
SQS

SNS Topic

Filters

38
Serverless Containers
with AWS Fargate

Fargate is the default for Elastic Container Services

39
ECS Fargate
Structure

• Fully managed
• Default for AWS
• Kubernetes Pods can
use Fargate
• Longer running
workloads

Source:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/Welcome.html 40
Serverless Microservices with Fargate

Source: Implementing Microservices with AWS


41
Some Considerations for Fargate vs.
Lambda
Lambda
+ Good for event-driven, short-term execution
+ Many integrated event sources
+ Good fit for response to an HTTP request behind an API Gateway
+ Good if duration shorter than 15 minutes
+ Stateful apps with EFS
(-) Need to refactor existing applications
(-) Cold start delays, may require some regular warming up (scheduled startup)
+ FedRAMP High certified

Fargate
+ Good for long-running processes
(-) Initial startup time longer than Lambda, negligible afterwards
+ Good if you want to control your own auto-scaling
(-) No inherent event integration (needs to come via ingress)
(-) Stateful apps not recommended
+ More memory available, cheaper than Lambda for high memory
+ Containerizing often easier than refactoring
(-) EKS Fargate only FedRAMP Moderate
42
A Modern Application in AWS with the
Serverless Stack
Events

Dynamic Content
DynamoDB

Longer-Running
Processes
(Fargate)

S3
Static Content

Source: Implementing Microservices in AWS


43
Backup

44

You might also like