Professional Documents
Culture Documents
Ilovepdf Merged
Ilovepdf Merged
Ilovepdf Merged
Microservices on AWS
AWS Whitepaper
Implementing Microservices on AWS AWS Whitepaper
Amazon's trademarks and trade dress may not be used in connection with any product or service that is not
Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or
discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may
or may not be affiliated with, connected to, or sponsored by Amazon.
Implementing Microservices on AWS AWS Whitepaper
Table of Contents
Abstract and introduction .................................................................................................................... i
Introduction .............................................................................................................................. 1
Are you Well-Architected? ........................................................................................................... 2
Modernizing to microservices ....................................................................................................... 2
Simple microservices architecture on AWS ............................................................................................. 3
User interface ............................................................................................................................ 3
Microservices ............................................................................................................................. 4
Microservices implementations ............................................................................................ 4
Continuous integration and continuous deployment (CI/CD) ............................................................ 4
Private networking ..................................................................................................................... 5
Data store ................................................................................................................................. 5
Simplifying operations ................................................................................................................ 5
Deploying Lambda-based applications .................................................................................. 6
Abstracting multi-tenancy complexities ................................................................................. 7
API management ................................................................................................................ 7
Microservices on serverless technologies ............................................................................................... 8
Resilient, efficient, and cost-optimized systems .................................................................................... 10
Disaster recovery (DR) ............................................................................................................... 10
High availability (HA) ................................................................................................................ 10
Distributed systems components ........................................................................................................ 11
Distributed data management ........................................................................................................... 12
Configuration management ............................................................................................................... 14
Secrets management ................................................................................................................ 14
Cost optimization and sustainability ................................................................................................... 15
Communication mechanisms .............................................................................................................. 16
REST-based communication ....................................................................................................... 16
GraphQL-based communication .................................................................................................. 16
gRPC-based communication ....................................................................................................... 16
Asynchronous messaging and event passing ................................................................................ 16
Orchestration and state management ......................................................................................... 18
Observability ................................................................................................................................... 20
Monitoring ............................................................................................................................... 20
Centralizing logs ...................................................................................................................... 21
Distributed tracing .................................................................................................................... 22
Log analysis on AWS ................................................................................................................ 23
Other options for analysis ......................................................................................................... 24
Managing chattiness in microservices communication ........................................................................... 26
Using protocols and caching ...................................................................................................... 26
Auditing .......................................................................................................................................... 27
Resource inventory and change management .............................................................................. 27
Conclusion ....................................................................................................................................... 29
Contributors .................................................................................................................................... 30
Document history ............................................................................................................................. 31
Notices ............................................................................................................................................ 32
AWS Glossary .................................................................................................................................. 33
iii
Implementing Microservices on AWS AWS Whitepaper
Introduction
This whitepaper explores three popular microservices patterns: API driven, event driven, and data
streaming. We provide an overview of each approach, outline microservices' key features, address the
challenges in their development, and illustrate how Amazon Web Services (AWS) can help application
teams tackle these obstacles.
Considering the complex nature of topics like data store, asynchronous communication, and service
discovery, you are encouraged to weigh your application's specific needs and use cases alongside the
guidance provided when making architectural decisions.
Introduction
Microservices architectures combine successful and proven concepts from various fields, such as:
While microservices offer many benefits, it's vital to assess your use case's unique requirements and
associated costs. Monolithic architecture or alternative approaches may be more appropriate in some
cases. Deciding between microservices or monoliths should be made on a case-by-case basis, considering
factors like scale, complexity, and specific use cases.
We first explore a highly scalable, fault-tolerant microservices architecture (user interface, microservices
implementation, data store) and demonstrate how to build it on AWS using container technologies.
We then suggest AWS services to implement a typical serverless microservices architecture, reducing
operational complexity.
Lastly, we examine the overall system and discuss cross-service aspects of a microservices architecture,
such as distributed monitoring, logging, tracing, auditing, data consistency, and asynchronous
communication.
1
Implementing Microservices on AWS AWS Whitepaper
Are you Well-Architected?
This document focuses on workloads running in the AWS Cloud, excluding hybrid scenarios and
migration strategies. For information on migration strategies, refer to the Container Migration
Methodology whitepaper.
In the Serverless Application Lens, we focus on best practices for architecting your serverless applications
on AWS.
For more expert guidance and best practices for your cloud architecture—reference architecture
deployments, diagrams, and whitepapers—refer to the AWS Architecture Center.
Modernizing to microservices
Microservices are essentially small, independent units that make up an application. Transitioning from
traditional monolithic structures to microservices can follow various strategies.
2
Implementing Microservices on AWS AWS Whitepaper
User interface
User interface
Modern web applications often use JavaScript frameworks to develop single-page applications that
communicate with backend APIs. These APIs are typically built using Representational State Transfer
(REST) or RESTful APIs, or GraphQL APIs. Static web content can be served using Amazon Simple Storage
Service (Amazon S3) and Amazon CloudFront.
3
Implementing Microservices on AWS AWS Whitepaper
Microservices
Microservices
APIs are considered the front door of microservices, as they are the entry point for application logic.
Typically, RESTful web services API or GraphQL APIs are used. These APIs manage and process client
calls, handling functions such as traffic management, request filtering, routing, caching, authentication,
and authorization.
Microservices implementations
AWS offers building blocks to develop microservices, including Amazon ECS and Amazon EKS as the
choices for container orchestration engines and AWS Fargate and EC2 as hosting options. AWS Lambda is
another serverless way to build microservices on AWS. Choice between these hosting options depends on
the customer’s requirements to manage the underlying infrastructure.
AWS Lambda allows you to upload your code, automatically scaling and managing its execution with
high availability. This eliminates the need for infrastructure management, so you can move quickly and
focus on your business logic. Lambda supports multiple programming languages and can be triggered by
other AWS services or called directly from web or mobile applications.
Container-based applications have gained popularity due to portability, productivity, and efficiency. AWS
offers several services to build, deploy and manage containers.
• App2Container, a command line tool for migrating and modernizing Java and .NET web applications
into container format. AWS A2C analyzes and builds an inventory of applications running in bare
metal, virtual machines, Amazon Elastic Compute Cloud (EC2) instances, or in the cloud.
• Amazon Elastic Container Service (Amazon ECS) and Amazon Elastic Kubernetes Service (Amazon
EKS) manage your container infrastructure, making it easier to launch and maintain containerized
applications.
• Amazon EKS is a managed Kubernetes service to run Kubernetes in the AWS cloud and on-premises
data centers (Amazon EKS Anywhere). This extends cloud services into on-premises environments
for low-latency, local data processing, high data transfer costs, or data residency requirements (see
the whitepaper on "Running Hybrid Container Workloads With Amazon EKS Anywhere"). You can use
all the existing plug-ins and tooling from the Kubernetes community with EKS.
• Amazon Elastic Container Service (Amazon ECS) is a fully managed container orchestration service
that simplifies your deployment, management, and scaling of containerized applications. Customers
choose ECS for simplicity and deep integration with AWS services.
For further reading, see the blog Amazon ECS vs Amazon EKS: making sense of AWS container services.
• AWS App Runner is a fully managed container application service that lets you build, deploy, and run
containerized web applications and API services without prior infrastructure or container experience.
• AWS Fargate, a serverless compute engine, works with both Amazon ECS and Amazon EKS to
automatically manage compute resources for container applications.
• Amazon ECR is a fully managed container registry offering high-performance hosting, so you can
reliably deploy application images and artifacts anywhere.
4
Implementing Microservices on AWS AWS Whitepaper
Private networking
beyond the scope of this document. For more information, see the Practicing Continuous Integration and
Continuous Delivery on AWS whitepaper.
Private networking
AWS PrivateLink is a technology that enhances the security of microservices by allowing private
connections between your Virtual Private Cloud (VPC) and supported AWS services. It helps isolate and
secure microservices traffic, ensuring it never crosses the public internet. This is particularly useful for
complying with regulations like PCI or HIPAA.
Data store
The data store is used to persist data needed by the microservices. Popular stores for session data
are in-memory caches such as Memcached or Redis. AWS offers both technologies as part of the
managed Amazon ElastiCache service.
Putting a cache between application servers and a database is a common mechanism for reducing the
read load on the database, which, in turn, may allow resources to be used to support more writes. Caches
can also improve latency.
Relational databases are still very popular to store structured data and business objects. AWS offers six
database engines (Microsoft SQL Server, Oracle, MySQL, MariaDB, PostgreSQL, and Amazon Aurora) as
managed services through Amazon Relational Database Service (Amazon RDS).
Relational databases, however, are not designed for endless scale, which can make it difficult and time
intensive to apply techniques to support a high number of queries.
NoSQL databases have been designed to favor scalability, performance, and availability over the
consistency of relational databases. One important element of NoSQL databases is that they typically
don’t enforce a strict schema. Data is distributed over partitions that can be scaled horizontally and is
retrieved using partition keys.
Because individual microservices are designed to do one thing well, they typically have a simplified
data model that might be well suited to NoSQL persistence. It is important to understand that NoSQL
databases have different access patterns than relational databases. For example, it's not possible to
join tables. If this is necessary, the logic has to be implemented in the application. You can use Amazon
DynamoDB to create a database table that can store and retrieve any amount of data and serve any level
of request traffic. DynamoDB delivers single-digit millisecond performance, however, there are certain
use cases that require response times in microseconds. DynamoDB Accelerator (DAX) provides caching
capabilities for accessing data.
DynamoDB also offers an automatic scaling feature to dynamically adjust throughput capacity in
response to actual traffic. However, there are cases where capacity planning is difficult or not possible
because of large activity spikes of short duration in your application. For such situations, DynamoDB
provides an on-demand option, which offers simple pay-per-request pricing. DynamoDB on-demand is
capable of serving thousands of requests per second instantly without capacity planning.
For more information, see Distributed data management (p. 12) and How to Choose a Database.
Simplifying operations
To further simplify the operational efforts needed to run, maintain, and monitor microservices, we can
use a fully serverless architecture.
5
Implementing Microservices on AWS AWS Whitepaper
Deploying Lambda-based applications
Topics
• Deploying Lambda-based applications (p. 6)
• Abstracting multi-tenancy complexities (p. 7)
• API management (p. 7)
Using AWS CloudFormation and the AWS Serverless Application Model (AWS SAM), AWS Cloud
Development Kit (AWS CDK), or Terraform streamlines the process of defining serverless applications.
AWS SAM, natively supported by CloudFormation, offers a simplified syntax for specifying serverless
resources. AWS Lambda Layers help manage shared libraries across multiple Lambda functions,
minimizing function footprint, centralizing tenant-aware libraries, and improving the developer
experience. Lambda SnapStart for Java enhances startup performance for latency-sensitive applications.
Integration with tools like AWS Cloud9 IDE, AWS CodeBuild, AWS CodeDeploy, and AWS CodePipeline
streamlines authoring, testing, debugging, and deploying SAM-based applications.
The following diagram shows deploying AWS Serverless Application Model resources using
CloudFormation and AWS CI/CD tools.
6
Implementing Microservices on AWS AWS Whitepaper
Abstracting multi-tenancy complexities
However, they should not extend to encapsulating business logic due to the complexity and risk they
may introduce. A fundamental issue with shared libraries is the increased complexity surrounding
updates, making them more challenging to manage compared to standard code duplication. Thus, it's
essential to strike a balance between the use of shared libraries and duplication in the quest for the most
effective abstraction.
API management
Managing APIs can be time-consuming, especially when considering multiple versions, stages of the
development cycle, authorization, and other features like throttling and caching. Apart from API
Gateway, some customers also use ALB (Application Load Balancer) or NLB (Network Load Balancer)
for API management. Amazon API Gateway helps reduce the operational complexity of creating and
maintaining RESTful APIs. It allows you to create APIs programmatically, serves as a "front door" to
access data, business logic, or functionality from your backend services, Authorization and access control,
rate limiting, caching, monitoring, and traffic management and runs APIs without managing servers.
Figure 3 illustrates how API Gateway handles API calls and interacts with other components. Requests
from mobile devices, websites, or other backend services are routed to the closest CloudFront Point of
Presence (PoP) to reduce latency and provide an optimal user experience.
7
Implementing Microservices on AWS AWS Whitepaper
Microservices on serverless
technologies
Using microservices with serverless technologies can greatly decrease operational complexity. AWS
Lambda and AWS Fargate, integrated with API Gateway, allows for the creation of fully serverless
applications. As of April 7, 2023, Lambda functions can progressively stream response payloads back
to the client, enhancing performance for web and mobile applications. Prior to this, Lambda-based
applications using the traditional request-response invocation model had to fully generate and buffer
the response before returning it to the client, which could delay the time to first byte. With response
streaming, functions can send partial responses back to the client as they become ready, significantly
improving the time to first byte, which web and mobile applications are especially sensitive to.
Figure 4 demonstrates a serverless microservice architecture using AWS Lambda and managed services.
This serverless architecture mitigates the need to design for scale and high availability, and reduces the
effort needed for running and monitoring the underlying infrastructure.
Figure 5 displays a similar serverless implementation using containers with AWS Fargate, removing
concerns about underlying infrastructure. It also features Amazon Aurora Serverless, an on-demand,
auto-scaling database that automatically adjusts capacity based on your application's requirements.
8
Implementing Microservices on AWS AWS Whitepaper
9
Implementing Microservices on AWS AWS Whitepaper
Disaster recovery (DR)
Disaster recovery strategies for microservices should focus on downstream services that maintain the
application's state, such as file systems, databases, or queues. Organizations should plan for recovery
time objective (RTO) and recovery point objective (RPO). RTO is the maximum acceptable delay between
service interruption and restoration, while RPO is the maximum time since the last data recovery point.
For more on disaster recovery strategies, refer to the Disaster Recovery of Workloads on AWS: Recovery
in the Cloud whitepaper.
Amazon EKS ensures high availability by running Kubernetes control and data plane instances across
multiple Availability Zones. It automatically detects and replaces unhealthy control plane instances and
provides automated version upgrades and patching.
Amazon ECR uses Amazon Simple Storage Service (Amazon S3) for storage to make your container
images highly available and accessible. It works with Amazon EKS, Amazon ECS, and AWS Lambda,
simplifying development to production workflow.
Amazon ECS is a regional service that simplifies running containers in a highly available manner across
multiple Availability Zones within a Region, offering multiple scheduling strategies that place containers
for resource needs and availability requirements.
AWS Lambda operates in multiple Availability Zones, ensuring availability during service interruptions in
a single zone. If connecting your function to a VPC, specify subnets in multiple Availability Zones for high
availability.
10
Implementing Microservices on AWS AWS Whitepaper
• Code modification: Can you get the benefits without modifying code?
• Cross-VPC or cross-account traffic: If required, does your system need efficient management of
communication across different VPCs or AWS accounts?
• Deployment strategies: Does your system use or plan to use advanced deployment strategies such as
blue-green or canary deployments?
• Performance considerations: If your architecture frequently communicates with external services,
what will be the impact on overall performance?
AWS offers several methods for implementing service discovery in your microservices architecture:
• Amazon ECS Service Discovery: Amazon ECS supports service discovery using its DNS-based method
or by integrating with AWS Cloud Map (see ECS Service discovery). ECS Service Connect further
improves connection management, which can be especially beneficial for larger applications with
multiple interacting services.
• Amazon Route 53: Route 53 integrates with ECS and other AWS services, such as EKS, to facilitate
service discovery. In an ECS context, Route 53 can use the ECS Service Discovery feature, which
leverages the Auto Naming API to automatically register and deregister services.
• AWS Cloud Map: This option offers a dynamic API-based service discovery, which propagates changes
across your services.
For more advanced communication needs, AWS provides two service mesh options:
• Amazon VPC Lattice is an application networking service that consistently connects, monitors,
and secures communications between your services, helping to improve productivity so that your
developers can focus on building features that matter to your business. You can define policies for
network traffic management, access, and monitoring to connect compute services in a simplified and
consistent way across instances, containers, and serverless applications.
• AWS App Mesh: Based on the open-source Envoy proxy, App Mesh caters to advanced needs with
sophisticated routing, load balancing, and comprehensive reporting. Unlike Amazon VPC Lattice, App
Mesh does support the TCP protocol.
In case you're already using third-party software, such as HashiCorp Consul, or Netflix Eureka for
service discovery, you might prefer to continue using these as you migrate to AWS, enabling a smoother
transition.
The choice between these options should align with your specific needs. For simpler requirements, DNS-
based solutions like Amazon ECS or AWS Cloud Map might be sufficient. For more complex or larger
systems, service meshes like Amazon VPC Lattice or AWS App Mesh might be more suitable.
In conclusion, designing a microservices architecture on AWS is all about selecting the right tools to
meet your specific needs. By keeping in mind the considerations discussed, you can ensure you're making
informed decisions to optimize your system's service discovery and inter-service communication.
11
Implementing Microservices on AWS AWS Whitepaper
One such challenge arises from the trade-off between consistency and performance in distributed
systems. It's often more practical to accept slight delays in data updates (eventual consistency) than to
insist on instant updates (immediate consistency).
Sometimes, business operations require multiple microservices to work together. If one part fails, you
might have to undo some completed tasks. The Saga pattern helps manage this by coordinating a series
of compensating actions.
To help microservices stay in sync, a centralized data store can be used. This store, managed with
tools like AWS Lambda, AWS Step Functions, and Amazon EventBridge, can assist in cleaning up and
deduplicating data.
A common approach in managing changes across microservices is event sourcing. Every change in the
application is recorded as an event, creating a timeline of the system's state. This approach not only
helps debug and audit but also allows different parts of an application to react to the same events.
Event sourcing often works hand-in-hand with the Command Query Responsibility Segregation
(CQRS) pattern, which separates data modification and data querying into different modules for better
performance and security.
12
Implementing Microservices on AWS AWS Whitepaper
On AWS, you can implement these patterns using a combination of services. As you can see in Figure
7, Amazon Kinesis Data Streams can serve as your central event store, while Amazon S3 provides a
durable storage for all event records. AWS Lambda, Amazon DynamoDB, and Amazon API Gateway work
together to handle and process these events.
Remember, in distributed systems, events might be delivered multiple times due to retries, so it's
important to design your applications to handle this.
13
Implementing Microservices on AWS AWS Whitepaper
Secrets management
Configuration management
In a microservices architecture, each service interacts with various resources like databases, queues, and
other services. A consistent way to configure each service's connections and operating environment is
vital. Ideally, an application should adapt to new configurations without needing a restart. This approach
is part of the Twelve-Factor App principles, which recommend storing configurations in environment
variables.
A different approach is to use AWS App Config. It’s a feature of AWS Systems Manager which makes it
easy for customers to quickly and safely configure, validate, and deploy feature flags and application
configuration. Your feature flag and configurations data can be validated syntactically or semantically
in the pre-deployment phase, and can be monitored and automatically rolled back if an alarm that you
have configured is triggered. AppConfig can be integrated with Amazon ECS and Amazon EKS by using
the AWS AppConfig agent. The agent functions as a sidecar container running alongside your Amazon
ECS and Amazon EKS container applications. If you use AWS AppConfig feature flags or other dynamic
configuration data in a Lambda function, then we recommend that you add the AWS AppConfig Lambda
extension as a layer to your Lambda function.
GitOps is an innovative approach to configuration management that uses Git as the source of truth
for all configuration changes. This means that any changes made to your configuration files are
automatically tracked, versioned, and audited through Git.
Secrets management
Security is paramount, so credentials should not be passed in plain text. AWS offers secure services
for this, like AWS Systems Manager Parameter Store and AWS Secrets Manager. These tools can send
secrets to containers in Amazon EKS as volumes, or to Amazon ECS as environment variables. In
AWS Lambda, environment variables are made available to your code automatically. For Kubernetes
workflows, the External Secrets Operator fetches secrets directly from services like AWS Secrets Manager,
creating corresponding Kubernetes Secrets. This enables a seamless integration with Kubernetes-native
configurations.
14
Implementing Microservices on AWS AWS Whitepaper
Stateless components (services that store state in an external data store instead of a local data store)
in your architecture can make use of Amazon EC2 Spot Instances, which offer unused EC2 capacity in
the AWS cloud. These instances are more cost efficient than on-demand instances and are perfect for
workloads that can handle interruptions. This can further cut costs while maintaining high availability.
With isolated services, you can use cost-optimized compute options for each auto-scaling group. For
example, AWS Graviton offers cost-effective, high-performance compute options for workloads that suit
ARM-based instances.
Optimizing costs and resource usage also helps minimize environmental impact, aligning with the
Sustainability pillar of the Well-Architected Framework. You can monitor your progress in reducing
carbon emissions using the AWS Customer Carbon Footprint Tool. This tool provides insights into the
environmental impact of your AWS usage.
15
Implementing Microservices on AWS AWS Whitepaper
REST-based communication
Communication mechanisms
In the microservices paradigm, various components of an application need to communicate over
a network. Common approaches for this include REST-based, GraphQL-based, gRPC-based and
asynchronous messaging.
Topics
• REST-based communication (p. 16)
• GraphQL-based communication (p. 16)
• gRPC-based communication (p. 16)
• Asynchronous messaging and event passing (p. 16)
• Orchestration and state management (p. 18)
REST-based communication
The HTTP/S protocol, used broadly for synchronous communication between microservices, often
operates through RESTful APIs. AWS's API Gateway offers a streamlined way to build an API that serves
as a centralized access point to backend services, handling tasks like traffic management, authorization,
monitoring, and version control.
GraphQL-based communication
Similarly, GraphQL is a widespread method for synchronous communication, using the same protocols
as REST but limiting exposure to a single endpoint. With AWS AppSync, you can create and publish
GraphQL applications that interact with AWS services and datastores directly, or incorporate Lambda
functions for business logic.
gRPC-based communication
gRPC is a synchronous, lightweight, high performance, open-source RPC communication protocol.
gRPC improves upon its underlying protocols by using HTTP/2 and enabling more features such as
compression and stream prioritization. It uses Protobuf Interface Definition Language (IDL) which is
binary-encoded and thus takes advantage of HTTP/2 binary framing.
• Message Queues: A message queue acts as a buffer that decouples senders (producers) and receivers
(consumers) of messages. Producers enqueue messages into the queue, and consumers dequeue and
process them. This pattern is useful for asynchronous communication, load leveling, and handling
bursts of traffic.
16
Implementing Microservices on AWS AWS Whitepaper
Asynchronous messaging and event passing
To implement each of these message types, AWS offers various managed services such as Amazon SQS,
Amazon SNS, Amazon EventBridge, Amazon MQ, and Amazon MSK. These services have unique features
tailored to specific needs:
• Amazon Simple Queue Service (Amazon SQS) and Amazon Simple Notification Service (Amazon
SNS): As you can see in Figure 8, these two services complement each other, with Amazon SQS
providing a space for storing messages and Amazon SNS enabling delivery of messages to multiple
subscribers. They are effective when the same message needs to be delivered to multiple destinations.
Remember, the best service for you depends on your specific needs, so it's important to understand what
each one offers and how they align with your requirements.
17
Implementing Microservices on AWS AWS Whitepaper
Orchestration and state management
Step Functions provides a workflow engine to manage service orchestration complexities, such as
error handling and serialization. This allows you to scale and change applications quickly without
adding coordination code. Step Functions is part of the AWS serverless platform and supports Lambda
functions, Amazon EC2, Amazon EKS, Amazon ECS, SageMaker, AWS Glue, and more.
18
Implementing Microservices on AWS AWS Whitepaper
Orchestration and state management
Figure 9: An example of a microservices workflow with parallel and sequential steps invoked by AWS Step
Functions
AWS Managed Workflows for Apache Airflow (MWAA) is an alternative to Step Functions. You should use
Amazon MWAA if you prioritize open source and portability. Airflow has a large and active open-source
community that contributes new functionality and integrations regularly.
19
Implementing Microservices on AWS AWS Whitepaper
Monitoring
Observability
Since microservices architectures are inherently made up of many distributed components, observability
across all those components becomes critical. Amazon CloudWatch enables this, collecting and tracking
metrics, monitoring log files, and reacting to changes in your AWS environment. It can monitor AWS
resources and custom metrics generated by your applications and services.
Topics
• Monitoring (p. 20)
• Centralizing logs (p. 21)
• Distributed tracing (p. 22)
• Log analysis on AWS (p. 23)
• Other options for analysis (p. 24)
Monitoring
CloudWatch offers system-wide visibility into resource utilization, application performance, and
operational health. In a microservices architecture, custom metrics monitoring via CloudWatch is
beneficial, as developers can choose which metrics to collect. Dynamic scaling can also be based on these
custom metrics.
CloudWatch Container Insights extends this functionality, automatically collecting metrics for many
resources like CPU, memory, disk, and network. It helps in diagnosing container-related issues,
streamlining resolution.
20
Implementing Microservices on AWS AWS Whitepaper
Centralizing logs
Centralizing logs
Logging is key to pinpoint and resolve issues. With microservices, you can release more frequently and
experiment with new features. AWS provides services like Amazon S3, CloudWatch Logs, and Amazon
OpenSearch Service to centralize log files. Amazon EC2 uses a daemon for sending logs to CloudWatch,
while Lambda and Amazon ECS natively send their log output there. For Amazon EKS, either Fluent
Bit or Fluentd can be used to forward logs to CloudWatch for reporting using OpenSearch and Kibana.
However, due to the smaller footprint and performance advantages, Fluent Bit is recommended over
Fluentd.
Figure 12 illustrates how logs from various AWS services are directed to Amazon S3 and CloudWatch.
These centralized logs can be further analyzed using Amazon OpenSearch Service, inclusive of Kibana for
data visualization. Also, Amazon Athena can be employed for ad hoc queries against the logs stored in
Amazon S3.
21
Implementing Microservices on AWS AWS Whitepaper
Distributed tracing
Distributed tracing
Microservices often work together to handle requests. AWS X-Ray uses correlation IDs to track requests
across these services. X-Ray works with Amazon EC2, Amazon ECS, Lambda, and Elastic Beanstalk.
22
Implementing Microservices on AWS AWS Whitepaper
Log analysis on AWS
AWS Distro for OpenTelemetry is part of the OpenTelemetry project and provides open-source APIs
and agents to gather distributed traces and metrics, improving your application monitoring. It sends
metrics and traces to multiple AWS and partner monitoring solutions. By collecting metadata from your
AWS resources, it aligns application performance with the underlying infrastructure data, accelerating
problem-solving. Plus, it's compatible with a variety of AWS services and can be used on-premises.
23
Implementing Microservices on AWS AWS Whitepaper
Other options for analysis
CloudWatch Logs has the capability to stream log entries to Amazon Kinesis Data Firehose, a service
for delivering real-time streaming data. QuickSight then utilizes the data stored in Redshift for
comprehensive analysis, reporting, and visualization.
24
Implementing Microservices on AWS AWS Whitepaper
Other options for analysis
Figure 15: Log analysis with Amazon Redshift and Amazon QuickSight
Moreover, when logs are stored in S3 buckets, an object storage service, the data can be loaded into
services like Redshift or EMR, a cloud-based big data platform, allowing for thorough analysis of the
stored log data.
25
Implementing Microservices on AWS AWS Whitepaper
Using protocols and caching
Some key tools for managing chattiness are REST APIs, HTTP APIs and gRPC APIs. REST APIs offer
a range of advanced features such as API keys, per-client throttling, request validation, AWS WAF
integration, or private API endpoints. HTTP APIs are designed with minimal features and hence come at
a lower price. For more details on this topic and a list of core features that are available in REST APIs and
HTTP APIs, see Choosing between REST APIs and HTTP APIs.
Often, microservices use REST over HTTP for communication due to its widespread use. But in high-
volume situations, REST's overhead can cause performance issues. It’s because the communication uses
TCP handshake which is required for every new request. In such cases, gRPC API is a better choice. gRPC
reduces the latency as it allows multiple requests over a single TCP connection. gRPC also supports bi-
directional streaming, allowing clients and servers to send and receive messages at the same time. This
leads to more efficient communication, especially for large or real-time data transfers.
If chattiness persists despite choosing the right API type, it may be necessary to reevaluate your
microservices architecture. Consolidating services or revising your domain model could reduce chattiness
and improve efficiency.
26
Implementing Microservices on AWS AWS Whitepaper
Resource inventory and change management
Auditing
In a microservices architecture, it's crucial to have visibility into user actions across all services. AWS
provides tools like AWS CloudTrail, which logs all API calls made in AWS, and AWS CloudWatch, which
is used to capture application logs. This allows you to track changes and analyze behavior across your
microservices. Amazon EventBridge can react to system changes quickly, notifying the right people or
even automatically starting workflows to resolve issues.
For instance, if an API Gateway configuration in a microservice is altered to accept inbound HTTP traffic
instead of only HTTPS requests, a predefined AWS Config rule can detect this security violation. It logs
the change for auditing and triggers an SNS notification, restoring the compliant state.
27
Implementing Microservices on AWS AWS Whitepaper
Resource inventory and change management
28
Implementing Microservices on AWS AWS Whitepaper
Conclusion
Microservices architecture, a versatile design approach that provides an alternative to traditional
monolithic systems, assists in scaling applications, boosting development speed, and fostering
organizational growth. With its adaptability, it can be implemented using containers, serverless
approaches, or a blend of the two, tailoring to specific needs.
However, it's not a one-size-fits-all solution. Each use case requires meticulous evaluation given
the potential increase in architectural complexity and operational demands. But when approached
strategically, the benefits of microservices can significantly outweigh these challenges. The key is in
proactive planning, especially in areas of observability, security, and change management.
It's also important to note that beyond microservices, there are entirely different architectural
frameworks like Generative AI architectures such as Retrieval Augmented Generation (RAG), providing a
range of options to best fit your needs.
AWS, with its robust suite of managed services, empowers teams to build efficient microservices
architectures and effectively minimize complexity. This whitepaper has aimed to guide you through
the relevant AWS services and the implementation of key patterns. The goal is to equip you with the
knowledge to harness the power of microservices on AWS, enabling you to capitalize on their benefits
and transform your application development journey.
29
Implementing Microservices on AWS AWS Whitepaper
Contributors
The following individuals and organizations contributed to this document:
30
Implementing Microservices on AWS AWS Whitepaper
Document history
To be notified about updates to this whitepaper, subscribe to the RSS feed.
Major update (p. 31) Added information about AWS July 31, 2023
Customer Carbon Footprint
Tool, Amazon EventBridge,
AWS AppSync (GraphQL),
AWS Lambda Layers, Lambda
SnapStart, Large Language
Models (LLMs), Amazon
Managed Streaming for Apache
Kafka (MSK), Amazon Managed
Workflows for Apache Airflow
(MWAA), Amazon VPC Lattice,
AWS AppConfig. Added separate
section on cost optimization and
sustainability.
Minor updates (p. 31) Adjusted page layout April 30, 2021
Note
To subscribe to RSS updates, you must have an RSS plug-in enabled for the browser you are
using.
31
Implementing Microservices on AWS AWS Whitepaper
Notices
Customers are responsible for making their own independent assessment of the information in this
document. This document: (a) is for informational purposes only, (b) represents current AWS product
offerings and practices, which are subject to change without notice, and (c) does not create any
commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services
are provided “as is” without warranties, representations, or conditions of any kind, whether express or
implied. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements,
and this document is not part of, nor does it modify, any agreement between AWS and its customers.
32
Implementing Microservices on AWS AWS Whitepaper
AWS Glossary
For the latest AWS terminology, see the AWS glossary in the AWS Glossary Reference.
33
Cloud Practitioner (CLF-C02) Preparation Plan
Approach the exam day with confidence
Summary .................................................................................................................................................................... 3
Introduction................................................................................................................................................................ 3
The purpose of this article is to recommend the steps you can follow to get ready for the AWS Certified
Cloud Practitioner exam day.
Introduction
The AWS Certified Cloud Practitioner (CLF-C02) exam is intended for individuals who can effectively
demonstrate overall knowledge of the AWS Cloud, independent of a specific job role. The exam validates a
candidate’s ability to:
• Explain the value of the AWS Cloud.
• Understand and explain the AWS shared responsibility model.
• Understand security best practices.
• Understand AWS Cloud costs, economics, and billing practices.
• Describe and position the core AWS services, including compute, network, database, and storage
services.
• Identify AWS services for common use cases.
Prepare a schedule
To be efficient and effective with studying, you need to optimize the conditions under which you focus
best; this can include both the time of day that you study, as well the environment in which you study. Pay
attention when studying at different times and in different environments to figure out when and where
you are the most productive, and study under the conditions that work best for you.
Testing format
AWS certification dumps aren’t an effective tool when taking you exam. This is because each time a student
takes the exam, the questions that they answer are picked at random from a pool of over 500 questions.
Even if you could find a dump that covers all of them, you’d still have to memorize over 500 possible
answers
There is no true substitute for experience. AWS recommends that you have specific hands-on experience
that covers the competencies, domains, and objectives in the content outline for each exam. In addition,
In the exam
A 30-minute exam extension is available upon request to non-native English speakers when taking an exam
in English. The accommodation, “ESL +30,” only needs to be requested once, prior to registering for an
exam. It will apply to all future exam registrations with all test delivery providers.
Preparation method
The advices about the preparation methods for the certification exam here is true and can be quite
effective for many individuals.
Maintaining motivation throughout your exam preparation is crucial. Certification exams can be
challenging, and having the drive to keep learning and studying will help you stay on track and achieve
better results.
Engaging in discussions with study partners can indeed help you understand exam objectives better.
Through conversations, you can gain new insights, uncover knowledge gaps, and reinforce your
understanding of exam topics.
Studying with others can be beneficial as it allows for collaborative learning. You can discuss concepts,
share different perspectives, and clarify doubts together. Additionally, group study can help keep you
accountable to a study schedule. Studying with a small group of like-minded individuals who are also
motivated to excel in the exam can ensure focused and productive discussions. This way, you can avoid
distractions and maintain a study environment that fosters learning.
Practicing with sample questions and exams is an excellent way to assess your knowledge and readiness for
the actual certification exam. Working on questions selected by someone else can expose you to different
question styles and topics, preparing you for a broader range of challenges.
While these methods can be highly effective for some learners, it's essential to recognize that different
individuals have varying learning styles. Some people prefer self-study and independent learning, while
others thrive in group settings. It's essential to experiment with different methods and find what works
best for you personally.
Remember that exam preparation should also include a balanced approach that covers understanding
concepts, hands-on practice with AWS services (if applicable), reviewing official documentation, and using
reputable study materials. Additionally, setting clear goals and a study schedule can help you stay
organized and focused during your exam preparation journey.
In the ever-evolving landscape of cloud technology, staying up-to-date with the latest tools and techniques
is crucial. Whether you’re an experienced developer, a cloud newbie, or someone looking to pivot into the
tech industry, acquiring the right skills can set you apart. That’s where the AWS Skill Builder comes into
play—a comprehensive learning platform designed to help you master Amazon Web Services (AWS).
AWS Skill Builder is a repository of over 700 training lessons to help you learn AWS and refine your
knowledge of AWS services and improve your skills so you can put them into practice or apply the
knowledge during the many AWS certifications.
Free digital training on AWS Skill Builder offers 700+ on-demand courses and learning plans so you can
build the skills you need, your way. Want to build problem-solving cloud skills in an interactive, engaging
experience? A Skill Builder subscription offers access to self-paced labs, practice exams, role-based games,
and real-world challenges to accelerate your learning.
Log into AWS Skill Builder. Some of the materials referenced below require digital subscriptions.
The first step is getting to know the exam and exam-style questions.
1. Review the exam details page and exam guide (linked from exam details page) to understand who
should take the exam and what is tested on the exam.
The questions are created following the same process as questions you will see on the actual Certification
exams. They include detailed feedback and recommended resources to help you prepare for your exam.
1. Take Exam Prep Enhanced Course: AWS Certified Cloud Practitioner (CLF-C02) to understand what
is tested on the exam and to review exam-style questions.
1. Take AWS Certified Cloud Practitioner Official Practice Exam (CLF-C02 - English).
Each practice exam includes the same number of questions as the actual exam. They provide practice with
the same question style, depth, and rigor as the certification exam. They include exam-style scoring and a
pass/fail. You’ll also receive feedback on the answer choices for each question with recommended
resources to deepen your understanding of key topics.
You can determine if you want to simulate the exam experience by taking a timed exam with answers only
shown at the end. Or, you can choose other options, like un-timed, and with answers shown after
submitting each question.
Contents
03 Introduction
16 Conclusion
2
Migrate and Optimize: Your Cloud Adoption Roadmap
Introduction
Your Azure migration journey is fundamental This roadmap guide provides a catalog of
to optimizing your workloads and costs. resources to assist both customers who
To help you build your specific migration and have signed a Microsoft Unified Agreement
optimization roadmap, Microsoft offers proven (Unified Contract customers) and those using
guidance, tools, and frameworks to guide you unmanaged disks (Unmanaged customers)
from initial strategy, planning, readiness, and in their cloud migration and workload
migration to ongoing innovation, management, optimization efforts at any stage in their
and organizational alignment. journey. It includes all relevant resources and
channels of support available for each step and
phase of the journey. Some are self-serve, while
Once you’ve successfully Microsoft and its partners assist with others.
migrated to the cloud, you
can begin turning ideas into This end-to-end migration
game-changing business roadmap has four phases.
impacts by taking advantage Each phase involves multiple
of Azure tools and resources action steps.
to help you:
Unmanaged customers can access Azure
customer enablement resources, a free library
1 Accelerate innovation and go to
of online materials, tools, and resources
market faster with differentiating
services and experiences. designed to help customers get started, build,
deploy applications, and optimize workloads
using Azure services. These resources are
2 Streamline productivity and control available to all Azure customers, including
with a unified platform that simplifies documentation, training videos, webinars,
complex IT management.
forums, and other self-paced learning materials.
The migration
journey roadmap
Plan and implement an adoption journey
that’s tailored to your needs.
Optimize and
improve your
workloads
Keep your workloads
Develop your optimized using
cloud migration best practices from
strategy the operational
excellence pillar of
Define a clear path
the Well-Architected
forward with proven
Framework.
guidance, best practices,
programs, and learning
resources.
4
Migrate and Optimize: Your Cloud Adoption Roadmap
01
Develop a cloud
migration strategy
Cloud adoption is a means to an end. It begins Understand your business
when business and IT decision-makers realize motivations
the cloud can accelerate specific business Understanding and evaluating your
transformation goals. A solid cloud strategy motivation for moving to the cloud
sets your initiative on the right footing to build contributes to a successful business outcome.
your business and technical migration plan Many organizations want to save cost,
against well-defined expectations and business scale growth, and onboard new technical
outcomes, taking into consideration the capabilities—several motivations can apply
trade-offs and your initial migration approach. simultaneously. They help you pinpoint
your strategic migration goals and shape
Microsoft helps you define a decisions your cloud adoption and workload
clear path forward with proven optimization team may make in the future.
guidance. With a collection of best Your team should meet with stakeholders,
practices, programs, and learning executives, and business leaders to discuss
resources, you can build expertise, which motivations are driving your business’s
create your cloud migration cloud adoption.
strategy, and achieve ongoing
cloud value through workload
optimization.
5
Migrate and Optimize: Your Cloud Adoption Roadmap
6
Migrate and Optimize: Your Cloud Adoption Roadmap
Migrate or modernize?
Each workload will require the decision of
whether to migrate (rehost) or modernize
(re-platform) your existing application.
The answer will likely depend on the type
of application or workload and your business
Resources for Unified Contract customers
goals for moving it to the cloud.
Onboard: SQL Server Migration Readiness
It’s generally faster and less expensive Resources for all customers
to migrate an existing application, but that Get started with the Cloud Adoption
adoption approach doesn’t take advantage Framework for Azure
of opportunities to innovate in the cloud. Understand cloud operating models
Consider a migration approach if the
Cloud migration in the Cloud Adoption
source code is likely to remain stable and
the workload currently supports business Migration scenarios
processes and will continue to do so.
7
Migrate and Optimize: Your Cloud Adoption Roadmap
Asset-driven A plan based on the assets that support an application for migration.
approach. In this approach, you pull statistical usage data from a configuration
management database (CMDB) or other infrastructure assessment
tools. This approach usually assumes an IaaS model of deployment
as a baseline.
8
Migrate and Optimize: Your Cloud Adoption Roadmap
2 Organizational alignment
Cloud adoption impacts all aspects of your business, IT, and corporate culture. Use this
guidance to get your people ready for cloud adoption with a cloud adoption and cloud
governance team.
9
Migrate and Optimize: Your Cloud Adoption Roadmap
03
Implement your migration
When it’s time to deploy your cloud workload,
the Azure guidance continues with ready-made,
infrastructure-as-code environments for hosting
your workloads, called Azure landing zones.
These conceptual architectures are hosting
environments pre-provisioned with foundational
capabilities that account for scale, security
governance, networking, and identity.
10
Migrate and Optimize: Your Cloud Adoption Roadmap
11
Migrate and Optimize: Your Cloud Adoption Roadmap
Reliability When you build for reliability in the cloud, you help ensure a highly
available architecture as well as recovery from failures such as data
loss, major downtime, or ransomware incidents.
12
Migrate and Optimize: Your Cloud Adoption Roadmap
Azure
Well-Architected
Review
Well-Architected Reliability
recommendations Azure
Advisor
process Cost
optimization
Azure
Well-Architected Operational
excellence
Framework
Performance
efficiency
Architecture
Documentation
Center
Security
Partners
support and
service offers
13
Migrate and Optimize: Your Cloud Adoption Roadmap
04
Continuously optimize and
improve your workloads
To keep your workloads performing and The tool generates recommendations
optimized for your purposes after deployment, through a guided assessment and can pull
the Well-Architected Framework operational in Azure Advisor recommendations based
excellence pillar offers best practices for on your Azure subscription or resource
monitoring and diagnostic. group. The review helps establish a baseline
across the Well-Architected pillars to
Know the potential impact of your monitor improvements.
workload design decisions by using the
Azure Well-Architected Framework Review The operational excellence pillar of the
assessment to get recommendations for Well-Architected Framework covers the
optimizing your workload types, such as IoT, processes for reliable deployment and
SAP, data services, or machine learning. keeping your workloads running predictably
in production.
Access Integrate
14
Migrate and Optimize: Your Cloud Adoption Roadmap
15
Migrate and Optimize: Your Cloud Adoption Roadmap
Conclusion
Microsoft guidance and learning resources help you maximize
your cloud investments to achieve continuous innovation
and workload optimization using advanced technologies like
AI, data analytics, machine learning, and more. Following a
successful migration, you can rapidly develop new ways
of driving business growth in your unified and secure
cloud environments.
Disclaimer
This document is provided “as-is.” Information and views expressed in this document, including URL and other Internet Web site references, may change without
notice. You bear the risk of using it. Examples herein may be for illustration only and if so are fictitious. No real association is intended or inferred.
This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your
internal, reference purposes.
This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.
Microsoft assumes no responsibility for any errors, omissions, or inaccuracies in this document or for any actions taken by the reader based on the information
provided. The information provided in this document should not be considered as legal, financial, or professional advice, and should not be relied upon as such.
Cloud Computing
Practical Topics – Platforms, Applications
and Best Practices
Stuttgart University
WS 2023/24
07-11-2023
1
Recap
2
A Definition of Cloud Computing
National Institute of Standards 2011
http://csrc.nist.gov/groups/SNS/cloud-computing/
3
New digital services require new infrastructure, architectures,
processes and tools
Hybrid Clouds
Systems of Record Containers Systems of Engagement
4
A Short Tour of a Commercial
Cloud (AWS)
5
Source: Awsome Day 2019 Detroit Slideshare
6
6
Source: Awsome Day Detroit Slideshare, early 2019
7
7
Amazon’s focus on ‘builders’ and business
• “The broadest and deepest platform for today’s builders” (W. Vogels, CTO, AWS
Summit NYC July 2019)
– ”We provide a toolbox, you pick”
– Using most functions themselves to run and grow their own digital business
• “Enabling digital transformation” (A. Jassy, then AWS CEO, 2020)
• These messages were reinforced in the following years
– Builder Community Hub 2023
• Agility, DevOps pipeline, operational principles
– ‘Well-architected Framework’
• Toolkits, IDEs, Integration with e.g. Visual Studio
• Microservices, containers, serverless across the stack
– “You write the business logic, we do the heavy (infrastructure) lifting”
• Automation
• Security
• AI/Machine Learning (first separate keynote in 2020)
• Significant and growing investment in IoT
8
8
Some Basic Amazon (IaaS) Services
• EC2 – Elastic Compute Cloud,
– Consists of virtual machines (called EC2 instances) launched from
Amazon Machine Images (AMIs)
• AMI – Amazon Machine Image
– Templates for building EC2 instances
• VPC - Virtual Private Cloud
– Private network that isolates your resources
• EBS – Elastic Block Storage
– Persistent Storage Volumes that can be attached to an EC2 instance
• S3 – Simple Storage Service
– Object Store Service
• These services plus Relational Database Services are
sometimes called Foundational Services by AWS.
9
9
AWS Learner Lab
10
10
Cloud Services in the Learner Lab
• Most AWS services are available, sometimes with capacity
restrictions (see documentation)
• Limitation in using IAM, use the preconfigured role (LabRole)
rather than creating new roles
• You can create your own key pairs, but a digital key (vockey) is
also already available in the preconfigured terminal.
• The Learner Lab comes with $100 credits, so stop or
terminate your resources if you are not using them anymore.
Some resources (like EC2 instances) are automatically
restarted when starting a lab.
11
11
A Short Tour of AWS Services
• The AWS Console
• ‘The biggest toolbox for builders’ - AWS Services
• Some important examples of AWS services
– Virtual Machines (EC2)
– Object Storage (S3)
– NoSQL Database (DynamoDB)
– Simple Notifications (SNS)
– Serverless Functions (Lambda)
• The Market Place
Other Clouds are similar, we will introduce them later!
12
12
Hosting a Simple Web Application
13
• The security group controls the movement of data between your AWS resources
and the big, bad internet beyond
• The EC2 Amazon Machine Image (AMI) acts as a template for replicating precise
OS environments
• The Simple Storage Service (S3) bucket stores and delivers data for both backup
and delivery to users
David Clinton. Learn Amazon Web Services in a Month of Lunches (Kindle Locations
502-506). Manning Publications. Kindle Edition.
13
Storage and Databases
• Object Storage – S3
– Objects have urls, are stored in ‘buckets’
– Objects can be versioned
– Objects can be arranged in ‘folders’
– Buckets can host static websites
• Block Storage – EBS
– Virtual Disks for Virtual Machines
• Relational Database Services (RDS)
– Managed service with many option
• NoSQL DB (DynamoDB)
– Managed Service
– Key Value Store (Tables)
• Streams - Kinesis
14
14
Cloud Networking
• A VPC can span availability zones
• A subnet is restricted to a single one
• Some services require 2 subnets
in different zones for enhanced
availability
• A route table determines where
traffic is routed
• Internet gateways and virtual private
gateways
• Security groups control traffic
(firewall rules)
A VPC lets you provision a logically isolated section of the Amazon Web Services
(AWS) cloud where you can launch AWS resources in a virtual network that you
define
15
Availability, Scaling, Security
16
Integration Services
• Simple Notification Service – SNS
– Topic based publish and subscribe service
– Many services are able to publish or subscribe to a topic
• Simple Queuing Service – SQS
– At-least-once delivery
• Step Functions
– Basic Workflow
• Lambda Functions
– Trigger-based event processing
– Serverless computing (we will talk more about this)
– More than 140 integrations 17
17
A Modern Cloud Application
(AWS)
Events
Dynamic Content
DynamoDB
Longer-Running
Processes
S3
Static Content
API-driven
Monitoring from the start (CloudWatch)
Different styles of Compute (but always ‘small’ services)
18
Exercise
• Activate the AWS Academy Learner Lab
• Familiarize yourself with the console
• Launch an EC2 instance and connect to it
• Reboot, stop and terminate the instance
• Create an S3 bucket and upload an object
19
19
Cloud Design Principles and Best
Practices Frameworks
20
20
Designing Successful Cloud
Services
• Cloud Architecture Styles
• Cloud Native Applications
• 12 Factor App
• Azure Cloud Design Guidelines
• Amazon Well Architected Framework
• Google Tips for Building Reliable Services
21
21
Choices for Building a Cloud
Application
22
The Spectrum of Cloud Services
Cloud Enabled Cloud Centric/Cloud Native
Workloads Workloads
Scalable Elastic
Virtualized/Containerized Microservices/Containerized/Functions
23
23
23
Cloud Service Level Agreements (SLAs)
Example: AWS
https://aws.amazon.com/legal/service-level-agreements/
• For EC2, “AWS will use commercially reasonable efforts to make Amazon
EC2 available for each AWS region with a Monthly Uptime Percentage of at
least 99.99%, in each case during any monthly billing cycle (the “Region-
Level SLA”). In the event Amazon EC2 does not meet the Region-Level SLA,
you will be eligible to receive a Service Credit as described below.”
– Less than 4.5 minutes downtime (of service) per month
• For an individual EC2 instance, the service commitment is 99.5% hourly
uptime (Last Update May 2022).
– Less than 3.65 hours downtime per month
• For S3, the service commitment is 99.9%
• For Container Services, a credit of 10% is given if availability is less than
99.99 and more than 99.0 %
• For higher level services, availability is often 99.9%
– Less than 45 minutes downtime per month
24
24
A Modern Cloud Application
(AWS)
Events
Dynamic Content
DynamoDB
Longer-Running
Processes
VMs
Containers
S3
Static Content
API-driven
Monitoring from the start (CloudWatch)
Different styles of Compute (but always ‘small’ services)
25
Cloud Architecture Styles
(from: Microsoft Cloud Application Architecture Guide)
• N-Tier
• Web-Queue-Worker
• Microservices
• CQRS – Command and Query Responsibility Separation
• Event driven
• Big data
• Big compute
26
26
Cloud Architecture Styles
N-Tier:
• Traditional Enterprise Architecture
• Separation of concerns into layers (e.g. presentation, logic, data)
• Often rigid and monolithic, first step in migration to cloud native
• Usually based on virtual machines (like the web server example)
• Containerize for movement across environments
Web-Queue-Worker:
• Frontend handling client requests
• Message queue to backend (asynchronous)
• Backend handles heavy load (can be coordinator-worker(s)
configuration)
27
27
Web-Queue-Worker Environment
Example: Amazon Elastic Beanstalk
28
28
AWS Elastic Beanstalk
29
29
Cloud Architecture Styles
Microservices:
• Evolution of service-oriented architectures
• Can be next step from n-tier ‘monoliths’
• Break up large applications and containerize, enable parallel
development, scaling and separation of concerns
• Use API gateway to decouple clients from the services
themselves (gateway routes API requests to the appropriate
service). Avoid tight coupling of services. Store data with service.
Keep domain knowledge out of gateway
• My view: Microservices need to publish contracts or SLAs
• https://xebialabs.com/assets/files/whitepapers/exploring-
microservices-questions.pdf
30
30
• The microservices perform distinct business functions
• ‘Enabling services’ support communication between microservices
31
Note: The left side shows something similar to the WordPress service we built. The
right-hand side separates out the various backends and allows for distinctive services
from third parties (like Shopify for e-commerce). Also separate scaling properties etc.
31
Cloud Architecture Styles
CQRS – Command and Query Responsibility Separation:
• Read and write workloads are often asymmetrical, with very different
performance and scale requirements.
• CQRS separates reads and writes into separate models, using commands to
update data, and queries to read data.
• Commands should be task based, rather than data centric.
– (“Book hotel room,” not “set ReservationStatus to Reserved.”)
• Commands may be placed on a queue for asynchronous processing, rather than
being processed synchronously.
• Queries never modify the database. A query returns a Data Transfer Object
(DTO) that does not encapsulate any domain
• Allows independent scaling. Really requires messaging. Can be useful in
microservices, where no direct access to another service’s data store is allowed.
• Example: e-Commerce (Catalog vs. Transaction). Read Replicas enable this
Style. 32
32
Read replicas for read-heavy databases
• Replication to read
replicas is done
asynchronously
• Read replicas can be
promoted to master, for
instance in case the
source instance fails
Example: AWS
33
33
Cloud Architecture Styles
Event driven:
• Pub/sub or streaming.
• Decoupling of producers and consumers. Real-time processing,
high volume/high velocity, pattern matching and complex events
• Challenges: Guaranteed delivery, preservation of order
• Examples: Function as a Service (e.g. Lambda), Kinesis Streams,
Kafka
34
34
Event-Driven Architectures
Source: AWS 35
35
Cloud Architecture Styles
Big data:
• Too big to be handled by traditional data bases
• Performance through parallelism (sharding of data,
coordinator/worker pattern)
• Distributed data store
• Partitioned data
• Use cases: Batch, Machine Learning (Model Training), Complex
Analytics
36
36
Cloud Architecture Styles
37
Example: Map-Reduce
37
Cloud Native Architectures
• Cloud Native Foundation
• Characteristics of Cloud Native Services
• Eventual Consistency
38
38
Cloud Native Definition
(Cloud Native Foundation)
• Cloud native technologies empower organizations to build and run
scalable applications in modern, dynamic environments such as public,
private, and hybrid clouds. Containers, service meshes, microservices,
immutable infrastructure, and declarative APIs exemplify this approach.
• These techniques enable loosely coupled systems that are resilient,
manageable, and observable. Combined with robust automation, they
allow engineers to make high-impact changes frequently and predictably
with minimal toil.
• The Cloud Native Computing Foundation seeks to drive adoption of this
paradigm by fostering and sustaining an ecosystem of open source,
vendor-neutral projects. We democratize state-of-the-art patterns to make
these innovations accessible for everyone.
https://github.com/cncf/toc/blob/master/DEFINITION.md
39
Definition starts with a goal, and gives examples of technologies to achieve the goal
39
Cloud Native Foundation
https://www.cncf.io/
• Vendor neutral open source software foundation
• Fosters collaboration between developers, end-users and vendors
• Runs software projects, certification and education programs
• 18 platinum members, including Amazon, IBM, Google, Red Hat,
Microsoft, Alibaba, Huawei, Oracle, Cisco, Intel, SAP…
• Hundreds of other members
• 6 graduated projects, including:
– Kubernetes (Container Management)
– Prometheus (Monitoring)
– Envoy (Service Proxy)
– fluentd (logging)
• About 20 incubating projects, including Helm (Packaging), Jaeger
(Distributed Tracing), NATS (Messaging)
40
40
https://github.com/cncf/lan
dscape/blob/master/READ
ME.md#trail-map
41
https://github.com/cncf/landscape/blob/master/README.md#trail-map
41
Cloud native services are loosely coupled and
eventually consistent
• The concept of loose coupling goes back to principles for building robust
distributed systems
– Services are assumed to be autonomous
– No tight dependencies (if one service fails, others can still operate)
– Services communicate through clearly defined interfaces, preferably asynchronously
– Everything is a service
§ A service will invoke many other (distributed) services
§ Resources are cheap, plentiful, and not very reliable (commodity parts, where
possible)
§ Don’t try to prevent failure – fail fast and recover
§ Don’t debug and repair – kill and restart
§ Single points of failure are to be avoided through replication of resources
(services)
– Data is always stored in multiple copies
42
Multiple copies that are not updated synchronously imply the risk of accessing a copy
that is not up to date.
42
Cloud native services are loosely coupled and
eventually consistent
§ Eventual Consistency for data handling & replication: sometimes the data
storage service or database service will return the wrong answer, but eventually
the answer is going to be correct
§ This is due to replication for high availability
§ Message queues will deliver messages at least once (but can deliver multiple
times)
§ Scale is achieved through parallel asynchronous execution
§ Avoid synchronization overhead, stateless execution servers (store state/session
information outside the application)
§ Applications need to tolerate redundant execution (idempotency)
43
Multiple copies that are not updated synchronously imply the risk of accessing a copy
that is not up to date.
43
Eventual Consistency
• Eric Brewer’s CAP Theorem
– Of 3 properties of a shared data system (consistency, availability, tolerance to
network partitioning/failure), only 2 can be achieved simultaneously
• Strategies for availability all depend on data replication to multiple copies
– Quorum approaches with N= Number of Replicas, R = Read Quorum, W= Write
Quorum guarantees consistency if R + W > N (overlap of read and write sets)
– Systems focusing on fault tolerance often use N=3, W=R=2
• Other requirements (e.g. high load) require large N. If few writes, often R=1 to ensure a
read is available if at least one node operates
• To minimize likelihood of lost writes, choose W>1
• Very large distributed systems have to live with network partitioning
• If read and write set don’t overlap, we cannot achieve strong consistency, but this is
often combined with a ‘lazy’ update approach to eventually update all nodes
– Good example: Shopping cart
– Amazon shopping cart prioritizes availability for write
• Other considerations: Failure detection
44
From Vogels:
He presented the CAP theorem, which states
that of three properties of shared-data systems—data
consistency, system availability, and tolerance to network
partition—only two can be achieved at any given time. A
more formal confirmation can be found in a 2002 paper
by Seth Gilbert and Nancy Lynch.4
44
Suggested Reading
• Werner Vogels, Eventually consistent, Comm.
ACM, Jan 2009
45
Examples of eventual consistency
46
46
Cloud native services move from ACID
to BASE
• ACID – predictive and accurate
– Atomic
– Consistent
– Isolated
– Durable
47
Characteristics of modern applications –
it’s not just the technology that counts
https://aws.amazon.com/modern-apps/
Built for change, in small (co-operating) parts. Think not just about the app, but also
it’s lifecycle. Offload pain as much as possible.
48
A Modern Application in AWS
Events
Dynamic Content
DynamoDB
Longer-Running
Processes
S3
Static Content
API-driven
Monitoring from the start (CloudWatch)
Different styles of Compute (but always ‘small’ services)
49
Designing Successful Cloud
Services – Part 2
• Cloud Architecture Styles
• Cloud Native Applications
• 12 Factor App
• Azure Cloud Design Guidelines
• Amazon Well Architected Framework
• Google Tips for Building Reliable Services
51
51
The Twelve Factors for aaS Applications
(12factor.net, Adam Wiggins)
“The twelve-factor app is a methodology for building software-as-a-service apps that:
•Use declarative formats for setup automation, to minimize time and cost for new
developers joining the project;
•Have a clean contract with the underlying operating system, offering maximum portability
between execution environments;
•Are suitable for deployment on modern cloud platforms, obviating the need for servers
and systems administration;
•Minimize divergence between development and production, enabling continuous
deployment for maximum agility;
•And can scale up without significant changes to tooling, architecture, or development
practices.
The twelve-factor methodology can be applied to apps written in any programming
language, and which use any combination of backing services (database, queue, memory
cache, etc).”
(Quote from 12factor.net web site)
54
54
The Twelve Factors for aaS Applications
I. Codebase
One codebase tracked in revision control, many deploys
II. Dependencies
Explicitly declare and isolate dependencies
III. Config
Store config in the environment
IV. Backing Services
Treat backing services as attached resources
V. Build, release, run
Strictly separate build and run stages
VI. Processes
Execute the app as one or more stateless processes
VII. Port binding
Export services via port binding (no runtime injection, listening to requests on a port)
55
Backing Services:
attached resources, accessed via a URL or other locator/credentials stored in
the config.
Port Binding:
The twelve-factor app is completely self-contained and does not rely on runtime
injection of a webserver into the execution environment to create a web-facing
service. The web app exports HTTP as a service by binding to a port, and listening to
requests coming in on that port.
Processes:
Share-nothing, horizontally scalable
55
The Twelve Factors for aaS Applications…..
VIII. Concurrency
Scale out via the process model (share-nothing, horizontally partionable)
IX. Disposability
Maximize robustness with fast startup and graceful shutdown
X. Dev/prod parity
Keep development, staging, and production as similar as possible
XI. Logs
Treat logs as event streams
XII. Admin processes
Run admin/management tasks as one-off processes
56
Concurrency:
Processes in the twelve-factor app take strong cues from the unix process model for
running service daemons. Using this model, the developer can architect their app to
handle diverse workloads by assigning each type of work to a process type.
The share-nothing, horizontally partitionable nature of twelve-factor app processes
56
Azure Design Principles for Cloud Services
57
Azure Design Principles for Cloud Services
58
AWS Guiding Principles for ‘Well-Architected’ Cloud
Services
Stop guessing your capacity needs: Eliminate guessing about your infrastructure capacity needs. When
you make a capacity decision before you deploy a system, you might end up sitting on expensive idle
resources or dealing with the performance implications of limited capacity. With cloud computing, these
problems can go away. You can use as much or as little capacity as you need, and scale up and down
automatically.
Test systems at production scale: In the cloud, you can create a production-scale test environment on
demand, complete your testing, and then decommission the resources. Because you only pay for the test
environment when it's running, you can simulate your live environment for a fraction of the cost of testing on
premises.
Automate to make architectural experimentation easier: Automation allows you to create and replicate
your systems at low cost and avoid the expense of manual effort. You can track changes to your automation,
audit the impact, and revert to previous parameters when necessary.
Allow for evolutionary architectures: Allow for evolutionary architectures. In a traditional environment,
architectural decisions are often implemented as static, one-time events, with a few major versions of a
system during its lifetime. As a business and its context continue to change, these initial decisions might
hinder the system's ability to deliver changing business requirements. In the cloud, the capability to automate
and test on demand lowers the risk of impact from design changes. This allows systems to evolve over time
so that businesses can take advantage of innovations as a standard practice.
Drive architectures using data: In the cloud you can collect data on how your architectural choices affect
the behavior of your workload. This lets you make fact-based decisions on how to improve your workload.
Your cloud infrastructure is code, so you can use that data to inform your architecture choices and
improvements over time.
Improve through game days: Test how your architecture and processes perform by regularly scheduling
game days to simulate events in production. This will help you understand where improvements can be made
and can help develop organizational experience in dealing with events.
Source: AWS ‘Well-Architected Framework’ 59
The first principle has two elements: elasticity and being serverless
59
The Five Pillars of the Well-Architected
Framework
• Operational Excellence
• Security
• Reliability
• Performance Efficiency
• Cost Optimization
60
60
Design Principles for Operational Excellence
Continuous Improvement
Best Practices:
Design, Operate, Evolve
61
Design Principles for Security
Risk Assessment and Mitigation
• Implement a strong identity foundation (least privilege, separation of
duties, centralized privilege management, reduce long-term credentials)
• Enable traceability
• Apply security to all layers (defense in depth)
• Automate security best practices
• Protect data in transit and at rest (encryption)
• Keep people away from data (eliminate manual processing)
• Prepare for security events
Best Practices:
Identity and Access Management, Detective Controls, Infrastructure
Protection (Separation/Segmentation), Data Protection, Incident Response
62
Design Principles for Reliability
Failure Prevention and Recovery
Best Practices
Foundations, Change Management, Failure Management
63
Design Principles for Performance
Efficiency
• Democratize advanced technologies (by enabling them to be consumed as
a service)
• Go global in minutes (by deploying to multiple regions ‘at a click’)
• Use serverless architectures
• Experiment more often (enabled through virtualization and automation)
• Mechanical sympathy (use the technology approach that aligns best with
your goals)
Best Practices:
Selection, Review, Monitoring, Trade-offs
64
Design Principles for Cost Optimization
Continuous Refinement and Improvement
Best Practices:
Expenditure Awareness, Cost-Effective Resources, Matching supply and
demand, Optimizing over time
This is heavily interlaced with Amazon self-promotion. It also talks about some basic
elements of the cloud business model.
65
Further Resources for the Well-Architected
Framework
• https://aws.amazon.com/architecture/well-architected/
• 14 well-architected ‘lenses’ provide best practices for building
specific types of applications, e.g.
– Serverless applications
– Containers
– High Performance Computing (HPC)
– Internet of Things
• AWS Architecture Center
• Well-Architected labs https://www.wellarchitectedlabs.com/
66
Demo!
66
Well-Architected Tool
67
67
Building Reliable Services on the Cloud
Phillip Tischler with Steve McGhee and Shylaja Nukala:
Building Reliable Services on the Cloud.
Systematic Resilience for Sustained Reliability,
O’Reilly, 2022
68
68
Core Architecture Choices
69
69
Choice of Services
• Compute
– Use containers! Keep them small, start with serverless.
– Optimize for startup time, implement ready/live checks, terminate gracefully.
• Network
– Use provider's CDN, Load Balancers, private WANs, service meshes
• Storage
– Consider object stores, NoSQL databases, multi-regional database services
– Use Publish/Subscribe service to decouple readers/writers, improve retries
– Consider Batch (MapReduce/Flume) for high volumes or data
70
70
Failure Domains and Redundancy
71
71
Avoid Common Failure Modes
• Bad Changes
– Supervision/monitoring
– Progressive roll-out
– Automatic (tested) roll-back
– Infrastructure-as-Code
• Error Handling
– Soft dependencies, fail-safe alternatives
– Standardized error codes
– Distinguish between server overload, service overload and
quota exhaustion, prevent retry amplification
72
Multiple retries through server overload should lead to service overload designation
72
Avoid Common Failure Modes..
• Resource Exhaustion
– Avoid cascading failures
– Use cost modeling, quotas, load shedding, criticality
criteria
– Auto-scaling, caching
• Hotspots and Thundering Herds
– Gating to batch up equivalent requests
– Add jitter or random delays to cache expiration
• Don’t miss regular backups!
73
73
74
74
Transformation to Cloud using multiple concurrent approaches
… to minimize risk & cost while leveraging new & existing investments to innovate & differentiate
75
76
76
Cloud Security (AWS)
77
77
Security and Compliance as Shared
Responsibility between AWS and Customer
https://aws.amazon.com/compliance/shared-responsibility-model/
78
78
https://aws.amazon.com/compliance/
79
79
AWS Security Capabilities
• Network Security
– Built-in Firewalls
– Encryption in transit
– Private/dedicated connections
– Distributed Denial of Service (DDoS) mitigation (Amazon Shield)
• Inventory and Configuration Management
– Control and track changes
• Data Encryption
– Encryption capabilities
– Key management services
– Hardware-based cryptographic key storage
• Access and Control Management
– Identity and Access Management (IAM)
– Multi-factor authentication
– Directory integration and Federation
– Amazon Cognito User Management and Secure Sign-On (SSO)
• Monitoring and Logging 80
Cognito: Authorization, Authentication & User Management for Mobile and Web
Amazon Inspector: Service performing security risk assessment and establishing best
practices. Hundreds of rules mapping to compliance standards
DDoS typically uses botnets to flood a website with traffic to deny service to
legitimate requests
AWS Shield Standard is free of charge protects against most common attacks
Advanced gives access to a response team.
80
Using Virtual Private Clouds (VPCs) to
secure resources
A VPC lets you provision a logically isolated section of the Amazon Web Services
(AWS) cloud where you can launch AWS resources in a virtual network that you
define
81
IAM - Configuring Access Rights
82
82
IAM Management Console
83
83
IAM User Groups
84
Policy Types
• Identity-based policies
• Resource-based policies
• Permissions boundaries
• Organization SCPs (Service Control Policies)
• Access control lists (ACLs)
• Session Policies
https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html
85
85
Containers and
Container Orchestration
Kubernetes, OpenShift, Operators, Service Mesh,
Security, Use cases
2023-11-16
Wolfram Richter
Manager Solution Architecture
Red Hat
1
Kubernetes Stabilizing since 2020
https://k8s.devstats.cncf.io/d/8/company-statistics-by-repository-group?orgId=1&var-period=y&var-metric=contributors&var-repogroup_name=All&var-repo_name=kubernetes&var-companies=All&from=1438380000000&to=1659391199000
Innovation Focus on the Surrounding Areas
https://all.devstats.cncf.io/d/1/activity-repository-groups?orgId=1&var-period=y&var-repogroups=All&from=1438380000000&to=1659391199000
CNCF Ecosystem Slide
4
Red Hat OpenShift | Functional Overview
Operations Services
Kubernetes
Linux OS
Automated Operations
Kubernetes
Edge Physical Virtual Private cloud Multi-Arch Public cloud Managed cloud
6 (Azure, AWS, IBM, Google)
Let’s take a first peek...
https://developers.redhat.com/learn/openshift
7
Containers
Red Hat OpenShift | Functional Overview
Operations Services
Kubernetes
Linux OS
● Schiffscontainerbeispielslide
Solution: the inter-modal shipping container
Container-Basics - Namespaces
● It’s a process!
CONTAINER
WHAT ARE CONTAINERS?
It Depends Who You Ask
INFRASTRUCTURE APPLICATIONS
1
4
Red Hat OpenShift Concepts
CONTAINER
15
Red Hat OpenShift Concepts
CONTAINER CONTAINER
IMAGE
BINARY RUNTIME
16
Red Hat OpenShift Concepts
CONTAINER CONTAINER
REGISTRY
17
Red Hat OpenShift Concepts
IMAGE REGISTRY
myregistry/frontend myregistry/mongo
frontend:latest mongo:latest
frontend:2.0 mongo:3.7
frontend:1.1 mongo:3.6
frontend:1.0 mongo:3.4
18
From Hosts to VM’s to Containers
VM Management Kubernetes
App1 App2 App3 App1 App2 App3 App1 App2 App3
OS
Guest OS Guest OS
Hypervisor
Guest OS
?
(Host Operating System) Host Operating System
Infrastructure
Infrastructure Infrastructure
VM Management Kubernetes
App1 App2 App3 App1 App2 App3 App1 App2 App3
22
Red Hat OpenShift | Functional Overview
Operations Services
Kubernetes
Linux OS
CONTAINER
CONTAINER CONTAINER
IMAGE
POD
10.140.4.44
24
Red Hat OpenShift Concepts
POD POD
10.140.4.44 10.15.6.55
25
Red Hat OpenShift Concepts
1 2 N
image name
replicas
labels
cpu CONTAINER CONTAINER ... CONTAINER
memory
storage
ReplicaSet
ReplicationController
POD POD POD
26
Red Hat OpenShift Concepts
v1 v2
image name
replicas CONTAINER
CONTAINER
labels
version
strategy
Deployment
DeploymentConfig POD POD
27
Red Hat OpenShift Concepts
image name
✓ ✓
replicas
CONTAINER CONTAINER
labels
cpu
memory
storage
POD POD
28
Red Hat OpenShift Concepts
Dev Prod
appconfig.conf appconfig.conf
CONTAINER CONTAINER
MYCONFIG=true MYCONFIG=false
29
Red Hat OpenShift Concepts
Dev Prod
hash.pw hash.pw
CONTAINER CONTAINER
ZGV2Cg== cHJvZAo=
30
The etcd datastore can be encrypted for additional security
https://docs.openshift.com/container-platform/4.6/security/encrypting-etcd.html
Red Hat OpenShift & Kubernetes Concepts
31
Red Hat OpenShift & Kubernetes Concepts
SERVICE
“backend”
role:
backend
32
Red Hat OpenShift Concepts
SERVICE
“frontend”
Route
role:
> curl http://app-prod.mycompany.com
backend
role:
role: role: frontend
frontend C frontend C C
POD POD
POD
33
Red Hat OpenShift Concepts
PersistentVolumeClaim PersistentVolume
My app is
POD
stateful.
34
Red Hat OpenShift Concepts
alive?
ready?
35
Red Hat OpenShift Concepts
C C C ❌ C C C
C C C
❌ ❌ C C C
36
Kubernetes and what it can do for you
Service discovery and load Automatic bin packing - You Self-healing - Kubernetes
balancing - Kubernetes can provide Kubernetes with a cluster restarts containers that fail,
expose a container using the DNS of nodes that it can use to run replaces containers, kills
name or using their own IP containerized tasks. You tell containers that don't respond to
address. If traffic to a container is Kubernetes how much CPU and your user-defined health check,
high, Kubernetes is able to load memory (RAM) each container and doesn't advertise them to
balance and distribute the network needs. Kubernetes can fit clients until they are ready to
traffic so that the deployment is containers onto your nodes to serve.
stable. make the best use of your
resources.
37
Kubernetes and what it can do for you
Does not deploy source code Does not provide Does not dictate logging,
and does not build your application-level services, such monitoring, or alerting
application. Continuous as middleware (for example, solutions. It provides some
Integration, Delivery, and message buses), data-processing integrations as proof of concept,
Deployment (CI/CD) workflows frameworks (for example, Spark), and mechanisms to collect and
are determined by organization databases (for example, export metrics.
cultures and preferences as well as PostgreSQL), caches, nor cluster
technical requirements. storage systems (for example,
Ceph) as built-in services.
39
Red Hat OpenShift | Architectural Overview
40
Red Hat OpenShift | Architectural Overview
COMPUTE COMPUTE
41
Red Hat OpenShift | Architectural Overview
CONTROL PLANE
42
Red Hat OpenShift | Architectural Overview
State of everything
etcd
CONTROL PLANE
43
Red Hat OpenShift | Architectural Overview
Kubernetes
API server
etcd
Cluster
Management
CONTROL PLANE
44
Red Hat OpenShift | Architectural Overview
Kubernetes Services
etcd
CONTROL PLANE
45
Red Hat OpenShift | Architectural Overview
Infrastructure Services
etcd
Web Console
CONTROL PLANE
46
Red Hat OpenShift | Architectural Overview
Infrastructure Services
Kubernetes Services
etcd
47
Red Hat OpenShift | Architectural Overview
Cluster monitoring
Infrastructure Services
Kubernetes Services
48
Red Hat OpenShift | Architectural Overview
Integrated routing
Infrastructure Services
49
Red Hat OpenShift | Architectural Overview
50
Red Hat OpenShift | Architectural Overview
Log aggregation
51
Red Hat OpenShift | Architectural Overview
Infrastructure Services
Kubernetes Services
etcd
52
Red Hat OpenShift | Architectural Overview
Infrastructure Services
Kubernetes Services
etcd
53
Red Hat OpenShift | Architectural Overview
Infrastructure Services
Kubernetes Services
etcd
54
Red Hat OpenShift | Architectural Overview
Infrastructure Services
Kubernetes Services
etcd
55
Red Hat OpenShift | Architectural Overview
Infrastructure Services
Kubernetes Services
etcd
56
Red Hat OpenShift | Architectural Overview
Developers CI/CD
Kubernetes Services Router Router
57
Red Hat OpenShift | Functional Overview
Operations Services
Kubernetes
Linux OS
60
HOW DOES AN OPERATOR WORK?
Operator Catalog Operator Installation Operator Updates Operator Discovery Rich UI Controls
63
Operator SDK
Enabling everybody to write Operators
Packaging
64
OPERATOR FRAMEWORK
65
HOW OPERATORS POWER OPENSHIFT
○ Manage the OpenShift control plane and cluster ○ Provide workloads and automation on top of the
infrastructure platform
○ Cluster functionality exposed via Operator APIs ○ Managed Workloads exposed via Operator APIs
○ 40+ Operators, managed by Cluster Version ○ 69 Red Hat Operators, 155 ISV Operators, 152
Operator Community Operators, managed by OLM
ImageRegistry
MachineConfig CloudCredentials
Red Hat Quay OpenShift Pipelines
OpenShift GitOps
WebConsole Ingress
CrunchyDB Postgres
GitLabs Runner
66
KubeVirt - Virtualization in Containers
Red Hat OpenShift
Security
Features, mechanisms and processes for container and platform
isolation
68
Back to Index
OPENSHIFT SECURITY | Comprehensive features
Fine-Grained RBAC
DEFEND
Network Isolation Storage
Infrastructure
70
Red Hat OpenShift Security | Comprehensive features
● Linux capabilities - break root privileges into smaller groups and control them
● Libseccomp - syscall filtering mechanism
● Namespaces - isolation primitives
71
SELinux
72
https://www.redhat.com/en/blog/selinux-mitig
ates-container-vulnerability
https://www.redhat.com/en/blog/latest-container-exploit-runc-
can-be-blocked-selinux
72
Security Context Constraints
• Allow administrators to
control permissions for
pods
• Restricted SCC is
granted to all users
• By default, no containers
can run as root
75
Red Hat OpenShift Security | Comprehensive features
Service Certificates
service.beta.openshift.io/
inject-cabundle="true"
service-ca.crt
ConfigMap
service.alpha.openshift.io/
serving-cert-my CONTAINER
My
Service
POD
serving-cert-my
tls.crt
tls.key
Secret
ConfigMap Service Serving
CAbundle Injector Cert Signer
76
Red Hat OpenShift Security | Comprehensive features
(6) Token
LDAP Google
Keystone OpenID
77
Red Hat OpenShift Security | Comprehensive features
Fine-Grained RBAC
78
Red Hat OpenShift | Functional Overview
Operations Services
Kubernetes
Linux OS
80
Back to Index
Red Hat OpenShift Monitoring | Solution Overview
81
Red Hat OpenShift Monitoring | Operator & Operand Relationships
kube-state-metrics Openshift-state-
Metrics (4.2)
prometheus-adapter telemeter-client
Prometheus Alertmanager
prometheus-operator
82
Red Hat OpenShift Monitoring | Prometheus, Grafana and Alertmanager Wiring
openshift-state-metrics
kube-state-metrics
Control Plane (API)
node-exporter node-exporter
83
Red Hat OpenShift
Logging
An integrated solution for exploring and corroborating application logs
84
Back to Index
Red Hat OpenShift Logging | Solution Overview
Observability via
log exploration and corroboration with EFK
• Components
• Access control
ElasticSearch
Operator
ElasticSearch
Cluster
Cluster Logging
Operator
Kibana
Fluentd
(per node)
Curator
86
Red Hat OpenShift Logging | Architecture
Fluentd
TLS TLS
Fluentd
Node Elasticsearch Kibana
Fluentd
Node Application Logs
Node
87
Red Hat OpenShift Logging | Architecture
stdout
stderr
Fluentd
TLS
Elasticsearch
CRI-O
OS DISK journald
kubelet
Node (OS)
88
Red Hat OpenShift Logging | Architecture
(API) that replaces needs to configure log forwarding via Infra spec:
outputs:
- name: MyLogs
Fluentd ConfigMap. type: Syslog
App syslog:
Facility: Local0
• The API helps to reduce probability to misconfigure Fluentd url: localstore.example.com:9200
pipelines:
and helps bringing in more stability into the Logging stack. Audit - inputs: [Infrastructure, Application, Audit]
outputs: [MyLogs]
inputSource=audit
89
“ClusterLogForwarder”
Custom Resource creates
Fluentd Fluentd
daemonset forwarder
90 Node
Customer
Examples
91
https://www.redhat.com/en/resources/audi-case-study?sc_cid=701f2000000txokAAA&utm_source=bambu&utm_medium=social
https://www.cio.de/a/volkswagen-senkt-testkosten-um-die-haelfte,3670004
Container and Kubernetes extending to Edge deployments
https://news.lockheedmartin.com/2022-10-25-Lockheed-Marti-Red-Hat-Collaborate-Advance-Artificial-Intelligence-Military-Missions
https://new.abb.com/news/detail/93075/abb-and-red-hat-partner-to-deliver-further-scalable-digital-solutions-across-industrial-edge-and-hybrid-cloud
INDUSTRY 4.0
Operations Services
Kubernetes
Linux OS
98
Back to Index
Build and Deploy | Three Ways to Serve Developers
99
Build and Deploy | Source-to-Image (S2I) for building/deploying from code
Code
(Git)
Developer Repository
CODE APPLICATION
Image Builder
S2I
Registry Image
BUILD IMAGE
Application
Image
DEPLOY
OpenShift Pipelines
101
OpenShift Pipelines
Pre-Production Production
LOCAL DEV CI CD CD
What is Cloud-Native
Continuous Integration and Continuous Delivery (CI/CD)?
104
OpenShift Pipelines
OpenShift Pipelines
a Cloud-Native CI/CD Experience on OpenShift
Standard Kubernetes-style
pipelines Build images with Kubernetes tools
Declarative pipelines with standard Kubernetes Use tools of your choice (source-to-image, buildah,
custom resources (CRDs) based on Tekton* kaniko, jib, etc) for building container images
105
OpenShift Pipelines
API
OpenShift Pipelines
Kubernetes OpenShift
107
Red Hat OpenShift | Functional Overview
Operations Services
Kubernetes
Linux OS
▸ Single purpose
▸ Independently deployable
▸ Have their context bound to a biz
domain
▸ Owned by a small team
▸ Often stateless
109
OpenShift ServiceMesh
Benefits of Microservices
Agility
Deliver updates faster and react faster to new business demands
Highly scalable
Scale independently to meet temporary traffic increases, complete batch
processing, or other business needs
Can be purpose-built
Use the languages and frameworks best suited for the service’s domain
Resilience
Improved fault isolation restricts service issues, such as memory leaks or
open database connections, to only affect that specific service
Many orgs have had success with Microservices - Netflix, Amazon, eBay, The Guardian
110
OpenShift ServiceMesh
111
OpenShift ServiceMesh
112
OpenShift Service Mesh
OpenShift ServiceMesh
Services
➤ Connect services securely with zero-trust
network policies. F
➤ Automatically secure your services with
managed authentication, authorization and Envoy Envoy Envoy
encryption.
➤ See what’s happening with out of the box Istio Kiali Jaeger
114
Let’s try it...
Demo Scenario:
https://istio.io/latest/docs/examples/bookinfo/
$ git clone
https://github.com/istio/istio.git
$ cd istio
115
What's new in OpenShift 4.6
OpenShift Pipelines
OpenShift GitOps
(new add-on)
116
GitOps with Argo CD
117
OpenShift Platform Services
Wrap up
118
Red Hat OpenShift | Functional Overview
Automated Operations
Kubernetes
Edge Physical Virtual Private cloud Multi-Arch Public cloud Managed cloud
119 (Azure, AWS, IBM, Google)
Thank you linkedin.com/company/red-hat
youtube.com/user/RedHatVideos
120
HYBRID CLOUD
SECURITY
Rhonda Childress
Kyndryl Fellow
Chief Innovation Officer – Kyndryl Security Practice
Cloud Security: Current State
Cloud Security
Current State
• More investments in Zero Trust and securing/encrypting data. Cloud
isn’t delivering the ROI as quickly as many expected, and CISOs are
facing complex cloud and legacy environments that are full of gaps and
seams. So, security is getting more involved in the assembly process.
• System vulnerabilities
• Unsecured APIs
• Cloud Forensics (dynamic environments, volatile data and limitation of available tools)
• Lack of Encryption and Key Control Causes Cloud Data Concerns
• Solar Winds, a major U.S. IT firm, fell victim to a supply chain recently. Weak information security practices by a former intern exposed a critical internal password (solarwinds123).
• Once the password was compromised, suspected Russian hackers were able to access a system that SolarWinds used to assemble updates to Orion, one of its flagship products.
• From here, the attackers inserted malicious code into an otherwise legitimate software update, allowing them to monitor and identify running processes that were involved in the
compilation of Orion, and replace source files to include SUNBURST malware.
• Orion updates were deployed to an estimated 18,000 customers, and SUNBURST sent information back to the attackers that was used to identify targets of additional malware,
broadened access, and spying.
• The fact that the intended targets and victims of the attack were several degrees of separation away from the entry point, makes this a popular example of a modern software
supply chain attack.
Repercussions:
• The SEC recently announced fraud charges against SolarWinds Corporation and its CISO. The SEC’s complaint alleges that SolarWinds misled investors by overstating its cybersecurity
practices and understating known cybersecurity risks. The SEC is charging SolarWinds with violating various provisions of the Securities Act of 1933 and the Securities Exchange Act of
1934, and the CISO with aiding and abetting the company’s violations. The SEC’s complaint seeks civil penalties, permanent injunctive relief, disgorgement, and a ban on the CISO
serving as a director or officer of a publicly traded company.
Software Supply Chain Attack Protection
How can you reduce supply chain security risks?
• Assess the security and trustworthiness of the code that you consume
• Ensure developers are keeping writing secure proprietary code
• Also – companies are deploying “Due Diligence Questionnaires” (DDQ) asking for specific information on suppliers.
Hypervisors are the new attack vector
• Hypervisor attacks have increased in the last 12 months.
• Belief that hypervisors could never be breached
• Lack of availability of similar security tooling as used
on the VMs
• Use of environments by bad actors led to an
understanding of the vulnerabilities
• Living off the land
Cloud Consoles (are not secure)
• Protection recommendations
• Require Strong passwords
• Require 2 Factor Authentication
• Use IAM (vs. Admin/root users) and 'least privileges' strategy
• Grant specific access to specific resources and services
• Don't share passwords
• Use Local IDs and passwords
Common Cloud Attack Vectors
Bad Practices
• Networking
Cloud Security Certifications
Entry
• Comptia Cloud+
Experienced
Expert
AWS ranks as the largest cloud infrastructure provider, followed by Microsoft Azure, and Google Cloud.
Summary
• The cloud is becoming a more popular attack
target for bad actors
• It is not inherently secure, but can be made
secure with diligence and tools
• Understanding attack vectors and patterns are
critical to writing secure software products and
services
• There are many lucrative career opportunities
in the security domain including certifications
Questions?
Hybrid Cloud Management
Dave Lindquist
Red Hat VP Engineering, Hybrid Cloud Management
1 IBM Fellow
Red Hat
Enterprise open source solutions,
● using a community-powered approach to deliver high-performing Linux, cloud, container, and Kubernetes
technologies.
More than
1,000,000
projects
3
F24576-200910
Open source
Fedora OKD
Foreman Ansible
4
F24576-200910
Red Hat Open Hybrid Cloud
CLOUD-NATIVE
MODERNIZE APPLICATIONS AI APPLICATIONS EDGE APPLICATIONS
APPLICATIONS
Kubernetes (orchestration)
Infrastructure
6
Hybrid Cloud Platform Management Requirements
Unified management solutions across on-prem and public cloud resources are already
implemented or planned by more than half (64%) of organizations worldwide in order to solve
this complexity and provide consistency
To what extent is your organization implementing or planning to implement a unified management solution to
standardize infrastructure operations across on-premises resources and public cloud services?
IAM
Common identity
Inventory / Data
Common inventory of resources
AI /Analytics
AIOps/Insights/analytics/ML
Policy
AIOps/Insights/analytics/ML
Hybrid Cloud Platform Management - future state
Results, Outcomes:
● An ability to manage Hybrid Cloud & Fleet vs list(s) of platforms
● Reduce costs (SRE toil) through automated lifecycle operations, reduce security exposure and risk
● Policy based management, abstracting operational complexity
● Improving productivity, proactive remediation by providing automation and analytics (EDA, AIOps)
● Enhanced risk and compliance management [via causality analysis with change event controls]
Advanced Cluster
Management
Red Hat
Red Hat Advanced Cluster Management for Kubernetes
11
Red Hat Advanced Cluster Management
for Kubernetes
Multicluster lifecycle
management
Advanced application
lifecycle management
Multicluster Observability
12
and Search for health and
optimization
13
Policy based Governance, Risk, and Compliance
Don’t wait for your security team to tap you on the shoulder
14
Advanced Application Lifecycle Management
Simplify your Application Lifecycle
15
Multicluster Observability
Overview
dashboards
16
Ansible
Automation across the hybrid cloud ecosystem
55+
Certified
Technology Network Private cloud
partners
Routers Switches
Developer IDE
Custom
enterprise content
Automation Hub
console.redhat.com
Private
Ansible Galaxy Automation Hub
19
Execution layers of Event Driven Automation
Event- Driven Ansible
Optional
20
The Ansible Lightspeed experience
Enhancing Playbook creation
2
3 5
22
Cloud Computing
Practical Topics – Platforms, Applications
and Best Practices
Stuttgart University
WS 2023/24
07-11-2023
2
About Me – Kristof Kloeckner
• https://www.linkedin.com/in/kristofkloeckner/
• With IBM from 1984 to 2017
• Advising clients on technology and strategy in hybrid
cloud contexts
• First CTO of IBM WebSphere Application Server
Platform
• First CTO of IBM Cloud
• Responsible for DevOps Tools (Rational Software)
• Responsible for AI and Automation in Technology
Services
• Honorarprofessor, University of Stuttgart, since 1997 3
About Me – Gerd Breiter
• https://www.linkedin.com/in/gerd-breiter-82467b2/
• With IBM from 1982 to 2018
• Working on Intelligent Infrastructure / On Demand
Computing / Cloud from 2000 - 2014
• Started and led work on TOSCA in close cooperation
with Frank Leymann and Stuttgart University
• Co-Lead for Definition of IBM Cloud Computing
Reference Architecture
• Chief Architect IBM Operations Management from
2014 – 2018
4
Other Contributors
• Rhonda Childress, Kyndryl Fellow
• Dave Lindquist, IBM Fellow
• Wolfram Richter, Manager of Chief Architects Germany, Red
Hat
• Simon Moser, DE, Lead Architect Serverless & Cloud, IBM
• Kristian Stewart, former DE, IBM
• Isabell Sippli, DE, IBM
• Amardeep Kalsi, STSM, IBM
• Albrecht Stäbler, CEO dibuco & Team
5
Logistics
• Weekly Webex session on Thursday at 15:45 Central European
Time
– https://unistuttgart.webex.com/meet/kristof.kloeckner
• Charts and Recordings will be posted on ILIAS
• External participants can access the course material through the
Technology Partnership Lab (TPL)
– https://tpl.informatik.uni-stuttgart.de/2023/10/18/practical-cloud-topics-
platforms-services-and-best-practices-ws2023-24/
• Office hour on request
• Please fill out the survey of prior knowledge on ILIAS!
• Ungraded exercises supported by an AWS Academy Learner Lab
• Exercise discussion every Tuesday at 17:30 (same Webex)
• Credits through completion of a case study and a short test/quiz
6
Basic Tenets of this Course
• Digital disruption makes cloud adoption a necessity for enterprises
• It is no longer a question of whether cloud should be adopted, but
– how,
– for what applications and data,
– and with which controls
• Cloud is as much about business models as it is about technology
• There is not ONE cloud - hybrid (multi-)clouds are central to enterprise requirements
– Containers are increasingly used as key enablers of hybrid clouds
• For cloud providers, winning developers is key. This drives the growth of platforms.
– Most platforms support multiple run times, programming and service models
– Cloud native models are growing in importance (microservices, serverless)
– Platforms are moving up the value stack, specialized platforms are emerging, e.g. for
Internet of Things, Machine Learning, industries
• Cloud enables and requires changes in development and operational models
• To understand cloud, it is important to understand how large distributed systems work,
and to understand service oriented architectures 7
Agenda
• Overview, Market and Technology Trends
• Cloud Native Architectures and Best Practices Frameworks
• Cloud Security
• Containers and Container Orchestration
• Serverless Computing and Serverless Stacks
• Commercial Clouds: AWS, Google, Azure
• Multi-Cloud and Hybrid Cloud Management
• Transforming Applications for the Cloud
• DevOps and Site Reliability Engineering
• AI in the Cloud
• Big Data in the Cloud
• IoT and Edge Computing
8
Agenda Details 1
• 10/19 Intro and Cloud Basics (Kristof Kloeckner)
– Core Concepts (Recap)
– A walk through a commercial cloud (AWS)
• 10/26 Overview of Cloud Native Architectures and Best Practices
Frameworks (Kristof Kloeckner)
– Some basic principles: Loose coupling, eventual consistency
– Architecture Styles
– AWS, MS & Google Architecture Guidance
– Well-architected Framework
• 11/2 Best Practices Frameworks Part 2 (Kristof Kloeckner)
• 11/9 Hybrid Cloud Security (Rhonda Childress, Kyndryl)
• 11/16 Containers & Container Orchestration (Wolfram Richter, Red Hat)
– Kubernetes, OpenShift, Operator Concept, Service Mesh, Security, Use cases)
• 11/23 Serverless 1 – IBM Code Engine (Simon Moser, IBM)
– Potentially OpenShift Serverless
• 11/30 Serverless 2 – AWS Lambda and Serverless Stack (Kristof Kloeckner)
– Examples of modern serverless applications 9
Agenda Details 2
• 12/7 Multi-Cloud and Commercial Clouds (KK)
• 12/14 Hybrid Cloud Management (Dave Lindquist)
• 12/21 Event Management (Kristian Stewart)
• No lecture week of 12/26
• No lecture week of 1/2
• 1/11 Transforming Applications for the Cloud (Isabell Sippli, IBM)
– Migration, Replatforming, Refactoring
– (KK) Amazon 6R and other portfolio transformation approaches
• 1/18 DevOps and Site Reliability Engineering (Amardeep Kalsi, IBM)
• 1/25 Big Data in the Cloud (Albrecht Staebler and team, dibuco) tbd
• 2/1 AI in the Cloud (KK)
• 2/8 IoT & Edge (Gerd Breiter) & Wrap-Up (KK)
– Home Automation Use Case
10
Agenda Details 3
11
References
• The Developer’s Guide to Microsoft Azure, 2nd Edition 2022
• Brendan Burns, Designing Distributed Systems, O’Reilly 2018
• David Clinton. Learn Amazon Web Services in a Month of Lunches. Manning
Publications, August 2017
• Ian Foulds, Learn Azure in a Month of Lunches, Manning Publications 2018
• Scott Galloway, The Four. The Hidden DNA of Amazon, Apple, Facebook and
Google, Penguin 2017
• Cloud Application Architecture Guide, Microsoft 2022
• Cornelia Davis, Cloud Native Patterns, Manning 2019
• Implementing Microservices on AWS, Amazon, November 2021
• Fehling, Leymann et. al., Cloud Computing Patterns, Springer 2014
• Andrew Tanenbaum, Maarten van Steen, Distributed Systems, Prentice Hall
2002
• Mary and Tom Poppendieck, Lean Software Development, An Agile Toolkit,
Addison Wesley, 2003
• Betsy Beyer et al., Site Reliability Engineering. How Google runs production
systems, O’Reilly 2016 12
References (AWS)
• David Clinton. Learn Amazon Web Services in a Month of Lunches.
Manning Publications, August 2017
https://www.manning.com/books/learn-amazon-web-services-in-a-month-of-
lunches
• AWS Tutorials
https://aws.amazon.com/getting-started/tutorials/
• AWS Developer Guides
• AWS Well Architected Framework
• AWS Getting Started Guided Projects, e.g.
https://aws.amazon.com/getting-started/use-cases/websites/?csl_l2b_ws
• https://www.slideshare.net/AmazonWebServices/awsome-day-2019-
detroit
• AWS Samples on GitHub https://github.com/aws-samples
• AWS SDKs https://aws.amazon.com/tools/#sdk
• Blogs like https://aws.amazon.com/blogs/aws/
• Builder Hub https://devops.com/builder-community-hub
13
References (AWS Certification)
• Amazon Web Services Overview, July 2019
– https://d0.awsstatic.com/whitepapers/aws-overview.pdf
• Architecting for the Cloud. AWS Best Practices, October 2018
– https://d1.awsstatic.com/whitepapers/AWS_Cloud_Best_Practices.p
df
• How AWS pricing works, June 2018
– https://d0.awsstatic.com/whitepapers/aws_pricing_overview.pdf
• Total Cost of (Non) Ownership of Web Applications in the
Cloud, August 2012
– https://media.amazonwebservices.com/AWS_TCO_Web_Application
s.pdf
• Compare AWS Support Plans
– https://aws.amazon.com/premiumsupport/plans/
14
Overview and Cloud Principles
15
We live in times of digital disruption
New businesses are composed by New apps are consolidating Business imperatives increase
leveraging digital services from decision making capabilities demand on IT resources and drive
a broad ecosystem at the fingertips of people a focus on maximum efficiency
who need to act and agility
17
Each of these business needs is distinguished by a set of
attributes
Business Model
Engagement Efficiency
Innovation
§ Enable new business § Data personalization § Increase operational
models (at scale) efficiency
§ Time sensitivity /
§ Leverage network effects Context sensitivity § Shift Capital Expense to
Operational Expense
§ Achieve Internet scale § Consumability of
(globalization) services § Consume services rather
than own assets
§ Increase speed of § Improve customer
innovation service § Improve IT agility
18
Specific technology adoption patterns align with each of the
three classes of business needs
§ Employ a Common
Security Model
19
Systems of Engagement and Internet of Things combine with Systems
of Record to enable new types of services
Speed, Convenience
Consistency
Systems of Record Systems of Engagement
20
These new service require new infrastructure, architectures,
processes and tools
Hybrid Clouds
Systems of Record Containers Systems of Engagement
Systems of Discovery
• Sensors Analytics and
• Embedded Machine Learning
intelligence Signal from noise
• Connected devices
New Processes
New Tools
Internet of Things
Edge Topologies 21
The challenges are not new:
In the past, industries have responded to similar challenges
through virtualization, automation and standardization
Telcos automate traffic through
switches to assure service and
lower cost.
http://csrc.nist.gov/groups/SNS/cloud-computing/
• On-demand self-service
– A consumer can unilaterally provision computing capabilities, such as server time and network storage, as
needed automatically without requiring human interaction with each service’s provider.
• Broad network access.
– Capabilities are available over the network and accessed through standard mechanisms that promote use by
heterogeneous thin or thick client platforms (e.g., mobile phones, laptops, and PDAs).
• Resource pooling.
– The provider’s computing resources are pooled to serve multiple consumers using a multi-tenant model,
with different physical and virtual resources dynamically assigned and reassigned according to consumer
demand. There is a sense of location independence in that the customer generally has no control or
knowledge over the exact location of the provided resources but may be able to specify location at a higher
level of abstraction (e.g., country, state, or datacenter). Examples of resources include storage, processing,
memory, network bandwidth, and virtual machines.
• Rapid elasticity.
– Capabilities can be rapidly and elastically provisioned, in some cases automatically, to quickly scale out and
rapidly released to quickly scale in. To the consumer, the capabilities available for provisioning often appear
to be unlimited and can be purchased in any quantity at any time.
• Measured Service.
– Cloud systems automatically control and optimize resource use by leveraging a metering capability at some
level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user
accounts). Resource usage can be monitored, controlled, and reported providing transparency for both the
provider and consumer of the utilized service.
25
The Concept of Virtualization
Users of Resources
• Virtualization separates the presentation of resources to users from the actual resources
• Virtual resources have the same interfaces and functions as the real resources they
replace, but can have different attributes/numbers that make this substitution worthwhile
26
Virtual Machine Monitors
(Hypervisors)
Source: Intel
27
Types of Hypervisors
Type 1 Hypervisor (bare metal or native)
– Runs directly on Hardware
– Examples: z/VM, Hyper-V, XEN, VMware ESXI
– KVM (Kernel Virtual Machine) runs inside Linux and
converts the OS into a type 1 hypervisor
– Amazon is using XEN, but has developed a hypervisor
based on KVM for next-generation EC2 (Nitro)
• Type 2 Hypervisor (hosted)
– Runs on top of an operating system
– Examples: QEMU and WINE, VirtualBox
– Less performance
28
Server Utilization Patterns Benefiting from Cloud
Scale and Resource Sharing through Virtualization
29
Cloud Pricing Principles
• Modeled after utilities
• Pay as you go – only pay for what you consume
• Reserved capacity
– Significant discounts based on upfront commitment
– Good for baseline capacity
• Spot instances
– Bid on available excess capacity
– Good for workloads like HPC or Machine Learning Training
• Some consumption-based pricing is tiered – you pay less
when you consume more
• Free tier for new customers, plus free tiers in some services to
encourage trials
• Free data transfer into the cloud
30
A Definition of Cloud Computing
NIST 2011
Service Models:
• Cloud Software as a Service (SaaS)
– The capability provided to the consumer is to use the provider’s applications running on a cloud
infrastructure. The applications are accessible from various client devices through a thin client interface such
as a web browser (e.g., web-based email). The consumer does not manage or control the underlying cloud
infrastructure including network, servers, operating systems, storage, or even individual application
capabilities, with the possible exception of limited user-specific application configuration settings.
34
A Definition of Cloud Computing
NIST 2011
Deployment Models:
• Private cloud
– The cloud infrastructure is operated solely for an organization. It may be managed by the organization or a
third party and may exist on premise or off premise.
• Community cloud
– The cloud infrastructure is shared by several organizations and supports a specific community that has
shared concerns (e.g., mission, security requirements, policy, and compliance considerations). It may be
managed by the organizations or a third party and may exist on premise or off premise.
• Public cloud
– The cloud infrastructure is made available to the general public or a large industry group and is owned by
an organization selling cloud services.
• Hybrid cloud.
– The cloud infrastructure is a composition of two or more clouds (private, community, or public) that
remain unique entities but are bound together by standardized or proprietary technology that enables
data and application portability (e.g., cloud bursting for load-balancing between clouds).
Note: Cloud software takes full advantage of the cloud paradigm by being service oriented
with a focus on statelessness, low coupling, modularity, and semantic interoperability.
35
Expected benefits of cloud are increasingly shifting
from cost savings to increased business agility
• For public cloud, no capital expense (capex), no upfront commitment
• More effective sharing of resources
– No need to own resources for peak consumption – elastic pool of resources
– Variety of payment options, including ’pay by the drink’
• Reduced operational expense (opex) and risk through standardization and
automation
– Standardize first, then automate
• Faster and more convenient access to resources
– Instant access through self-service and automated provisioning
– No need to build up physical capacity
– Scale globally through cloud provider infrastructure
• Access to an ecosystem of services (platform effect)
– Technology providers are adopting a ‘cloud first’ approach
• Cloud is serving as a ‘crucible’ to try out and integrate new technologies
36
Prevalent Cloud Use Cases
• Application Development Lifecycle
– Build or train on cloud, deploy everywhere
• New applications (cost, agility) – Cloud First
• Hybrid pattern – supplementing existing workloads
– Analytics, Disaster Recovery, Data Warehousing
• Hybrid pattern - connecting systems of engagement
and systems of records
• Migration/consolidation of existing applications
• Data Center migration
• All-in: IT in the Cloud (born on the cloud, move to
cloud) 37
In the Cloud, it’s all about services
• Everything is a service, everything is defined by software
• Software-defined everything, infrastructure as code
• Microservices and Service Meshes
• Bringing service builders and service consumers together is key
• Marketplaces and catalogs
• Consumption models (billing, licensing)
• Brokers
• Exposing and monetizing service interfaces (API ‘economy’)
• Orchestrating services
• What’s the right granularity
• Separation of concerns
• What’s the lifecycle of a service
• Sharing services
• Service abstractions and virtualization
• Encapsulation (containers) 38
Rapid cloud growth continues, driven by digital
transformation
IDC July 2023
• Spending on public cloud services and infrastructure grew to $545.8B in 2022, a growth
of 22%. Foundational Services for ‘Digital First’ grew by 288.%.
• In 1Q23, infrastructure spending on non-cloud dropped 0.9%, shared cloud was up 22.5%,
dedicated cloud down 1.5% (IDC)
• In 2Q 2020, spending on public cloud infrastructure exceeded that on traditional infra
• The pandemic has accelerated digital transformation and the shift to cloud
• (Application) Software as a Service is the largest cloud market ($246B, 18% growth),
followed by Infrastructure as a service, Platform as a Service (32% growth), and SaaS
Infrastructure as a Service
• ‘This highlights the increasing reliance of enterprises on a cloud innovation platform built
around widely deployed compute services, data/AI services, and app framework services
to drive innovation.’
42
A Short Tour of a Commercial
Cloud (AWS)
43
Source: Awsome Day 2019 Detroit Slideshare
44
Source: Awsome Day Detroit Slideshare, early 2019
45
Amazon’s focus on ‘builders’ and business
• “The broadest and deepest platform for today’s builders” (W. Vogels, CTO, AWS
Summit NYC July 2019)
– ”We provide a toolbox, you pick”
– Using most functions themselves to run and grow their own digital business
• “Enabling digital transformation” (A. Jassy, then AWS CEO, 2020)
• These messages were reinforced in the following years
– Builder Community Hub 2023
• Agility, DevOps pipeline, operational principles
– ‘Well-architected Framework’
• Toolkits, IDEs, Integration with e.g. Visual Studio
• Microservices, containers, serverless across the stack
– “You write the business logic, we do the heavy (infrastructure) lifting”
• Automation
• Security
• AI/Machine Learning (first separate keynote in 2020)
• Significant and growing investment in IoT
46
Some Basic Amazon (IaaS) Services
• EC2 – Elastic Compute Cloud,
– Consists of virtual machines (called EC2 instances) launched from
Amazon Machine Images (AMIs)
• AMI – Amazon Machine Image
– Templates for building EC2 instances
• VPC - Virtual Private Cloud
– Private network that isolates your resources
• EBS – Elastic Block Storage
– Persistent Storage Volumes that can be attached to an EC2 instance
• S3 – Simple Storage Service
– Object Store Service
• These services plus Relational Database Services are
sometimes called Foundational Services by AWS.
49
AWS Learner Lab
50
Cloud Services in the Learner Lab
• Most AWS services are available, sometimes with capacity
restrictions (see documentation)
• Limitation in using IAM, use the preconfigured role (LabRole)
rather than creating new roles
• You can create your own key pairs, but a digital key (vockey) is
also already available in the preconfigured terminal.
• The Learner Lab comes with $100 credits, so stop or
terminate your resources if you are not using them anymore.
Some resources (like EC2 instances) are automatically
restarted when starting a lab.
51
A Short Tour of AWS Services
• The AWS Console
• ‘The biggest toolbox for builders’ - AWS Services
• Some important examples of AWS services
– Virtual Machines (EC2)
– Object Storage (S3)
– NoSQL Database (DynamoDB)
– Simple Notifications (SNS)
– Serverless Functions (Lambda)
• The Market Place
Other Clouds are similar, we will introduce them later!
52
Hosting a Simple Web Application
53
Storage and Databases
• Object Storage – S3
– Objects have urls, are stored in ‘buckets’ can be
– Objects can be versioned
– Objects can be arranged in ‘folders’
– Buckets can host static websites
• Block Storage – EBS
– Virtual Disks for Virtual Machines
• Relational Database Services (RDS)
– Managed service with many option
• NoSQL DB (DynamoDB)
– Managed Service
– Key Value Store (Tables)
• Streams - Kinesis
54
Cloud Networking
• A VPC can span availability zones
• A subnet is restricted to a single one
• Some services require 2 subnets
in different zones for enhanced
availability
• A route table determines where
traffic is routed
• Internet gateways and virtual private
gateways
• Security groups control traffic
(firewall rules)
Events
Dynamic Content
DynamoDB
Longer-Running
Processes
S3
Static Content
59
Other Material
60
61
62
63
64
Cloud Computing
Multi-Cloud, Hybrid Cloud, Commercial
Clouds
Stuttgart University
WS 2023/24
11-30-2023
2
A Modern Cloud Stack
Frameworks &
Platform Services Management Integration Blueprints Security
App Services
Infrastructure
Services Virtual Servers Storage Network
4
Modern Cloud Stack: Domain Services
Frameworks &
Platform Services Management Integration Blueprints Security
App Services
Infrastructure
Services Virtual Servers Storage Network
5
Runtimes & Platform Services (Representative Choices)
Infrastructure
6
Hybrid Clouds and Cloud Integration
7
A Hybrid Cloud Landscape
Hybrid Clouds
Systems of Record Containers Systems of Engagement
Systems of Discovery
• Sensors Analytics and
• Embedded Machine Learning
intelligence Signal from noise
• Connected devices
New Processes
New Tools
Internet of Things
Edge Topologies 8
Enterprise Cloud Requirements
from Nutanix Enterprise Cloud Index 2023
11
A very short history of Microsoft Azure
• Initial virtualization technology in 1997, completely rebuilt as Hyper-V in 2008
• Very large web properties since the 90s – Hotmail, Bing
• Azure was launched in 2008 (LA) and initially focused exclusively on PaaS
– Branded as ‘Windows’ to give the connotation of ‘OS for the Cloud’
– “Scared” their developer base (radical change), too far ahead of its time?
– The market was much more comfortable with Amazon’s IaaS focus
• MS added a stateless “VM role” to Azure as a stop-gap
– It is now deprecated
• Major shift in 2012:
– Added full IaaS role support to Azure
– Shifted definition of “Azure” to mean “Microsoft’s Public Cloud”
– PaaS platform naming shifted to “Azure Cloud Services”
• Renamed from Windows Azure to Microsoft Azure; All In on Cloud!
• Now strong support for open source, Linux, Cloud Native standards
– #2 by market share
Azure Physical Infrastructure
https://learn.microsoft.com/en-us/azure/guides/developer/azure-developer-guide
Microsoft Azure Home Page
15
Microsoft Azure Quickstart Center
16
Visual Studio and Azure
https://learn.microsoft.com/en-us/azure/azure-functions/create-first-function-vs-code-
python?pivots=python-mode-configuration
17
Azure Platform
Or Container
Source: Microsoft
24
Developing and Deploying Workloads on
AKS
Local
Source: Microsoft
25
Service Bus Queues
Source: Microsoft
26
Azure Events
Source: Microsoft
27
Azure Services
• Microsoft Azure compute & simple/scalable storage
• Azure SQL Database (fka SQL Azure)
– SQL Server as a Service
• Azure Cosmos – multi-model, distributed database service
• Azure DocumentDB – NoSQL database
• AppFabric (Cloud-based services)
– Access Control Service (Azure Active Directory)
– Enterprise Service Bus
– Distributed Object Caching
• Traffic Manager
– Global traffic management/routing (performance)
• Azure Connect (“VPN” between cloud and on-premise services)
• Azure Portals
– Web-based Service Lifecycle Management tools
– SQL Database management
– ReSTful APIs also available (non-Browser-based tools)
• Azure Media Services
• Azure Content Delivery Network (CDN)
• Various Migration Tools
Azure Services….
• Docker Services
– Azure Docker VM Extension turns VMs into Docker hosts
– Azure Container Service & Azure Kubernetes Service
– Docker Machine
• Authentication
– Azure Active Directory
– App Service Authentication (Azure ID and social identity providers)
• Monitoring
• DevOps Integration
– Popular open source tools
– Visual Studio
• Tools often use .Net under the covers
• For a full comparison with AWS, see
https://docs.microsoft.com/en-us/azure/architecture/aws-professional/services
Azure Free Services Tier
32
Azure Arc Structure
Source: Microsoft
33
Azure Arc Starting Page
34
Azure Arc Capabilities
• Inventory, management, governance, and security across a multicloud
environment
• Azure VM extensions to use Azure management services to monitor, secure, and
update servers
• Manage and govern Kubernetes clusters at scale
• Use GitOps to deploy configuration across one or more clusters from Git
repositories.
• Zero-touch compliance and configuration for Kubernetes clusters using Azure
Policy.
• Run Azure data services on any Kubernetes environment as if it runs in Azure
(specifically Azure SQL Managed Instance and Azure Database for PostgreSQL
• Create custom locations on top of your Azure Arc-enabled Kubernetes clusters,
using them as target locations for deploying Azure services instances.
• Azure service cluster extensions for Azure Arc-enabled Data Services, App Services
on Azure Arc (including web, function, and logic apps) and Event Grid on
Kubernetes.
• Single pane of glass
35
Source: Microsoft
36
Source: Microsoft
37
Source: Microsoft
38
Hybrid Applications with Azure Arc
39
Azure and Azure Stack
Source: Microsoft
40
Azure Stack Hub
https://learn.microsoft.com/en-us/azure-stack/user/user-overview?view=azs-
2206
41
Azure Stack Edge Pro 2
https://learn.microsoft.com/en-us/azure/databox-online/azure-stack-edge-
pro-2-overview
42
Major Cloud Platforms
• Microsoft Azure
• Google
• IBM Cloud
44
Google Cloud Platform
• Google’s entry into Cloud is based on their experience with very large data and
initially focused on platform services in this area
• In 2008, they previewed Google App Engine targeted towards web applications,
quickly followed by data-focused services (like NoSQL databases)
• Here is a current description of a web application built with the Google Cloud
Platform
45
Google Cloud Platform Services Highlights
Google advertises
• Transformative Know-How
• World Class Security
• Choice with Hybrid and Multicloud
– Kubernetes originated with Google
• Serverless for Simplicity (Cloud Functions, Cloud Run, Knative open source)
• Innovation with AI, ML and Big Data/Analytics
– Tensorflow and Keras originated with Google
– BigQuery Cloud Data Warehouse
• Managed Open Source Software
• World-wide Network
• Google G-Suite productivity applications
47
Major Cloud Platforms
• Microsoft Azure
• Google
• IBM Cloud
48
IBM Cloud
• IBM Public Cloud offers very similar functions to
other commercial clouds
• IBM has a strong focus on private and hybrid clouds
and infrastructure and management spanning clouds
– Support for Cloud Native Foundation
– Acquisition of Red Hat
• Some cloud services from IBM (especially AI/ML) will
be available on other clouds as well
• For introductory projects, see
https://cloud.ibm.com/developer/appservice/starter-kits
49
Linux, Containers & Kubernetes are Foundational for the IBM
Platform
APPLICATION DEVELOPMENT MANAGEMENT,
OPERATIONS & SECURITY
Infrastructure-as-a-Service
Bare Metal | Compute, Network & Storage Virtualization – VMWare, Openstack | Private Cloud | Public Cloud
50
OPENSHIFT CONTAINER
PLATFORM
ANY
CONTAINER
ANY
Laptop Datacenter OpenStack Amazon Web Services Microsoft Azure Google Cloud INFRASTRUCTURE
51
IBM & Red Hat: Strategic Architecture
IBM Software
for the Hybrid Cloud
Built in
Self-Service Automated Deployment
RED HAT HYBRID CLOUD PLATFORM + Day 2 Operations
Private Cloud IBM Public Cloud AWS Microsoft Azure Google Cloud Edge On-premises IBM Power & Z
56
Transforming an existing mission critical workload to
the cloud is challenging
• Refactoring complex,
interconnected applications & data
On-premises | Off-premises
A pragmatic approach to modernizing applications
50%
Modernize and Migrate
to Cloud Public SaaS
Public
10% Re-Host (lift & shift)
2.5 X
productivity processes removed
• Refactor/Re-Architect
– Most expensive option
– Necessary when application has reached a tipping point
– E.g. unwieldy monolith needs breaking up
• Retire
– Application is redundant
• Retain
– Core application with significant IP
– Performing well in specific enterprise setting (e.g. mainframe high
volume transactional application)
– Can be integrated into hybrid cloud settings
62
See also: See: https://aws.amazon.com/blogs/enterprise-
AWS 6Rs – Options for Cloud Migration
Source: https://aws.amazon.com/blogs/enterprise-strategy/6-strategies-for-migrating-
applications-to-the-cloud/
63
Further Considerations for Migration
(Adapted from Gartner 5Rs, ca. 2010)
• Business considerations for the type of migration may change
over time
• Consider the entire spectrum of hybrid cloud when making
choices
• Strive for workload ‘portability’
– Containerize!
– Use loosely coupled architectures (message and event based)
– Use standard interfaces and open source where possible, avoid lock-in
– Don’t underestimate management needs – observability is key
• Consider the entire lifecycle
• Consider an incremental approach (strangler pattern)
64
Serverless 1: IBM Cloud Code Engine
“Practical Cloud Topics – Platforms, Applications, and Best Practices”
Simon Moser
CTO Cloud Container Services
IBM Research & Development
Email: smoser@de.ibm.com
Linkedin: https://www.linkedin.com/in/simonmoser/
X / Twitter: @mosersd
Agenda
• What is Serverless?
• Introduction to FaaS
• Serverless computing is a cloud computing execution model in which the cloud provider allocates
machine resources on demand, taking care of the servers on behalf of their customers1.
• Around 2016, the term "serverless functions" started to take off in the tech industry and was
presented as the undeniable future of infrastructure2
• "Serverless" is a misnomer in the sense that servers are still used by cloud service providers to
execute code for developers1.
• Developers of serverless applications are not concerned with capacity planning, configuration,
management, maintenance, fault tolerance, or scaling of containers, VMs, or physical servers1.
1 - https://en.wikipedia.org/wiki/Serverless_computing
2 - https://matduggan.com/serverless-functions-post-mortem/
Serverless 101: Traditional model
Scales inherently:
• One process per request
No cost overhead for resiliency :
• No long running process to be made HA /
multi-region
Introduces event programming model
Microservices: Function
• Decoupling
Function
FaaS
Microservice
• Further decoupling – each fn does one thing Function
• The idea of a serverless function replacing the traditional web framework or API has almost
disappeared. Even cloud providers pivoted, positioning functions as more "glue between
services" than the services themselves.
What does Functions
promise to developers?
Event Driven
Workloads
What does Functions
promise to developers?
Scalability
IBM Cloud Functions
Demo time
IBM Cloud Functions
cold container
IBM Cloud Functions
pre-warmed container
39
Performance is king
faster
Now that you
understood FaaS -
What’s Serverless ?
Serverless Value Proposition
• Better resource utilization: Only pay for resources being used, instead of
resources idling around
Tim Wagner (Inventor of AWS Lambda and former AWS GM for Lambda)
Example: World’s fastest video compression with serverless
http://pages.cs.wisc.edu/~shivaram/cs744-readings/excamera.pdf
Next Generation Serverless
Serverless + Elimination of limits = Next Gen Serverless
- Transparent scaling (including scale- - (Almost) no CPU, memory or disk
to-zero) limits
- No infrastructure mgmt. - No duration limit
- No capacity mgmt.
- Pay-by-consumption
Demo time
How did we build
Code Engine
IBM Cloud Code Engine Conceptual architecture
End users
Speed & IBM Cloud Code Engine
ease of use
Function
App </>
knative Istio
Batch jobs
Developer
(Code Engine user)
Container
...
Control
• On a regional level
• Custom autoscaler based on capacity
demands
Knative Istio Tekton
Serving is the runtime Components to connect, Define a series of re-
component that hosts and secure, control, and usable tasks to build a
scales your application as observe services. CI/CD workflow
K8s pods
ServiceMesh, auto-mTLS Each run as a container
Eventing contains tools for etc,
managing events between
loosely coupled services
Open Source
29
Shipwright Paketo Kaniko
Shipwright is an extensible A collection of buildpacks A tool to build container
framework for building Leveraging the Cloud Native images
container images on Buildpacks framework to
Kubernetes. make image builds easy, Builds container images
performant and secure from Dockerfiles
Declare and reuse build
strategies to build your Ensuring that upstream Executes each command
container images. languages, runtimes and within the Dockerfile
frameworks are completely in userspace
Shipwright supports continuously patched in
popular tools such as response to vulnerabilities Enables building container
Kaniko, Cloud Native and updates images in environments
Buildpacks, Buildah, and that can't easily or securely
more! run a Docker daemon (such
as a standard Kubernetes
cluster)
Open Source
30
Whom did we build
Code Engine for?
Serverless is suitable for different personas now
• I can run my containerized application without having to • I love Functions-as-a-Service and can now run them with
worry about sizing, creating or managing a cluster. almost no limits.
• “Run my container” vs. “Give me a cluster, that I can • I now have a single platform to securely combine
then run my container on”. Functions with Apps and other workloads
• I can create powerful batch jobs and easily combine them I can start utilizing a new powerful platform and:
with events and other services.
• keep using a “push source code” experience
• The underlying platforms scales out and allows me to run
massively parallel jobs … and I only pay for what I use. • do not have to worry about containers
… and …
• Today we see customers combining Web-Apps and Batch processing in one Project
• Integration of new flavours into a K8s cluster pool is not instantaneous (and therefore slows
adoption)
• Eating into the profitability of the business, „pool-less“ architectures are a goal
• Can a VM be Serverless ?
Yes, it can. Please welcome: Serverless VMs
Rental Car Uber • The Uber of compute, just better
Rental car Trip from A to B
Unit & degree of lock-in • Customer doesn’t have to care about patching,
Fill tank, locking, no damage, driving, Get in and get out of the car, if it compliance and alike
if something breaks get it fixed, stick breaks, just get out and take another
Customer Responsibility to the traffic laws, if it breaks, calls Uber
service and wait with the car • Pays only for as long as his process is running
As long as the car is in the hands of Customer pays a fixed price for the
Charging model
the customer, From date x to date y, route from A to B • Can use all hardware flavors instantaneously
incl. when sleeping at night
Broad spectrum of cars Broad spectrum of cars • Scales transparently - the resource allocation happens
Size
Bad – if I need to take care of more Very good – don’t need to do anything transparently behind the scenes.
people, I need to order more cars, but special – whether it’s 1 person or
that takes a long time 1000, the ordering process is the
Elasticity same, and it’s within minutes. I order
them, and they come quickly in
basically any number
Very broad – from compact cars up to More limited than Hertz wrt car
trucks selection, but its higher-level service
Choice allows growing into domains Hertz
can’t (e.g. food delivery)
From date a to b – irregardless of how Only charged for the period of time
much used (incl. when I’m sleeping) when sitting in the car
Charged duration
In general cheaper than Uber Attractive for short trips. The longer
the distance, the more renting
Price becomes attractive
Code Engine vs. AWS Lambda
Typical customer complaint AWS Lambda IBM Code Engine
Why can’t I get more memory per execution? X 3GB ✅ 30GB and higher
CPU cores >> 4 / VSI limits Max 2 Max 4 vCPU EC2 limits
Cold start time low-digit seconds à few 100’s of Millisecs Seconds-mins minutes
millisecs
Note that Code Engine unifies the best-of-breed capabilities, allowing to use them in combination, which is not possible on AWS. Examples:
- more than 15mins duration & scale-to-zero
- Payload size > 6MB & low cold start time
© 2021 IBM Corporation 40
Serverless distributed
Code Engine: applications
Apps
• Stateful or stateless
• Private node-to-
Containers node communication
Workload examples • Data which must be in • Apps having special • Continously • High-volume web apps / • API /microservice / web
(sweetspot) on-prem DC OS requirements runnning processes APIs High-volume web app implementations
• Mainframe apps • Apps packaged into (e.g. game engines) apps / APIs • Mobile backends
existing VM images • distributed • Reaction to streaming /
• Live-videostreams technologies (e.g. data IoT, Cognitive, etc.
(resource-heavy) mongodb, events
zookeeper)
44
Comparison: Developer
Experience
On-premise
VMs PaaS
Datacenter Containers FaaS / Serverless
Developer view Just the app code Just the app code container VM None
Autoscaling inherent, no delay mgmt function mgmt function mgmt function None
45
Comparison: Workload Sweet
Spots
On-premise VMs Containers PaaS FaaS / Serverless
Datacenter
Artifact physical machine VM container app code action code, trigger, rule
Developer • Developer • Installs or • Creates application or • Uploads complete • Uploads only artifacts
usage manually clones an microservices, and packages it in a application using a CF • No explicit management of
installs existing OS, containe supported runtime. computing resources
middleware and packages • Deploys the container to the server. • Explicitly binds services to required.
and services on the entire OS • Must manage loading of Docker application • No starting and stopping of
dedicated in a VM image components and any • Explicitly starts/stops the application required.
hardware. and deploys to orchestration/communication among cloud application.
the server. containers. • Entire applications is
• Developer atomically packaged and
must stop/stop executed.
the entire VM. • Any changes requires
deployment of the entire
application.
46
IBM Cloud / DOC ID / Month XX, 2018 / © 2018 IBM Corporation 47
Cloud Computing
Serverless in AWS
Stuttgart University
WS 2023/24
11-21-2023
2
Shades of Serverless Computing
• Event-based computing (with Function as a Service)
– Apache OpenWhisk, Knative
– AWS Lambda
– IBM Functions, Azure Functions, Google Cloud Functions…
• Any kind of managed service where you ‘just deliver code’ (or
a container)
– AWS Serverless Platform (incl. AWS Lambda and Fargate)
– IBM Code Engine
– Google Cloud Run
6
Events, Functions, Services
https://aws.amazon.com/getting-started/deep-dive-serverless/?e=gs2020&p=gsrc
7
Lambda Execution
removes
from polls
queue
8
Usage of Serverless in Lambda
https://www.datadoghq.com/state-of-serverless/
• Web applications: serve the front-end code via Amazon S3 and Amazon
CloudFront, or automating the entire deployment and hosting with AWS
Amplify Console.
• Web and mobile backends: the front-ends interact with the backend via
API Gateway. Integrated authorization and authentication are provided by
Amazon Cognito or APN Partners like Auth0.
• Data processing: event-based processing tasks triggered by data changes
in data stores, or streaming data ETL tasks with Amazon Kinesis and
Lambda.
• Parallelized computing tasks: splitting highly complex, long-lived
computations to individual tasks across many Lambda function instances
to process data more quickly in parallel.
• Internet of Things (IoT) workloads: processing data generated by physical
IoT devices.
11
Example: IoT Backend
This is a typical example for preventive maintenance that holds for many
Industries.
12
Example: Real-Time File Processing
CreateThumbnail
13
AWS Lambda Concepts
• Lambda runs functions to process events.
• An event is a JSON formatted document that contains data for a function
to process. The Lambda runtime converts the event to an object and
passes it to your function code. Invoking a function determines the
structure and contents of the event.
• All runtimes share a common programming model. You tell the runtime
which method to run by defining a handler in the function configuration,
and the runtime runs that method. The runtime passes objects to the
handler that contain the invocation event and the context, such as the
function name and request ID.
• Functions can be synchronous (e.g. for API Gateways), asynchronous (e.g.
SNS, S3), or poll-based (DynamoDB, Kinesis)
• Concurrency is the number of requests that your function is serving at any
given time.
• A trigger is a resource or service that invokes a function, it can be a
program, an AWS service or an event source mapping (like reading from a
stream).
14
Creating a Serverless Function
• From the dashboard, select ‘create function’
• Four options:
1. Create from Scratch
2. Use a blueprint
3. Deploy a container image
4. Browse serverless app repository
• If creating a function from scratch, provide
– Name, runtime info, permissions
– Use LabRole for AWS Academy setup
– .Net, Go, Java, Python, Node.js, Ruby
• Create Function
• This brings up the console for your application
• Select the test tab to test it out
– You need to configure a test event
15
Sample Function (lambda_function.py)
import json
16
Sample Function (index.js)
17
Showing event content (Python)
A greeting by name
# import the JSON utility package since we will be working with a JSON object
import json
# define the handler function that the Lambda service will use an entry point
def lambda_handler(event, context):
# extract values from the event object we got from the Lambda service
name = event['firstName'] +' '+ event['lastName']
# return a properly formatted JSON object
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda, ' + name)
}
import sys
def handler(event, context):
return 'Hello from AWS Lambda using Python' + sys.version + ‘!’
• Sample Dockerfile
FROM public.ecr.aws/lambda/python:3.8
COPY app.py ./
CMD ["app.handler"]
21
Serverless Microservices with Lambda
30
Well-Architected Best Practices for Lambda
https://www.datadoghq.com/blog/well-architected-serverless-applications-
best-practices/
31
Lambda Resources
• https://aws.amazon.com/lambda/getting-started/
• Tutorials:
– CRUD API with Lambda and DynamoDB
https://docs.aws.amazon.com/apigateway/latest/developerguide/http-api-dynamo-
db.html
– Basic Web Application
https://aws.amazon.com/getting-started/hands-on/build-web-app-s3-lambda-api-
gateway-dynamodb/?trk=gs_card
• serverless.com (good overview)
• Matthew Fuller: AWS Lambda, A Guide to Serverless Microservices, 2016,
available on Kindle (a bit dated, but decent introduction)
• Serverless resources from AWS: serverlessland.com
– Pattern collection: serverlessland.com/patterns
• github.com/serverless/serverless: Github repo with a serverless framework
• Best practices:
https://www.datadoghq.com/blog/well-architected-serverless-applications-best-practices/
32
API Gateway
• API Gateways enable developers to create, publish and
manage APIs
• Gateways serve as a control point for applications to access
backend services
• They provide authorization, traffic management, version
control etc.
• Amazon API Gateway options:
– HTTP API for RESTful APIs that only require proxy functions
– REST API for full API management
– Websocket API for persistent two-way connection between clients and
backend, e.g. for chat apps
• https://docs.aws.amazon.com/apigateway/latest/developergu
ide/welcome.html
33
Amazon API Gateway
34
Serverless Microservices with
API Gateway and Lambda
HelloWorldAPI
HelloWorldFunction
HelloWorldDatabase
MyBucket
36
Amazon Simple Notification Service (SNS)
https://console.aws.amazon.com/sns/v3/home?region=us-east-1#/dashboard 37
Event-Driven Architecture with Topics, Filters
and Queues
SQS
SNS Topic
Filters
38
Serverless Containers
with AWS Fargate
39
ECS Fargate
Structure
• Fully managed
• Default for AWS
• Kubernetes Pods can
use Fargate
• Longer running
workloads
Source:
https://docs.aws.amazon.com/AmazonECS/latest/developerguide/Welcome.html 40
Serverless Microservices with Fargate
Fargate
+ Good for long-running processes
(-) Initial startup time longer than Lambda, negligible afterwards
+ Good if you want to control your own auto-scaling
(-) No inherent event integration (needs to come via ingress)
(-) Stateful apps not recommended
+ More memory available, cheaper than Lambda for high memory
+ Containerizing often easier than refactoring
(-) EKS Fargate only FedRAMP Moderate
42
A Modern Application in AWS with the
Serverless Stack
Events
Dynamic Content
DynamoDB
Longer-Running
Processes
(Fargate)
S3
Static Content
44