Professional Documents
Culture Documents
AWS DevOps Professional (DOP-C01) - Notes
AWS DevOps Professional (DOP-C01) - Notes
Professional
Encryption 14
Encryption Approaches 14
Encryption Context 14
Symmetric Encryption 15
Asymmetric Encryption 16
Signing 17
Steganography 18
DNS 22
DNS - Basic 22
DNS Zone 23
DNS - Recall 23
DNS - Root 23
DNS - Hierarchy 24
DNS - Resolution 24
DNS - Remember 24
Route53 Fundamentals 24
Route53 - Product Basics 24
Route53 - Register Domains 25
Route53 - Hosted Zones 25
EB and Docker 39
EB and Docker - Single Container 39
EB and Docker - Multi-Container 39
Lambda Versions 40
Lambda Aliases 41
Lambda Layers 43
API Gateway 48
API Gateway - Refresher 48
API Gateway - Authentication 49
API Gateway - Endpoint Types 49
API Gateway - Stages 49
API Gateway - Errors 49
API Gateway - Caching 50
Step Function 58
Some problems with Lambda 58
State Machines 58
Introduction to Containers 59
Virtualisation Problems 59
Containerization 60
Image Anatomy 60
Container Anatomy 61
Container Registry 61
Container Key Concepts 61
ECS - Concepts 61
ECS 61
ECS Concepts 62
OpsWorks Stacks 64
OpsWorks Stacks 64
OpsWorks Stacks - Server Instances 64
R53 Interoperability 78
R53 - Both Roles 79
R53 - Resgistrar Only 79
R53 - Hosting Only 80
CICD in AWS 80
CodePipeline 82
CodePipeline - Basic 82
CodeBuild 83
CodeBuild - Basic 83
CodeBuild - Architecture 83
CodeDeploy 84
CodeDeploy - Basic 84
CodeDeploy - Configuration 84
Jenkins 87
Jenkins Architecture 88
Jenkins on AWS 88
Jenkins with CodePipeline 89
CloudWatch 89
CloudWatch - Architecture Concepts 89
CloudWatch - Data 90
CloudWatch - Alarms 90
CloudWatch - Data Architecture 91
CloudWatch Logs 91
CloudWatch Logs - Ingestion 91
CloudWatch Logs - Subscriptions 92
CloudWatch Logs - Aggregation 93
CloudWatch Logs - Summary 93
Athena 93
Athena - Basic 93
Kinesis Data Streams 93
Kinesis Data Streams - Concepts 94
Kinesis Data Streams - Architecture 94
SQS vs Kinesis 94
MapReduce 96
MapReduce 96
EMR Architecture 98
EMR Architecture 98
Amazon Redshift 99
Redshift - Basic 99
Redshift Architecture 99
S3 Events 136
S3 Event Notifications 137
Lambda@Edge 150
Lambda@Edge 150
Lambda@Edge - Use Cases 150
Technical Fundamentals
Encryption
Encryption Approaches
● Encryption At Rest
● Encryption In Transit
Encryption Context
● Plaintext
○ Is unnincrypted data
○ It can be text, but it doesn’t have to be
○ Can be documents, image, or even applications
○ Is data that you can load into an application and use, or can load and
immediatly read that data
● Algorithm
○ Is a piece of code or more specially a piece of maths which takes plaintext
and an encryption key, and it generates encrypted data
○ Examples: Blowfish, AES, RC4, DES, RC5 and RC6
○ When an algorithm is being used, it needs the plaintext and it needs a key
● Key
○ A key at is simplest is a password, but it can be much more complex
○ When an algorithm takes plaintext and the key, the output that it generates is
ciphertext
● Ciphertext
○ Just like plaintext, ciphertext isn’t always text data
○ Is just encrypted data
○ The relationship between all these is that encryption, it takes plain text, it uses
an algorithm and a key, and uses those things to create ciphertext
Decryption is just the reverse, it takescuphertext, it takes a key and it ghenerates plaintext
Symmetric Encryption
● A symmetric encryption algorithm is used and this accepts the key and the plaintext
● Once both of those are accepted, it performs ancryption and it outputs ciphertext, the
encryped info
● The encrypted info is now secure because they are ciphertext and nobody can
decipher without the key
● They can be sent over any transmission method, even an insecure way to the end
receiver
● The encryption removes the risk of transmitting this data in the open
● So even if we handed the ciphertext overto an untrusted party and asked for him to
deliver that to the end receiver, that would still be safe because the cuphertext is
undecipherable without the key
● But the end receiver doen’t have the key wich was used to encrypt the data
● With symmetric key encryption, the same key is used for both the encryption and
decrypt processes
● It gets tricky beacause transfer this key electronically would not be secure and could
be intercepted by third parties
● This is why symmetric encryption is great for things like laptops, but no useful for
situations where the data needs to be transfered between two remote parties,
because arranging the transit of the key is the problem
● Generally this is done in advance so there is no delay in decrypting the data
● If the data that we’re transferring is time sensitive, the transit of the encryption key
needs to happen in advance
● And that is the most complex part of this method of encryption
● If we did have a way to transfer the key securely, then the same algorithm would
decrypt the data using the key and the ciphertext, and then would return the original
message that was sent
Asymmetric Encryption
It makes much easier to exchange keys because the keys used in asymmetric encryption
are themselves asymmetric
● To use the asymmetric encryption, the first stage id for the sender and receiver to
agree with an asymmetric algorithm to use, and then create encryption keys for the
algorithm, which logically enough will be asymmetric encryption keys
● Asymmetric encryption keys are formed of two parts, a public key and a private key
● For both sides to be able to send and receive to each other, both sides would need to
make both public and private keys
● In the asymmetric scenarios, only the receiver side will need to generate any keys
● A public key can be used to generate ciphertext, which can only be decrypted by the
corresponding private key
● The public keys cannot decrypt data that it was used to encrypt
● Only the private key can encrypt that data
● This means that the private key needs to be guarded really carefully because it’s
what’s used to decrypt data
● If it leaks, the receiver could be compromised
● The public key is only used to encrypt
● So when the receiver uploads his public key to his cloud storage so that anyone can
access it
● The worst thing that could happen to anyone who obtains the receiver public key is
that he or she could use it to encrypt plaintext into ciphertext that only the receiver
could decrypt
● So there is no downside to anyone getting hold of the receiver's public key
● So with asymmetric encryption, there is no requirement to exchange keys in advance
● As long as the receiver uploaded his public key to somewhere that was accessible to
the world, then the first step would be for the sender to download the receiver’s
public key
● Using the receiver public key and the plaintext, the asymmetric algorithm would
generate some ciphertext
● The ciphertext can then be transmitted to the receiver and once received only then
the data could be decrypted
● The receiver already has his private key and so he provides that private key and the
ciphertext to the algorithm, which decrypts the ciphertext back into plaintext, and then
the receiver has a copy of the plaintext of the original message
● Asymmetric encryption is generally used where two or more parties are involved
● And generally when those parties have never physically met before
● Asymmetric encryption is computationally much more difficult to do than symmetric
● And so many processes use asymmetric encryption to initially agree and
communicate a symmetric key and then the symmetric key is used for
communication between those two parties from that point onward
Signing
Generally used for ID verification and certain log on systems
● This requires that both sides to operate as one, and so the sender needs to know
that the receiver has received the message and that agreed with them
● The receiver might want to respond with a simple okay message
● So the message will be sent to the sender with the okay message
● The issue is that anyone can encrypt the message to another party using asymmetric
encryption
● Anyone could get a hold of the sender public key and encrypt a message saying
okay and send it to the sender and the sender wouldn’t necessarily be aware
whether that was from the receiver or not
● Encryption does not prove identity
● But for this, we can use the signing process
● With the signing, the receiver could write this okay message, and then he could take
the message and using his private key, he can sign that message
● Then, that message can be sent across to the sender and when the sender receives
that message, he can use the receiver public key to prove whether that message was
signed using the receiver‘s private key
● So key signing is generally used for ID verification and certain log on systems
Steganography
Method of hiding something in something else
● With steganography, the sender could generate some ciphertext and hide it in a
puppy image
● The image could be delivered to the receiver who knows to expect the image with
some data inside and then extract the data
● To anyone else, it would just look like a puppy image and everybody knows there’s
no way that the sender would send the receiver a puppy image, so there is plausible
deniability
● The effect of steganography might be a slightly larger file but it would look almost
identical
● Effective steganography algorithms make it almost impossible to find the hidden
data unless you know a certain key, a number, or a pattern
● Steganography is just another layer of protection
● The steganography algorithm would take the original picture, select the required
number of pixels, adjust those pixels by a certain range of values and what it would
generate as an output would be an almost identical puppy image, but hidden there
would be slight changes
● It allows you to embed data in another piece of data
● To be really secure, the sender would encrypt some data using the receiver’s public
key, take that ciphertext, use steganography to embed it in an image that wouldn’t
be tied back to the sender, send this image to the receiver and then the receiver
could also use steganography to extract the piece of ciphertext and then decrypt it
using his private key and the same process could be followed in reverse to signal an
okay, but the receiver in additional to encrypting that okay, would also sign it
● So the sender would know that it came from the receiver
Distributed Denial of Service (DDoS)
● Spoof a source IP address and initiate the connection attempt with a server
● The server tries to perform step two of the handshake, but it can’t contact the source
address because it’s spoofed
● In general, it hangs in this step waiting for a specified duration, and this consumes
network resources
1 - Cipher Suites
● TLS begins with an established TCP connection.
● Agree method of communications, the “Cipher Suite”
● At this point, the client and server have agreed on how to communicate and the client
has the server certificate.
● The certificate contains the server ‘public key’
2 - Authentication
● Ensure the server certificate is authentic, verifying the server as legitimate
3 - Key Exchange
● Move from Asymmetric to SYmmetric keys in a secure way and begin encryption
process
● Both sides confirm the handshake and from them on, communications between client
⇔ server are encrypted
DNS
● DNS is probably one of the most important distributed databases which exist today.
● It is used for service discovery, configuration and the operation of most consumer
web browsing and other internet activities.
● While not strictly required in detail for the exam - understanding DNS will help you
answer DNS related questions and help make sense of other AWS lessons
throughout the course.
● IANA : https://www.iana.org
● Root hints : https://www.internic.net/domain/named.root
● Root Servers : https://www.iana.org/domains/root/servers
● Root Zone Database : https://www.iana.org/domains/root/db
● Root Zone File : https://www.internic.net/domain/root.zone
● Delegation Record for .com : https://www.iana.org/domains/root/db/com.html
DNS - Basic
● DNS is a discovery service
● Translate machone into human and vice-versa
● www.amazon.com => 104.98.34.131
● It`s huge and has to be distributed
● 4,294,967,296 IPv4
DNS Zone
DNS - Recall
● DNS Client => your laptop, phone, tablet, PC
● Resolver => Software on your device, or a server which queries DNS on your behalf
● Zone => A part of the DNS database (amazon.com)
● Zonefile => physical database for a zone
DNS - Root
● www.amazon.com
● DNS root & Root Zone
○ DNS Root Servers (13)
DNS - Hierarchy
DNS - Resolution
DNS - Remember
● Root Hints => config points at the root server IPs and addresses
● Root Server => Hosts the DNS root zone
● Root Zone => Points at the TLD authoritative servers
● gTLD => generic top level domain (.com .org)
● ccTLD => country-code top level domain (.uk .eu etc)
Route53 Fundamentals
CNAME Records
MX Records
TXT Records
IAM Users
● 5000 IAM users per account
● IAM User can be a member of 10 groups
○ This has systems design impacts
○ May impact Internet-scale applications
○ May impact Large orgs & org mergers
CloudFromation
CloudFromation
Elastic Beanstalk
● A new application version is deployed into all instances within the EB environment at
the same time
● But it causes outages during the deployment and doesn’t have a great method of
handling failures
● This is a good one to use for a development or testing environment, but nothing that
of a great importance
EB Deployment - Rolling
● With this method you deploy the new application version in batches
● This type of method is great when you want to sleep through all of the instances
within your environment, taking things batch by batch
● Each batch is taken out of service
● The new application is deployed into that batch and then when they pass health
checks they’re put back into service
● You can identify any problems before moving on to the next
● This means that you have additional control and it’s safer because the process
continues only when the current batch passes its health checks
● But it does mean a loss in capacity as instances are removed from service while the
deployment is happening
● This means there’s no increase in cost because the capacity of the environment is
maintained
● This method is similar to rolling deployment but we can’t drop any capacity
● We’re gonna start with an environment with four instances and all four are running
the currently deployed application version which is version 1
● This method of deployment starts by us having a version two of the application, and
we decide to deploy it into this environment
● Immediately whatever batch size we pick, let’s say that it’s two instances, this means
that a new batch is deployed running the new version of the application
● This now means that we have 150% capacity for our application because we chose a
batch size which is half of the number of instances in which our environment is
running
● So now we have two application versions running within our environment
● Four instances on version 1 and two on version two
● The deployment starts on two of the original instances, and these are removed from
service, taking the capacity back down to 100% levels
● Once this batch is finished, they’re added back into service and then deployment
happens to the next batch
● Again, they are taken out of service
● the new version is deployed and when finished they’re brought back into service, and
this means that we now have a total of six instances running version two
● This is two more, so one batch more than we originally had
● So once the whole deployment is finished the extra batch is removed and the
environment again has four instances
● This process takes longer
● There will be additional costs for the extra instances running during the deployment
but it’s safer and great for production usage because we don’t drop any capacity
EB Deployment - Immutable
● With this method, we start with a similar environment three instances running the
same application
● Version once, when we start the deployment the original instances aren’t touched at
all
● They’re treated as immutable instead a temporary auto-scaling group is created and
within it another full set of instances are immediately created and the new application
version is deployed onto these instances
● Once finished, the result is two sets of instances the original set running version one
in the original auto-scaling group and the new ser running in the temporary
auto-scaling group deployed with version two
● Once completed and the health checks passed, the new instances are moved into
the original auto-scaling group
● The original instances are terminated and the temporary auto-scaling group is
deleted
● This method has the highest cost because it uses double the instances but it offers
the quickest and lowest risk rollback because if anything goes wrong, the original
instances are available until right at the end of the process
● So until the deployment finishes in its entirety non of the original infrastructure is
touched in any way
● The Traffic Splitting deployment method is just like immutable, but when a new set of
instances are ready traffic can be split between the new and the old version
● In this case, we’re sending 50% of the traffic towards the new version on the right
and leaving the remaining 50% on the original version
● It’s just a form of AB testing and it allows that extra set of verification before you
finished the deployment
● This means your regression path is relatively quick because you can just return the
traffic flow and remove the new instances and once you’re entirely happy with the
new version
● Traffic splitting is a fairly new method of deployment introduced in the middle of 2020
EB Deployment - Blue Green
● You have two different environments, a blue environment on the left and a green on
the right
● You have the DNS Record pointing ate the blue running version one ate the
application you deploy version two to a brand new environment
● You run both of them at the same time, get people to be to test the green
environment on the right and then when ready you can change the DNS to point at
the green environment
● This is really good because it means you have complete control over these two
different environments and you’re not relying on Elastic Beanstalk to orchestrate the
deployment
● You can control when the switch over occurs and leave the original environment in
place for as long as you want to
EB Cloning
● Create a NEW environment, by cloning an EXISTING one
● Copy PROD-ENV to a new TEST-ENV (for testing and Q/A)
● .. new version of platform branch
● Copies options, env variables, resources and other settings
● Includes RDS in ENV, but no data is copied
● “unmanaged changes” are not included
● Console UI, API or “eb clone EXISTING-ENVNAME”
EB and Docker
Lambda Versions
You can use versions to manage the deployment of your functions. For example, you can
publish a new version of a function for beta testing without affecting users of the stable
production version. Lambda creates a new version of your function each time that you
publish the function. The new version is a copy of the unpublished version of the function.
● You can create one or more aliases for your Lambda function. A Lambda alias is like
a pointer to a specific function version.
● Users can access the function version using the alias Amazon Resource Name
(ARN).
● Aliases can point at a single version, or be configured to perform weighted routing
between 2 versions.
Lambda Environment Variables
● KEY & VALUE pair (0 or more)
● Associated with &LATEST (Can be edited)
● Associated with a version (immutable // fixed)
● Can be accesses within the execution environment
● Can be encrypted wirth KMS
● Allow code execution to be adjusted based on variables
https://docs.aws.amazon.com/lambda/latest/dg/configuration-envvars.html#configuration-env
vars-samples
Lambda Monitoring
● All Lambda metrics are available within CloudWatch
● .. directly or via monitoring tab on a specific function
● Dimensions - Function Name, Resource (Alias/Version), Executed Version
(combination alias and version, wighted alias) and ALL FUNCTIONS
● Invocations, Error, Duration, COncurrent Executions
● DeadLetterErrors, DestinationDeliveryFailures
Lambda Logging
● Lambda Execution Logs => CloudWatch Logs
● stdout or stderr
● Log Groups = /aws/lambda/functionsname
● Log Stream = YYY/MM/DD/[&LATEST || version]..random
● Permissions via Execution Role
● …default role gives logging permissions
Lambda Tracing
● X-Ray shows the flow of requests through your application
● Enable aActive Tracinga on a function
● aws lambda update-function-configuration --function-name my-function
--tracing-config Mode=active
● AWSXRau Daemon WriteAccess managed policy
● Use X-Ray SDK within your function
Lambda Layers
● You can configure your Lambda function to pull in additional code and content in the
form of layers. A layer is a .zip file archive that contains libraries, a custom runtime,
or other dependencies.
● With layers, you can use libraries in your function without needing to include them in
your deployment package.
https://aws.amazon.com/blogs/aws/new-for-aws-lambda-use-any-programming-language-an
d-share-common-components/
Lambda Container Images
● Lambda is a Function as a Service (FaaS) product
● Create a function, upload code, it executes
● This is great …but has 2 problems
● ORGS…uses containers & CI/CD processes build for containers…
● …would like a way of locally testing lambda functions before deployment
● Lambda runtime API - IN CONTAINER IMAGE
● AWS Lambda Runtime Interface Simulator (RIE) - Local Test
Elastic Load Balancing invokes your Lambda function synchronously with an event that
contains the request body and metadata.
Lambda and ALB - Multi-Value Headers
API Gateway
Amazon API Gateway is a fully managed service that makes it easy for developers to create,
publish, maintain, monitor, and secure APIs at any scale. APIs act as the "front door" for
applications to access data, business logic, or functionality from your backend services.
https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-s3-
messages.htm
State Machines
● Serverless workflow.. START -> STATES -> END
● States are THINGS which occur
● The maximum Duration is 1 year…
● Standard Workflow (Default)
● Express Workflow
○ High volume
○ Event processing workflow. ex? IOT
○ Streaming data processing transformation
○ Mobile Application Backend
● Started via API Gateway, IOT Rules, EventBridge, Lambda
https://github.com/acantril/learn-cantrill-io-labs/tree/master/aws-serverless-pet-cuddle-o-tron
Introduction to Containers
Virtualisation Problems
Containerization
Image Anatomy
Container Anatomy
Container Registry
ECS - Concepts
● ContainerDefinition - Amazon Elastic Container Service
● TaskDefinition - Amazon Elastic Container Service
ECS
ECS Concepts
● Container Definition - Image & Ports
● Task Definition - Security (Task Role), Container(s), Resources
● Task Role - IAM Role which the TASK assumes
● Service - How many copies, HA, Restarts
OPSWORKS
OpsWorks Stacks
● Handles with Infrastructure, Configuration and Application within one Service
OpsWorks Stacks
● A Stack is a set of layers, instances and related AWs resources whose configuration
you want to manage together
● Layers
○ Elastic Lead Balancing Layer
○ Application Server Layer
■ Use Cookbooks for the pplication deployment (Cookbooks = Chef
recepies)
● User have to indicate where they are: GIT, HTTP Archive, S3
Archive
○ Amazon RDS Layer
● An instance can be a member of multiple layers. If any of those layers has auto
healing disabled, AWS OpsWorks Stacks does not heal the instance if it fails.
Parameters
● Standard Parameters and SecureString (used for credentials)
● Hierarccal information
● Configuration for the CloudWatch agent
● Anything of this nature
About
● It does share functionality with Parameter Store
● Designed for secrets (.. password, API Keys..)
● Usable via Console, CLI, API or SDK (integration)
● Supports automatic rotation, this uses lambda
● Directly integrates with some AWS products (...RDS)
ECS Cluster with Fargate
Delegation Problems
● Julie is a AdministratorAccess
● Julie wants Bob to be an IAM administrator
● Julie gives Bob iam:* to manahe identities
● Nothing stops Bob changing it’s own permissions
● Nothing stops Bob creating a FullAdministrator
Route53
R53 CNAME
● “A” maps a NAME to an IP Address
● leticia.io => 1.3.3.7
● CNAME maps a NAME to another NAME
● … www.leticia.io => leticia.io
● CNAME is invalid for naked/apex (leticia.io)
● Many AWS services use a DNS Name (ELBs)
● With just CNAME - leticia.io => ELB would be invalid
R53 ALIAS
● ALIAS records map a NAME to an AWS resource
● Can be used both for naked/apex and normal records
● For non apex/naked - functions like CNAME
● There is no charge for ALIAS requests pointing at AWS resources
● For AWS Services - default to picking ALIAS
● Should be the same “Type” as what the record is pointing at
● API Gateway, CloudFront, Elastic Beanstalk, ELB, Global Accelerator & S3
R53 Simple Routing
Amazon Route 53 health checks monitor the health and performance of your web
applications, web servers, and other resources. Each health check that you create can
monitor one of the following:
● The health of a specified resource, such as a web server
● The status of other health checks
● The status of an Amazon CloudWatch alarm
R53 Failover Routing
● Failover routing lets you route traffic to a resource when the resource is healthy or to
a different resource when the first resource is unhealthy
R53 Multi Value Routing
● Multivalue answer routing lets you configure Amazon Route 53 to return multiple
values, such as IP addresses for your web servers, in response to DNS queries.
● You can specify multiple values for almost any record, but multivalue answer routing
also lets you check the health of each resource, so Route 53 returns only values for
healthy resources
● Weighted routing lets you associate multiple resources with a single domain name
(catagram.io) and choose how much traffic is routed to each resource.
● This can be useful for a variety of purposes, including load balancing and testing new
versions of software.
R53 Latency Routing
● If your application is hosted in multiple AWS Regions, you can improve performance
for your users by serving their requests from the AWS Region that provides the
lowest latency.
● Geolocation routing lets you choose the resources that serve your traffic based on
the geographic location of your users, meaning the location that DNS queries
originate from.
R53 Geoproximity Routing
● Geoproximity routing lets Amazon Route 53 route traffic to your resources based on
the geographic location of your users and your resources.
● You can also optionally choose to route more traffic or less to a given resource by
specifying a value, known as a bias.
● A bias expands or shrinks the size of the geographic region from which traffic is
routed to a resource.
R53 Interoperability
● R53 normally has 2 jobs - Domain registrar and DOmain Hosting
● R53 can do BOTH, or either Domain Registrar or Domain Hosting
● R53 Accepts your money (domain registration fee)
● R53 allocates 4 Name Servers (NS) (Domain Hosting)
● R53 creates a zone file (domain hosting) on the above NS
● R53 communicates with the registry of the TLD(Domain Registrar)
● .. sets the NS records for the domain to point at the 4 NS above
● This details how Route53 provides Registrar and DNS Hosting features and steps
through architectures where it is used for BOTH, or only one of those functions - and
how it integrates with other registrars or DNS hosting.
R53 - Both Roles
CICD in AWS
● CI/CD is handled within AWS by CodeCommit, CodeBuild, CodeDeploy and
CodePipeline.
● For the SA Pro, you don't need to have a detailed understanding operationally, but
you will need a high-level, component level understanding.
● appspec.yml or appspec.json reference
https://docs.aws.amazon.com/codedeploy/latest/userguide/reference-appspec-file.ht
ml
● buildspec.yml reference
https://docs.aws.amazon.com/codebuild/latest/userguide/build-spec-ref.html
● buildspec.yml, appspec.[yml | json]
● CodeDeploy
● AWS Elastic Beanstalk or AWS OpsWorks
● AWS CloudFormation
● AWS ECS or ECS (Blue/Green)
● AWS Service Catalog or Alexa Skills Kit
● Amazon S3
CodePipeline
● AWS CodePipeline is a continuous delivery service you can use to model, visualize,
and automate the steps required to release your software.
● You can quickly model and configure the different stages of a software release
process.
● CodePipeline automates the steps required to release your software changes
continuously
CodePipeline - Basic
● Pipeline is a Continuous Delivery tool
● Controls the flow from source, through build towards deployment
● Pipelines are build from STAGES
● STAGES can have sequential or parallel ACTIONS
● Movement between stages cab require manual approval
● Artifacts can be loaded into an action, and generated from an action
● State Changes => Event bridge (Success, Failed, Cancelled)
● CloudTrail or Console UI can be used to view/interact
CodeBuild
● AWS CodeBuild is a fully managed continuous integration service that compiles
source code, runs tests, and produces software packages that are ready to deploy.
● With CodeBuild, you don’t need to provision, manage, and scale your own build
servers.
● CodeBuild scales continuously and processes multiple builds concurrently, so your
builds are not left waiting in a queue
CodeBuild - Basic
● Code Build as a service - fully managed
● Pay only for the resources consumed during builds
● Alternative to part of Jenkins functionality
● Used for builds and tests
● Uses docker with AWS services ..KMS, IAM, VPC, CloudTrail, S3
CodeBuild - Architecture
● Architecture - Gets sources from GitHub, CodeCommit, CodePipeline, S3
● Build … (and tests)
● Customised via buildspec.yml file (in root source)
● Logs => S3 and CloudWatch Logs
● Metrics => CloudWatch
● Event => EventBridge (event-driven response)
● Java, Ruby, Python, Node.JS, PHP, .NET, Go..and more…
● buildspec.yml - customize the build process
● Four main PHASES in the file
○ Install - install packages in the build environment (framework etc)
○ pre_build - sign-in to things or install dependencies
○ build - commands run during the build process
○ post_build - package things up, push docker image, explicit notifications
○ Environment variables - shell, variables, parameter-store, secrets-manager
○ Artifacts - What stuff to put where
CodeDeploy
● CodeDeploy is a deployment service that automates application deployments to
Amazon EC2 instances, on-premises instances, serverless Lambda functions, or
Amazon ECS services.
CodeDeploy - Basic
● Code Deployment as a Service
● There are alternatives - Jenkins, Ansible, Chef, Puppet, CloudFormation and more…
● Deploys code ..not resources
● EC2, On-premises, Lambda Functions and ECS
● Code, Web, COnfiguration, EXE files, Packages, Scripts, media and more
● CodeBuild integrates with AWS services & AWS Code* tools
● CodeDeploy agent (On-premises or EC2)
CodeDeploy - Configuration
● Appspec.yml (YAML or JSON formatted)
● Manage Deployments - config + lifecycle event hooks
● Files (EC2/On-prem)
○ Provides information to CodeDeploy, about which files from your application
should be installed on the instance during the deployment install
○ This is how you configure which things are installed
● Resources (ECS, Lambda)
○ Lambda
■ For lambda, it contains the name, alias, current version of a lambda
function
■ So it can be used to control all of the surrounding details about the
lambda function that is being used for the deployment
○ ECS
■ Contains things like the task definition or container and port details
used for routing traffic to your container
■ Think as the configuration for the thing running your application
● Permissions (EC2, On-prem)
○ Details any special permissions and how they should be applied to the files,
directories and folders, which are defined in the file sections
○ So if you use the file section to copy any files from your application on to
these deployments targets, then it’s permissions sections that’s going to be
used to set any special permissions on those files and folders
● Lifecycle Event Talk - Depends on what and where is being deployed
○ ApplicationStop
■ Generally used when you want to prepare for the actual deployment
itself
○ DownloadBundle
■ This is when the CodeDeploy agent copies the application down to a
temporary location
○ BeforeInstall
■ This is an event that you can use for any pre-installation tasks
● Maybe you want to decrypt some files, or create a backup of
the current application or configuration, anything that you want
to do before the install itself
○ Install
■ During this part of the deployment lifecycle, the CodeDeploy copies
the application files from the temporary location to the final destination
folder
■ This is performed by the CodeDeploy agent and you can’t run any
scripts during this step
■ This is something that’s handled on your behalf by the CodeDeploy
product and CodeDeploy agent itself
○ AfterInstall
■ Allows you to perform install steps
■ So performing any application-specific configuration, maybe changing
file permissions or applying licensing, anything that you want to do
after the install
○ ApplicationStart
■ Typically used when you want to restart or start any services that were
stopped during the ApplicationStart component of the deploy
■ This is the part when you fully installed, you’ve performed all the
configuration, and now you’re wanting to start up the application
service or services
○ ValidadeService*
■ Where you are going to verify that the component was completed
successfully
■This is the part that is going to allow CodeDeploy to determine
whether the deployment was successful or not
■ Is time to look at the application specifically, check any application
logs, or perform any tests to verify that the application has been
deployed as expected
● Remember the order and the names for the exam
Elastic Container Registry (ECR)
ECR - Benefits
● Integrated with IAM - Permissions
● Image scanning, basic and enhanced (inspector)
● Near RealTime Metrics => CW (auth, push, pull)
● API Actions = CloudTrail
● Events => EventBridge
● Replication …Cross-Region AND Cross-Account
Jenkins
● Open source CICD tool
● Can replace CodeBuild, CodePipeline & CodeDeploy
● Must be deployed in a Mater / Slave configuration
● Must manage multi-AZ, deploy on EC2, etc…
● All projects must have a “Jenkinsfile” (similar to buildspec.yml) to tell Jenkins what to
do
● Jenkins can be extended on AWS thanks to many plugins
Jenkins Architecture
Jenkins on AWS
Jenkins with CodePipeline
CloudWatch
CloudWatch - Alarms
● Alarm - Watches a metric over a time period
● ..ALARM or OK
● ..Value of metric vs threshold ..over time
● ..one or more actions
● Alarm Resolution
CloudWatch - Data Architecture
CloudWatch Logs
● CloudWatch logs is a product which can store, manage and provide access to
logging data for on-premises and AWS environments including systems and
applications
● It can also via subscription filters stream the data to Lambda, Elasticsearch, Kinesis
streams and firehose for further delivery
● Metric filters can be used to generate Metrics within Cloudwatch, alarms and
eventual events within Eventbridge.
Athena
Athena - Basic
● Amazon Athena is an interactive query service that makes it easy to analyze data in
Amazon S3 using standard SQL.
○ Athena is serverless, so there is no infrastructure to manage, and you pay
only for the queries that you run.
● Athena is easy to use.
○ Simply point to your data in Amazon S3, define the schema, and start
querying using standard SQL.
○ Most results are delivered within seconds.
○ With Athena, there’s no need for complex ETL jobs to prepare your data for
analysis.
SQS vs Kinesis
● Kinesis
○ Designed for huge scale ingestion of data
○ Multiple consumers ..rolling window
○ Ingestion of data at scale
○ Large throughput
○ Large numbers of devices
○ Persistence
○ Hight data rates
○ Consuming data at different rates
○ The consumer can consume data either in real-time or periodically
○ Move backward and forwards through time
○ Data ingestion, Analytics, Monitoring, App Clicks
● SQS
○ SQS 1 production group, 1 consumption group
○ Decoupling and Asynchronous communication
○ Persistence of messages, no window
○ Worker pools
MapReduce
MapReduce
● Data Analysis Architecture - huge scale, parallel processing
● Two Main Phases - MAP and REDUCE
● Optional - Combine & Partition
● Data is separated into ‘splits’ .. each assigned to a mapper
● Perform Operations at scale - Customisable
● Recombine Data into Results
● HDFS - Hadoop File System
● Highly Fault-tolerant - replicated between nodes
● Name Node - provides the ‘namespace’ for file system & controls access to HDFS
● Block ..segment of data on HDFS .. generally 64MB
EMR Architecture
● Elastic Map Reduce (EMR) is the AWS Managed implementation of EMR/Hadoop
within AWS.
EMR Architecture
● AWS Managed Implementation af Apache Hadoop
● .. and Spark, HBase, Presto, Flink, Hive, Pig…
● Can be operated long term .. or use ad-hoc (transient) clusters
● Runs in ONE AZ in a VPC using EC2 for compute
● Auto scales - Spot, Instance Fleet, Reserved, On-Demand
● Big data processing , manipulation, analytics, indexing, transformation and more
… (data pipeline *)
Amazon Redshift
● Redshift is a column based, petabyte scale, data warehousing product within AWS
● Its designed for OLTP products within AWS/on-premises to add data to for long term
processing, aggregation and tending.
Redshift - Basic
● Petabyte-scale Data warehouse
● OLAP(Columm based) not OLTP (row/transaction)
● Pay as you use … similar structure to RDS
● Direct Query S3 using Redshift Spectrum
● Direct Query other DBs using federated query
● Integrates with AWS tooling such as Quicksight
● SQL-like interface JDBC/ODBC connections
Redshift Architecture
● Server based (not serverlss)
● One AZ in a VPC - Network cost/perfromance
● Leader Node - Query input, planning and aggregation
● Compute Node - performing queries of data
● VPC Security, IAM Permissions, KMS at rest Encryption, CW Monitoring
● Redshift Enhanced VPC Routing - VPC Networking
Amazon Quicksight
● QuickSight is a a BA/BI Visualisation and Dashboard tool which is capable of
integrating with AWS and external data sources
Amazon Quicksght
● Business Analytics & Intelligence (BA/BI) service
● Visualizations, Ad-hoc Analysis
● Discovery and Integration with AWS Data Sources
● .. and works with external data sources
● In the exam … dashboards or visualisation
Amazon Athena
● Amazon Athena is an interactive query service that makes it easy to analyze data in
Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to
manage, and you pay only for the queries that you run.
● Athena is easy to use. Simply point to your data in Amazon S3, define the schema,
and start querying using standard SQL. Most results are delivered within seconds.
With Athena, there’s no need for complex ETL jobs to prepare your data for analysis.
System Manager
SSM Architecture and Agent Activation
● Systems Manager uses an agent architecture to allow communications between the
systems manager service and managed instances.
SSM Documents
● JSON or YAML documents
● Stored in the SSM Document Store
● Ask for Parameters and include Steps
● Command Document - Run Command, State Manager & Maintenance Windows
● Automation Document - Automation, State Manager & Maintenance Windows
● Package Document - Distributor
○ Contains a .ZIP payload
SSM Inventory & SSM Patching
● Patch Manager, a capability of AWS Systems Manager, automates the process of
patching managed instances with both security related and other types of updates.
AWS Config
● AWS Config is a service which records the configuration of resources over time
(configuration items) into configuration histories.
● All the information is stored regionally in an S3 config bucket.
● AWS Config is capable of checking for compliance .. and generating notifications and
events based on compliance.
AWS Config
● Record configuration changes over time on resources
● Auditing of changes, compliance with standards
● Does not prevent changes happening … no protection
● Regional service … supports cross-region and account aggregation
● Changes can generate SNS notifications and near-realtime events via EventBridge &
Lambda
AWS Inspector
● Amazon Inspector is an automated security assessment service that helps improve
the security and compliance of applications deployed on AWS.
● Amazon Inspector automatically assesses applications for exposure, vulnerabilities,
and deviations from best practices
Inspector - Basic
● Scans EC2 instances & the instance OS (and any other networking components
involved)
○ It’s not checking AMI’s or the applications themselves
● ..Vulnerabilities and deviations against best practice
● Length…15 min, 1 hour, 8/12 hours or 1 day
● Provides a report of findings ordered by priority
● Network Assessment (Agentless)
○ But you can add an agent to provide additional richer information
Inspector - HostPackages
● Packages (..Host Assessments, Agent required)
● Common vulnerabilities and exposures (CVE)
● Center for Internet Security Benchmarks (CIS)
● Security best practices for Amazon Inspector
AWS GuardDuty
● Guard Duty is an automatic thread detection service which reviews data from
supported services and attempts to identify any events outside of the 'norm' for a
given AWS account or Accounts.
Amazon Macie
● Amazon Macie is a data visibility security service that helps classify an protect your
sensitive and business-critical content
Macie
● Available on N. Virgina and Oregon Region
Trusted Advisor
● AWS Trusted Advisor is an online tool that provides you real time guidance to help
you provision your resources following AWS best practices.
● Trusted Advisor checks help optimize your AWS infrastructure, increase security and
performance, reduce your overall costs, and monitor service limits.
HA, FT and DR
High-Availability (HA)
● Aims to ensure an agreed level of operational performance, usually uptime, for a
higher than normal period
● Maixmixe systems online time
○ 99.9% (Three 9’s) = 8.77 hours p/year downtime
○ 99.999% (Five 9’s) = 5.26 minutes p/year downtime
● Fast or automatically recovery of issues
Fault-Tolerance (FT)
● Is the property that enables a system to continue operating properly in the event of
the failure of some (one or more faults within) of its components
● Minimize outages, levels of redundancy and system components which can route
traffic and sessions around any failed components
● Operate through failure
● Expensive - because it’s much more complex
Summary
● High-Availability - Minimise any outages
● Fault-Tolerance - Operate Through Faults
● Disaster Recovery - Used when these don’t work
DR Tips
● Backup
○ EBS Snapshots, RDS automated backupd/snapshots
○ Resgular pushes to S3, S3 IA, Glacier, Lifecycle policy, Cross Region
Replication
○ From on-premise: snowball or storage gatewa
● High availability
○ Use Route53 to migrate DNS over from Region to Region
○ RDS multi-az, ElasticCache Multi-AZ, EFS, S3
○ Site to Site VPN as a recovery from Direct Connect
● Replication
○ RDS Replication (cross region), AWS Aurora + global tables
○ Database replication from on-premise to RDS
○ Storage Gateway
● Automation
○ Cloudfromation / Elastic Beanstalk to re-create a whole new environment
○ Recover / Reboot EC2 instances with cloudwatch if alarms fail
○ AWS Lambda functions for customized automations
● Chaos Testing
○ Netflix has a "simian-army" randomly terminating EC2
DR Checklist
● EFS Backup
○ AWS backup with EFS (frequency, when, retain time, lifecycle policy) -
managed
○ EFS to EFS backup
○ Multi-region: EFS -> S3 -> S3 CRR -> EFS
● Route 53 Backup
○ Use ListResourceRecordSets API for exports
○ Write your own script for imports into R53 or other DNS provider
● Elastic Beanstalk Backup
○ Saved configuration usgin the eb or AWS console
Storage
EBS - HDD-based
● st1
○ Throughput Optimized
○ Fast hard drive
○ Not very agile
○ Cheap
○ Ideal for larger volumes of data
○ Designed for data sequentially accessed
■ Data which needs to be written or read in a fairly sequential way
○ From 125 GB top 16TB in size
○ Max of 500 IOPS
■ 500 IOPS = 500MB per second
○ Performance of 40 MB/s/TB Base
○ Performance of 250 MB/s/TB Burst
○ Use Cases
■ Big data
■ Data warehouses
■ Log processing
● sc1
○ Cold HDD
○ Even cheaper
■ But comes with significant trade-offs
○ Designed for data infrequent workload
○ Geared towards maximum economy
○ Just for store lots of data and performance doesn’t matter
○ From 125 GB top 16TB in size
○ Max of 250 IOPS = 250 MB/s
○ 12 MB/s/TB Base
○ 80 MB/s/TB Burst
○ Use Cases
■ Anythong requiring less than a few loads or scan per day
S3 Bucket Policies
● A form of resource policy
● Like identity policies, but attached to a bucket
● Resource perspective permissions
● ALLOW/DENY same or different accounts
● ALLOW/DENY Anonymous principals
READ Allows grentee to list the objects Allows grentee to read the
object data and its metadata
READ_ACP Allows grantee to read the bucket Allows grantee to read the
ACL object ACL
WRITE_ACP Allows grentee to write the ACL for Allows gratee to write the
the applicable bucket ACL for the applicable
object
FULL_CONTROL Allows grantee the READ, WRITE, Allows grantee the READ,
READ_ACP, an WRITE_ACP READ_ACP, and
permissions on the bucket WRITE_ACP permissions
on the object
Block Public Access
S3 Security - Exam
● Identity: Controlling different resources
● Identity: You have a preference for IAM
● Identity: Same Account
● Bucket: Just conrtolling S3
● Bucket:Anonymous or Cross-Account
● ACLs: NEVER - unless you must
S3 Static Hosting
● Accessing S3 is generally done via APIs
● Static Website Hosting is a feature of the product which lets you define a HTTP
endpoint, set index and error documents and use S3 like a website.
● This lesson exposures the functionality and some common usages.
● S3 Pricing : https://aws.amazon.com/s3/pricing/
S3 Static Hosting
● Normal access is via AWS APIs
● This feature allows access via HTTP
● Index and Error documents are set
● Website Endpoint is created
● Custom Domain via R53 - BucketName Matters
Static Website Hosting
Object Versioning
MFA Delete
● Enabled in versiosning configuration
● MFA is reuired to change bucket versiosning state
● MFA is required to delete versions
● Serial number (MFA) + Code passed with API CALLS
S3 Object Encryption
● This lesson steps through the various encryption options available within S3 and
finishes by looking at default bucket encryption settings
○ Client-Side Encryption
○ SSE-C
○ SSE-S3
○ SSE-KMS
● As part of the lesson we review how SSE-KMS impacts permissions and how it can
achieve role separation.
S3 Encryption
● Buckets aren’t encrypted.. objects are..
● Encryption AT REST
○ Client-Side Encryption
○ Server-Side Encryption
● Server-Side Encryption with Customer-Provided (SSE-C)
● Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3)
● Server-Side Encryption with Customer Master Keys (CMKs) Stored in AWS Key
Management Service (SSE-KMS)
SSE-C
SSE-S3(AES256)
SSE-KMS
Summary
S3 Standard
● Objects are replicated across at least 3 AZ in a Region
● 99.9999999999% durability (11’s 9)
● Replication over 3 AZ & content-MD5 Checksums and Cyclic Redundancy
Checks(CRCs) are used to detect and fix any data corruption
● HTTP/1 200 OK response is provided by S3 API Endpoints when Objects are stored
● Billed a GB/m fee for data stored
○ a $ per GB charge for transfer OUT (IN is free) and price per 1.000 requests
○ No specific retrieval fee, no minimum duration, no minimin size
● Milliseconds first byte latency and objects can be publicity available
● Use Cases
○ Frequently Accessed Data which is important
○ Non Replaceable
S3 Standard-IA
● Objects are replicated across at least 3 AZ in a Region
● 99.9999999999% durability (11’s 9)
● Replication over 3 AZ & content-MD5 Checksums and Cyclic Redundancy
Checks(CRCs) are used to detect and fix any data corruption
● Billed a GB/m retrieval fee
○ Overall cost increase with frequent data access
○ a $ per GB charge for transfer OUT (IN is free) and price per 1.000 requests
● Minimum duration charge od 30 days
○ Objects can be stored for less, but the minimum billing always applies
● Minimum capacity of 128KB per object
● Use Case
○ Long-lived data which is important
○ But where access is infrequent
S3 One Zone-IA
● Objects are replicated across at least 3 AZ in a Region
● 99.9999999999% durability (11’s 9)
● Replication for only 1 AZ
○ Assuming that the AZ where the data is stored in doesn’t fail during the time
period
● Billed a GB/m retrieval fee
○ Overall cost increase with frequent data access
● Minimum duration charge od 30 days
○ Objects can be stored for less, but the minimum billing always applies
● Minimum capacity of 128KB per object
● Use Case
○ Long-lived data
○ Non-critical & replaceable
○ Infrequent access
S3 Glacier
● Objects are replicated across at least 3 AZ in a Region
● 99.9999999999% durability (11’s 9)
● Replication over 3 AZ & content-MD5 Checksums and Cyclic Redundancy
Checks(CRCs) are used to detect and fix any data corruption
● Data in Glacier is retrieved to S3 Standard-IA temporarily
○ Expedited - 1-5 minutes
○ Standard - 3-5 hours
○ Bulk- 5-12 hours
○ Faster = More Expensive
○ First byte latency = minutes or hours
● Object cannot be made publicly accessible…any access of data (beyond object
metadata) requires a retrieval process
● 40 KB min size
● 90 day min Duration
● Use Case
○ Archival data where frequent or realtime access isn’t needed
■ Minutes - hours retrieval
S3 Intelligent-Tiering
● Tiers
○ Frequent Access Tier
○ Infrequent Access
○ Archive
○ Deep Archive
● Intelliget-Tiering monitors and automatically moves any objects not accessed for 30
days to a low cost infrequent access tier and eventually to archive or deep archive
tiers
● As objects are accesses, they are moved back to the frequent access tier
○ There are no retrieval fess for accessing objects, only a 30 day minimum
duration
● Has a monitoring and automation cost per 1.000 objects
○ The frequent access tier cost the same as S3 Standard, the infrequent the
same as Standard-IA, Archive and Deep Archive are comparable to their
glacier equivalents
● Use Cases
○ Long-lived data
○ With changing and unknown patterns
S3 Lifecycle Configuration
Presigned URL
● You can create a URL for an object you have no access too
● When using the URL, the pemrissions match the identity which generated it..
● Access denied could mean the generating ID never had access .. or doesn’t now
● Don’t generate with a role .. URL stops working when temporary creadentials
expire…
● Simple Requests
● Preflight & Preflighted requests
● Access-Control-Allow-Origin
● Access-Control-Max-Age
● Access-Control-Allow-Methods
● Access-Control-Allow-Headers
● Define Origin (original URL used)..other domains need CORS
S3 Events
● The Amazon S3 notification feature enables you to receive notifications when certain
events happen in your bucket.
● To enable notifications, you must first add a notification configuration that identifies
the events you want Amazon S3 to publish and the destinations where you want
Amazon S3 to send the notifications.
● You store this configuration in the notification subresource that is associated with a
bucket
S3 Event Notifications
● Notification generated when events occur in a bucket
● .. can be delivered to SNS, SQS and Lambda Functions
● Object Created (Put, Post, Copy. CompleteMultiPartUpload)
● Object Deleted (*, Delete, DeleteMarkerCreated)
● Object Restore (Post(Initiated), Completed)
● Replication (OperationMissedThreshold, OperationReplicatedAfterThreshold,
OperationNotTracked, OperationFailedReplication)
S3 Access logs
● Server access logging provides detailed records for the requests that are made to a
bucket. Server access logs are useful for many applications.
○ For example, access log information can be useful in security and access
audits.
● It can also help you learn about your customer base and understand your Amazon
S3 bill.
S3 Access Logs
S3 Object Lock
● You can use S3 Object Lock to store objects using a write-once-read-many (WORM)
model. It can help you prevent objects from being deleted or overwritten for a fixed
amount of time or indefinitely.
● You can use S3 Object Lock to meet regulatory requirements that require WORM
storage, or add an extra layer of protection against object changes and deletion.
S3 Object Lock
● Object Lock enabled on ‘new buckets’ (Support re for existing)
● Write-Once-Read-Many(WORM) - No Delete, No Overwrite
● Requites versioning - individual versions are locked
● 1 - Retention Period
● 2 - Legal Hold
● Both, Once or the other, or none
● A buckeyt can have default Object Lock Settings
S3 Object Lock - Retention
● Specify DAYS & YEARS - A Retention Period
● COMPLIANCE - Can’t be adjusted, deleted, overwritten
● .. even by the account root user
● .. Until retention expires
● GOVERNANCE - special permissions can be granted allowing lock settings to be
adjusted
● s3:BypassGovernanceRetention..
● …. x-amz-bypass-governance-retention:true (console default)
S3 Access Points
● Amazon S3 Access Points, a feature of S3, simplifies managing data access at scale
for applications using shared data sets on S3.
● Access points are unique hostnames that customers create to enforce distinct
permissions and network controls for any request made through the access point.
● Creating access points - Amazon Simple Storage Service
S3 Access Points
● Simplify managinf access to S3 Buckets/Objects
● Rather than 1 bucket w/ 1 Bucket Policy…
● .. create many access points
● .. each with different policies
● .. each with different network controls
● Each access points has its own endpoint address
● Created via Console or aws s3control create-access-point --name secretcats
--account-id 1234565226 --bucket catpics
EFS - Architecture
● EFS ia an implementation of NFSv4
● EFS Filesystems can be mounted in Linux
● Shared between manu EC2 Instances
● Private service, via mount targets inside a VPC
● Can be accessed from on-premises - VPN or DX
● Linux Only
● General Purpose and MAX IO Performance Modes
● General Purpose = default dor 99.9% of uses
● Bursting and Provisioned Throughput Modes
● Standard and Infrequent Access (IA) Classes
CloudFront Architecture
CloudFront - Basis
CloudFront Terms
● Origin - The source location of your content
● S3 Origin or Custom Origin
● Distribution - The ‘configuration’ unit of CloudFront
● Edge Location - Local cache of your data
● Regional Edge Cache - Larger version of an edge location
○ Provides enother layer of caching
CloudFront Architecture
Lambda@Edge
● Lambda@Edge allows CloudFront to run lambda function at CloudFront edge
locations to modify traffic between the viewer and edge location and edge locations
and origins.
● Lambda@Edge example functions - Amazon CloudFront
Lambda@Edge
● You can run lighweight Lambda at edge locations
● Adjust data between the Viewer & Origin
● Currently supports Node.js and Python
● Run in the AWS public Space (NOT VPC)
● Layers are not supported
● Different Limits vs Normal Lambda Functions
Lambda@Edge - Use Cases
● A/B testing - Viewer Request
● Migration Between S3 Origins - Origin Request
● Different Objects Based on Device - Origin Request
● COntent By Country - Origin Request
CloudFront Geo-Retsriction
● There are two common architectures for restricting access to content via CloudFront.
● The build in feature set - CloudFront Geo Restriction allows for White or Black list
restrictions based on ONLY Country Code
● 3rd Party Geolocation requires a compute instance, a private distribution and the
generation of signed URLs or Cookies - but can restrict based on almost anything
(licensing, user login status, user profile fields and much more)
Database
DynamoDB
● NoSQL Public Database-as-a-Service(DBaaS) - Key/Value & Document
● No self-managed servers or infrastructure
● Manual / AUuomatic provisioned performance IN/OUT or On-Demand
● Highly Resilient …across AZs and optionally global
● Really fast..single-sigit milliseconds (SSD based)
● Backups, point-in-time recovery, encryption at rest
● Event-Driven integration … do things when data changes
DynamoDB Tables
DynamoDB Considerations
● NoSQL…preference DynamoDB in the exam
● Relational Data … generally NOT DynamoDB
● Key/Value .. preference DynamoDB in the exam
● Access via console, CLI, API .. ‘NO SQL’
● Billed based RCU, WCU, Storage and features
Scan
RCU Calculation
● If you need to retrieve 10 ITEMS per second … 2.5K average size
● Calculate RCU per item … ROUND UP (ITEM SIZE / 4KB) (1)
● Multiply by average read ops per second (10)
● Strongly Consistent RCU Required (10)
● (50% of strongly consistent = Eventually Consistent RCU Required (5)
DynamoDB Indexes
● Query is the most efficient operation in DDB
● Query can only work on 1 PK value at a time..
● .. and optionally a single, or range of SK values
● Indexes are alternative views on table data
● Different SK (LSI) or Different PK and SK (GSI)
● Some or all attributes (projection)
Streams Concepts
● Time ordered list of ITEM CHANGES in a table
● 24-Hour rolling window
● Enabbled on a per table basis
● Recods INSERTS, UPDATES and DELETES
● Different view types influence whats is the stream
DynamoDB Streams
Trigger Concepts
● ITEM changes generate an event
● That event contains the data which changed
● A action is taken using that data
● AWS = Stream + Lambda
● Reporting & Analytics
● Aggregation, Messaging or Notifications
DynamoDB Triggers
DynamoDB Accelerator
● DynamoDB Accelerator (DAX) is an in-memory cache designed specifically for
DynamoDB.
● It should be your default choice for any DynamoDB caching related questions.
DAX Considerations
● Primary NODE (Write) and Replicas (Read)
● Nodes are HA .. Primary failure = election
● In-Memory cache - Scaling … Much faster reads, reduced costs
● Scale UP and Scale OUT (Bigger or More)
● Supports write-through
● DAX Deployed WITHIN a VPC
DynamoDB TTL
HA vs FT vs DR
DB / BC - Pilot Light
● A secondary environment is provisioned in advanced running only the absolute
minimum of infrastructure - like a pilot light in a heater
○ … it can be powered on much quicker than backup and restore
● Critical components such as Databases are always syncing ready to be used
DB / BC - Warm Standby
● A smaller sized, but fully functional version of your primary insfrastructure is
running 24/4/365…
○ Ready to be increased in size when failover is required
○ .. faster than pilot light ..cheaper than full active
DR Architecture - Storage
● This lesson steps through how the failure of various different parts of the AWS
infrastructure platform will effect Instance Store Volumes, EBS, EFS, S3 and S3
Snapshots
DR Architecture - Storage
DR Architecture - Compute
DR Architecture - Compute
DR Architecture - Database
DR Architecture - Database
DR Architecture - Networking
DR Architecture - Global Networking
LC and LT - Architecture
Auto-Scaling Groups
● An Auto Scaling group contains a collection of Amazon EC2 instances that are
treated as a logical grouping for the purposes of automatic scaling and management.
● An Auto Scaling group also enables you to use Amazon EC2 Auto Scaling features
such as health check replacements and scaling policies.
● Both maintaining the number of instances in an Auto Scaling group and automatic
scaling are the core functionality of the Amazon EC2 Auto Scaling service.
ASG - Basic
● Automatic Scaling and Self-Healing for EC2
● Uses Launch Templates or Configuration
● Has a Minimum, Desired and Maximum Size
● Keep running instances at the Desired capacity by provisioning or terminating
instances
● Scaling Policies automate based on metrics
ASG - Architecture
ASG - Policies
● Manual Scaling - Manually Adjust the desired capacity
● Scheduled Scaling - Time based adjustment
● Dynamic Scaling
○ Simple - “CPU above 50% +1”, “CPU Below 50% -1”
○ Stepped Scaling - Bigger +/- based on difference
○ Target Tracking - Desired Aggregate CPU = 40% ..ASG handle it
● Cooldown Periods …
ASG - Health
ASG - ASG + Load Balancers
https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-add-elb-healthcheck.html
https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroupLifecycle.html
https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-enter-exit-standby.html
ELB - Evolution
● 3 types od Load Balancers (ELB) available within AWS
● Split between v1 (avoid/migrate) and v2 (prefer)
● Classic Loasd Balancer (CLB) - v1 - Introduced in 2009
○ Not really layer 7, lacking features, 1 SSL per CLB
● Application Load Balanvcer (ALB) - v2 - HTTP/HTTPS/WebSocket
● Network Load Balancer (NLB) - v2 - TCP, TSL & UDP
● V2 = faster, chaper, support target groups and rules
ELB Architecture
ELB - Architecture
ELB - Cross-Zone Load Balancer
ALB vs NLB
ALB vs NLB
● NLB
○ Unbroken encryption
○ Static IP for whitelistening
○ The fastest perofrmance(millions rps)
○ Protocols not HTTP or HTTPS
○ PirvateLink
● Otherwise = ALB
Session State
Session Stickiness
● Session stickiness is a feature of AWS ELB's which allows applications which store
session state internally on EC2 instances to function with load balancers
● Sessions are locked to specific back end instances using a cookie generated by the
load balancer.
Connection Stickiness
GWLB - Basic
Connection Draining
● In order to provide a first-class use experience, you’d like to avoid
breaking open network connections while taking an instance out of
service, updating its software, or replacing it with a fresh instance that
contains updated software. Imagine each broken connection as a
half-drawn web page, an aborted file download, or a failed web service
call, each of which results in an unhappy user or customer.
● You can now avoid this situation by enabling the new Connection
Draining feature for your Classic Load Balancers or Deregistration delay
on ALB, NLB or GWLB
Connection Draining
● What happens when instances are unhealthy … or derestered
● Normally all connections are closed & no new connections ..
● COnnection drainning allows in-flught requests to complete
● CLASSIC LOAD BALANCER ONLY - defined on the CLB
● Timeout: Between 1 and 3600 seconds (default 300)
● InService: Instance deregistration currently in progress
● Auto SCaling waits for all connections to complete or Timeout
Derestritation Delay
● Supported on ALB, NLB and GWLB (subtle differences)
● Defined on the Target Group - NOT the LB
● Stops sending requests to deregistering targets
● Existing connections can continue
● .. until they complete naturally
● … or the deregistration delay is reached
● Default 300 seconds (0-3600 seconds)
X-Forwarded
● A set of HTTP headers (if only workds with HTTP/S) (NO other protocols) (Layer 7)
● e.g. X-Forwarded-For: client
● The header is added or appended by proxies/LBs
● The client is left most in the list
● X-Forwarded-For: 1.3.3.7, proxy1, proxy2 …
● LB add ^^ header, containing Julies IP
● Backend web server needs to be aware of its header
● Connections from LB, but X-Forwarded-For contains original client
● Supported … CLB & ALB, NOT SUPPORTED NLB (because it’s layer 4)
Proxy Protocol
● Proxy Protocol works at Layer 4 ..
● .. additional layer 4 (tcp) header .. Works with a range of protocols (including HTTP
and HTTPS)
● ..Works with CLB (v1) and NLB (v2 - binary encoded)
● End to end encryption - e.g. unbroken HTTPS (TCP listener)
● … use PROXY Protocol, you can’t add a HTTP header, it isn’t decrypted
References
https://learn.cantrill.io/
https://portal.tutorialsdojo.com/courses/aws-certified-devops-engineer-professional-practice-
exams/
https://www.udemy.com/course/aws-certified-devops-engineer-professional-hands-on/
https://www.whizlabs.com/aws-devops-certification-training