Ammett Williams Google Cloud DevOps

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Google Cloud Professional DevOps Engineer Exam

Prep Notes by

Ammett
Class SRE implements DevOps

0
BY AMMETT
Google Cloud Professional DevOps Engineer Exam
Exam prep sheet by Ammett v.0

SRE
SRE SLO SLI SLA Error budget Toil Review documents
SRE Book

Video
What it is What it is What it is What it is What it is What it is SRE playlist
In general, an SRE team is This is a target value or range of This is a carefully defined This is an explicit or Provides a clear, objective Toil is the kind of work tied to
responsible for the availability, values for a service level that is quantitative measure of some implicit contract with your metric that determines how running a production service that
latency, performance, efficiency, measured by an SLI. aspect of the level of service users that includes unreliable the service is allowed tends to be manual, repetitive,
change management, monitoring, that is provided. consequences of meeting to be within a single quarter. automatable, tactical, devoid of
emergency response, and (or missing) the SLOs they enduring value, and that scales
capacity planning of their contain linearly as a service grows
service(s) My experience
What you should know What you should know What you should know What you should know What you should know What you should know Various element of the SRE topics
1- What it is and how it aligns with 1- Actions to take when SLO are 1- How to set metrics 1- These have penalties 1- How is this determined 1- What is toil combine to make some
DevOps being met or not 2- Freshness 2- Should be less strict 2- What happen when this is 2- How to handle toil over time interesting questions. Spend
3- Formulas than SLO’s exceeded or in danger 3- What type of task are worth some time on each area and learn
automating to appreciate your SLI metrics.
Key Points Key Points Key Points Key Points Key Points Key Points Generally, a good area to pick up
1- Services that need HTTPS Load 1- Options, adjusts SLO & SLI, 1- Understand the “math” what 1- Compare SLA to SLO 1- How are these established 1- What should be the aim of some points and not too hard if
balancing stop deployment until stable, is being measured targets point and who is responsible. engineering task vs toil. you understand them well.
Automate this year’s toil away

Toil Budgets DevOps Alerting Monitoring Managing Risk Post-mortems Review documents
SRE Workbook

Video
What it is What it is What it is What it is What it is What it is Improving reliability
Google aims to ensure that at Organizational and cultural While there may be many Collecting, processing, Item or risk that may cause you A rolling update is an update that is
least 50% of each SRE's time is movement that aims to increase alerts ultimately, your goal is aggregating, and displaying to not meet the SLO gradually applied to all instances in
spent doing engineering projects software delivery velocity, to be notified for a significant real-time quantitative data an instance group until all instances
service reliability, and shared event: an event that consumes about a system, such as have been updated
ownership among stakeholders. a large fraction of the error query counts and types,
budget. error counts etc.
What you should know What you should know What you should know What you should know What you should know What you should know My experience
1- Understand the general point of 1- Map SRE principles to DevOps 1- Precision, Recall, Detection 1- Analyze long term 1- Target risk that will bring you 1- Writing post-mortems based on These topics make up the core of
this toil budgets. time, reset time trends. in the error budget SRE principles. the SRE practice. Combined they
2- Comparing over time 2. Quantify data will be featured and you can pick
Key Points Key Points Key Points Key Points up a few points if you are
1- No Silos, Accidents are 1-.Target Error rate, Increased 1- Controlling and identify risk 1- No blame, root causes, action prepared enough.
normal, Gradual change, Tooling, alert window, Incrementing helps you manage your SLO items
measurement is crucial. duration, Burn rate,multiple
burn rate, multiwindow, multi-
burn-rate alerts

1
BY AMMETT
SRE
Handling Incidents API lifecycle IRM Dashboard Response Structure Tracing Communication Review documents
Incident response
API lifecycle
IRM concept

What it is What it is What it is What it is What it is What it is Video


Things break so it is important to What is the process for your new Incident Response and Communication and Going deeper toward the source Keeping stack holder in the loops.
SRE playlist
understand how and what to do deployment and life cycle of you Management (IRM) is a structure is a key part of of the problem in the system Communication is “KEY” the better
when that happens. API. product within Stackdriver for handling incident. it is the better for your incident
managing and responding to management
incidents.
What you should know What you should know What you should know What you should know What you should know What you should know
1- What options do you employ 1 - Stages to replace an API 1 - How to set metrics 1- Who handle what role 1- Have a digital representation 1- Who handle communication in
2- Freshness 2- Delegation of where time is spent on your what circumstances
3- Formulas 3- Communication queries My experience
Handling incident is important.
There are steps, roles, activities
Key Points Key Points Key Points Key Points Key Points involved. Do not forget
1- Roll back, Connection draining, 1 - The order of the process 1- Operation leader, 1- This can be done in 1- Communication loops communication also.
stop testing, A/B, scaling 2 - Is it chicken or egg Communication lead, stackdriver
incident commander

Incident response team Incident response workflow

2
BY AMMETT
Stack driver
Stack driver What it is Key points What you should know Review documents Video My experience
Monitoring Stackdriver Monitoring 1- Metrics 1- everything in depth about stackdriver Monitoring docs Intro to stackdriver Ok if you don’t know stackdriver
discovers and monitors your 2- Custom metrics deeply don’t do the exam.
cloud resources automatically, 3- Alerting policies Stackdriver monitoring This means you should focus a lot
whether you are running on 4- Monitoring
of time testing an experimenting
Google Cloud Platform or AWS with all the features.

Sharing charts What it is Key points What you should know Review documents My experience
If you want, you can share a 1- Sharing various chars is possible 1- iframe Sharing charts This is something you may
chart with others by sending 2- understand how to customise the 2- query parameters bypass but can pick you up a
them a parameterized URL. parameter 3- keeping view updated point.
3- Know the tag used 4-Static screen shot

Workspaces What it is Key points What you should know Review documents My experience
A Workspace is a tool for 1- What it is 1- required roles, project owner, Stackdriver workspaces This was a shocker but not
monitoring resources contained 2- How to design monitoring editor, monitoring Admin, anymore right
in one or more Google Cloud 3- Every Workspace has a host project stackdriver account editor Managing work spaces
projects 4- Aadd existing account to workspace

Python What it is Key points What you should know Review documents My experience
You can write logs to Logging 1- How to use with App engine, GKE, 1- Logging library for python Stackdriver logging for This was a shocker but it’s
from Python applications by compute engine, locally python DevOps so how about that. What
using the Python logging 2- IAM permission required about the others languages?
handler included with the Google Cloud Client
Logging client library Libraries for Python
Stackdriver What it is Key points What you should know Review documents Video My experience
agent / FluentD The Logging agent, an 1- Stream log from VM and 3rd party 1 - Based on fluentd About the agent Ok if you don’t know stackdriver
application based on fluentd software packages to stack drive logging 2 - Get syslog files don’t do the exam. Please be
that runs on your virtual 2- Install agent 3 - Get third party logs Configuring the agent warned in cased you missed it
machine (VM) instances. earlier.
Syslog

Protect What it is Key points What you should know Review documents My experience
sensitive Data Fluentd filter plugin 1- Remove sensitive or unwanted data 1- filter record transformer logging agent modifying Protecting data is important. This
mutates/transforms incoming 2- Add new fields records can pick you up a point or two.
event streams in a versatile 3- Update field in log entries
manner 4- Delete fields in log entries Fluentd

3
BY AMMETT
How the logging agent works Routing of log entries

4
BY AMMETT
Stack driver – Trace – Debugger - profiler
Stackdriver What it is Key points What you should know Review documents Video My experience
logging Stackdriver Logging allows you 1- Routing of logged entries 1- As much as possible  Log router Ok if you don’t know stackdriver
to store, search, analyze, 2- Log sinks Stackdriver doctor deeply don’t do the exam.
monitor, and alert on log data 3- Storing logs Repeating in cased you miss it
and events from Google Cloud 4- Third party SIEM Centralized logging
earlier.
Platform and (AWS).

Trace What it is Key points What you should know Review documents Video My experience
Trace is a distributed tracing 1- What type of problems you would use 1- Latency Trace Stackdriver Trace Think latency and finding it’s
system that collects latency trace for. 2- Permission errors cause.
data from your applications and 3- How & when to create custom roles
displays it in the Google Cloud 4- Service account permissions
Platform Console.
Debugger What it is Key points What you should know Review documents My experience
Stackdriver Debugger is a 1- View app state without adding logging 1- Less that 10ms of latency added Debugger Get info without affecting the app.
feature of Google Cloud 2- Use with test, development and
Platform that lets you inspect production
the state of a running
application in real time, without
stopping or slowing it down
Profiler What it is Key points What you should know Review documents Video My experience
Profiler continuously analyzes 1- Capture characteristics of the code as 1- Show what happing within each Profiler Stackdriver profiles Know what your code is doing in
the performance of CPU or it runs service real time, get analytics with
memory-intensive functions 2- Finds bugs 2- Take random sample profiles profiler.
executed across an application. 3- It does not require pervasive changes

Alerting What it is Key points What you should know Review documents My experience
You must configure most 1- Different channels and how to use 1- Email, mobile apps, pagerduty, SMS, Notification Options Alerts can be sent using multiple
notification channels before you them for alerts Slack, Webhooks channels. Understand the
use them in alerting policies. integrations.

Cloud IAM What it is Key points What you should know Review documents My experience
With the logging data in a 1- What are the various roles and the 1- Permissions level necessary to Role etc Export logs Nice easy point right. Well IAM
Google Cloud project, you must permissions they have to do various export logs permissions are necessary to run
be a member and have an Cloud functions 2- (logging.configWriter, most services. You may as well as
IAM role that grants you logging.admin
get familiar with them.
permission to use Logging owner)

5
BY AMMETT
Data
BigQuery Data Studio Cloud Storage Pub/Sub Grafana Datadog Review documents
Pub/Sub
Grafana
BigQuery
What it is What it is What it is What it is What it is What it is Cloud storage
BigQuery is a serverless, highly Google Data Studio allows Used for a range of scenarios Cloud Pub/Sub is Grafana is an open source Datadog pulls metrics from
Video
scalable, and cost-effective cloud you to create branded including serving website a publish/subscribe (Pub/Sub) metric analytics and Google Stackdriver Logging to:
enterprise data warehouse that reports content, storing data for service: a messaging service visualization suite for 1- Visualize the performance of BigQuery
enables super-fast SQL queries with data visualizations to archival and disaster recovery, where the senders of visualizing time series data your Stackdriver logs
using the processing power of share with your clients. or distributing large data messages are decoupled from that supports various types of 2- Correlate the performance of My experience
Google's infrastructure. objects to users via direct the receivers of messages data sources your logs with your applications Viewing data using different tools
download. (integrations). Storage of logs,
exporting logs, pub/sub, triggers and
What you should know What you should know What you should know What you should know What you should know What you should know
more.
1- How it work with stackdriver 1- Intergrating Google 1- What it does, classes 1- Multiple uses and 1- How does this work with 1- What it is used for
etc. services with Data Studio 2- Integrations intergration of PubSub Stackdriver 2- How it integrate with
3- Uses for DevOps Stackdriver

Key Points Key Points Key Points Key Points


1- Sinks, viewing logs, exporting What google service it 1- Be aware of the services 1- Integration 1- Intergration
logs, ingesting logs integrates with that can use it as a trigger

Networking / Compute
Computer engine Managed Instance Flow logs Network service Tier Preemptible VM’s Committed use Review documents
groups Committed use
Managed instances
Preemptible VM
Network Service Tier
Flow logs
What it is What it is What it is What it is What it is What it is
Compute Engine delivers A managed instance group VPC Flow Logs record a Allows customers to optimize Best for short-lived compute Committed use discounts are Video
configurable virtual machines (MIG) contains identical sample of network flows sent their cloud network for instances suitable for batch ideal for workloads with
Highly available deployments
running in Google’s data centres instances that are based on from and received by VM performance or price jobs and fault-tolerant predictable resources needs.
with access to high-performance an instance template. instances, including instances optimisation. workloads.
networking infrastructure and used as GKE nodes
block storage.
What you should know What you should know What you should know What you should know What you should know What you should know My experience
1- Monitor these with stack driver 1- What it does (autoheal, 1- Log entry sampling (default 1- When to use. 1- How, when to use these to 1- Predictable work needs The networking once again is a key
2- Monitor application load balancing, autoscaling 0.50 (50%)) 2- What is the difference and save cost or help processing 2- Term 1-3 years point in any cloud infrastructure. Get
an auto-updating. 2- TCP/UDP traffic trade-off 3- Billed weather used or not familiar with these and pick up a point
3- Health checks monthly or 3

Key Points Key Points Key Points Key Points


1- Keep scenarios in mind 1-What you can monitor with it 1- Managing cost (know the 1- Requirements and
where you would use these 2- Used for seeing what’s trade-offs also) recommendation for use.
happening in the network

6
BY AMMETT
Security
IAM What it is Key points What you should know Review documents Video My experience
IAM lets you manage who has 1- All the services need some level of 1- Permission for logging, exporting and IAM roles IAM is now like a staple on GCP
access to what in you GCP permissions to run. You might as well get other aspects. exams just like kubernetes. Ok
environment. familiar with the IAM roles required. I would recommend you get familiar Best practices for identity that’s all you need to know.
with the IAM roles generally for the
various services

KMS What it is Key points What you should know Review documents Video My experience
Cloud KMS is a cloud-hosted key 1- You can generate, use, rotate, and 1- Using cloud KMS with other GCP Using cloud KMS with Securing Kubernetes secrets KMS helps you in many ways.
management service that lets destroy cryptographic keys services (especially developer based) other products Figure out which ways you need
you manage encryption for your to be helped.
cloud services the same way CMEK
you do on-premises. .

Secret Manager What it is Key points What you should know Review documents My experience
Secret Manager provides a secure 1- Encrypt, store and audit (infrastructure 1-Applications often require access to Secrets Manager Secrets will pop up somewhere,
and convenient tool for storing and apps secrets) small pieces of sensitive data at build so now it’s no longer a secret.
API keys, passwords, certificates, 2- You can address individual version of a or run time. These pieces of data are
and other sensitive data. secret often referred to as secrets.
3- Rotation
Cloud SCC What it is Key points What you should know Review documents Video My experience
Security Command Center gives 1- What it does. 1- What may be relevant for your Security Command Cloud security cc Get familiar with this.
enterprises consolidated visibility pipeline. Center
into their Google Cloud assets
across their organization.

Binary What it is Key points What you should know Review documents Video My experience
Authorisation Binary Authorization is a service 1- Allows or blocks deployment of 1-Know the flow of binary authorisation Secure software chains Binary Authorisation Demo This is a bit confusing so study
on Google Cloud Platform (GCP) images to GKE based on policy the flow and the stages
that provides software supply- 2- Attestation (Très important) Codelab Binary (important to figure out the
chain security for applications 3- Enforcement functionality authorization
answers for this type of question)
that run in the Cloud. 4 Authorization

Images What it is Key points What you should know Review documents My experience
Container Analysis provides 1- Allow vulnerability scanning and 1- Performs scans on images in Get image vulnerabilities Think secured images and secure
vulnerability information and metadata storage for software artifacts container registry and monitor deployments
other types of metadata for the vulnerability info to keep up to date.
container images in Container -Incremental scans
Registry. -Continuous analysis
Security scanner What it is Key points My experience
Automatically scan your App 1- Detect 4 common OWASP top 10 This may pop up but then again
Engine, Compute Engine, and vulnerabilities (XSS, Flash injection, who knows??
GKE apps for common mixed content, outdated/insecure
vulnerabilities. libraries)

7
BY AMMETT
Binary Authorization/vulnerability Binary Authorization/Cloud Build

8
BY AMMETT
To be added by end January 2020 (yes the exam is that heavy) last update 13-01-2020
 DevOps and Tools
 Google DevOps Tools
 Performance

9
BY AMMETT

You might also like