Download as pdf or txt
Download as pdf or txt
You are on page 1of 60

Choosing between Cloud Tasks and Pub/Sub

Both Cloud Tasks (/tasks/docs/dual-overview) and Pub/Sub (/pubsub/docs/overview) may be used to implement message passing
and asynchronous integration, but while they function in similar ways, they are not identical. This page helps you choose the right
product for your use case.

Key Di erences

The core difference between Pub/Sub and Cloud Tasks is the notion of implicit vs explicit invocation.

Pub/Sub aims to decouple publishers of events and subscribers to those events. Publishers do not need to know anything about
their subscribers. As a result, Pub/Sub gives publishers no control over the delivery of the messages save for the guarantee of
delivery. In this way, Pub/Sub supports implicit invocation: a publisher implicitly causes the subscribers to execute by publishing
an event.

By contrast, Cloud Tasks is aimed at explicit invocation where the publisher retains full control of execution. In particular, a
publisher speci es an endpoint where each message is to be delivered.

In addition, Cloud Tasks provides tools for queue and task management unavailable to Pub/Sub publishers, including:

Scheduling speci c delivery times

Delivery rate controls

Con gurable retries

Access and management of individual tasks in a queue

Task/message creation deduplication


Detailed feature comparison

Feature Cloud Tasks Cloud Pub/Sub

Push via webhooks Yes Yes

At least once delivery guarantee Yes Yes

Task creation deduplication Yes No

Con gurable retries Yes No

Scheduled delivery Yes No

Explicit rate controls Yes No (Subscriber clients can implement ow control


 (/pubsub/docs/pull#con g))

Pull via API No Yes

Batch insert No Yes

Multiple handlers/subscribers per No Yes


message

Task/message retention 30 days Up to 7 days

Max size of task/message 1MB 10MB

Max delivery rate 500 qps/queue No upper limit

Geographic availability Regional Global


Feature Cloud Tasks Cloud Pub/Sub

Maximum push handler/subscriber 30 minutes (HTTP) 10 minutes for push operations


processing duration 10 minutes (App Engine Standard
automatic scaling)
24 hours (App Engine Standard manual or
basic scaling)
60 minutes (App Engine Flexible)

Number of queues/subscriptions per 1,000/project, more available via quota 10,000/project


project increase request

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License
 (https://creativecommons.org/licenses/by/4.0/), and code samples are licensed under the Apache 2.0 License
 (https://www.apache.org/licenses/LICENSE-2.0). For details, see the Google Developers Site Policies (https://developers.google.com/site-policies).
Java is a registered trademark of Oracle and/or its a liates.

Last updated 2021-06-01 UTC.


Cloud Tasks versus Cloud Scheduler
Both Cloud Tasks (/tasks/docs/dual-overview) and Cloud Scheduler (/scheduler/docs) can be used to initiate actions outside of the
immediate context. But they have signi cant differences in functionality and usage. This page helps you understand the
differences between them.

Key Di erences

In general, there are four main differences between Cloud Scheduler and Cloud Tasks.

Feature Cloud Scheduler Cloud Tasks

TriggeringTriggers actions at regular xed intervals. You set up the Triggers actions based on how the individual task object is con gured. If the
interval when you create the cron job `scheduleTime` eld is set, the action is triggered at that time. If the eld is not
 (/scheduler/docs/creating), and the rate does not change set, the queue processes its tasks in a non- xed order.
for the life of the job.

Setting Initiates actions on a xed periodic schedule. Once a Initiates actions based on the amount of tra c coming through the queue.
rates minute is the most ne-grained interval supported. You can set a maximum rate when you create the queue, for throttling or
tra c smoothing purposes, up to 500 dispatches per second.

Naming Except for the time of execution, each run of a cron job is Each task has a unique name, and can be identi ed and managed individually
exactly the same as every other run of that cron job. in the queue.

Handling If the execution of a cron job fails, the failure is logged. If If the execution of a task fails, the task is re-tried until it succeeds. You can
failure retry behavior is not speci cally con gured, the job is not limit retries based on the number of attempts and/or the age of the task, and
rerun until the next scheduled interval. you can control the interval between attempts in the con guration of the
queue.
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License
 (https://creativecommons.org/licenses/by/4.0/), and code samples are licensed under the Apache 2.0 License
 (https://www.apache.org/licenses/LICENSE-2.0). For details, see the Google Developers Site Policies (https://developers.google.com/site-policies).
Java is a registered trademark of Oracle and/or its a liates.

Last updated 2021-06-01 UTC.


Common pi alls and limitations
You might encounter the following issues and limitations when using Cloud Tasks:

Execution order

With the exception of tasks scheduled to run in the future, task queues are completely agnostic about execution order. There are
no guarantees or best effort attempts made to execute tasks in any particular order. Speci cally: there are no guarantees that old
tasks will execute unless a queue is completely emptied. A number of common cases exist where newer tasks are executed
sooner than older tasks, and the patterns surrounding this can change without notice.

Duplicate execution

Cloud Tasks aims for a strict "execute exactly once" semantic. However, in situations where a design trade-off must be made
between guaranteed execution and duplicate execution, the service errs on the side of guaranteed execution. As such, a non-zero
number of duplicate executions do occur. Developers should take steps to ensure that duplicate execution is not a catastrophic
event. In production, more than 99.999% of tasks are executed only once.

Resource limitations

The most common source of backlogs in immediate processing queues is exhausting resources on the target instances. If a
user is attempting to execute 100 tasks per second on frontend instances that can only process 10 requests per second, a
backlog will build. This typically manifests in one of two ways, either of which can generally be resolved by increasing the
number of instances processing requests.

Backo errors and enforced rates

Servers that are being overloaded can start to return backoff errors in the form of HTTP response code 503. Cloud Tasks will
react to these errors by slowing down execution until errors stop. This can be observed by looking at the "enforced rate" eld in
the Cloud Console.

Go to the Cloud Tasks page (https://console.cloud.google.com/cloudtasks)

Latency spikes and max concurrent

Overloaded servers can also respond with large increases in latency. In this situation, requests remain open for longer. Because
queues run with a maximum concurrent number of tasks, this can result in queues being unable to execute tasks at the expected
rate. Increasing the max_concurrent_tasks (/tasks/docs/reference/rpc/google.cloud.tasks.v2#ratelimits) for the affected queues can
help in situations where the value has been set too low, introducing an arti cial rate limit. But increasing max_concurrent_tasks is
unlikely to relieve any underlying resource pressure.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License
 (https://creativecommons.org/licenses/by/4.0/), and code samples are licensed under the Apache 2.0 License
 (https://www.apache.org/licenses/LICENSE-2.0). For details, see the Google Developers Site Policies (https://developers.google.com/site-policies).
Java is a registered trademark of Oracle and/or its a liates.

Last updated 2021-06-01 UTC.


Managing scaling risks
Google's infrastructure is designed to operate elastically at high-scale: most layers can adapt to increased tra c demands up to
massive scale. A core design pattern that makes this possible is adaptive layers -- infrastructure components that dynamically
re-allocate load based on tra c patterns. This adaptation, however, takes time. Because Cloud Tasks enables very high volumes
of tra c to be dispatched, it exposes production risks in situations where tra c can climb faster than the infrastructure can
adapt.

Overview

This document provides guidelines on the best practices for maintaining high Cloud Tasks performance in high-tra c queues. A
high-TPS queue is a queue that has 500 created or dispatched tasks per second (TPS) or more. A high-TPS queue group is a
contiguous set of queues, for example [queue0001, queue0002, …, queue0099], that have at least 2000 tasks created or
dispatched in total. The historical TPS of a queue or group of queues are viewable using the Stackdriver metrics
 (/monitoring/api/metrics_gcp#gcp-cloudtasks), api/request_count for “CreateTask” operations and queue/task_attempt_count for
task attempts. High-tra c queues and queue groups are prone to two different broad classes of failures:

Queue overload (#queue)

Target overload (#target)

Queue overload occurs when task creation and dispatch to an individual queue or queue group increases faster than the queue
infrastructure is able to adapt. Similarly, target overload occurs when the rate at which tasks are being dispatched causes tra c
spikes in the downstream target infrastructure. In both cases, we recommend following a 500/50/5 pattern: when scaling
beyond 500 TPS, increase tra c by no more than 50% every 5 minutes. This document reviews different scenarios that can
introduce scaling risks and gives examples of how to apply this pattern.
on: We do not recommend a queue with more than 1000 TPS (creates plus dispatches) as it will produce higher delivery latency than normal.

Queue overload

Queues or queue groups can become overloaded any time tra c increases suddenly. As a result, these queues can experience:

Increased task creation latency

Increased task creation error rate

Reduced dispatch rate

To defend against this, we recommend establishing controls in any situation where the create or dispatch rate of a queue or
queue group can spike suddenly. We recommend a maximum of 500 operations per second to a cold queue or queue group, then
increasing tra c by 50% every 5 minutes. In theory, you can grow to 740K operations per second after 90 minutes using this
ramp up schedule. There are a number of circumstances in which this can occur.

For example:

Launching new features that make heavy use of Cloud Tasks

Moving tra c between queues

Rebalancing tra c across more or fewer queues

Running batch jobs that inject large numbers of tasks

In these cases and others, follow the 500/50/5 pattern.


Using App Engine tra c spli ing

If the tasks are created by an App Engine app, you can leverage App Engine tra c splitting (Standard
 (/appengine/docs/standard/java/splitting-tra c)/Flex (/appengine/docs/ exible/java/splitting-tra c)) to smooth tra c increases. By
splitting tra c between versions (Standard (/appengine/docs/admin-api/deploying-apps)/Flex
 (/appengine/docs/ exible/java/testing-and-deploying-your-app)), requests that need to be rate-managed can be spun up over time to
protect queue health. As an example, consider the case of spinning up tra c to a newly expanded queue group: Let [queue0000,
queue0199] be a sequence of high-TPS queues that receive 100,000 TPS creations in total at peak.

Let [queue0200, queue0399] be a sequence of new queues. After all tra c has been shifted, the number of queues in the
sequence has doubled and the new queue range receives 50% of the sequence’s total tra c.

When deploying the version that increases the number of queues, gradually ramp up tra c to the new version, and thus the new
queues, using tra c splitting:

Start shifting 1% of the tra c to the new release. For example 50% of 1% of 100,000 TPS yields 500 TPS to the set of new
queues.

Every 5 minutes, increase by 50% the tra c that is sent to the new release, as detailed in the following table:

Minutes since start of the % of total tra c shifted to the new % of total tra c to the new % of total tra c to the old
deployment version queues queues

0 1.0 0.5 99.5

5 1.5 0.75 99.25

10 2.3 1.15 98.85

15 3.4 1.7 98.3

20 5.1 2.55 97.45


Minutes since start of the % of total tra c shifted to the new % of total tra c to the new % of total tra c to the old
deployment version queues queues

25 7.6 3.8 96.2

30 11.4 5.7 94.3

35 17.1 8.55 91.45

40 25.6 12.8 87.2

45 38.4 19.2 80.8

50 57.7 28.85 71.15

55 86.5 43.25 56.75

60 100 50 50

Release-driven tra c spikes

When launching a release that signi cantly increases tra c to a queue or queue group, gradual rollout is, again, an important
mechanism for smoothing the increases. Gradually roll out your instances such that the initial launch does not exceed 500 total
operations to the new queues, increasing by no more than 50% every 5 minutes.

New High-TPS queues or queue groups

Newly created queues are especially vulnerable. Groups of queues, for example [queue0000, queue0001, …, queue0199], are just
as sensitive as single queues during the initial rollout stages. For these queues, gradual rollout is an important strategy. Launch
new or updated services, which create high-TPS queues or queue groups, in stages such that initial load is below 500 TPS and
increases of 50% or less are staged 5 minutes or more apart.
Newly expanded queue groups

When increasing the total capacity of a queue group, for example expanding [queue0000-queue0199 to queue0000-queue0399],
follow the 500/50/5 pattern. It is important to note that, for rollout procedures, new queue groups behave no differently than
individual queues. Apply the 500/50/5 pattern to the new group as a whole, not just to individual queues within the group. For
these queues group expansions, gradual rollout is again an important strategy. If the source of your tra c is App Engine, you can
use tra c splitting (see Release-Driven Tra c Spikes (#spikes)). When migrating your service to add tasks to the increased
number of queues, gradually roll out your instances such that the initial launch does not exceed 500 total operations to the new
queues, increasing by no more than 50% every 5 minutes.

Emergency queue group expansion

On occasion, you might want to expand an existing queue group, for example because tasks are expected to be added to the
queue group faster than the group can dispatch them. If the names of the new queues are spread out evenly among your existing
queue names when sorted lexicographically, then tra c can be sent immediately to those queues as long as there are no more
than 50% new interleaved queues and the tra c to each queue is less than 500 TPS. This method is an alternative to using tra c
splitting (#split) and gradual rollout (#spikes) as described in the sections above.

This type of interleaved naming can be achieved by appending a su x to queues ending in even numbers. For example, if you
have 200 existing queues [queue0000-queue0199] and want to create 100 new queues, choose [queue0000a, queue0002a,
queue0004a, …, queue0198a] as the new queue names, instead of [queue0200-queue0299].

If you need a further increase, you can still interleave up to 50% more queues every 5 minutes.

Large-scale/batch task enqueues

When a large number of tasks, for example millions or billions, need to be added, a double-injection pattern can be useful.
Instead of creating tasks from a single job, use an injector queue. Each task added to the injector queue fans out and adds 100
tasks to the desired queue or queue group. The injector queue can be sped up over time, for example start at 5 TPS, then
increase by 50% every 5 minutes.

Named Tasks

When you create a new task, Cloud Tasks assigns the task a unique name by default. You can assign your own name to a task by
using the name parameter. However, this introduces signi cant performance overhead, resulting in increased latencies and
potentially increased error rates associated with named tasks. These costs can be magni ed signi cantly if tasks are named
sequentially, such as with timestamps. So, if you assign your own names, we recommend using a well-distributed pre x for task
names, such as a hash of the contents. See documentation
 (/tasks/docs/reference/rpc/google.cloud.tasks.v2#google.cloud.tasks.v2.Task.FIELDS.string.google.cloud.tasks.v2.Task.name) for more
details on naming a task.

Target overload

Cloud Tasks can overload other services that you are using, such as App Engine, Datastore, and your network usage, if
dispatches from a queue increase dramatically in a short period of time. If a backlog of tasks has accumulated, then unpausing
those queues can potentially overload these services. The recommended defense is the same 500/50/5 pattern suggested for
queue overload: if a queue dispatches more than 500 TPS, increase tra c triggered by a queue by no more than 50% every 5
minutes. Use Stackdriver metrics (/monitoring/api/metrics_gcp#gcp-cloudtasks) to proactively monitor your tra c increases.
Stackdriver alerts can be used to detect potentially dangerous situations.

Unpausing or resuming high-TPS queues

When a queue or series of queues is unpaused or re-enabled, queues resume dispatches. If the queue has many tasks, the newly-
enabled queue’s dispatch rate could increase dramatically from 0 TPS to the full capacity of the queue. To ramp up, stagger
queue resumes or control the queue dispatch rates using Cloud Tasks's maxDispatchesPerSecond
 (/tasks/docs/reference/rpc/google.cloud.tasks.v2#ratelimits).

Bulk scheduled tasks

Large numbers of tasks, which are scheduled to dispatch at the same time, can also introduce a risk of target overloading. If you
need to start a large number of tasks at once, consider using queue rate controls to increase the dispatch rate gradually or
explicitly spinning up target capacity in advance.

Increased fan-out

When updating services that are executed through Cloud Tasks, increasing the number of remote calls can create production
risks. For example, say the tasks in a high-TPS queue call the handler /task-foo. A new release could signi cantly increase the
cost of calling /task-foo if, for example, that new release adds several expensive Datastore calls to the handler. The net result of
such a release would be a massive increase in Datastore tra c that is immediately related to changes in user tra c. Use gradual
rollout or tra c splitting to manage ramp up.

Retries

Your code can retry on failure when making Cloud Tasks API calls. However, when a signi cant proportion of requests are failing
with server-side errors, a high rate of retries can overload your queues even more and cause them to recover more slowly. Thus,
we recommend capping the amount outgoing tra c if your client detects that a signi cant proportion of requests are failing with
server-side errors, for example using the Adaptive Throttling algorithm described in the Handling Overload chapter of the Site
Reliablity Engineering book (https://landing.google.com/sre/book.html). Google's gRPC client libraries implement a variation of this
algorithm (https://github.com/grpc/proposal/blob/master/A6-client-retries.md#throttling-retry-attempts-and-hedged-rpcs).
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License
 (https://creativecommons.org/licenses/by/4.0/), and code samples are licensed under the Apache 2.0 License
 (https://www.apache.org/licenses/LICENSE-2.0). For details, see the Google Developers Site Policies (https://developers.google.com/site-policies).
Java is a registered trademark of Oracle and/or its a liates.

Last updated 2021-06-01 UTC.


Pa erns for scalable and resilient apps
This document introduces some patterns and practices for creating apps that are resilient and scalable, two essential goals of
many modern architecture exercises. A well-designed app scales up and down as demand increases and decreases, and is
resilient enough to withstand service disruptions. Building and operating apps that meet these requirements requires careful
planning and design.

Scalability: Adjusting capacity to meet demand

Scalability  (https://wikipedia.org/wiki/Scalability) is the measure of a system's ability to handle varying amounts of work by adding or
removing resources from the system. For example, a scalable web app is one that works well with one user or many users, and
that gracefully handles peaks and dips in tra c.

The exibility to adjust the resources consumed by an app is a key business driver for moving to the cloud. With proper design,
you can reduce costs by removing under-utilized resources without compromising performance or user experience. You can
similarly maintain a good user experience during periods of high tra c by adding more resources. In this way, your app can
consume only the resources necessary to meet demand.

Google Cloud provides products and features to help you build scalable, e cient apps:

Compute Engine (/compute) virtual machines and Google Kubernetes Engine (GKE) (/kubernetes-engine) clusters integrate
with autoscalers that let you grow or shrink resource consumption based on metrics that you de ne.

Google Cloud's serverless platform (/serverless) provides managed compute, database, and other services that scale quickly
from zero to high request volumes, and you pay only for what you use.

Database products like BigQuery (/bigquery), Cloud Spanner (/spanner), and Cloud Bigtable (/bigtable) can deliver consistent
performance across massive data sizes.
Cloud Monitoring (/monitoring) provides metrics across your apps and infrastructure, helping you make data-driven scaling
decisions.

Resilience: Designing to withstand failures

A resilient app is one that continues to function despite failures of system components. Resilience requires planning at all levels
of your architecture. It in uences how you lay out your infrastructure and network and how you design your app and data
storage. Resilience also extends to people and culture.

Building and operating resilient apps is hard. This is especially true for distributed apps, which might contain multiple layers of
infrastructure, networks, and services. Mistakes and outages happen, and improving the resilience of your app is an ongoing
journey. With careful planning, you can improve the ability of your app to withstand failures. With proper processes and
organizational culture, you can also learn from failures to further increase your app's resilience.

Google Cloud provides tools and services to help you build highly available and resilient apps:

Google Cloud services are available in regions and zones (/docs/geography-and-regions#regions_and_zones) across the globe,
enabling you to deploy your app to best meet your availability goals.

Compute Engine instance groups and GKE clusters can be distributed and managed across the available zones in a region.

Compute Engine regional persistent disks (/compute/docs/disks#repds) are synchronously replicated across zones in a
region.

Google Cloud provides a range of load-balancing options (/load-balancing) to manage your app tra c, including global load
balancing that can direct tra c to a healthy region closest to your users.

Google Cloud's serverless platform (/serverless) includes managed compute and database products that offer built-in
redundancy and load balancing.
Google Cloud supports CI/CD (/docs/ci-cd) through native tools and integrations with popular open source technologies, to
help automate building and deploying your apps.

Cloud Monitoring provides metrics across your apps and infrastructure, helping you make data-driven decisions about the
performance and health of your apps.

Drivers and constraints

There are varying requirements and motivations for improving the scalability and resilience of your app. There might also be
constraints that limit your ability to meet your scalability and resilience goals. The relative importance of these requirements and
constraints varies depending on the type of app, the pro le of your users, and the scale and maturity of your organization.

Drivers

To help prioritize your requirements, consider the drivers from the different parts of your organization.

Business drivers

Common drivers from the business side include the following:

Optimize costs and resource consumption.

Minimize app downtime.

Ensure that user demand can be met during periods of high usage.

Improve quality and availability of service.

Ensure that user experience and trust are maintained during any outages.
Increase exibility and agility to handle changing market demands.

Development drivers

Common drivers from the development side include the following:

Minimize time spent investigating failures.

Increase time spent on developing new features.

Minimize repetitive toil through automation.

Build apps using the latest industry patterns and practices.

Operations drivers

Requirements to consider from the operations side include the following:

Reduce the frequency of failures requiring human intervention.

Increase the ability to automatically recover from failures.

Minimize repetitive toil through automation.

Minimize the impact from the failure of any particular component.

Constraints

Constraints might limit your ability to increase the scalability and resilience of your app. Ensure that your design decisions do not
introduce or contribute to these constraints:
Dependencies on hardware or software that is di cult to scale.

Dependencies on hardware or software that is di cult to operate in a high-availability con guration.

Dependencies between apps.

Licensing restrictions.

Lack of skills or experience in your development and operations teams.

Organizational resistance to automation.

Pa erns and practices

The remainder of this document de nes patterns and practices to help you build resilient and scalable apps. These patterns
touch all parts of your app lifecycle, including your infrastructure design, app architecture, storage choices, deployment
processes, and organizational culture.

Three themes are evident in the patterns:

Automation. Building scalable and resilient apps requires automation. Automating your infrastructure provisioning, testing,
and app deployments increases consistency and speed, and minimizes human error.

Loose coupling. Treating your system as a collection of loosely coupled, independent components allows exibility and
resilience. Independence covers how you physically distribute your resources and how you architect your app and design
your storage.

Data-driven design. Collecting metrics to understand the behavior of your app is critical. Decisions about when to scale
your app, or whether a particular service is unhealthy, need to be based on data. Metrics and logs should be core features.
Automate your infrastructure provisioning

Create immutable infrastructure through automation to improve the consistency of your environments and increase the success
of your deployments.

Treat your infrastructure as code

Infrastructure as code (IaC) is a technique that encourages you to treat your infrastructure provisioning and con guration in the
same way you handle application code. Your provisioning and con guration logic is stored in source control so that it's
discoverable and can be versioned and audited. Because it's in a code repository, you can take advantage of continuous
integration and continuous deployment (CI/CD) pipelines, so that any changes to your con guration can be automatically tested
and deployed.

By removing manual steps from your infrastructure provisioning, IaC minimizes human error and improves the consistency and
reproducibility of your apps and environments. In this way, adopting IaC increases the resilience of your apps.

Cloud Deployment Manager (/deployment-manager) lets you automate the creation and management of Google Cloud resources
with exible templates. Alternatively, Con g Connector (/con g-connector) lets you manage your resources using Kubernetes
techniques and work ows. Google Cloud also has built-in support for popular third-party IaC tools, including Terraform
 (/docs/terraform), Chef  (https://www.chef.io/partners/google-cloud-platform), and Puppet
 (https://puppet.com/products/managed-technology/google).

Create immutable infrastructure

Immutable infrastructure is a philosophy that builds on the bene ts of infrastructure as code. Immutable infrastructure
mandates that resources never be modi ed after they're deployed. If a virtual machine, Kubernetes cluster, or rewall rule needs
to be updated, you can update the con guration for the resource in the source repository. After you've tested and validated the
changes, you fully redeploy the resource using the new con guration. In other words, rather than tweaking resources, you re-
create them.

Creating immutable infrastructure leads to more predictable deployments and rollbacks. It also mitigates issues that are
common in mutable infrastructures, like con guration drift and snow ake servers
 (https://martinfowler.com/bliki/Snow akeServer.html). In this way, adopting immutable infrastructure further improves the
consistency and reliability of your environments.

Design for high availability

Availability is a measure of the fraction of time that a service is usable. Availability is often used as a key indicator
 (https://landing.google.com/sre/sre-book/chapters/service-level-objectives/#indicators-o8seIAcZ) of overall service health. Highly available
architectures aim to maximize service availability, typically through redundantly deploying components. In simplest terms,
achieving high availability typically involves distributing compute resources, load balancing, and replicating data.

Physically distribute resources

Google Cloud services are available in locations across the globe. These locations are divided into regions and zones
 (/docs/geography-and-regions#regions_and_zones). How you deploy your app across these regions and zones affects the availability,
latency, and other properties of your app. For more information, see best practices for Compute Engine region selection
 (/solutions/best-practices-compute-engine-region-selection).

Redundancy is the duplication of components of a system in order to increase the overall availability of that system. In Google
Cloud, redundancy is typically achieved by deploying your app or service to multiple zones, or even in multiple regions. If a
service exists in multiple zones or regions, it can better withstand service disruptions in a particular zone or region. Although
Google Cloud makes every effort to prevent such disruptions, certain events are unpredictable and it's best to be prepared.
With Compute Engine managed instance groups (/compute/docs/instance-groups), you can distribute virtual machine instances
across multiple zones in a region, and you can manage the instances as a logical unit. Google Cloud also offers regional
persistent disks (/compute/docs/disks#repds) to automatically replicate your data to two zones in a region.

You can similarly improve the availability and resilience of your apps deployed on GKE by creating regional clusters
 (/kubernetes-engine/docs/concepts/regional-clusters). A regional cluster distributes GKE control plane components, nodes, and pods
across multiple zones within a region. Because your control plane components are distributed, you can continue to access the
cluster's control plane even during an outage involving one or more (but not all) zones.

Favor managed services

Rather than independently installing, supporting, and operating all parts of your application stack, you can use managed services
to consume parts of your application stack as services. For example, rather than installing and managing a MySQL database on
virtual machines (VMs), you can instead use a MySQL database provided by Cloud SQL (/sql). You then get an availability Service
Level Agreement (SLA) (/sql/sla) and can rely on Google Cloud to manage data replication, backups, and the underlying
infrastructure. By using managed services, you can spend less time managing infrastructure, and more time on improving the
reliability of your app.

Many of Google Cloud's managed compute, database, and storage services offer built-in redundancy, which can help you meet
your availability goals. Many of these services offer a regional model, which means the infrastructure that runs your app is
located in a speci c region and is managed by Google to be redundantly available across all the zones within that region. If a
zone becomes unavailable, your app or data automatically serves from another zone in the region.

Certain database and storage services also offer multi-regional availability, which means that the infrastructure that runs your
app is located in several regions. Multi-regional services can withstand the loss of an entire region, but typically at the cost of
higher latency.

Load-balance at each tier


Load balancing lets you distribute tra c among groups of resources. When you distribute tra c, you help ensure that individual
resources don't become overloaded while others sit idle. Most load balancers also provide health-checking features to help
ensure that tra c isn't routed to unhealthy or unavailable resources.

Google Cloud offers several load-balancing choices. If your app runs on Compute Engine or GKE, you can choose the most
appropriate type of load balancer depending on the type, source, and other aspects of the tra c. For more information, see the
load-balancing overview (/load-balancing/docs/load-balancing-overview) and GKE networking overview
 (/kubernetes-engine/docs/concepts/network-overview).

Alternatively, some Google Cloud-managed services, such as App Engine and Cloud Run, automatically load-balance tra c.

It's common practice to load-balance requests received from external sources, such as from web or mobile clients. However,
using load balancers between different services or tiers within your app can also increase resilience and exibility. Google Cloud
provides internal layer 4 (/load-balancing/docs/internal) and layer 7 (/load-balancing/docs/l7-internal) load balancing for this purpose.

The following diagram shows an external load balancer distributing global tra c across two regions, us-central1 and asia-
east1. It also shows internal load balancing distributing tra c from the web tier to the internal tier within each region.
Monitor your infrastructure and apps

Before you can decide how to improve the resilience and scalability of your app, you need to understand its behavior. Having
access to a comprehensive set of relevant metrics and time series about the performance and health of your app can help you
discover potential issues before they cause an outage. They can also help you diagnose and resolve an outage if it does occur.
The monitoring distributed systems (https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems) chapter in the
Google SRE book (https://landing.google.com/sre/books/) provides a good overview of some approaches to monitoring.

In addition to providing insight into the health of your app, metrics can also be used to control autoscaling behavior for your
services.

Cloud Monitoring (/monitoring) is Google Cloud's integrated monitoring tool. Cloud Monitoring ingests events, metrics, and
metadata, and provides insights through dashboards and alerts. Most Google Cloud services automatically send metrics
 (/monitoring/api/metrics) to Cloud Monitoring, and Google Cloud also supports many third-party sources. Cloud Monitoring can
also be used as a backend for popular open source monitoring tools, providing a "single pane of glass" with which to observe
your app.

Monitor at all levels

Gathering metrics at various levels or tiers within your architecture provides a holistic picture of your app's health and behavior.

Infrastructure monitoring

Infrastructure-level monitoring provides the baseline health and performance for your app. This approach to monitoring captures
information like CPU load, memory usage, and the number of bytes written to disk. These metrics can indicate that a machine is
overloaded or is not functioning as expected.

In addition to the metrics collected automatically, Cloud Monitoring provides an agent (/monitoring/agent) that can be installed to
collect more detailed information from Compute Engine VMs, including from third-party apps running on those machines.

App monitoring
We recommend that you capture app-level metrics. For example, you might want to measure how long it takes to execute a
particular query, or how long it takes to perform a related sequence of service calls. You de ne these app-level metrics yourself.
They capture information that the built-in Cloud Monitoring metrics cannot. App-level metrics can capture aggregated conditions
that more closely re ect key work ows, and they can reveal problems that low-level infrastructure metrics do not.

We also recommend using OpenCensus (/monitoring/custom-metrics/open-census) to capture your app-level metrics. OpenCensus is


open source, provides a exible API, and can be con gured to export metrics to the Cloud Monitoring backend.

Service monitoring

For distributed and microservices-driven apps, it's important to monitor the interactions between the different services and
components in your apps. These metrics can help you diagnose problems like increased numbers of errors or latency between
services.

Istio  (https://istio.io/docs/concepts/what-is-istio/) is an open source tool that provides insights and operational control over your
network of microservices. Istio generates detailed telemetry for all service communications, and it can be con gured to send the
metrics to Cloud Monitoring.

End-to-end monitoring

End-to-end monitoring, also called black-box monitoring, tests externally visible behavior the way a user sees it. This type of
monitoring checks whether a user is able to complete critical actions within your de ned thresholds. This coarse-grained
monitoring can uncover errors or latency that ner-grained monitoring might not, and it reveals availability as perceived by the
user.

Expose the health of your apps


A highly available system must have some way of determining which parts of the system are healthy and functioning correctly. If
certain resources appear unhealthy, the system can send requests elsewhere. Typically health checks involve pulling data from
an endpoint to determine the status or health of a service.

Health checking is a key responsibility of load balancers. When you create a load balancer that is associated with a group of
virtual machine instances, you also de ne a health check (/load-balancing/docs/health-check-concepts). The health check de nes
how the load balancer communicates with the virtual machines to evaluate whether particular instances should continue to
receive tra c. Load-balancer health checks can also be used to autoheal
 (/compute/docs/instance-groups/autohealing-instances-in-migs) groups of instances such that unhealthy machines are re-created. If
you are running on GKE and load-balancing external tra c through an ingress resource, GKE automatically creates appropriate
health checks for the load balancer.

Kubernetes has built-in support for liveness and readiness probes. These probes help the Kubernetes orchestrator decide how to
manage pods and requests within your cluster. If your app is deployed on Kubernetes, it's a good idea to expose the health
 (/solutions/best-practices-for-operating-containers#expose_the_health_of_your_application) of your app to these probes through
appropriate endpoints.

Establish key metrics

Monitoring and health checking provide you with metrics on the behavior and status of your app. The next step is to analyze
those metrics to determine which are the most descriptive or impactful. The key metrics vary, depending on the platform that the
app is deployed on, and on the work that the app is doing.

You're not likely to nd just one metric that indicates whether to scale your app, or that a particular service is unhealthy. Often it's
a combination of factors that together indicate a certain set of conditions. With Cloud Monitoring, you can create custom
metrics (/monitoring/custom-metrics) to help capture these conditions. The Google SRE book advocates four golden signals
 (https://landing.google.com/sre/sre-book/chapters/monitoring-distributed-systems/#xref_monitoring_golden-signals) for monitoring a user-
facing system: latency, tra c, errors, and saturation.
Also consider your tolerance for outliers. Using an average or median value to measure health or performance might not be the
best choice, because these measures can hide wide imbalances. It's therefore important to consider the metric distribution; the
99th percentile might be a more informative measure than the average.

De ne service level objectives (SLOs)

You can use the metrics that are collected by your monitoring system to de ne service level objectives (SLOs). SLOs specify a
target level of performance or reliability for your service. SLOs are a key pillar of SRE practices and are described in detail in the
service level objectives (https://landing.google.com/sre/sre-book/chapters/service-level-objectives/) chapter in the SRE book, and also in
the implementing SLOs (https://landing.google.com/sre/workbook/chapters/implementing-slos/) chapter in the SRE workbook.

You can use service monitoring (/monitoring/service-monitoring) to de ne SLOs based on the metrics in Cloud Monitoring. You can
create alerting policies (/monitoring/service-monitoring/alerting-on-budget-burn-rate) on SLOs to let you know whether you are in danger
of violating an SLO.

Store the metrics

Metrics from your monitoring system are useful in the short term to help with real-time health checks or to investigate recent
problems. Cloud Monitoring retains your metrics for several weeks (/monitoring/quotas#data_retention_policy) to best meet those
use cases.

However, there is also value in storing your monitoring metrics for longer-term analysis. Having access to a historical record can
help you adopt a data-driven approach to re ning your app architecture. You can use data collected during and after an outage to
identify bottlenecks and interdependencies in your apps. You can also use the data to help create and validate meaningful tests.

Historical data can also help validate that your app is supporting business goals during key periods. For example, the data can
help you analyze how your app scaled during high-tra c promotional events over the course of the last few quarters or even
years.
For details on how to export and store your metrics, see the Cloud Monitoring metric export
 (/solutions/stackdriver-monitoring-metric-export) solution.

Determine scaling pro le

You want your app to meet its user experience and performance goals without over-provisioning resources.

The following diagram shows how a simpli ed representation of an app's scaling pro le. The app maintains a baseline level of
resources, and uses autoscaling to respond to changes in demand.

Autoscaling

Baseline

Balance cost and user experience

Deciding whether to scale your app is fundamentally about balancing cost against user experience. Decide what your minimum
acceptable level of performance is, and potentially also where to set a ceiling. These thresholds vary from app to app, and also
potentially across different components or services within a single app.
For example, a consumer-facing web or mobile app might have strict latency goals. Research shows
 (https://developers.google.com/web/fundamentals/performance/why-performance-matters/) that even small delays can negatively impact
how users perceive your app, resulting in lower conversions and fewer signups. Therefore, it's important to ensure that your app
has enough serving capacity to quickly respond to user requests. In this instance, the higher costs of running more web servers
might be justi ed.

The cost-to-performance ratio might be different for a non-business-critical internal app where users are probably more tolerant
of small delays. Hence, your scaling pro le can be less aggressive. In this instance, keeping costs low might be of greater
importance than optimizing the user experience.

Set baseline resources

Another key component of your scaling pro le is deciding on an appropriate minimum set of resources.

Compute Engine virtual machines or GKE clusters typically take time to scale up, because new nodes need to be created and
initialized. Therefore, it might be necessary to maintain a minimum set of resources, even if there is no tra c. Again, the extent
of baseline resources is in uenced by the type of app and tra c pro le.

Conversely, serverless technologies like App Engine, Cloud Functions, and Cloud Run are designed to scale to zero, and to start
up and scale quickly, even in the instance of a cold start. Depending on the type of app and tra c pro le, these technologies can
deliver e ciencies for parts of your app.

Con gure autoscaling

Autoscaling  (https://wikipedia.org/wiki/Autoscaling) helps you to automatically scale the computing resources consumed by your
app. Typically, autoscaling occurs when certain metrics are exceeded or conditions are met. For example, if request latencies to
your web tier start exceeding a certain value, you might want to automatically add more machines to increase serving capacity.
Many Google Cloud compute products have autoscaling features. Serverless managed services like Cloud Run, Cloud Functions,
and App Engine are designed to scale quickly. These services typically offer con guration options to limit or in uence
autoscaling behavior, but in general, much of the autoscaler behavior is hidden from the operator.

Compute Engine and GKE provide more options to control scaling behavior. With Compute Engine, you can scale based on
various inputs (/compute/docs/load-balancing-and-autoscaling), including Cloud Monitoring custom metrics and load-balancer serving
capacity. You can set minimum and maximum limits on the scaling behavior, and you can de ne an autoscaling policy with
multiple signals (/compute/docs/autoscaler/multiple-signals) to handle different scenarios. As with GKE, you can con gure the cluster
autoscaler (/kubernetes-engine/docs/concepts/cluster-autoscaler) to add or remove nodes based on workload or pod metrics
 (/kubernetes-engine/docs/tutorials/custom-metrics-autoscaling), or on metrics external
 (/kubernetes-engine/docs/tutorials/external-metrics-autoscaling) to the cluster.

We recommend that you con gure autoscaling behavior based on key app metrics, on your cost pro le, and on your de ned
minimum required level of resources.

Minimize sta up time

For scaling to be effective, it must happen quickly enough to handle the increasing load. This is especially true when adding
compute or serving capacity.

Use pre-baked images

If your app runs on Compute Engine VMs, you likely need to install software and con gure the instances to run your app.
Although you can use startup scripts (/compute/docs/startupscript) to con gure new instances, a more e cient way is to create a
custom image (/compute/docs/images#custom_images). A custom image is a boot disk that you set up with your app-speci c
software and con guration.
For more information on managing images, see the image-management best practices
 (/solutions/image-management-best-practices) article.

When you've created your image, you can de ne an instance template (/compute/docs/instance-templates). Instance templates
combine the boot disk image, machine type, and other instance properties. You can then use an instance template to create
individual VM instances or a managed instance group (/compute/docs/instance-groups/creating-groups-of-managed-instances). Instance
templates are a convenient way to save a VM instance's con guration so you can use it later to create identical new VM
instances.

Although creating custom images and instance templates can increase your deployment speed, it can also increase
maintenance costs because the images might need to be updated more frequently. For more information, see the balancing
image con guration and deployment speed
 (/solutions/dr-scenarios-building-blocks#balancing_image_con guration_and_deployment_speed) docs.

Containerize your app

An alternative to building customized VM instances is to containerize your app. A container (/containers) is a lightweight,


standalone, executable package of software that includes everything needed to run an app: code, runtime, system tools, system
libraries, and settings. These characteristics make containerized apps more portable, easier to deploy, and easier to maintain at
scale than virtual machines. Containers are also typically fast to start, which makes them suitable for scalable and resilient apps.

Google Cloud offers several services to run your app containers. Cloud Run (/run) provides a serverless, managed compute
platform to host your stateless containers. The App Engine Flexible (/appengine/docs/ exible) environment hosts your containers in
a managed platform as a service (PaaS). GKE (/kubernetes-engine) provides a managed Kubernetes environment to host and
orchestrate your containerized apps. You can also run your app containers on Compute Engine (/compute/docs/containers) when
you need complete control over your container environment.

Optimize your app for fast sta up


In addition to ensuring your infrastructure and app can be deployed as e ciently as possible, it's also important to ensure your
app comes online quickly.

The optimizations that are appropriate for your app vary depending on the app's characteristics and execution platform. It's
important to do the following:

Find and eliminate bottlenecks by pro ling the critical sections of your app that are invoked at startup.

Reduce initial startup time by implementing techniques like lazy initialization, particularly of expensive resources.

Minimize app dependencies that might need to be loaded at startup time.

Favor modular architectures

You can increase the exibility of your app by choosing architectures that enable components to be independently deployed,
managed, and scaled. This pattern can also improve resiliency by eliminating single points of failure.

Break your app into independent services

If you design your app as a set of loosely coupled, independent services, you can increase your app's exibility. If you adopt a
loosely coupled design, it lets your services be independently released and deployed. In addition to many other bene ts, this
approach enables those services to use different tech stacks and to be managed by different teams. This loosely coupled
approach is the key theme of architecture patterns like microservices and SOA.

As you consider how to draw boundaries around your services, availability and scalability requirements are key dimensions. For
example, if a given component has a different availability requirement or scaling pro le from your other components, it might be
a good candidate for a standalone service.
For more information, see Migrating a monolithic app to microservices (/solutions/migrating-a-monolithic-app-to-microservices-gke).

Aim for statelessness

A stateless app or service does not retain any local persistent data or state. A stateless model ensures that you can handle each
request or interaction with the service independent of previous requests. This model facilitates scalability and recoverability,
because it means that the service can grow, shrink, or be restarted without losing data that's required in order to handle any in-
ight processes or requests. Statelessness is especially important when you are using an autoscaler, because the instances,
nodes, or pods hosting the service can be created and destroyed unexpectedly.

It might not be possible for all your services to be stateless. In such a case, be explicit about services that require state. By
ensuring clean separation of stateless and stateful services, you can ensure easy scalability for stateless services while
adopting a more considered approach for stateful services.

Manage communication between services

One challenge with distributed microservices architectures is managing communication between services. As your network of
services grows, it's likely that service interdependencies will also grow. You don't want the failure of one service to result in the
failure of other services, sometimes called a cascading failure.

You can help reduce tra c to an overloaded service or failing service by adopting techniques like the circuit breaker
 (https://martinfowler.com/bliki/CircuitBreaker.html) pattern, exponential backoffs  (https://wikipedia.org/wiki/Exponential_backoff), and
graceful degradation
 (https://landing.google.com/sre/sre-book/chapters/addressing-cascading-failures/#xref_cascading-failure_load-shed-graceful-degredation).
These patterns increase the resiliency of your app either by giving overloaded services a chance to recover, or by gracefully
handling error states. For more information, see the addressing cascading failures
 (https://landing.google.com/sre/sre-book/chapters/addressing-cascading-failures/) chapter in the Google SRE book.
Using a service mesh (/blog/products/networking/welcome-to-the-service-mesh-era-introducing-a-new-istio-blog-post-series) can help you
manage tra c across your distributed services. A service mesh is software that links services together, and helps decouple
business logic from networking. A service mesh typically provides resiliency features like request retries, failovers, and circuit
breakers.

Use appropriate database and storage technology

Certain databases and types of storage are di cult to scale and make resilient. Make sure that your database choices don't
constrain your app's availability and scalability.

Evaluate your database needs

The pattern of designing your app as a set of independent services also extends to your databases and storage. It might be
appropriate to choose different types of storage for different parts of your app, which results in heterogeneous storage.

Traditional apps often operate exclusively with relational databases. Relational databases offer useful functionality such as
transactions, strong consistency, referential integrity, and sophisticated querying across tables. These features make relational
databases a good choice for many common app features. However, relational databases also have some constraints. They are
typically hard to scale, and they require careful management in a high-availability con guration. A relational database might not
be the best choice for all your database needs.

Non-relational databases, often referred to as NoSQL databases, take a different approach. Although details vary across
products, NoSQL databases typically sacri ce some features of relational databases in favor of increased availability and easier
scalability. In terms of the CAP theorem  (https://wikipedia.org/wiki/CAP_theorem), NoSQL databases often choose availability over
consistency.
Whether a NoSQL database is appropriate often comes down to the required degree of consistency. If your data model for a
particular service does not require all the features of an RDBMS, and can be designed to be eventually consistent, choosing a
NoSQL database might offer increased availability and scalability.

In addition to a range of relational and NoSQL databases, Google Cloud also offers Cloud Spanner (/spanner), a strongly
consistent, highly available, and globally distributed database with support for SQL. For information about choosing an
appropriate database on Google Cloud, see Google Cloud databases (/products/databases).

Implement caching

A cache's primary purpose is to increase data retrieval performance by reducing the need to access the underlying slower
storage layer.

Caching supports improved scalability by reducing reliance on disk-based storage. Because requests can be served from
memory, request latencies to the storage layer are reduced, typically allowing your service to handle more requests. In addition,
caching can reduce the load on services that are downstream of your app, especially databases, allowing other components that
interact with that downstream service to also scale more easily or at all.

Caching can also increase resiliency by supporting techniques like graceful degradation
 (https://landing.google.com/sre/sre-book/chapters/addressing-cascading-failures/#xref_cascading-failure_load-shed-graceful-degredation). If
the underlying storage layer is overloaded or unavailable, the cache can continue to handle requests. And even though the data
returned from the cache might be incomplete or not up to date, that might be acceptable for certain scenarios.

Memorystore for Redis (/memorystore) provides a fully managed service that is powered by the Redis in-memory datastore.
Memorystore for Redis provides low-latency access and high throughput for heavily accessed data. It can be deployed in a high-
availability con guration that provides cross-zone replication and automatic failover.
Modernize your development processes and culture

DevOps can be considered a broad collection of processes, culture, and tooling that promote agility and reduced time-to-market
for apps and features by breaking down silos between development, operations, and related teams. DevOps techniques aim to
improve the quality and reliability of software.

A detailed discussion of DevOps is beyond the scope of this article, but some key aspects that relate to improving the reliability
and resilience of your app are discussed in the following sections. For more details, see the Google Cloud DevOps page
 (/solutions/devops).

Design for testability

Automated testing (/solutions/devops/devops-tech-test-automation) is a key component of modern software delivery practices. The


ability to execute a comprehensive set of unit, integration, and system tests is essential to verify that your app behaves as
expected, and that it can progress to the next stage of the deployment cycle. Testability is a key design criterion for your app.

We recommend that you use unit tests for the bulk of your testing because they are quick to execute and typically easy to
maintain. We also recommend that you automate higher-level integration and system tests. These tests are greatly simpli ed if
you adopt infrastructure-as-code techniques, because dedicated test environments and resources can be created on demand,
and then torn down once tests are complete.

As the percentage of your codebase covered by tests increases, you reduce uncertainty and the potential decrease in reliability
from each code change. Adequate testing coverage means that you can make more changes before reliability falls below an
acceptable level.

Automated testing is an integral component of continuous integration (/solutions/devops/devops-tech-test-automation). Executing a


robust set of automated tests on each code commit provides fast feedback on changes, improving the quality and reliability of
your software. Google Cloud–native tools like Cloud Build (/build) and third-party tools like Jenkins (/jenkins) can help you
implement continuous integration.

Automate your deployments

Continuous integration and comprehensive test automation give you con dence in the stability of your software. And when they
are in place, your next step is automating deployment (/solutions/devops/devops-tech-deployment-automation) of your app. The level
of deployment automation varies depending on the maturity of your organization.

Choosing an appropriate deployment strategy is essential in order to minimize the risks associated with deploying new software.
With the right strategy, you can gradually increase the exposure of new versions to larger audiences, verifying behavior along the
way. You can also set clear provisions for rollback if problems occur.

For examples of automating deployments, see Continuous Delivery Pipelines with Spinnaker and GKE
 (/solutions/continuous-delivery-spinnaker-kubernetes-engine) and Automating Canary Analysis on GKE with Spinnaker
 (/solutions/automated-canary-analysis-kubernetes-engine-spinnaker).

Adopt SRE practices for dealing with failure

For distributed apps that operate at scale, some degree of failure in one or more components is common. If you adopt the
patterns covered in this document, your app can better handle disruptions caused by a defective software release, unexpected
termination of virtual machines, or even an infrastructure outage that affects an entire zone.

However, even with careful app design, you inevitably encounter unexpected events that require human intervention. If you put
structured processes in place to manage these events, you can greatly reduce their impact and resolve them more quickly.
Furthermore, if you examine the causes and responses to the event, you can help protect your app against similar events in the
future.
Strong processes for managing incidents (https://landing.google.com/sre/sre-book/chapters/managing-incidents/) and performing
blameless postmortems (https://landing.google.com/sre/sre-book/chapters/postmortem-culture/) are key tenets of SRE. Although
implementing the full practices of Google SRE might not be practical for your organization, if you adopt even a minimum set of
guidelines, you can improve the resilience of your app. The appendices in the SRE book (https://landing.google.com/sre/sre-book/toc/)
contain some templates that can help shape your processes.

Validate and review your architecture

As your app evolves, user behavior, tra c pro les, and even business priorities can change. Similarly, other services or
infrastructure that your app depends on can evolve. Therefore, it's important to periodically test and validate the resilience and
scalability of your app.

Test your resilience

It's critical to test that your app responds to failures in the way you expect. The overarching theme is that the best way to avoid
failure is to introduce failure and learn from it.

Simulating and introducing failures is complex. In addition to verifying the behavior of your app or service, you must also ensure
that expected alerts are generated, and appropriate metrics are generated. We recommend a structured approach, where you
introduce simple failures and then escalate.

For example, you might proceed as follows, validating and documenting behavior at each stage:

Introduce intermittent failures.

Block access to dependencies of the service.

Block all network communication.


Terminate hosts.

For details, see the Breaking your systems to make them unbreakable (https://www.youtube.com/watch?v=pVYwagnFXJI) video from
Google Cloud Next 2019.

If you're using a service mesh like Istio to manage your app services, you can inject faults
 (https://istio.io/docs/concepts/tra c-management/#fault-injection) at the application layer instead of killing pods or machines, or you
can inject corrupting packets at the TCP layer. You can introduce delays to simulate network latency or an overloaded upstream
system. You can also introduce aborts, which mimic failures in upstream systems.

Test your scaling behavior

We recommend that you use automated nonfunctional testing to verify that your app scales as expected. Often this veri cation
is coupled with performance or load testing. You can use simple tools like hey  (https://github.com/rakyll/hey) to send load to a web
app. For a more detailed example that shows how to do load testing against a REST endpoint, see Distributed load testing using
Google Kubernetes Engine (/solutions/distributed-load-testing-using-gke).

One common approach is to ensure that key metrics stay within expected levels for varying loads. For example, if you're testing
the scalability of your web tier, you might measure the average request latencies for spiky volumes of user requests. Similarly, for
a backend processing feature, you might measure the average task-processing time when the volume of tasks suddenly
increases.

Also, you want your tests to measure that the number of resources that were created to handle the test load is within the
expected range. For example, your tests might verify that the number of VMs that were created to handle some backend tasks
does not exceed a certain value.

It's also important to test edge cases. What is the behavior of your app or service when maximum scaling limits are reached?
What is the behavior if your service is scaling down and then load suddenly increases again? For a discussion of these topics,
see the load testing section of Peak-season production readiness
 (/solutions/black-friday-production-readiness#helping_to_ensure_reliability).

Always be architecting

The technology world moves fast, and this is especially true of the cloud. New products and features are released frequently,
new patterns emerge, and the demands from your users and internal stakeholders continue to grow.

As the principles for cloud-native architecture


 (/blog/products/application-development/5-principles-for-cloud-native-architecture-what-it-is-and-how-to-master-it?
utm_medium=email&utm_source=other&utm_campaign=partner.443.opencourse.targetedmessages.marketing%7Epartner.443.r7GztVGbEemwag6
YIZVrbA)
blog post de nes, always be looking for ways to re ne, simplify, and improve the architecture of your apps. Software systems are
living things and need to adapt to re ect your changing priorities.

What's next

Read the principles for cloud-native architecture


 (/blog/products/application-development/5-principles-for-cloud-native-architecture-what-it-is-and-how-to-master-it?
utm_medium=email&utm_source=other&utm_campaign=partner.443.opencourse.targetedmessages.marketing%7Epartner.443.r7GztVGbEe
mwag6YIZVrbA)
blog post.

Read the SRE books (https://landing.google.com/sre/books/) for details on how the Google production environment is
managed.

Learn more about how DevOps (/solutions/devops) on Google Cloud can improve your software quality and reliability.
Explore reference architectures, diagrams, tutorials, and best practices about Google Cloud. Take a look at our Cloud
Architecture Center (/architecture).

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License
 (https://creativecommons.org/licenses/by/4.0/), and code samples are licensed under the Apache 2.0 License
 (https://www.apache.org/licenses/LICENSE-2.0). For details, see the Google Developers Site Policies (https://developers.google.com/site-policies).
Java is a registered trademark of Oracle and/or its a liates.

Last updated 2021-05-28 UTC.


Securing queue con guration
This page provides suggestions for implementing best practices for securing queue creation and con guration, including
minimizing the pitfalls described in Using Queue Management versus queue.yaml (/tasks/docs/queue-yaml). The key is to restrict
queue management methods to a small set of people or entities. For large organizations, it might be necessary to use a service
account to run software that enforces proper queue con guration.

The general idea is to separate users and other entities into three categories:

1. Queue Admins - Users in this group have permission to call Cloud Tasks queue management methods, or to upload
queue.yaml les. This group is restricted to a very small set of users so as to reduce the risk of clobbering queue
con guration, particularly by inadvertently mixing queue.yaml and Cloud Tasks queue management methods.

2. Cloud Tasks Workers - Users in this group have permission to perform common interactions with Cloud Tasks such as
enqueuing and dequeuing tasks. They are not allowed to call Cloud Tasks queue management methods.

3. App Engine Deployers - For projects that have App Engine apps, users in this group have permission to deploy the app.
They are not permitted to upload queue.yaml les or make any Cloud Tasks API calls, thus allowing the queue admins to
enforce the proper policies.

In this scheme, users who are queue admins should not also be Cloud Tasks workers, since that would defeat the purpose of the
separation.

If your project uses Cloud Tasks queue management methods exclusively, it might also make sense that queue admins should
not also be App Engine deployers, since this would make it possible for an errant queue.yaml le to be uploaded.

Small projects and organizations


Small projects and organizations can assign Identity and Access Management (IAM) roles (/iam/docs/understanding-roles) directly
to users to place them into the groups above. This makes sense for teams who prefer con guration simplicity or who make
queue con guration changes or App Engine app deployments by hand.

Add users to these groups as follows:

Queue Admin

1. As a project admin, grant the cloudtasks.queueAdmin role to users who are allowed to make Cloud Tasks queue
management API calls or upload queue.yaml les.

gcloud projects add-iam-policy-binding [PROJECT_ID] \


  --member user:[EMAIL] \
  --role roles/cloudtasks.queueAdmin

2. As a user with the cloudtasks.queueAdmin role, following the best practices above, choose one of the following methods
for changing the queue con guration.

a. Use Cloud Tasks API to change queue con guration.

b. Upload queue.yaml with gcloud:

gcloud app deploy queue.yaml

Cloud Tasks Worker


As there are often many users allowed to interact with Cloud Tasks, you can assign roles to Service Accounts
 (/iam/docs/service-accounts) instead of individual users. This type of usage is common in production. For more information, see
Large projects and organizations (#large_projects_and_organizations).

1. As a project admin, grant roles to users who are allowed to interact with Cloud Tasks but not change queue con guration:

gcloud projects add-iam-policy-binding [PROJECT_ID] \


  --member serviceAccount:cloud-tasks-worker@[PROJECT_ID].iam.gserviceaccount.com \
  --role roles/cloudtasks.viewer

gcloud projects add-iam-policy-binding [PROJECT_ID] \


  --member serviceAccount:cloud-tasks-worker@[PROJECT_ID].iam.gserviceaccount.com \
  --role roles/cloudtasks.enqueuer

gcloud projects add-iam-policy-binding [PROJECT_ID] \


  --member serviceAccount:cloud-tasks-worker@[PROJECT_ID].iam.gserviceaccount.com \
  --role roles/cloudtasks.dequeuer

gcloud projects add-iam-policy-binding [PROJECT_ID] \


  --member serviceAccount:cloud-tasks-worker@[PROJECT_ID].iam.gserviceaccount.com \
  --role roles/cloudtasks.taskRunner

gcloud projects add-iam-policy-binding [PROJECT_ID] \


  --member serviceAccount:cloud-tasks-worker@[PROJECT_ID].iam.gserviceaccount.com \
  --role roles/cloudtasks.taskDeleter

As a user with one or more of the roles granted above, you can interact with the Cloud Tasks API.

App Engine Deployer


1. As a project admin, grant roles to users who are allowed to deploy App Engine apps but who are not allowed to modify
queue con guration or interact with tasks:

gcloud projects add-iam-policy-binding [PROJECT_ID] \


  --member user:[EMAIL] \
  --role roles/appengine.deployer

gcloud projects add-iam-policy-binding [PROJECT_ID] \


  --member user:[EMAIL] \
  --role roles/appengine.serviceAdmin

gcloud projects add-iam-policy-binding [PROJECT_ID] \


  --member user:[EMAIL] \
  --role roles/storage.admin

2. As a user with the roles granted above, deploy an App Engine app.

gcloud app deploy app.yaml

Large projects and organizations

Large projects and organizations can use Service Accounts (/iam/docs/service-accounts) to separate duties and responsibilities.
This makes sense for teams with complex infrastructure for changing queue con guration and perhaps also deploying App
Engine apps.
The instructions below are generally appropriate when queue con guration and interaction with Cloud Tasks, and potentially App
Engine app deployment, happen through software rather than through a human user directly. This also makes it possible to
protect queue con guration without requiring all members of your team to understand the content on this page.

For example, you can create a web app or command line tool that all users must use to create, update, and delete queues.
Whether that tool uses Cloud Tasks queue management methods or queue.yaml is an implementation detail of the tool that
users do not need to worry about. If the tool is the only entity in the queue admins group, then you can guarantee that there is no
inadvertent mixing of Cloud Tasks queue management methods and queue.yaml use.

on: If queues are removed from queue.yaml , they enter the DISABLED state and only queue admins can delete them. Since there is no way to delete
s programmatically in the App Engine SDK, your tool would have to delete them via a Cloud Tasks DeleteQueue call.

Instructions for setting up these service accounts follow.

Queue Admin

1. As a project admin, create the service account.

gcloud iam service-accounts create queue-admin \


  --display-name "Queue Admin"

2. Grant the cloudtasks.queueAdmin role to the service account so that it can upload queue.yaml les and make Cloud Tasks
queue management API calls.

gcloud projects add-iam-policy-binding [PROJECT_ID] \


  --member serviceAccount:queue-admin@[PROJECT_ID].iam.gserviceaccount.com \
  --role roles/cloudtasks.queueAdmin

3. Grant the iam.serviceAccountActor role to human users, groups, or other entities


 (/iam/docs/overview#concepts_related_identity) that are allowed to change queue con guration. This should be a very small
number of people, for example administrators who are allowed to respond in emergencies.

gcloud iam service-accounts add-iam-policy-binding \


  queue-admin@[PROJECT_ID].iam.gserviceaccount.com \
  --member user:[EMAIL] \
  --role roles/iam.serviceAccountActor

4. Create a service account key so that a human user or another entity can assume the identity of the service account.

gcloud iam service-accounts keys create \


  --iam-account queue-admin@[PROJECT_ID].iam.gserviceaccount.com \
  ~/queue-admin-service-account-key.json

5. As a user or other entity given the iam.serviceAccountActor role, assume the identity of the service account.

gcloud auth activate-service-account queue-admin@[PROJECT_ID].iam.gserviceaccount.com \


  --key-file ~/queue-admin-service-account-key.json

. Following the best practices above, choose only one of the following methods for changing the queue con guration:

a. Use Cloud Tasks to change queue con guration.


b. Upload queue.yaml with gcloud:

gcloud app deploy queue.yaml

Cloud Tasks Worker

1. As a project admin, create the service account.

gcloud iam service-accounts create cloud-tasks-worker \


  --display-name "Cloud Tasks Worker"

2. Grant roles to the service account so that it can interact with Cloud Tasks but not change queue con guration.

gcloud projects add-iam-policy-binding [PROJECT_ID] \


  --member serviceAccount:cloud-tasks-worker@[PROJECT_ID].iam.gserviceaccount.com \
  --role roles/cloudtasks.viewer

gcloud projects add-iam-policy-binding [PROJECT_ID] \


  --member serviceAccount:cloud-tasks-worker@[PROJECT_ID].iam.gserviceaccount.com \
  --role roles/cloudtasks.enqueuer

gcloud projects add-iam-policy-binding [PROJECT_ID] \


  --member serviceAccount:cloud-tasks-worker@[PROJECT_ID].iam.gserviceaccount.com \
  --role roles/cloudtasks.dequeuer

gcloud projects add-iam-policy-binding [PROJECT_ID] \


  --member serviceAccount:cloud-tasks-worker@[PROJECT_ID].iam.gserviceaccount.com \
  --role roles/cloudtasks.taskRunner

gcloud projects add-iam-policy-binding [PROJECT_ID] \


  --member serviceAccount:cloud-tasks-worker@[PROJECT_ID].iam.gserviceaccount.com \
  --role roles/cloudtasks.taskDeleter

3. Grant the iam.serviceAccountActor role to human users, groups, or other entities


 (/iam/docs/overview#concepts_related_identity) that are allowed to use the Cloud Tasks API in your project.

gcloud iam service-accounts add-iam-policy-binding \


  cloud-tasks-worker@[PROJECT_ID].iam.gserviceaccount.com \
  --member user:[EMAIL] \
  --role roles/iam.serviceAccountActor

4. Create a service account key so that a human user or another entity can assume the identity of the service account.

gcloud iam service-accounts keys create \


  --iam-account cloud-tasks-worker@[PROJECT_ID].iam.gserviceaccount.com \
  ~/cloud-tasks-worker-service-account-key.json

5. As a user or other entity given the iam.serviceAccountActor role, assume the identity of the service account.

gcloud auth activate-service-account cloud-tasks-worker@[PROJECT_ID].iam.gserviceaccount.com \


  --key-file ~/cloud-tasks-worker-service-account-key.json
. Use the Cloud Tasks API.

App Engine Deployer

1. As a project admin, create the service account.

gcloud iam service-accounts create app-engine-deployer \


  --display-name "App Engine Deployer"

2. Grant roles to the service account so that it can deploy App Engine apps but not queue.yaml.

gcloud projects add-iam-policy-binding [PROJECT_ID] \


  --member serviceAccount:app-engine-deployer@[PROJECT_ID].iam.gserviceaccount.com \
  --role roles/appengine.deployer

gcloud projects add-iam-policy-binding [PROJECT_ID] \


  --member serviceAccount:app-engine-deployer@[PROJECT_ID].iam.gserviceaccount.com \
  --role roles/appengine.serviceAdmin

gcloud projects add-iam-policy-binding [PROJECT_ID] \


  --member serviceAccount:app-engine-deployer@[PROJECT_ID].iam.gserviceaccount.com \
  --role roles/storage.admin

3. Grant the iam.serviceAccountActor role to human users, groups, or other entities


 (/iam/docs/overview#concepts_related_to_identity) that are allowed to deploy the App Engine app.
gcloud iam service-accounts add-iam-policy-binding \
  app-engine-deployer@[PROJECT_ID].iam.gserviceaccount.com \
  --member user:[EMAIL] \
  --role roles/iam.serviceAccountActor

4. Create a service account key so that a human user or another entity can assume the identity of the service account.

gcloud iam service-accounts keys create \


  --iam-account app-engine-deployer@[PROJECT_ID].iam.gserviceaccount.com \
  ~/app-engine-deployer-service-account-key.json

5. As a user or other entity given the iam.serviceAccountActor role, assume the identity of the service account.

gcloud auth activate-service-account app-engine-deployer@[PROJECT_ID].iam.gserviceaccount.com \


  --key-file ~/app-engine-deployer-service-account-key.json

. Deploy the App Engine app.

gcloud app deploy app.yaml

More information about service accounts

For a comprehensive explanation of service accounts, see the following pages:


Service Accounts (/iam/docs/service-accounts)

Understanding Service Accounts (/iam/docs/understanding-service-accounts)

Creating and Managing Service Accounts (/iam/docs/granting-roles-to-service-accounts)

Granting Roles to Service Accounts (/iam/docs/creating-managing-service-accounts)

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License
 (https://creativecommons.org/licenses/by/4.0/), and code samples are licensed under the Apache 2.0 License
 (https://www.apache.org/licenses/LICENSE-2.0). For details, see the Google Developers Site Policies (https://developers.google.com/site-policies).
Java is a registered trademark of Oracle and/or its a liates.

Last updated 2021-06-01 UTC.


Using Queue Management versus queue.yaml
This page explains the differences between using the Cloud Tasks API (/tasks/docs/creating-queues) to manage queues and using
the upload of a Cloud Tasks queue.yaml (/appengine/docs/standard/python/con g/queueref) or queue.xml
 (/appengine/docs/standard/java/con g/queueref) le to accomplish the same ends. It also discusses some of the pitfalls of mixing
mechanisms and how to deal with common problems.

Most standard App Engine apps use queue.yaml  (/appengine/docs/standard/python/con g/queueref) to con gure queues in the App Engine Task
service. For Java apps, the queue.xml  (/appengine/docs/standard/java/con g/queueref) le is used instead. Everything that is written on this pag
queue.yaml also applies to queue.xml les, but for brevity, this writing frequently uses queue.yaml to represent both les.

Introduction

The Cloud Tasks API provides an App Engine-independent interface to the App Engine Task Queue
 (/appengine/docs/standard/python/taskqueue) service. As part of that interface, it provides the ability to manage queues, including
doing so via the console or thegcloud command. Queues that are created by Cloud Tasks API are accessible from the App
Engine SDK and vice versa. To maintain compatibility, it is possible to use the con guration le used by the App Engine SDK,
queue.yaml, to also create and con gure queues to be used via the Cloud Tasks API. However, mixing con guration via le with
con guration via the Cloud Tasks API can produce unexpected consequences.

Pi alls of mixing queue.yaml with Cloud Tasks queue management methods


For the underlying service,queue.yaml les are de nitive. Uploading a queue.yaml that omits existing queues in your project, no
matter how they were created, causes those queues to be disabled
 (/appengine/docs/standard/python/taskqueue/push/deleting-tasks-and-queues#disabling_queues), or paused. Thus if you use the Cloud
Tasks API to call CreateQueue or UpdateQueue and then upload a queue.yaml le that omits them, the queues that were created in
the Cloud Tasks calls are disabled.

Consider the following scenario:

1. Call CreateQueue to create a queue named "cloud-tasks-queue".

2. Upload a queue.yaml le with the following contents:

queue:
- name: queue-yaml-queue

What is the current state of queues in this project? The queue named "cloud-tasks-queue" and any other queues that existed prior
are in DISABLED state, and the queue named "queue-yaml-queue" is in RUNNING state.

The default queue is an exception. It is never disabled by App Engine. See Cloud Tasks and the default App Engine queue
ud_tasks_and_the_default_app_engine_queue) for more details.

This behavior might be surprising if you create queues through the Cloud Tasks API. The instructions below explain how to
resume a disabled queue (#resume-queue).

Similarly, if a queue is disabled in the Cloud Tasks API but later appears in an uploaded queue.yaml le, that queue is resumed.

If a queue is deleted with the DeleteQueue method and later appears in a queue.yaml le, the queue.yaml upload can fail because
queue names are not allowed to be reused for several days after deletion.
Best practices

on: It is strongly recommended that you use either the con guration le method or the Cloud Tasks API to con gure your queues, but not both.

If you are new to Cloud Tasks or App Engine, use the Cloud Tasks API exclusively to manage your queues and avoid the use of
queue.yaml and queue.xml altogether. Cloud Tasks queue management methods give users more choice in creating, updating,
and deleting queues.

If, however, you are an existing queue.yaml or queue.xml user, you should only consider switching to queue management
methods if you understand the pitfalls of mixing queue.yaml with Cloud Tasks queue management methods (#pitfalls).

To help enforce the use of only one con guration method, consider using groups and permissions to control access to queue
management activities. See Securing Queue Con guration (/tasks/docs/secure-queue-con guration) for instructions.

Debugging

You can inspect your project's Admin Activity audit logs (/logging/docs/audit#viewing_audit_logs) to retrieve the history of queue
con guration changes including queue creations, updates, and deletions:

    gcloud logging read \


      'protoPayload.methodName=
       (com.google.appengine.legacy.queue_created OR
        com.google.appengine.legacy.queue_updated OR
        google.cloud.tasks.v2.CloudTasks.CreateQueue OR
        google.cloud.tasks.v2.CloudTasks.UpdateQueue OR
        google.cloud.tasks.v2.CloudTasks.DeleteQueue)'
For example, if an existing queue is disabled by a queue.yaml upload, a "Disabled queue '[QUEUE_NAME]'" message would appear
in the audit log via the com.google.appengine.legacy.queue_updated method.

How to resume a queue disabled by a queue.yaml upload

If you mix queue.yaml with Cloud Tasks queue management methods (#pitfalls), uploading a queue.yaml le might accidentally
disable a queue created through the Cloud Tasks API.

To resume the queue, you can either call ResumeQueue on the queue or add it to queue.yaml and upload. Be aware that if you had
previously set a custom processing rate (/appengine/docs/standard/python/con g/queueref#rate) in the queue.yaml con guration for
the queue, ResumeQueue resets the queue to the default rate. This is re ected in the maxDispatchesPerSecond
 (/tasks/docs/reference/rest/v2/projects.locations.queues#RateLimits.FIELDS.max_dispatches_per_second) eld of the response to
ResumeQueue.

ng: Resuming many high-QPS queues at the same time can lead to target overloading. If you are resuming high-QPS queues, follow the 500/50/5 pat
bed in Managing Cloud Tasks Scaling Risks (/tasks/docs/manage-cloud-task-scaling).

Quotas

If you use queue.yaml to create your queues, by default you can create a maximum of 100 queues. Queues created using the
Cloud Tasks API have a default maximum of 1,000 queues. As in other cases, mixing queue.yaml and Cloud Tasks API methods
can produce unexpected results. For example, suppose you create some queues using queue.yaml, and then get a quota
increase to, for example, 2,000. If you then subsequently use the Cloud Tasks API method of creating further queues, you will get
out of quota errors. To remedy this, le a request via Edit Quotas from the Quotas page of the Cloud Console
 (https://console.cloud.google.com/iam-admin/quotas).

Additional information about Cloud Tasks queue management methods

Queue con guration and queue sta up delay

Changes to queue con guration can require several minutes to take effect. For example, after calling CreateQueue or
UpdateQueue, several minutes might pass before you can successfully call CreateTask on that queue.

Cloud Tasks and the default App Engine queue

The App Engine queue named "default" is given special treatment in the App Engine SDK and in the Cloud Tasks API.

If the default queue does not already exist, it is created in the following situations:

1. When a task is rst added to the default queue using the App Engine SDK.

2. When a queue.yaml le that speci es a default queue is uploaded.

3. When CreateQueue or UpdateQueue is called to create the default queue.

To preserve compatibility with App Engine, Cloud Tasks enforces the following restrictions:

1. If a queue named "default" is created, it must be a queue using App Engine tasks.

2. Once created, users cannot delete the default queue.

In the Cloud Tasks API, the following also applies to the default queue:
1. The Cloud Tasks API does not automatically create the default queue or any other queues.

2. Just like any other queue, calling GetQueue on the default queue results in a not found error if the call is made before the
queue is created.

3. Similarly, the default queue does not appear in the output of ListQueues before it is created.

4. The con guration of the default queue can be changed with the UpdateQueue call.

What's next

See the methods available in the RPC Cloud Tasks API (/tasks/docs/reference/rpc) in the reference documents.

See the methods available in the REST Cloud Tasks API (/tasks/docs/reference/rest) in the reference documents.

Read about queue.yaml (/appengine/docs/standard/python/con g/queueref) and queue.xml


 (/appengine/docs/standard/java/con g/queueref).

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License
 (https://creativecommons.org/licenses/by/4.0/), and code samples are licensed under the Apache 2.0 License
 (https://www.apache.org/licenses/LICENSE-2.0). For details, see the Google Developers Site Policies (https://developers.google.com/site-policies).
Java is a registered trademark of Oracle and/or its a liates.

Last updated 2021-06-01 UTC.

You might also like