GCP Infra Guide

Application Team Infra Guide
Contents
Service Status..............................................................................................................................................3
IAM (Identity Access Management)............................................................................................................4
GCP Logging.................................................................................................................................................9
Logging Agents:.....................................................................................................................................14
Cloud Monitoring......................................................................................................................................16
Overview of Cloud Monitoring.............................................................................................................16
Alerting policies and uptime checks.................................................................................................16
Charts and dashboards.....................................................................................................................16
Create a dashboard and chart...............................................................................................................19
Alerts.....................................................................................................................................................20
SSH CONNECTIONS...................................................................................................................................23
Accessing VM Instance:.........................................................................................................................26
Snapshots:.............................................................................................................................................27
System Monitoring ( CPU/Memory/Disk)................................................................................................29
FIREWALL..................................................................................................................................................36
Specifications....................................................................................................................................37
Implied rules.....................................................................................................................................38
Using Firewall Rules Logging.....................................................................................................................39
All firewall logs..................................................................................................................................39
Specific subnets.................................................................................................................................40
Specific VMs......................................................................................................................................40
Connections from a specific country................................................................................................41
GCP Load Balancer....................................................................................................................................42
 Load balanced traffic does not have the source address of the original client..............................47
 Requests are rejected by the load balancer...................................................................................47
 Load balancer doesn't connect to backends..................................................................................47
 Health check probes can't reach the backends..............................................................................47
 Clients cannot connect to the load balancer.................................................................................47
1
 Organizational policy restriction for Shared VPC...........................................................................47
 Load balancer doesn't distribute traffic evenly across zones.........................................................47
 Limitations.....................................................................................................................................47
Cannot view logs ?............................................................................................................................48
Log entries missing............................................................................................................................48
Missing metadata for some log entries............................................................................................49
CI/CD.........................................................................................................................................................50
CLOUD BUILD............................................................................................................................................51
How builds work...................................................................................................................................51
Starting Builds.......................................................................................................................................51
Viewing build results............................................................................................................................52
Step status and build status..............................................................................................................55
Troubleshooting build errors....................................................................................................................56
Does your build pass locally?................................................................................................................56
Did you look at the build logs?.............................................................................................................56
Manual builds fail due to user not having access to build logs............................................................56
Builds fail due to missing service account permissions........................................................................56
Error when deploying on Cloud Functions........................................................................................57
Error when deploying on App Engine...............................................................................................57
Error when deploying on GKE...........................................................................................................57
Error when deploying on Cloud Run.................................................................................................57
Error when storing images in Container Registry.................................................................................58
Builds fail due to invalid ssh authorization..........................................................................................58
Builds fail due to No route to host error...............................................................................................58
Build trigger fails due to missing cloudbuild.builds.create permission................................................58
I/O timeout error..................................................................................................................................58
4xx client errors.....................................................................................................................................59
Deploying containers on Compute Engine works......................................................................................60
SSL certificates...........................................................................................................................................62
 Self-managed and..........................................................................................................................62
 Google-managed SSL certificates...................................................................................................62
Troubleshooting SSL certificates............................................................................................................62
Troubleshooting Google-managed certificates....................................................................................62
2
Managed status................................................................................................................................62
Domain status...................................................................................................................................64
Managed certificate renewal............................................................................................................65
Troubleshooting self-managed SSL certificates....................................................................................65
Error: Certificate cannot be parsed...................................................................................................65
Error: Missing common name or subject alternative name...............................................................66
Error: Private key cannot be parsed..................................................................................................66
Error: Private keys with passphrases.................................................................................................67
Error: Expiring intermediate certificate(s).........................................................................................67
Error: RSA public exponent is too large.............................................................................................67
Containers.............................................................................................................................................71
Pods.......................................................................................................................................................72
Deployments.........................................................................................................................................73
How your GKE logs get to Cloud Logging..........................................................................................74
Find your GKE logs in Cloud Logging.................................................................................................75
Make sure you’re collecting GKE logs...............................................................................................76
3
Service Status
The Cloud Status Dashboard is the first place to check when there is any issue with the GCP
service. The dashboard shows the list of incidents that affect many customers, so one can find if
it’s the same issue they are facing.
The status dashboard marks as either a disruption or outage to indicate the severity. More
minor, but still widespread issues are posted as temporary notices. All times are US/Pacific and
it’s shown with the status as below beside the service:
Available (green check mark)
Service information (-)
Service disruption (!)
Service outage (red X mark)
Here is the link: status.cloud.google

When a relevant Google Cloud product or service reports an issue in the Cloud Status
Dashboard, outage notice alerts in the Cloud Console.
If an outage notice appears in the Cloud Console, click the notice to learn more about the status
of the issue.
Contact GCP support if the issue experienced is not available in the service status dashboard for
more details.
4
IAM (Identity Access Management)
With IAM, you manage access control by defining who (identity) has what access (role) for
which resource. In IAM, permission to access a resource isn't granted directly to the end user.
Instead, permissions are grouped into roles, and roles are granted to authenticated principals.
An IAM policy defines and enforces what roles are granted to which principals, and this policy is
attached to a resource.
When an authenticated principal attempts to access a resource, IAM checks the resource's
policy to determine whether the action is permitted.
Navigating to IAM in GCP:

Navigate to the IAM & Admin console, select Navigation menu > IAM & Admin > IAM.
Search through the table to find the accounts in the GCP project and examine the roles they are
granted.
Types of Roles
There are three types of roles in Cloud IAM:
Primitive roles, which include the Owner, Editor, and Viewer roles that existed prior to the
introduction of Cloud IAM.
5
Predefined roles, which provide granular access for a specific service and are managed by
Google Cloud.
Custom roles, which provide granular access according to a user-specified list of permissions.
Role Name Permissions

Permissions for read-only actions that do not
roles/viewer affect state, such as viewing (but not
modifying) existing resources or data.
All viewer permissions, plus permissions for
roles/editor actions that modify state, such as changing
existing resources.
All editor permissions and permissions for
the following actions:
 Manage roles and permissions for a
roles/owner
project and all resources within the
project.
 Set up billing for a project.
Read access to browse the hierarchy for a
project, including the folder, organization,
roles/browser (beta) and Cloud IAM policy. This role doesn't
include permission to view resources in the
project.
What are service accounts?

A service account is a special kind of account used by an application or compute workload, such
as a Compute Engine virtual machine (VM) instance, rather than a person.
Applications use service accounts to make authorized API calls, authorized as either the service
account itself, or as Google Workspace or Cloud Identity users through domain-wide
delegation.
A service account is identified by its email address, which is unique to the account.
6
Create enable service accounts for instances
You can create and set up a new service account using IAM. After creating an account, grant the
account one or more IAM roles, and then authorize a virtual machine instance to run as that
service account.
To create a new service account:

 Go to Navigation menu > IAM & Admin, select Service accounts and click on + Create
Service Account.
 Fill necessary details with:
 Service account name: vm_example
 Now click Create and Continue and then add the following roles.
 Role: Storage Object Viewer
 Click Continue and then click Done.
Your console should resemble the following:
7
Usually, the service account's email is derived from the service account ID, in the format:
[SERVICE-ACCOUNT-NAME]@[PROJECT_ID].iam.gserviceaccount.com
 Storage Object Viewer (roles/storage.objectViewer) Grants access to view objects and

their metadata, excluding ACLs. Can also list the objects in a bucket.
Setting up a new instance to run as a service account:

 Go to the Create an instance page.
 Go to Create an instance
 Specify the VM details.
 In the Identity and API access section, choose the service account (vm-example) you
want to use from the drop-down list.
 Continue with the VM creation process.
8
9
GCP Logging
OVERVIEW OF GCP LOGGING

Cloud Logging is a fully managed service that allows you to store, search, analyze, monitor, and alert on
logging data and events from Google Cloud.
Navigation to Logging through console :

Click on left top hamburger button on GCP console to see all the services of GCP and select Logging.
Or directly search Logging from console search bar and click on it.
10
It displays the Logs Explorer page under Logging after clicking on it. Find all the logs under a project
sorted by timestamp.
Viewing Logs:
Click on any of the logs to see the details of the event occurred. Find more details by expanding the
dropdown button of the logs.
11
Filtering Logs:
 Filter out the necessary logs in a easier way by going to Legacy Logs Viewer.
 The page will look like this below.
 Filter out the logs by choosing from the dropdown button given in the filter.
 Select the resource from the listed options to filter the logs for that particular resource.
12
 Or easily filter out by just typing the details in the filter search bar.
Accessing Logs:
Access logs that are stored in the logs bucket.
 Click on Logs Storage from the listed options.
13
 It displays all the logs buckets that are available.
 Click on View logs in this bucket to view the logs from the bucket if you have proper permission
to view it.
Viewing Logs through Command Line Interface:-

 Activate cloud shell present on the top right of the console.
14
 You can run the gcloud commands in the cloud shell or Google SDK.
Permissions:-
gcloud logging commands are controlled by Identity and Access Management (IAM) permissions. To use
any of the gcloud logging commands, you must have the serviceusage.services.use permission. You must
also have the IAM role that corresponds to the log's location.
Reading Log entry:-

Run the following command to read the log entry:
gcloud logging read "resource.type=global" --folder=[FOLDER_ID] --limit=1
Here is the result of above command:
insertId: 1f22es3frcguaj
logName: folders/[FOLDER_ID]/logs/my-folder-log
receiveTimestamp: '2018-03-19T18:20:19.306598482Z'
resource:
type: global
textPayload: A folder log entry
timestamp: '2018-03-19T18:20:19.306598482Z'
Logging Agents:
The Logging agent streams logs from your VM instances and from selected third-party software
packages to Cloud Logging. It is a best practice to run the Logging agent on all your VM
instances.
The VM images for Compute Engine don't include the Logging agent, so you must complete
these steps to install it on those instances. The agent runs under both Linux and Windows.
The agent is already included in Google Kubernetes Engine or App Engine.
For container and system logs, GKE deploys a per-node logging agent that reads container logs,
adds helpful metadata, and then stores them.
15
The logging agent checks for container logs in the following sources: Standard output and
standard error logs from containerized processes kubelet and container runtime logs.
Prerequisites before installing the agent:

 A supported VM instance in a Google Cloud project.
 When installing the Logging agent, a minimum of 250 MiB memory is required, but 1 GiB
is recommended.
 Ensure your VM is running a supported operating system.
 Credentials on the VM instance that authorize communication with Cloud Logging or
Cloud Monitoring.
 Compute Engine VM instances generally have the correct credentials by default.
Installing the agent using the command line:

Linux:
 Open a terminal connection to your VM instance using SSH and ensure you have sudo
access.
 Change to a directory you have write access to, for example your home directory.
Run the command in SSH:
curl -sSO https://dl.google.com/cloudagents/add-logging-agent-repo.sh
sudo bash add-logging-agent-repo.sh --also-install
Windows:
 Connect to your instance using RDP or a similar tool and login to Windows.
 Open a PowerShell terminal with administrator privileges by right-clicking the
PowerShell icon and selecting Run as Administrator.
Run the following PowerShell commands:
(New-Object Net.WebClient).DownloadFile("https://dl.google.com/cloudagents/windows/
StackdriverLogging-v1-17.exe", "${env:UserProfile}\StackdriverLogging-v1-17.exe")& "$
{env:UserProfile}\StackdriverLogging-v1-17.exe"
16
Cloud Monitoring
By using Cloud Monitoring, you can answer important questions like the following:
 What is the load on service?

 Is website accessible and responding correctly?
 Is service performing well
Overview of Cloud Monitoring

Cloud Monitoring collects measurements of service and of the Google Cloud resources that we
use. This section provides an overview of the Cloud Monitoring tools that can use to visualize
and monitor these measurements.
Alerting policies and uptime checks

To be notified when the performance of a service doesn't meet the defined criteria, create an
alerting policy. For example, create an alerting policy that notifies on-call team when the 90th
percentile of the latency of HTTP 200 responses from the service exceeds 100 ms.
To be notified when a deployed service isn't accessible or when it isn't responding correctly,
configure an uptime check and attach an alerting policy:
 The uptime check periodically probes the service and stores the success and latency of
that probe as metric data.
 The alerting policy monitors the success status of the uptime check and notifies when a
probe fails.
Charts and dashboards

To understand the current load on a service, or to view the performance data of service for the
past month, use the charts and dashboards tools. Cloud Monitoring populates dashboards
based on the services and resources that service uses; however, create custom dashboards to
chart data, display indicators, or display text.
17
Chart and monitor any (numeric) metric data that Google Cloud project collects, including the
following:
 System metrics generated by Google Cloud services. These metrics provide information
about how the service is operating. For example, Compute Engine reports more than 25
unique metrics for each virtual machine (VM) instance.
 System and application metrics that the Cloud Monitoring agent gathers. These
metrics provide additional information about system resources and applications running
on Compute Engine instances. Optionally, configure the agent to collect metrics from
third-party plugins such as Apache or Nginx web servers, or MongoDB or PostgreSQL
databases.
 Custom metrics, this service writes by using the Cloud Monitoring API or by using a
library like OpenCensus.
 Logs-based metrics, which collect numeric information about the logs written to Cloud
Logging. Google-defined logs-based metrics include counts of errors that service detects
and the total number of log entries received by Google Cloud project. Also define logs-
based metrics. For example, create a metric that counts the number of 404 Not Found
errors for an application deployed to App Engine.
GCP Stackdriver monitoring

Google Stackdriver is a monitoring service that provides IT teams with performance data about
applications and virtual machines running on the Google Cloud Platform. It is natively
integrated with Google Cloud Platform and hosted on Google infrastructure.
Google Stackdriver performs monitoring, logging and diagnostics to help businesses ensure
optimal performance and availability. The service gathers performance metrics and metadata
from multiple cloud accounts and allows IT teams to view that data through custom dashboard,
charts and reports.
18
Stackdriver Monitoring measures the health of cloud resources and applications by
providing visibility into metrics such as CPU usage, disk I/O, memory, network traffic and
uptime. It is based on collectd, an open source daemon that collects system and application
performance metrics. Users can receive customizable alerts when Stackdriver Monitoring
discovers performance issues. It is used to monitor Google Compute Engine.
 Default
o CPU usage,
o Disk I/O,
o Network traffic and
o Uptime
 Custom
o Memory utilization and others
Using Cloud console

Navigation:
Make sure you are under the right project. You can also check and change the project from the
nav bar of GCP home dashboard.
Go to navigation menu ➖> OPERATIONS section ➖> Monitoring
19
Create a dashboard and chart
Cloud Monitoring can display the metrics collected in the charts and dashboards. In the Cloud
Console, select Monitoring or click the following button:
 In the left menu select Dashboards, and then Create Dashboard.

 Name the dashboard for ex: Example Dashboard.
 Click the widget in the Chart library that you want added to the dashboard
Add the chart
 Choose chart type from Chart library as shown in below image Line/Stacked area/.
 To quickly configure a widget, use Basic mode
 Name the chart title CPU Load.
 Select the Resource type. The Resource type menu lists every monitored resource for
which there is metric data. Ex: VM instance
 The Metric menu determines the selections for the Resource type. Ex:Set the Metric
CPU load (1m).
 Refresh the tab to view the graph.
20
Alerts
Alerting gives timely awareness to problems in your cloud applications so you can resolve the
problems quickly.
In Cloud Monitoring, an alerting policy describes the circumstances under which you want to be
alerted and how you want to be notified
Alerting policies that are used to track metric data collected by Cloud Monitoring are called
metric-based alerting policies.
IAM roles required:
To create an alerting policy, your IAM role name for the Google Cloud project must be one of
the following:
21
 Monitoring Editor
 Monitoring Admin
 Project Owner
Creating Alerts:
 Click on hamburger button on the top left of the console.

 Select Alerting under Operations Monitoring -> Alerting from the listed options.
 Click Create Policy to see the Create alerting policy page
 Click Add condition and complete the dialog.
A condition describes a monitored resource, a metric for that resource, and when the
condition is met. An alerting policy must have at least 1 condition, however alerting policies
22
can contain up to 6 conditions. If an alerting policy has exactly 1 condition and if the
condition is met, then an incident is created. If an alerting policy has multiple conditions,
then you specify how these conditions are combined.
 Click Next to advance to the notifications section.

 To be informed when an incident is created, add a notification channel to your alerting
policy. You can add multiple notification channels.
To add a notification channel, click Notification channels. In the dialog, select one or more
notification channels from the menu and then click OK.
 (Optional) If you want to be notified when an incident is opened and closed, then select
Notify on incident closure. By default, notifications are sent only when an incident is
opened.
 For example, assume that you have an alert with a metric threshold condition that monitors
a virtual machine (VM). If you turn down the VM while an incident is open, then by default
Monitoring waits for seven days before it closes the incident.
 Click Next to advance to the documentation section.
 Click Name and enter a policy name. This name is included in notifications and it is displayed
in the Policies page.
 Click Save.
23
SSH CONNECTIONS
OVERVIEW OF SSH Connections:
Compute Engine uses key-based SSH authentication to establish connections to Linux virtual machine
(VM) instances. By default, local users with passwords aren't configured on Linux VMs.
Before you can connect to a VM, several configurations must be performed. If you use the Google Cloud
Console or the gcloud command-line tool to connect to your VMs, Compute Engine performs these
configurations on your behalf.
Connecting to VM instance :-
Connect to a VM instance through GCP console as well as through CLI in GCP.
Steps:
 Click on the hamburger button on the top left of the console.

 Find Compute Engine and select VM instances under it.
 All the instances available under the project will be displayed.
24
 In the list of virtual machine instances, click SSH in the row of the instance that you want to
connect to.
 After you connect, use the terminal to run commands on your Linux instance. When you have
finished, disconnect from the instance by using the exit command.
Troubleshooting SSH:
This document will guide to see the errors that you encountered while connecting to VM instance
through SSH.
Through Logging (GCP Service):-

Directly check which type of event has been occurred from the logs of the particular instance from
Logging service of GCP. To get the detail information on the error that has been occurred while
connecting you can refer to Logging document.
Through SSH Troubleshooting tool:-

Another way is to use SSH Troubleshooting tool.
SSH Troubleshooting tool:-

It performs following tests to check why the SSH connection has been failed and what is the cause
behind it.
25
 User permissions tests: Checks whether you have the required IAM permissions to connect to
the VM using SSH
 Network connectivity tests: Checks if the VM is connected to the network
 VM instance status tests: Checks if the VM's CPU status to see if VM is running
 VPC settings tests: Checks SSH default port and IAP port forwarding
Permissions:-
Permissions required for this task:
 To perform this task, you must have the following permissions:
o networkmanagement.connectivitytests.create on the VM
o networkmanagement.connectivitytests.delete on the VM
o networkmanagement.connectivitytests.get on the VM
 If you are missing any of the preceding permissions, the troubleshooting tool skips network
connectivity tests.
Running Troubleshooting tool:-

 Run the troubleshooting tool by using the gcloud beta compute ssh command:
 Run the gcloud commands in the cloud shell or in Google SDK.
 Activate cloud shell present on the top right of the console.
26
gcloud beta compute ssh VM_NAME --troubleshoot
 Replace VM_NAME with the name of the VM that you can't connect to.

 The tool prompts you to provide permission to perform the troubleshooting tests.
 After running the troubleshooting tool, do the following:
1. Review the test results from the gcloud console output to understand why the VM's SSH
connection isn't working.
2. Resolve SSH connections by performing the remediation steps provided by the tool.
Accessing VM Instance:
 Select VM instances under Compute Engine -> VM instances from the listed options.
 It displays all the list of VM instances present under the project.
27
Snapshots:
Snapshots incrementally back up data from your persistent disks. After you create a snapshot to
capture the current state of the disk, you can use it to restore that data to a new disk. Compute
Engine stores multiple copies of each snapshot across multiple locations with automatic
checksums to ensure the integrity of your data.
You can create snapshots from disks even while they are attached to running virtual machine
(VM) instances. The lifecycle of a snapshot created from a disk attached to a running VM
instances is independent of the lifecycle of the VM instance.
Checking Snapshots:

 Select Snapshots under Compute Engine -> Snapshots from the listed options.
 It displays all the snapshots that have been created under the project.
28
29
System Monitoring ( CPU/Memory/Disk)
Disk, memory and CPU utilization of VM’s are captured on the Stackdriver dashboards for each
instance with the specific metric defined while creating the dashboard.
Stackdriver Dashboard with metric (cpu/memory/disk):

The metric field identifies the measurements to be collected from a monitored resource. It
includes a description of what is being measured and how the measurements are interpreted.
Metric is a short form of metric type.
The resource type field specifies from which resource the metric data is captured. The resource
type is sometimes called the monitored resource type or the resource.
Click on create dashboard, choose the following data:
Chart type: line/graph/stacked area chart/bar chart etc,
Chart Title: name
Resource type: vm instances, cloud functions etc.
Metrics: cpu usage, disk, memory etc.
Filter: state=used state=free, choose the data view: mean, min, max.
Set condition: above/, threshold value (numeric).
30
Identify disk utilization in VM instance:
For Linux instance:
Prerequisites: Linux based system or terminal ready, Get the instance or host name.
1. ssh <instance name>
2. Enter the command df –h (df stands for disk free and –h displays in KB MB GB)
The df command shows you the amount of space taken up by different drives with the results
of Filesystem, Size, Used, Avail, Use%, Mounted on.
Mounted on/mount point – The directory of the located file system.
Example:
student-04-64e0e90044eb@gcelab:~$ df -h
Filesystem Size Used Avail Use% Mounted on
udev 1.8G 0 1.8G 0% /dev
tmpfs 370M 5.0M 365M 2% /run
/dev/sda1 9.7G 1.4G 7.8G 15% /
tmpfs 1.9G 0 1.9G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup
/dev/sda15 124M 5.7M 119M 5% /boot/efi
For Windows instance:
31
Use a remote desktop protocol (RDP) client to connect to the windows instance. Windows local
machine, use Remote Desktop Connection. Other operating systems might require to use third-
party software for RDP.
Enter the external ip address and click on connect button. Further add username and password.
This will redirect to the Windows Desktop.
Here are the steps to follow to identify the Open Settings (Windows Start - Settings)
1. Select System
2. Select Storage
3. Select the drive you wish to see details for
4. The storage usage, broken down by data type, will be displayed.
32
Memory Utilization in VM instance:
To see the amount of free and used memory on your system, run the command free. The free
command displays:
1. Total amount of free and used physical memory
2. Total amount of swap memory in the system
3. Buffers and caches used by the kernel
From your terminal window, issue the command free.
Example:
student-04-64e0e90044eb@gcelab:~$ free
total used free shared buff/cache available
Mem: 3782928 118812 3539888 5032 124228 3482908
Swap: 0 0 0
Consider example, total memory is 3782928 MB, 118812 MB is used and 3539888 MB free. It
does not just mean that application now can only request for 3539888 MB free memory, if you
look at the usage figures you can see that 124228 MB memory use is for buffers and cache.
So, if applications request memory, then Linux OS will free up the buffers and cache to yield
memory for the new application requests.
Example: free –m (-m Displays memory in Mega Byte)
For Windows Instance:

1. Click on the Windows Start menu and type in System Information.
2. A list of search results pops up, among which is the System Information utility, click on
it.
3. Scroll down to Installed Physical Memory (RAM) and see how much memory is installed
on your computer.
33
CPU Utilization in VM instance:
To view the CPU% open a terminal window and run the command: top
The system should respond by displaying a list of all the processes that are currently running. It
will show the result as users, tasks, CPU load, and memory usage.
This list can frequently change, as background tasks start and complete. One helpful switch is to
launch top with the –i switch:
Example: top –i ( -i hides all the idle processes)
To quit the top function, press the letter q on your keyboard. Some other useful commands
while top is running include:
M – sort task list by memory usage
P – sort task list by processor usage
N – sort task list by process ID
T – sort task list by run time
34
Here,
PID: Shows task’s unique process id.
PR: Stands for priority of the task.
SHR: Represents the amount of shared memory used by a task.
VIRT: Total virtual memory used by the task.
USER: User name of owner of task.
%CPU: Represents the CPU usage.
TIME+: CPU Time, the same as ‘TIME’, but reflecting more granularity through hundredths of a
second.
NI: Represents a Nice Value of task. A Negative nice value implies higher priority, and positive
Nice value means lower priority.
%MEM: Shows the Memory usage of task.
After identifying the highest cpu utiliziation process id to get further details of the process run
the below command:
35
ps aux | grep “enter process id” ( Process id details)
a: prints the running processes from all users.
u: shows user or owner column in output.
x: prints the processes those have not been executed from the terminal.
For Windows Instance

Windows task manager gives the details of CPU Usage. Here are the steps to follow:
1. Press the Windows button, type task manager, and press Enter.
2. In the window that appears, click the Performance tab.
3. On the Performance tab, a list of hardware devices is displayed on the left side.
36
FIREWALL
In computing, a firewall is a network security system that monitors and controls incoming and

outgoing network traffic based on predetermined security rules. A firewall typically establishes
a barrier between a trusted network and an untrusted network, such as the Internet.
VPC firewall rules let you allow or deny connections to or from your virtual machine (VM)
instances based on a configuration that you specify. Enabled VPC firewall rules are always
enforced, protecting your instances regardless of their configuration and operating system,
even if they have not started up.
Every VPC network functions as a distributed firewall. While firewall rules are defined at the
network level, connections are allowed or denied on a per-instance basis. You can think of the
VPC firewall rules as existing not only between your instances and other networks, but also
between individual instances within the same network.
37
Firewall rules in Google Cloud
When you create a VPC firewall rule, you specify a VPC network and a set of components that
define what the rule does. The components enable you to target certain types of traffic, based
on the traffic's protocol, destination ports, sources, and destinations.
In addition to firewall rules that you create, Google Cloud has other rules that can affect
incoming (ingress) or outgoing (egress) connections:
 Google Cloud doesn't allow certain IP protocols, such as egress traffic on TCP port 25
within a VPC network. For more information, see always blocked traffic.
 Google Cloud always allows communication between a VM instance and its
corresponding metadata server at 169.254.169.254. For more information, see always
allowed traffic.
 Every network has two implied firewall rules that permit outgoing connections and
block incoming connections. Firewall rules that you create can override these implied
rules.
Specifications
VPC firewall rules have the following characteristics:
 Each firewall rule applies to incoming (ingress) or outgoing (egress) connection, not
both. For more information, see direction of connection.
 Firewall rules support IPv4 connections. IPv6 connections are also supported in VPC
networks that have IPv6 enabled. When specifying a source for an ingress rule or a
destination for an egress rule by address, you can specify IPv4 or IPv6 addresses or
blocks in CIDR notation.
 Each firewall rule can contain either IPv4 or IPv6 ranges, but not both.
 Each firewall rule's action is either allow or deny. The rule applies to connections as long
as it is enforced. For example, you can disable a rule for troubleshooting purposes.
 When you create a firewall rule, you must select a VPC network. While the rule is
enforced at the instance level, its configuration is associated with a VPC network. This
means that you cannot share firewall rules among VPC networks, including networks
connected by VPC Network Peering or by using Cloud VPN tunnels.
38
Implied rules
Every VPC network has two implied IPv4 firewall rules. If IPv6 is enabled in a VPC network, the
network also has two implied IPv6 firewall rules. These rules are not shown in the Cloud
Console.
Implied IPv4 firewall rules are present in all VPC networks, regardless of how the networks are
created, and whether they are auto mode or custom mode VPC networks. The default network
has the same implied rules.
 Implied IPv4 allow egress rule. An egress rule whose action is allow, destination
is 0.0.0.0/0, and priority is the lowest possible (65535) lets any instance send traffic to
any destination, except for traffic blocked by Google Cloud. A higher priority firewall rule
may restrict outbound access. Internet access is allowed if no other firewall rules deny
outbound traffic and if the instance has an external IP address or uses a Cloud NAT
instance. For more information, see Internet access requirements.
 Implied IPv4 deny ingress rule. An ingress rule whose action is deny, source is 0.0.0.0/0,
and priority is the lowest possible (65535) protects all instances by blocking incoming
connections to them. A higher priority rule might allow incoming access. The default
network includes some additional rules that override this one, allowing certain types of
incoming connections.
If IPv6 is enabled, the VPC network also has these two implied rules:
 Implied IPv6 allow egress rule. An egress rule whose action is allow, destination is ::/0,
and priority is the lowest possible (65535) lets any instance send traffic to any
destination, except for traffic blocked by Google Cloud. A higher priority firewall rule
may restrict outbound access. Internet access is allowed if no other firewall rules deny
outbound traffic and if the instance has an external IP address.
 Implied IPv6 deny ingress rule. An ingress rule whose action is deny, source is ::/0, and
priority is the lowest possible (65535) protects all instances by blocking incoming
connections to them. A higher priority rule might allow incoming access.
The implied rules cannot be removed, but they have the lowest possible priorities. You can
create rules that override them as long as your rules have higher priorities (priority
numbers less than 65535). Because deny rules take precedence over allow rules of the same
priority, an ingress allow rule with a priority of 65535 never takes effect.
39
Using Firewall Rules Logging
Firewall Rules Logging allows you to audit, verify, and analyze the effects of your firewall rules.
For example, you can determine if a firewall rule designed to deny traffic is functioning as
intended. Logging is also useful if you need to determine how many connections are affected by
a given firewall rule.
Using console
Use the Logs section of the Cloud Console to view firewall rule logs.(refer gcp logging)
The following filters demonstrate how you can search for specific firewall events.
All firewall logs
1. Go to the Logs page in the Google Cloud Console.

2. Select Subnetwork in the first pull-down menu.
3. Select compute.googleapis.com/firewall in the second pull-down menu.
4. Click OK.
Alternatively:

2. In right side of the Filter by label or text search field, click the down arrow and
select Convert to advanced filter.
3. Paste the following into the field. Replace PROJECT_ID with your project ID.
resource.type="gce_subnetwork"
logName="projects/PROJECT_ID/logs/compute.googleapis.com%2Ffirewall"
40
Specific subnets

2. In the first pull-down menu, move the cursor to Subnetwork, then move it to the right
to open up the individual subnet selection menu.
3. In the second pull-down menu, select compute.googleapis.com/firewall.
4. Click OK.
Alternatively:

3. Paste the following into the field. Replace PROJECT_ID with your project ID
and SUBNET_NAME with your subnetwork.
resource.labels.subnetwork_name="SUBNET_NAME"
Specific VMs

and INSTANCE_NAME with your VM.
jsonPayload.instance.vm_name="INSTANCE_NAME"
41
Connections from a specific country

and COUNTRYwith the ISO 3166-1 alpha-3 code.
jsonPayload.remote_location.country=COUNTRY
42
GCP Load Balancer
Cloud Load Balancing overview:
A load balancer distributes user traffic across multiple instances of your applications. By spreading the
load, load balancing reduces the risk that your applications experience performance issues.
Fig :- Simple overview of load balancing
Cloud Load Balancing is a fully distributed, software-defined managed service. It isn't hardware-based,
so you don't need to manage a physical load balancing infrastructure.
Google Cloud offers the following load balancing features:
 Single anycast IP address. With Cloud Load Balancing, a single anycast IP address is the frontend
for all of your backend instances in regions around the world.
 Software-defined load balancing. Cloud Load Balancing is a fully distributed, software-defined,
managed service for all your traffic.
 Seamless autoscaling. Cloud Load Balancing can scale as your users and traffic grow, including
easily handling huge, unexpected, and instantaneous spikes by diverting traffic to other regions
in the world that can take traffic.
 Layer 4 and Layer 7 load balancing. Use Layer 4-based load balancing to direct traffic based on
data from network and transport layer protocols such as TCP, UDP, ESP, or ICMP. Use Layer 7-
based load balancing to add request routing decisions based on attributes, such as the HTTP
header and the uniform resource identifier.
43
 External and internal load balancing. You can use external load balancing when your users
reach your applications from the internet and internal load balancing when your clients are
inside of Google Cloud.
 Global and regional load balancing. Distribute your load-balanced resources in single or
multiple regions, to terminate connections close to your users, and to meet your high availability
requirements.
 Advanced feature support. Cloud Load Balancing supports features such as IPv6 global load
balancing, WebSockets, user-defined request headers, and protocol forwarding for private VIPs.
Cloud Load Balancing products:

The following diagram summarizes the available Cloud Load Balancing products.
Load Balancer Types:

The following table summarizes the load balancing products available for each combination of features.
Internal
or Premium or Proxy or
externa Regional or global Standard pass- Traffic type Load balancer type
l IP network tier through
address
Internal Regional Premium only Pass- TCP or UDP Internal TCP/UDP load balancer
through
Proxy HTTP or HTTPS Internal HTTP(S) load balancer
External Global Premium only Proxy HTTP or HTTPS Global external HTTP(S) load
balancer (Preview)
Global in Premium Tier Premium or Proxy HTTP or HTTPS Global external HTTP(S) load balancer
Standard (classic)
Effectively regional1 in
SSL SSL proxy load balancer
44
Standard Tier TCP TCP proxy load balancer
Regional Premium or Pass- TCP, UDP, ESP, or External TCP/UDP Network load

Standard through ICMP (Preview) balancer
Standard only Proxy HTTP or HTTPS Regional external HTTP(S) load

balancer (Preview)
The following table provides more specific information about each load balancer.
Load balancer Traffic type Global or Network Load balancing Load balancer Proxy or
type regional service scheme frontend ports pass-
tier through
Global external HTTP or HTTPS Global Premium EXTERNAL_MANAGED HTTP on 80 or Proxy
HTTP(S) load only 8080; HTTPS on
balancer (Preview 443
)
Global external HTTP or HTTPS Global in Premium EXTERNAL HTTP on 80 or Proxy
HTTP(S) load Premium Tier. or 8080; HTTPS on
balancer (classic) Regional in Standard 443
Standard Tier.
Regional external HTTP or HTTPS Regional Standard EXTERNAL_MANAGED HTTP on 80 or Proxy
HTTP(S) load only 8080; HTTPS on
balancer (Preview 443
)
Internal HTTP(S) HTTP or HTTPS Regional Premium INTERNAL_MANAGED HTTP on 80 or Proxy
load balancer only 8080; HTTPS on
443
SSL proxy load TCP with SSL Global in Premium EXTERNAL 25, 43, 110, 143, Proxy
balancer offload Premium Tier. or 195, 443, 465, 587,
Regional in Standard 700, 993, 995,
Standard Tier. 1883, 3389, 5222,
5432, 5671, 5672,
5900, 5901, 6379,
8085, 8099, 9092,
9200, and 9300
TCP proxy load TCP without SSL Global in Premium EXTERNAL 25, 43, 110, 143, Proxy
balancer offload Premium Tier. or 195, 443, 465, 587,
Regional in Standard 700, 993, 995,
Standard Tier. 1883, 3389, 5222,
45
5432, 5671, 5672,
5900, 5901, 6379,
8085, 8099, 9092,
9200, and 9300
External TCP/UDP TCP, UDP, ESP, Regional Premium EXTERNAL Any Pass-
Network load or or through
balancer ICMP (Preview) Standard
Internal TCP/UDP TCP or UDP Regional Premium INTERNAL Any Pass-
load balancer backends, only through
regional
frontends
(global access
supported)
Choosing Load Balancer:

To determine which Cloud Load Balancing product to use, you must first determine what traffic type
your load balancers must handle and whether you need global or regional load balancing, external or
internal load balancing, and proxy or pass-through load balancing.
46
Troubleshooting Load Balancer:
External HTTP(S) Load Balancer:
External HTTP(S) Load Balancing is a proxy-based Layer 7 load balancer that enables you to run and scale
your services behind a single external IP address. External HTTP(S) Load Balancing distributes HTTP and
HTTPS traffic to backends hosted on a variety of Google Cloud platforms (such as Compute Engine,
Google Kubernetes Engine (GKE), Cloud Storage, and so on), as well as external backends connected
over the internet or via hybrid connectivity.
The types of issues that may arise are follows:
 Setup issues when backends have incompatible balancing modes
 General connectivity issues
 Issues with HTTP/2 connections to the backends
 External backend and internet NEG issues
 Serverless NEG issues
Refer the below link to troubleshoot in detail:
https://cloud.google.com/load-balancing/docs/https/troubleshooting-ext-https-lbs
Internal HTTP(S) Load Balancer:
Google Cloud Internal HTTP(S) Load Balancing is a proxy-based, regional Layer 7 load balancer that
enables you to run and scale your services behind an internal IP address.
Internal HTTP(S) Load Balancing distributes HTTP and HTTPS traffic to backends hosted on Compute
Engine and Google Kubernetes Engine (GKE). The load balancer is accessible only in the chosen region of
your Virtual Private Cloud (VPC) network on an internal IP address.
Internal HTTP(S) Load Balancing is a managed service based on the open source Envoy proxy. This
enables rich traffic control capabilities based on HTTP(S) parameters. After the load balancer has been
configured, it automatically allocates Envoy proxies to meet your traffic needs.
47
 Backends have incompatible balancing modes

 Load balanced traffic does not have the source address of the original client
 Requests are rejected by the load balancer
 Load balancer doesn't connect to backends
 Health check probes can't reach the backends
 Clients cannot connect to the load balancer
 Organizational policy restriction for Shared VPC
 Load balancer doesn't distribute traffic evenly across zones
 Limitations
https://cloud.google.com/load-balancing/docs/l7-internal/troubleshooting-l7-ilb
Internal TCP/UDP Load Balancer:
Google Cloud Internal TCP/UDP Load Balancing is a regional load balancer that is built on
the Andromeda network virtualization stack.
Internal TCP/UDP Load Balancing distributes traffic among internal virtual machine (VM) instances in the
same region in a Virtual Private Cloud (VPC) network. It enables you to run and scale your services
behind an internal IP address that is accessible only to systems in the same VPC network or systems
connected to your VPC network.
An Internal TCP/UDP Load Balancing service has a frontend (the forwarding rule) and a backend (the
backend service). You can use either instance groups or GCE_VM_IP zonal NEGs as backends on the
backend service.
 Load balancer setup issues
 General connectivity issues
 Backend failover issues
 Load balancer as next-hop issues
https://cloud.google.com/load-balancing/docs/internal/troubleshooting-ilb
48
Cannot view logs ?
If you cannot view firewall rules logs in the Logs section of the Cloud Console, check the
following:
Possible cause: Insufficient permissions
Ask the project owner to make sure your IAM principal at least has the Logs Viewer role for the
project. Refer to permissions for more information.
Possible cause: Subnetwork logs might be excluded from Logging
In the Cloud Console, navigate to Logging > Logs ingestion, and verify that either GCE
Subnetwork is not excluded or, if it is partially excluded, that the exclusion filter does not apply
to firewall logs.
Possible cause: Legacy networks not supported
You cannot use firewall rules logging in a legacy network. Only VPC networks are supported.
Possible cause: Make sure you're looking in the correct project
Because firewall rule logs are stored with the project that contains the network, it's important
to make sure you're looking for logs in the correct project. With Shared VPC, VM instances are
created in service projects, but they use a Shared VPC Network located in the host project. For
Shared VPC scenarios, firewall rules logs are stored in that host project.
If Shared VPC is involved, you'll need appropriate permissions to the host project in order to
view firewall rules logs. Even though the VM instances themselves are located in service
projects, firewall rules logs for them are located in the host project.
Log entries missing
Possible cause: Connections might not match the firewall rule you expect
Verify that the firewall rule you expect is in the list of applicable firewall rules for an instance.
Use the Cloud Console to view details for the relevant instance, then click the View details
button in the Network interfaces section on its VM instance details page. Inspect applicable
firewall rules in the Firewall rules and routes details section of its Network interface details
page.
Review the firewall rules overview to make sure you have created your firewall rules correctly.
You can use tcpdump on the VM to determine if connections it sends or receives have

addresses, ports, and protocols that would match the firewall you expect.
49
Possible cause: A higher priority rule with firewall rules logging disabled might apply
Firewall rules are evaluated according to their priorities. From the perspective of a VM instance,
only one firewall rule applies to the traffic.
A rule that you think would be the highest priority applicable rule might not actually be the
highest priority applicable rule. A higher priority rule that does not have logging enabled might
apply instead.
To troubleshoot, you can temporarily enable logging for all possible firewall rules applicable to
a VM. Use the Cloud Console to view details for the relevant VM, then click the View
detailsbutton in the Network interfaces section on its VM instance details page. Inspect
applicable firewall rules in the Firewall rules and routes details section of its Network interface
detailspage, and identify your custom rules in that list. Temporarily enable logging for all of
those custom firewall rules.
With logging enabled, you can identify the applicable rule. Once identified, be sure to disable
logging for all rules that do not actually need it.
Missing metadata for some log entries
Possible cause: Configuration propagation delay
If you update a firewall rule that has firewall logging enabled, it might take a few minutes
before Google Cloud finishes propagating the changes necessary to log traffic that matches the
rule's updated components.
50
CI/CD
Google Cloud Platform is one of the leading cloud providers in the public cloud market. It
provides a host of managed services, and if it is running exclusively on Google Cloud, it makes
sense to use the managed CI/CD tools that Google Cloud provides.
A typical Continuous Integration & Deployment setup on Google Cloud Platform looks like the
below.
Google Cloud CI/CD
1. Developer checks in the source code to a Version Control system such as GitHub
2. GitHub triggers a post-commit hook to Cloud Build.
3. Cloud Build builds the container image and pushes to Container Registry.
4. Cloud Build then notifies Cloud Run to redeploy
5. Cloud Run pulls the latest image from the Container Registry and runs it.
51
CLOUD BUILD
Cloud Build is a service that executes builds on Google Cloud Platform's infrastructure. It can
import source code from a variety of repositories or cloud storage spaces, execute a build to
desired specifications, and produce artifacts such as Docker containers or Java archives
How builds work
The following steps describe, in general, the lifecycle of a Cloud Build build:
1. Prepare your application code and any needed assets.
2. Create a build config file in YAML or JSON format, which contains instructions for Cloud
Build.
3. Submit the build to Cloud Build.
4. Cloud Build executes your build based on the build config you provided.
5. If applicable, any built artifacts are pushed to Artifact Registry.
Starting Builds
Manually start builds in Cloud Build using the gcloud command-line tool or the Cloud Build API,
or use Cloud Build's build triggers feature to create an automated continuous
integration/continuous delivery (CI/CD) workflow that starts new builds in response to code
changes.
Or integrate build triggers with many code repositories, including Cloud Source Repositories,
GitHub, and Bitbucket.
52
Viewing build results
View your build results using the gcloud tool, the Cloud Build API or use the Build History page
in the Cloud Build section in Cloud Console, which displays details and logs for every build Cloud
Build executes.
To view build logs, principals require one of the following IAM roles in addition to the Cloud
Build IAM permissions:
 If build logs are in the default Cloud Storage bucket, grant the Project > Viewer role.
 If builds logs are in a user-specified Cloud Storage bucket, grant the Storage Object
Viewer role.
Using Cloud console

Navigation:
Make sure you are under the right project. You can also check and change the project from the
nav bar of GCP home dashboard.
Go to navigation menu ➖> CI/CD section ➖> Cloud Build
In the Cloud Console, the Build History menu can show you information about a build's status
(such as success or failure), source, results, create time, images, and more.
 To view the Build History menu, open the Build History page in the Google Cloud
Console:
53
 The Build history page is displayed, which shows a list of your recent builds.
 To filter builds by region, use the Region drop-down menu at the top of the page to
choose the region you would like to filter by. You can only filter regionalized builds
associated with Cloud Functions deployments.
 To filter builds, use the Filter builds text box at the top of the page or by entering a
query manually.
54
 To view additional columns such as Trigger description and Artifacts, use the column
selector view_column.
 To view details about a specific build, go the Build History and click on a specified build.
The Build details page is displayed, with the Build Summary for your build. The Build
Summary includes:
o Build Log, the log of your build.

o Execution Details, the details of your build including your environment variables
and substitutions.
o Build Artifacts, the artifacts of your build such as container images, build logs, or
binaries.
55
 You can view your the build log or execution details specific to a build step by selecting
the step in the Steps table to the left.
Step status and build status

After a build completes, Cloud Build provides an overall status for the build and a status for
each individual build step.
The following table summarizes the statuses when a build or a step succeeds, times out, or fails:
Event Build Status Step Status
Build succeeds SUCCESS All steps are marked SUCCESS.
Build fails FAILURE  Failed step is marked FAILED.
 Steps that succeeded before the termination of the
build are marked SUCCESS
 Steps in the middle of execution are marked
CANCELLED.
 Steps not started to execute are marked QUEUED.
Build is cancelled by CANCELLED  Steps that succeeded before the cancellation of the
the user build are marked SUCCESS
CANCELLED.
 Steps that did not start to execute are marked
QUEUED.
Build times out TIMEOUT  Steps that succeeded before the build time out are
marked SUCCESS
CANCELLED.
QUEUED.
Step times out FAILED  The timed-out step is marked TIMEOUT
 Steps that succeeded prior to the timed-out step are
marked SUCCESS
CANCELLED.
QUEUED.
56
To view per-step and build status, run the gcloud builds describe command:
gcloud builds describe [BUILD_ID] where [BUILD_ID] is the ID of the build
Troubleshooting build errors

It provides troubleshooting strategies as well as solutions for some common error messages
that you might see when running a build.
Does your build pass locally?
When troubleshooting Cloud Build errors, your first step should always be to confirm that you
can build locally. If your build doesn't work locally, the root cause of the problem is not coming
from Cloud Build. You need to diagnose and fix the issue locally first.
Did you look at the build logs?
Use Logging or Cloud Storage build logs to get more information about the build error. Logs
written to stdout or stderr appear automatically in the Cloud Console.
Manual builds fail due to user not having access to build logs
You see the following error when trying to run a build manually:
AccessDeniedAccess denied. [EMAIL_ADDRESS] does not have storage.objects.get access to the Google Cloud
Storage object.
You see this error because Cloud Build requires that users running manual builds and using the
default Cloud Storage logs bucket have the Project Viewer IAM role in addition to the Cloud
Build Editor role. To address this error, you can do one of the following:
 Use the default logs bucket, and grant the Project Viewer role and the Cloud Build Editor
role to the user running the build. For instructions on granting this permission, see
Configure access to Cloud Build resources.
 Create your own Cloud Storage bucket to store logs. For instructions see Storing build
logs in a user-created bucket.
Builds fail due to missing service account permissions
Cloud Build uses a special service account to execute builds on your behalf. If the Cloud Build
service account does not have the necessary permission to perform a task, you'll see the
following error:
Missing necessary permission iam.serviceAccounts.actAs for [USER] on the service account
[CLOUD_BUILD_SERVICE_ACCOUNT]@PROJECT.iam.gserviceaccount.com
To address this error, grant the required permission to the service account. Use the information
in the following pages to determine the permission to grant to the Cloud Build service account:
57
 Cloud Build service account
 Understanding IAM roles
 Granting permissions to Cloud Build service account
Build failures due to missing permissions for service account commonly occur when trying to
deploy using Cloud Build.
Error when deploying on Cloud Functions
You see the following error when trying to deploy on Cloud Functions:
To address this error, grant the Cloud Functions Developer role to the Cloud Build service
account.
Error when deploying on App Engine
You see the following error when trying to deploy on App Engine:
To address this error, grant the App Engine Admin role to the Cloud Build service account.
Error when deploying on GKE
You see the following error when trying to deploy on GKE:
To address this error, grant the GKE Developer role to the Cloud Build service account.
Error when deploying on Cloud Run
You see the following error when trying to deploy on Cloud Run:
You see this error because the Cloud Build service account does not have the IAM permissions
required to deploy on Cloud Run. For information on granting the necessary permissions, see
Deploying on Cloud Run.
Error when storing images in Container Registry

You see the following error when your build is trying to store built images to Container Registry:
[EMAIL_ADDRESS] does not have storage.buckets.create access to project [PROJECT_NAME]
58
You see this error because the Cloud Build service account does not have the Storage Admin
role that is needed to store container images in Container Registry.
Builds fail due to invalid ssh authorization
You see the following error when running a build:
Could not parse ssh: [default]: invalid empty ssh-agent socket, make sure SSH_AUTH_SOCK is set
This error indicates a problem with SSH authorization. A common example is SSH authorization
error that happens when accessing private GitHub repositories with Cloud Build. For
instructions on setting up SSH for GitHub, see Accessing private GitHub repositories.
Builds fail due to No route to host error
You see the following or similar error when running a build in a private pool:
Unable to connect to the server: dial tcp 192.168.10.XX:<port>: connect: no route to host
Cloud Build runs its Cloud builders on the virtual machine in the Google-managed project using
the Docker containers. The Docker bridge interface (and consequently the containers
connected to this interface) is assigned an IP range of 192.168.10.0/24, which makes the
communication with the external hosts in the same subnet impossible. When allocating the IP
ranges for resources in your project(s) during private pool configuration, we recommend
selecting a range outside of 192.168.10.0/24.
Build trigger fails due to missing cloudbuild.builds.create permission
You see the following error when running a build trigger:
Failed to trigger build: Permission 'cloudbuild.builds.create' denied on resource 'projects/xxxxxxxx' (or it may
not exist)
Build triggers use the Cloud Build service account to create a build. The error above indicates
that the Cloud Build service account is missing the cloudbuild.builds.create IAM permission,
which is required for the service account to run a build trigger. You can resolve this error by
granting the Cloud Build Service Account IAM role to
[PROJECT_NUMBER]@cloudbuild.gserviceaccount.com. For instructions on granting this role,
see Configuring access for Cloud Build service account.
I/O timeout error

You see the following error when running a build:
Timeout - last error: dial tcp IP_ADDRESS: i/o timeout
This error commonly occurs when your build attempts to access resources in a private network.
Builds run via Cloud Build can access private resources in the public internet such as resources
in a repository or a registry, however, builds cannot access resources in a private network.
59
4xx client errors
This group of errors indicates that the build request is not successful presumably by fault of the
user sending the request. Some examples of 4xx client errors are:
 **Error**: 404 : Requested entity was not found

 **Error**: 404 : Trigger not found
 **Error**: 400 : Failed Precondition
When you see a 4xx client error, look at your build logs to see if it contains more information
about the reason for the error. Some common causes for client errors include:
 The source location you specified does not have anything new to commit and the
working tree is clean. In this case, check your source code location and try building
again.
 Your repository does not contain a build config file. If this is the case, upload a build
config file to your repository and run the build again.
 You've specified an incorrect trigger ID.
 You have recently added a new repository after installing the Github app, and Cloud
Build does not have permissions to access the new repo. If this is the case connect your
new repository to Cloud Build.
60
Deploying containers on Compute Engine works
The common methods of deploying software onto a Compute Engine VM instance include:
 Deploying software on VM boot using a startup script or cloud-init.

 Creating a custom boot disk image with software pre-installed.
Both of the above methods combine the tasks of configuring the app and setting up the
operating system environment.
A VM instance with apps deployed directly to the operating system
Alternatively, you can deploy software in a container onto a VM instance or to a MIG. A

container carries both application software and the required libraries and is isolated from OS
apps and libraries. A container can be easily moved between deployment environments
without dealing with conflicting library versions in the container and its OS.
A VM instance with apps deployed in a container
The following process describes how you deploy a container on Compute Engine:
61
1. You bundle your app and required libraries into a Docker image and publish the image
to Artifact Registry, Container Registry, or a third-party registry such Docker Hub.
2. You specify a Docker image name and the docker run configuration when creating a VM
instance
3. Compute Engine executes the following tasks after you make a request to create a VM
instance:
4. Compute Engine creates a VM instance that uses a Google-provided Container-
Optimized OS image. This image includes a Docker runtime and additional software that
is responsible for starting your container.
5. Compute Engine stores your container settings in instance metadata under the gce-
container-declaration metadata key.
6. When the VM starts, the Container-Optimized OS image uses the docker run command
configuration that is stored in the instance's metadata, pulls the container image from
the repository, and starts the container.
Steps to create a VM instance running a container
62
SSL certificates
Transport Layer Security (TLS) is an encryption protocol used in SSL certificates to protect
network communications.
Google Cloud uses SSL certificates to provide privacy and security from a client to a load
balancer. To achieve this, the load balancer must have an SSL certificate and the certificate's
corresponding private key. Communication between the client and the load balancer remains
private—illegible to any third party that doesn't have this private key.
They are of two types
 Self-managed and
 Google-managed SSL certificates
Troubleshooting SSL certificates
Troubleshooting Google-managed certificates

For Google-managed certificates, there are two types of status:
 Managed status
 Domain status
Managed status
To check the certificate status, run the following command:
gcloud compute ssl-certificates describe CERTIFICATE_NAME \ --global \ --
format="get(name,managed.status)"
Values for managed status are as follows:

Managed status Explanation
PROVISIONING The Google-managed certificate has been created

and Google Cloud is working with the Certificate
Authority to sign it.
Provisioning a Google-managed certificate might
take up to 60 minutes.
63
If the certificate remains in the PROVISIONING
state, make sure that the correct certificate is
associated with the target proxy. You can check
this by running the gcloud compute target-https-
proxies describe or gcloud compute target-ssl-
proxies describe command.
ACTIVE The Google-managed SSL certificate is obtained
from the Certificate Authority. It might take an
additional 30 minutes to be available for use by a
load balancer.
PROVISIONING_FAILED You might briefly see PROVISIONING_FAILED
even when your certificate is actually ACTIVE.
Recheck the status.
If the status remains PROVISIONING_FAILED, the

Google-managed certificate has been created,
but the Certificate Authority can't sign it. Ensure
that you completed all steps in Using Google-
managed SSL certificates.
Google Cloud retries provisioning until successful

or the status changes to
PROVISIONING_FAILED_PERMANENTLY.
PROVISIONING_FAILED_PERMANENTLY The Google-managed certificate is created, but
the Certificate Authority can't sign it because of a
DNS or load balancer configuration issue. In this
state, Google Cloud doesn't retry provisioning.
Create a replacement Google-managed SSL

certificate, and make sure that the replacement is
associated with your load balancer's target proxy.
Verify or complete all steps in Using Google-
managed SSL certificates. Afterwards, you can
delete the certificate that permanently failed
provisioning.
RENEWAL_FAILED The Google-managed certificate renewal failed
because of an issue with the load balancer or DNS
configuration. If any of the domains or
subdomains in a managed certificate aren't
pointing to the load balancer's IP address by
using an A/AAAA record, the renewal process
fails. The existing certificate continues to serve,
but expires shortly. Check your configuration.
If the status remains RENEWAL_FAILED, provision
a new certificate, switch to using the new
certificate, and delete the old certificate.
64
For more information about certificate renewal,
see Google-managed SSL certificate renewal.
Domain status
To check the domain status, run the following command:
gcloud compute ssl-certificates describe CERTIFICATE_NAME \ --global \ --
format="get(managed.domainStatus)"
Values for domain status are described in this table.

Domain status Explanation
PROVISIONING The Google-managed certificate is created for the

domain. Google Cloud is working with the
Certificate Authority to sign the certificate.
Provisioning a Google-managed certificate might
take up to 60 minutes.
ACTIVE The Google-managed SSL certificate is obtained
from the Certificate Authority. Provisioning for
this domain is complete. It might take an
additional 30 minutes for the certificate to be
available for use by a load balancer.
FAILED_NOT_VISIBLE Certificate provisioning hasn't completed for the
domain. Any of the following might be the issue:
 The domain's DNS record doesn't resolve
to the IP address of the Google Cloud
load balancer. To resolve this issue,
update the DNS A and AAAA records to
point to your load balancer's IP address.
 DNS must not resolve to any other IP
address than the load balancer's. For
example, if an A record resolves to the
correct load balancer, but the AAAA
resolves to something else, the domain
status is FAILED_NOT_VISIBLE.
 Newly updated DNS A and AAAA records
can take a significant amount of time to
be fully propagated. Sometimes
propagation across the internet takes up
to 72 hours worldwide, although it
typically takes a few hours. The domain
status continues to be
65
FAILED_NOT_VISIBLE until propagation is
complete.
 The SSL certificate isn't attached to the
load balancer's target proxy. To resolve
this issue, update your load balancer
configuration.
 The frontend ports for the global
forwarding rule do not include port 443
for an SSL proxy load balancer. This can
be resolved by adding a new forwarding
rule with port 443.
 If the managed status is PROVISIONING,
Google Cloud continues to retry
provisioning, even if the domain status is
FAILED_NOT_VISIBLE.
FAILED_CAA_CHECKING Certificate provisioning failed because of a
configuration issue with your domain's CAA
record. Ensure that you have followed the correct
procedure.
FAILED_CAA_FORBIDDEN Certificate provisioning failed because your
domain's CAA record doesn't specify a CA that
Google Cloud needs to use. Ensure that you have
followed the correct procedure.
FAILED_RATE_LIMITED Certificate provisioning failed because a
Certificate Authority has rate-limited certificate
signing requests. You can provision a new
certificate, switch to using the new certificate,
and delete the old certificate, or you can contact
Google Cloud Support.
Managed certificate renewal
If any of the domains or subdomains in a managed certificate aren't pointing to the load
balancer's IP address, the renewal process fails. To avoid renewal failure, make sure that all
your domains and subdomains are pointing to the load balancer's IP address.
Troubleshooting self-managed SSL certificates
66
Error: Certificate cannot be parsed
Google Cloud requires certificates in PEM format. If the certificate is PEM formatted, check the
following:
You can validate your certificate using the following OpenSSL command, replacing
CERTIFICATE_FILE with the path to your certificate file:
openssl x509 -in CERTIFICATE_FILE -text -noout
If OpenSSL is unable to parse your certificate:
 Contact your CA for help.

 Create a new private key and certificate.
Error: Missing common name or subject alternative name
Google Cloud requires that your certificate have either a common name (CN) or subject
alternative name (SAN) attribute. See Create a CSR for additional information.
When both attributes are absent, Google Cloud displays an error message like the following
when you try to create a self-managed certificate:
ERROR: (gcloud.compute.ssl-certificates.create) Could not fetch resource:
- The SSL certificate is missing a Common Name(CN) or Subject Alternative
Name(SAN).
Error: Private key cannot be parsed
Google Cloud requires PEM-formatted private keys that meet the private key criteria.
You can validate your private key using the following OpenSSL command, replacing
PRIVATE_KEY_FILE with the path to your private key:
openssl rsa -in PRIVATE_KEY_FILE -check
The following responses indicate a problem with your private key:
 unable to load Private Key

 Expecting: ANY PRIVATE KEY
 RSA key error: n does not equal p q
 RSA key error: d e not congruent to 1
67
 RSA key error: dmp1 not congruent to d
 RSA key error: dmq1 not congruent to d
 RSA key error: iqmp not inverse of q
To fix the problem, you must create a new private key and certificate.
Error: Private keys with passphrases
If OpenSSL prompts for a passphrase, you'll need to remove the passphrase from your private
key before you can use it with Google Cloud. You can use the following OpenSSL command:
openssl rsa -in PRIVATE_KEY_FILE \ -out REPLACEMENT_PRIVATE_KEY_FILE
Replace the placeholders with valid values:
 PRIVATE_KEY_FILE: The path to your private key that's protected with a passphrase
 REPLACEMENT_PRIVATE_KEY_FILE: A file path where you'd like to save a copy of your
plaintext private key
Error: Expiring intermediate certificate(s)
If an intermediate certificate expires before the server (leaf) certificate, this might indicate that
your CA isn't following best practices.
When an intermediate certificate expires, your leaf certificate used in Google Cloud might
become invalid. This depends on the SSL client, as follows:
 Some SSL clients only look at the expire time of the leaf certificate and ignore expired
intermediate certificates.
 Some SSL clients treat a chain with any expired intermediate certificate(s) as invalid and
display a warning.
To resolve this issue:
 Wait for the CA to switch to a new intermediate certificate.

 Request a new certificate from them.
 Re-upload the new certificate with the new keys.
Your CA might also allow cross-signing for intermediate certificates. Check with your CA to
confirm.
68
Error: RSA public exponent is too large
The following error message appears when the RSA public exponent is larger than 65537. Make
sure to use 65537, as specified in RFC 4871.
ERROR: (gcloud.compute.ssl-certificates.create) Could not fetch resource:
- The RSA public exponent is too large.
69
GKE Overview
Google Kubernetes Engine (GKE) provides a managed environment for deploying, managing,
and scaling your containerized applications using Google infrastructure. The GKE environment
consists of multiple machines (specifically, Compute Engine instances) grouped together to
form a cluster.
Kubernetes concept :
70
Node:
A node is the smallest unit of computing hardware in Kubernetes. It is a representation

of a single machine in your cluster. In most production systems, a node will likely be
either a physical machine in a datacenter, or virtual machine hosted on Google Cloud
Platform.
The Cluster:
Although working with individual nodes can be useful, it’s not the Kubernetes way. In
general, you should think about the cluster as a whole, instead of worrying about the
state of individual nodes.
In Kubernetes, nodes pool together their resources to form a more powerful machine.
When you deploy programs onto the cluster, it intelligently handles distributing work to
the individual nodes for you. If any nodes are added or removed, the cluster will shift
71
around work as necessary. It shouldn’t matter to the program, or the programmer,
which individual machines are actually running the code.
Containers
Programs running on Kubernetes are packaged as Linux containers. Containers are a widely
accepted standard, so there are already many pre-built images that can be deployed on
Kubernetes.
Containerization allows you to create self-contained Linux execution environments. Any
program and all its dependencies can be bundled up into a single file and then shared on the
internet. Anyone can download the container and deploy it on their infrastructure with very
little setup required. Creating a container can be done programmatically, allowing powerful CI
and CD pipelines to be formed.
Multiple programs can be added into a single container, but you should limit yourself to one
process per container if at all possible. It’s better to have many small containers than one large
one. If each container has a tight focus, updates are easier to deploy and issues are easier to
diagnose.
72
Pods
Unlike other systems you may have used in the past, Kubernetes doesn’t run containers
directly; instead it wraps one or more containers into a higher-level structure called a
pod. Any containers in the same pod will share the same resources and local network.
Containers can easily communicate with other containers in the same pod as though
they were on the same machine while maintaining a degree of isolation from others.
Pods are used as the unit of replication in Kubernetes. If your application becomes too
popular and a single pod instance can’t carry the load, Kubernetes can be configured to
deploy new replicas of your pod to the cluster as necessary. Even when not under heavy
load, it is standard to have multiple copies of a pod running at any time in a production
system to allow load balancing and failure resistance.
Pods can hold multiple containers, but you should limit yourself when possible. Because
pods are scaled up and down as a unit, all containers in a pod must scale together,
regardless of their individual needs. This leads to wasted resources and an expensive
bill. To resolve this, pods should remain as small as possible, typically holding only a
main process and its tightly-coupled helper containers (these helper containers are
typically referred to as “side-cars”).
73
Deployments
Although pods are the basic unit of computation in Kubernetes, they are not typically directly
launched on a cluster. Instead, pods are usually managed by one more layer of abstraction: the
deployment.
A deployment’s primary purpose is to declare how many replicas of a pod should be running at
a time. When a deployment is added to the cluster, it will automatically spin up the requested
number of pods, and then monitor them. If a pod dies, the deployment will automatically re-
create it.
Using a deployment, you don’t have to deal with pods manually. You can just declare the
desired state of the system, and it will be managed for you automatically.
Cluster orchestration with GKE
GKE clusters are powered by the Kubernetes open source cluster management system.
Kubernetes provides the mechanisms through which you interact with your cluster. You use
Kubernetes commands and resources to deploy and manage your applications, perform
administration tasks, set policies, and monitor the health of your deployed workloads.
74
Kubernetes draws on the same design principles that run popular Google services and provides
the same benefits: automatic management, monitoring and liveness probes for application
containers, automatic scaling, rolling updates, and more. When you run your applications on a
cluster, you're using technology based on Google's 10+ years of experience running production
workloads in containers.
How your GKE logs get to Cloud Logging
Any containerized code that is running in a GKE cluster—either your code or pre-packaged
software—typically generates a variety of logs. These logs are usually written to standard
output (‘stdout’) and standard error ‘stderr’, and include error, informational and debugging
messages.
When you set up a new GKE cluster in Google Cloud, system and app logs are enabled by
default. A dedicated agent is automatically deployed and managed on the GKE node to collect
logs, add helpful metadata about the container, pod and cluster and then send the logs to
Cloud Logging. Both system logs and your app logs are then ingested and stored in Cloud
Logging, with no additional configuration needed.
(Refer cloud logging for navigation)
75
Find your GKE logs in Cloud Logging
In order to find these logs in the Cloud Logging service, all you need to do is filter your logs by
GKE-related Kubernetes resources, by clicking this link, or by running the following query in the
log viewer:
resource.type=("k8s_container" OR "container" OR "k8s_cluster" OR "gke_cluster" OR
"gke_nodepool" OR "k8s_node")
This query surfaces logs that are related to Kubernetes resources in GKE: clusters, nodes, pods
and containers. Alternatively, you can access any of your workloads in your GKE cluster and
click on the container logs links in your deployment, pod or container details; this also brings
you directly to your logs in the Cloud Logging console.
If no log entries return with your query, it’s time to look for reasons your logs aren’t being
generated or collected into Cloud Logging. (refer gcp logging topic- unable to view logs issues)
76
Make sure you’re collecting GKE logs
As mentioned above, when you create a GKE cluster, system and app logs are set to be
collected by default. You can update how you configure log collection either when you create
the cluster or by updating the cluster configuration.
If you don’t see any of your logs in Cloud Logging, check whether the GKE integration with
Cloud Logging is properly enabled. Follow these instructions to check the status of your
cluster’s configuration.
If the GKE integration is not enabled, you can enable log collection for the cluster by editing the
cluster in the Google Cloud Console, or by using the gcloud container clusters update command
line.
If you have already enabled the GKE integration with Cloud Logging and Cloud Monitoring and
still don’t see any of your GKE logs, check whether your logs they’ve been excluded. Logging
77
exclusions may have been added to exclude logs from ingestion into Cloud Logging either for all
or specific GKE logs. Adjusting these exclusions allows you to ingest the GKE logs that you need
into Cloud Logging.
You can configure a GKE cluster to only capture system logs. If you have already enabled the
GKE integration with Cloud Logging and Cloud Monitoring and only see system logs in Cloud
Logging, check whether you have selected this option. To check whether application log
collection is enabled or disabled, and to then enable app log collection, follow these
instructions.
78

GCP Infra Guide

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

GCP Infra Guide

Uploaded by

Copyright:

Available Formats

Application Team Infra Guide

Here is the link: status.cloud.google

Navigating to IAM in GCP:

Role Name Permissions

What are service accounts?

To create a new service account:

 Storage Object Viewer (roles/storage.objectViewer) Grants access to view objects and

Setting up a new instance to run as a service account:

OVERVIEW OF GCP LOGGING

Navigation to Logging through console :

 The page will look like this below.

 Click on Logs Storage from the listed options.

Viewing Logs through Command Line Interface:-

Reading Log entry:-

gcloud logging read "resource.type=global" --folder=[FOLDER_ID] --limit=1

Here is the result of above command:

Prerequisites before installing the agent:

Installing the agent using the command line:

 What is the load on service?

Overview of Cloud Monitoring

Alerting policies and uptime checks

Charts and dashboards

GCP Stackdriver monitoring

Using Cloud console

Go to navigation menu ➖> OPERATIONS section ➖> Monitoring

 In the left menu select Dashboards, and then Create Dashboard.

 Click on hamburger button on the top left of the console.

 Click Create Policy to see the Create alerting policy page

 Click Add condition and complete the dialog.

 Click Next to advance to the notifications section.

 Click on the hamburger button on the top left of the console.

 All the instances available under the project will be displayed.

Through Logging (GCP Service):-

Through SSH Troubleshooting tool:-

SSH Troubleshooting tool:-

 Network connectivity tests: Checks if the VM is connected to the network

 VM instance status tests: Checks if the VM's CPU status to see if VM is running

 To perform this task, you must have the following permissions:

Running Troubleshooting tool:-

 Replace VM_NAME with the name of the VM that you can't connect to.

 It displays all the list of VM instances present under the project.

 Click on hamburger button on the top left of the console.

Stackdriver Dashboard with metric (cpu/memory/disk):

For Windows instance:

For Windows Instance:

For Windows Instance

In computing, a firewall is a network security system that monitors and controls incoming and

All firewall logs

1. Go to the Logs page in the Google Cloud Console.

1. Go to the Logs page in the Google Cloud Console.

1. Go to the Logs page in the Google Cloud Console.

1. Go to the Logs page in the Google Cloud Console.

1. Go to the Logs page in the Google Cloud Console.

1. Go to the Logs page in the Google Cloud Console.

Fig :- Simple overview of load balancing

Google Cloud offers the following load balancing features:

Cloud Load Balancing products:

Load Balancer Types:

Proxy HTTP or HTTPS Internal HTTP(S) load balancer

Regional Premium or Pass- TCP, UDP, ESP, or External TCP/UDP Network load

Standard only Proxy HTTP or HTTPS Regional external HTTP(S) load

Choosing Load Balancer:

The types of issues that may arise are follows:

 Error: 404 : Requested entity was not found