Best Practises On AKS Engineer Daily Work Session

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 29

Best Practices for AKS

engineer’s daily work


Azure Containers Team - EMEA

Speaker: Mutaz Nassar


Support Escalation Engineer - ACT EMEA

Contents organizer: Randa Al-Qudah


Support Escalation Engineer - ACT EMEA
The purpose of this session is to share
some best practices and tools that can be
What’s inside used in our day-to-day job to increase
efficiency and productivity.

AKS Engineer Best Practices - ACT Training


Agenda

Customer
Organize Troubleshoot Search Swarm/Escalate Documentation Communications

• Organize your references • Troubleshooting process • How to search in • How to engage • Case documentation • FQR and LQR
• Internal tools different tools resources • Do’s and Don’ts
• Email communications tips
• Remote session setup
• Out of Office auto reply

AKS Engineer Best Practices - ACT Training


• Outlook folders
• Bookmarks
How to • OneNote
organize your • Notepad++
stuff • Follow up Excel
• Custom search engine

AKS Engineer Best Practices - ACT Training


Outlook

• Create Outlook folders and rules; main folders and rules:


o AKS Talk (Azure Kubernetes Service Discussion: aks-talk@service.microsoft.com)
o ACI Talk (aci-talk@service.microsoft.com)
o ACR Talk (acr-talk@service.microsoft.com)
o ARO Talk (OpenShift on Azure Service Discussion: aro-talk@service.microsoft.com)
o Azure Containers Team (ACTGlobal@microsoft.com & ACTEMEA@microsoft.com)
o Emerging issues (Azure Containers Support Notify: ACTNotify@microsoft.com)
o Azure Outages (Azure Incident Notification (AzureInternal): wainc@microsoft.com &
Windows Azure Partner Notifications: aznot@microsoft.com)
• Use OLHelper for your cases.
• Use OOF email.
ACT EMEA Training
AKS Engineer - Mentoring
Best Practices ofTraining
- ACT Mentors
Bookmarks

Bookmark your links with key words that you can remember

Create folders based on categories


(example: AKS, ACI, ACR,ARO, Brownbags…etc.)

Use the "Bookmark Manager" to search for links (Ctrl + Shift + O)

AKS Engineer Best Practices - ACT Training


OneNote

Create your cases FQR, LQR, attachments, Create your own


summary and tips knowledge base of cases

ACT EMEA Training


AKS Engineer - Mentoring
Best Practices ofTraining
- ACT Mentors
Notepad++

Save commonly used commands Kusto queries for different


scenarios [optional]

ACT EMEA Training


AKS Engineer - Mentoring
Best Practices ofTraining
- ACT Mentors
Excel

Create an Excel sheet of your cases to organize your work and labor –
[optional]

Follow up Excel sheet

ACT EMEA Training


AKS Engineer - Mentoring
Best Practices ofTraining
- ACT Mentors
Custom Search Engine

Setup custom browser Search Engine for a faster search


(DfM/IcM/ASC/Wiki/AppLens).

ACT EMEA Training


AKS Engineer - Mentoring
Best Practices ofTraining
- ACT Mentors
• Troubleshooting process
• How and where to search
• Information about outages
Troubleshooting • How to swarm and escalate
• Outlook Distribution Lists and
Teams channels

AKS Engineer Best Practices - ACT Training


Troubleshooting Process

ASC Kusto Jarvis Internal Wiki Public Internet Engage resources


• Troubleshooters • Script for AKS • ACT Wiki • MS documents • AVA
• Geneva Actions
• Instant Answers • ACI Kusto Queries • AKS PG Wiki • GitHub • Triage calls
• AppLens • API SLA Dashboard • Stackoverflow • Outlook DLs
• ACI on Atlas Kusto Queries • PG TSG
• Azure Service Insights • VM Perf Dashboard • kubernetes.io • Technical Advisor
• ACI using Kusto helper • Other teams Wiki
• Control Plane Resource • EEE/PG escalation
• ACR Kusto Queries
Usage Dashboard
• ARO Kusto Queries
• AKS Resource Health
• DGrep

Note: You can use DfM quick launch toolbar to generate Kusto queries and open the troubleshooting tools easily.

ACT EMEA Training


AKS Engineer - Mentoring
Best Practices ofTraining
- ACT Mentors
How and Where to search
AKS Wiki, AKS PG Wiki, PG TSG,
Other teams Wiki
ASC instant answers

Search in Teams

Search in DfM

Search in Outlook

Search in IcM portal


Emerging and Known Issues, PG work items , and
AKS Issues on GitHub
Public Internet
*Use error codes to tune search results

AKS Engineer Best Practices - ACT Training


How to swarm and escalate

AVA Triage calls Collaboration Technical EEE/PG


with other teams Advisors Escalation

ACT EMEA Training


AKS Engineer - Mentoring
Best Practices ofTraining
- ACT Mentors
Outlook DLs, Teams channels and groups

Outlook DLs Teams Channels Teams Groups


AVA ACI – General
aks-talk@service.microsoft.com EMEA ACT Daily Triage
AVA AKS – General
aci-talk@service.microsoft.com AVA ACR – General
AVA AKS - Windows EMEA ARO Daily Triage
acr-talk@service.microsoft.com AVA ARO – General
Emerging Issues – AKS Daily ARO Case Triage
aro-talk@service.microsoft.com Emerging Issues – ACI
AVA ICM Approval Containers EMEA Jarvis JIT Access Requests
aznot@microsoft.com acr-sup
ACI Talk
wainc@microsoft.com AKS Gang
Azure Red Hat OpenShift

ACT EMEA Training


AKS Engineer - Mentoring
Best Practices ofTraining
- ACT Mentors
Outages resources

Windows Azure Partner Notifications – DL: aznot@microsoft.com

Azure Incident Notification (AzureInternal) – DL: wainc@microsoft.com

Azure Status public website: https://status.azure.com

Iridias - internal: https://iridias.microsoft.com

IcM portal – Outages tab

Outages tab and Resource Health tab in ASC

ACT EMEA Training


AKS Engineer - Mentoring
Best Practices ofTraining
- ACT Mentors
• Initial case notes
Documentation • Ongoing case notes

AKS Engineer Best Practices - ACT Training


[Summary]
=================
Issue Definition:
 (do not copy/past the customer verbatim here)
 This should be a detailed definition of the problem or ask, from the
customer (in your own words). If there is a specific error, it should be
Case Summary included here “verbatim” using copy/past, to avoid typos.
 This should also include relevant details to the issue. What type of

– Initial Notes application is it? What does it do? When did the issue first appear? A
snippet of call stack leading to the error if relevant etc.…
 
Environment:
Impacted resource:
Region:
Resource ID:
 
Assessments /Troubleshooting / Cause:
 Summary of steps taken to investigate/troubleshoot the issue to date.
 When root cause is determined/verified, add that here as well.
 Be sure to include any relevant Bug #’s or ICM ID’s here.
 
When you’re assigned a case, Resolution/Workaround/Fix:
you should summarize it and  Summarize what the final solution or resolution was for the customers
issue.
add the summary to the case  If the case was closed before providing a resolution, workaround or fix;
notes. provide details on the last known status and instructions for when the
customer can reengage.

AKS Engineer Best Practices - ACT Training


[Case status update]
=================
Case Summary Current status:
• What update had that triggered Action Plan change

– Ongoing Notes • Include any information in environment change


Action taken and output:
• Include any additional logs, findings, etc. taken since the previous notes.
Next action plan:
• What actions to be taken
• And clarify if it’s pending on our side or the customer’s

Summarize the active case and


add the summary to the case
notes.

AKS Engineer Best Practices - ACT Training


• FQR and LQR
• Do’s and Don’ts
Customer • Email communications tips
Communications • Remote session setup
• Out of Office auto reply

AKS Engineer Best Practices - ACT Training


FQR and LQR

FQR LQR
 It is not just "meeting IR" without giving the  You have built a relationship with the customer as you
customer a quality engagement. have resolved their technical issue. Now it’s time to
finishing the case with a strong summary, this gives you
 It is not robotic, scripted or using templated an opportunity to leave a little something for the
responses. customer.
 It is focused on being prepared, personalizing
the response, and progressing the case.  Set the expectation that a closing email will be sent that
 It makes the customer feel that they had a quality captures the issue and resolution. This is something
and meaningful initial response. they can use if the issue happens again.
 It reflects the customer's sense of urgency.
 It’s important that every case needs to have a Last
Quality Response, this is our chance to make a great
last impression.

ACT EMEA Training


AKS Engineer - Mentoring
Best Practices ofTraining
- ACT Mentors
Do’s & Don’ts in customers’ communications – Part 1

FQR DO’s FQR DON’Ts


• Read the notes provided by the customer.​ • Do not copy and paste the customer’s verbatim to describe
• Check the case attachments – if any. the issue.
• Do not write to the customer impersonating him or her –
• Paraphrase your understanding of the issue.​
"The customer informed that…"
• Research a bit before writing your first e-mail.​ • ​Do not scope the case to limit the help Microsoft can
• Ask meaningful questions if the data you reviewed is provide.​
not enough to determine next steps.​ • Do not ask questions that can be determined by looking at
• Address the customer in respectful manner, directly.​ the customer’s site information using our tools.​
• Do not add a bunch of links and articles without any purpose
• Inform what’s going to happen next – when/how you
or explanation.​
will follow up.​
• Do not offer to move the case to another engineer due to a
• Provide your schedule availability and politely ask the time zone difference.​
customer’s availability when needed.​ • Do not state right away you are not the best resource to
• Use spell checker. work on it because of a time zone difference.
• Don’t miss the SLA to achieve FQR.

ACT EMEA Training


AKS Engineer - Mentoring
Best Practices ofTraining
- ACT Mentors
Do’s & Don’ts in customers’ communications – Part 2

FQR should include LQR should include


• Issue verification. • Customer name
• Business Impact. • Thank the customer for using Microsoft Products.
• Preliminary analysis result, based on initial • Case summary, including:
comprehension about the issue. • Description of the problem (Symptom).
• Root cause (If available).
• Depending on the time/information availability:
• Resolution.
• Suggested resolution to the issue (FCR- First Contact
Resolution). • You and your manager contact information.
• Request for additional information needed in order to • Reminder to fill out the survey.
proceed working the case towards a resolution. • Invitation to reopen if issue reoccurs or not resolved.
• Next Action. • Signature.
• Signature.

ACT EMEA Training


AKS Engineer - Mentoring
Best Practices ofTraining
- ACT Mentors
Do’s & Don’ts in customers’ communications – Part 3

Our Emails should NOT include


• Internal Links (Wikis, work items, Jarvis, AppLens, ASC link, any internal link).
• Logs from Kusto.
• Screenshots from internal tools (ASC, VM performance dashboard, AppLens, ASI, etc).
• Screenshots with personal information (Subscription ID, Public IP, etc).
• Internal terminology (Product Group, Embedded Escalation Engineers, Blackbox Monitoring, etc).
• Internal acronyms (IcM, PG, EEE, etc.).
• References to internal process used for case handling.
• Non-official public links.

ACT EMEA Training


AKS Engineer - Mentoring
Best Practices ofTraining
- ACT Mentors
Email Communications Tips

Reply to all from DfM and don't start a new email thread to keep all communications in
one place.

Always send a meeting summary after your meeting with customer.

Add the online update to the main email thread when replying to the customer
to avoid confusion with multiple threads and to keep all communications in one place.

ACT EMEA Training


AKS Engineer - Mentoring
Best Practices ofTraining
- ACT Mentors
Setup a Teams meeting with customers

Send the meeting in the customer’s time zone


Use time zone converter:
https://dateful.com/time-zone-converter
https://www.worldtimebuddy.com/

ACT EMEA Training


AKS Engineer - Mentoring
Best Practices ofTraining
- ACT Mentors
Enable Out of Office auto reply on your email

o Create an Out of Office message if you have a planned absence or outside of


your working hours that includes:
• When you plan to be back.
• Who is covering for you.
• Where they can go for help (azurebu@microsoft.com,
arrbackup@microsoft.com, your TA, your Manager).
o If you have an unplanned absence, if possible, turn on your Out of Office
message.
o Use OOFSponder to manage your auto reply.

ACT EMEA Training


AKS Engineer - Mentoring
Best Practices ofTraining
- ACT Mentors
How to find a team’s SAP on DfM

 DfM SAP search does not search on any word in the string, it is only able to search in order (starts with).

 You can use an * or % in front of or after the keyword to conduct a wildcard search.

ACT EMEA Training


AKS Engineer - Mentoring
Best Practices ofTraining
- ACT Mentors
Any Questions?

© Copyright Microsoft Corporation. All rights reserved. AKS Engineer Best Practices - ACT Training

You might also like