Professional Documents
Culture Documents
Google Data Analytics Professional Certificate COURSE 2
Google Data Analytics Professional Certificate COURSE 2
Google Data Analytics Professional Certificate COURSE 2
The six data analysis phases Clean data is the best data and you will need to
There are six data analysis phases that will help clean up your data to get rid of any possible errors,
you make seamless decisions: ask, prepare, inaccuracies, or inconsistencies. This might mean:
process, analyze, share, and act. Keep in mind, Using spreadsheet functions to find
these are different from the data life cycle, which incorrectly entered data
describes the changes data goes through over its Using SQL functions to check for extra
lifetime. Let’s walk through the steps to see how spaces
they can help you solve problems you might face Removing repeated entries
on the job. Checking as much as possible for bias in
the data
Step 1: Ask Questions to ask yourself in this step:
It’s impossible to solve a problem if you don’t 1. What data errors or inaccuracies might
know what it is. These are some things to get in my way of getting the best possible
consider: answer to the problem I am trying to
Define the problem you’re trying to solve solve?
Make sure you fully understand the 2. How can I clean my data so the
stakeholder’s expectations information I have is more consistent?
Focus on the actual problem and avoid
any distractions Step 4: Analyze
Collaborate with stakeholders and keep You will want to think analytically about your
an open line of communication data. At this stage, you might sort and format your
Take a step back and see the whole data to make it easier to:
situation in context Perform calculations
Questions to ask yourself in this step: Combine data from multiple sources
1. What are my stakeholders saying their Create tables with your results
problems are?
2. Now that I’ve identified the issues, how
can I help the stakeholders resolve their Questions to ask yourself in this step:
questions? 1. What story is my data telling me?
2. How will my data help me solve this
Step 2: Prepare problem?
You will decide what data you need to collect in 3. Who needs my company’s product or
order to answer your questions and how to service? What type of person is most
organize it so that it is useful. You might use your likely to use it?
business task to decide:
What metrics to measure Step 5: Share
Locate data in your database Everyone shares their results differently so be sure
Create security measures to protect that to summarize your results with clear and enticing
data visuals of your analysis using data via tools like
Questions to ask yourself in this step: graphs or dashboards. This is your chance to show
1. What do I need to figure out how to solve the stakeholders you have solved their problem
this problem? and how you got there. Sharing will certainly help
2. What research do I need to do? your team:
Make better decisions
Step 3: Process Make more informed decisions
Lead to stronger outcomes
Successfully communicate your findings Step 6: Act
Now it’s time to act on your data. You will take
everything you have learned from your data
Questions to ask yourself in this step: analysis and put it to use. This could mean
1. How can I make what I present to the providing your stakeholders with
stakeholders engaging and easy to recommendations based on your findings so they
understand? can make data-driven decisions.
2. What would help me understand this if I Questions to ask yourself in this step:
were the listener? 1. How can I use the feedback I received
during the share phase (step 5) to actually
meet the stakeholder’s needs and
expectations?
These six steps can help you to break the data analysis process into smaller, manageable parts, which is called
structured thinking. This process involves four basic activities:
1. Recognizing the current problem or situation
2. Organizing available information
3. Revealing gaps and opportunities
4. Identifying your options
When you are starting out in your career as a data analyst, it is normal to feel pulled in a few different
directions with your role and expectations. Following processes like the ones outlined here and using
structured thinking skills can help get you back on track, fill in any gaps and let you know exactly what you
need.
PRACTICE QUIZ:
1. A data analytics team works to recognize the current problem. Then, they organize available information
to reveal gaps and opportunities. Finally, they identify the available options. These steps are part of what
process?
This describes structured thinking. Structured thinking begins with recognizing the current problem
or situation. Next, information is organized to reveal gaps and opportunities. Finally, the available
options are identified.
2. In which step of the data analysis process would an analyst ask questions such as, “What data errors might
get in the way of my analysis?” or “How can I clean my data so the information I have is consistent?”
Process
An analyst asks questions such as, “What data errors might get in the way of my analysis?” or “How can I
clean my data so the information I have is consistent?” during the process step. This is when data is
cleaned in order to eliminate any possible errors, inaccuracies, or inconsistencies.
3. A data analyst has entered the analyze step of the data analysis process. Identify the questions they might
ask during this phase. Select all that apply.
How will my data help me solve this problem?
The analyze step involves thinking analytically about data. Data analysts might ask how the data can help
them solve the problem and what story the data is trying to tell.
What story is my data telling me?
The analyze step involves thinking analytically about data. Data analysts might ask how the data can help
them solve the problem and what story the data is trying to tell.
4. A data analyst is trying to understand what data to use to help solve a business problem. They’re asking
questions such as, “What internal data is available in the database?” and “What outside facts do I need to
research?” The data analyst is in which phase of the data analysis process?
The data analyst is in the prepare step. This is when analysts consider what information to gather and
what research they can do to help problem-solve.
Questions should be open-ended. This is the best way to get responses that will help you accurately qualify or
disqualify potential solutions to your specific problem. So, based on the thought process, possible SMART
questions might be:
On a scale of 1-10 (with 10 being the most important) how important is your car having four-wheel
drive?
What are the top five features you would like to see in a car package?
What features, if included with four-wheel drive, would make you more inclined to buy the car?
How much more would you pay for a car with four-wheel drive?
Has four-wheel drive become more or less popular in the last three years?
First, decide who you will speak with and how they might use data. Your goal is to plan for a successful
conversation. Think about how much time you need and how you will use it. For this step, review the
following advice:
Prioritize your questions: Prepare to ask the most important and interesting questions first.
Make your time count: Stay on subject during the conversation.
Clarify your understanding: To avoid confusion, build in some time to summarize answers to make
sure you understood them correctly. This will go a long way in helping you avoid mistakes. For
example, in a conversation with a teacher, you might check your understanding with a statement like,
“Just to double check that I understand what you’re saying correctly, you currently use test scores in
the following ways…”
Depending on the field they are in, the person you chat with may not be comfortable sharing detailed data with
you. That's okay! Be sure to respect what they are willing to share during your conversation.
Create questions
Now, come up with questions to help you understand their business goals, the type of data they interact with,
and any limitations of the data.
Use the SMART question framework to make sure each question you ask makes sense based on their field.
Each question should meet as many of the SMART criteria as possible.
As a reminder, SMART questions are
Specific: Questions are simple, significant, and focused on a single topic or a few closely related
ideas.
Measurable: Questions can be quantified and assessed.
Action-oriented: Questions encourage change.
Relevant: Questions matter, are important, and have significance to the problem you’re trying to
solve.
Time-bound: Questions specify the time to be studied.
For instance, if you have a conversation with someone who works in retail, you might lead with questions
like:
Specific: Do you currently use data to drive decisions in your business? If so, what kind(s) of data do
you collect, and how do you use it?
Measurable: Do you know what percentage of sales is from your top-selling products?
Action-oriented: Are there business decisions or changes that you would make if you had the right
information? For example, if you had information about how umbrella sales change with the weather,
how would you use it?
Relevant: How often do you review data from your business?
Time-bound: Can you describe how data helped you make good decisions for your store(s) this past
year?
If you are having a conversation with a teacher, you might ask different questions, such as:
Specific: What kind of data do you use to build your lessons?
Measurable: How well do student benchmark test scores correlate with their grades?
Action-oriented: Do you share your data with other teachers to improve lessons?
Relevant: Have you shared grading data with an entire class? If so, do students seem to be more or
less motivated, or about the same?
Time-bound: In the last five years, how many times did you review data from previous academic
years?
If you are having a conversation with a small business owner of an ice cream shop, you could ask:
Specific: What data do you use to help with purchasing and inventory?
Measurable: Can you order (rank) these factors from most to least influential on sales: price, flavor,
and time of year (season)?
Action-oriented: Is there a single factor you need more data on so you can potentially increase sales?
Relevant: How do you advertise to or communicate with customers?
Time-bound: What does your year-over-year sales growth look like for the last three years?
Take good notes
It is important to take good notes during your conversation. Your notes should be comprehensive and useful.
To help you capture meaningful notes, you should stick to a process of asking a question, clarifying your
understanding of their response, and then briefly recording it in your notes.
Remember: If a question is worth asking, then the answer is worth recording. Commit yourself to taking great
notes during your conversation.
Helpful aspects of your conversation to note include:
Facts: Write down any concrete piece of information, such as dates, times, names, and other
specifics.
Context: Facts without context are useless. Note any relevant details that are needed in order to
understand the information you gather.
Unknowns: Sometimes you may miss an important question during a conversation. Make a note
when this happens so you can figure out the answer later.
For example, if the previous SMART questions led the ice cream shop owner to propose a project to analyze
customer flavor preferences, your notes might appear something like this:
Project: Collect customer flavor preference data.
Overall business goal: Use data to offer or create more popular flavors.
Two data sources: Cash register receipts and completed customer surveys (email).
Target completion date: Q2
To do: Call back later and speak with the manager about the location of survey data.
The notes you will take will differ greatly based on the data conversation you have. The important thing is that
your notes are clear, organized, and concise.
Now you are ready to have a great conversation about data in real life.
Tools to help you visualize and share your data analysis with stakeholders:
Reports - is a static collection of data given to stakeholders periodically
Example: a finance firm’s monthly sales
- great for giving snapshots of high level historical data for an organization.
- can be designed and sent out periodically, often on a weekly or monthly basis, as organized
and easy to reference information
- quick to design and easy to use as long as you continually maintain them
- because reports use static data or data that doesn't change once it's been recorded, they
reflect data that's already been cleaned and sorted
-
- Reports need regular maintenance and aren't very visually appealing.
- Because they aren't automatic or dynamic, reports don't show live, evolving data.
Dashboards - monitors live, incoming data. Let's talk about reports first.
- they give your team more access to information being recorded
- you can interact through data by playing with filters, and
- because they're dynamic, they have long-term value
- if stakeholders need to continually access information, a dashboard can be more efficient
than having to pull reports over and over, which is a big time saver for you
- they're just nice to look at
- they take a lot of time to design and can actually be less efficient than reports, if they're not
used very often
- If the base table breaks at any point, they need a lot of maintenance to get back up and
running again.
- can sometimes overwhelm people with information too. If you aren't used to looking
through data on a dashboard, you might get lost in it.
Dashboards are powerful visual tools that help you tell your data story.
A dashboard organizes information from multiple datasets into one central location, offering huge time-
savings. Data analysts use dashboards to track, analyze, and visualize data in order to answer questions and
solve problems. For a basic idea of what dashboards look like, refer to this article: 6 real-world examples of
business intelligence dashboards. Tableau is one tool that is used to create dashboards and is covered later in
the program.
The following table summarizes the benefits of using a dashboard for both data analysts and their
stakeholders.
* It is important to remember that changed data is pulled into dashboards automatically only if the data
structure is the same. If the data structure changes, you have to update the dashboard design before the data
can update live.
Creating a dashboard
1. Identify the stakeholders who need to see the data and how they will use it
To get started with this, you need to ask effective questions. Check out this Requirements Gathering
Worksheet to explore a wide range of good questions you can use to identify relevant stakeholders and their
data needs. This is a great resource to help guide you through this process again and again.
To learn more about choosing the right visualizations, check out Tableau’s galleries:
For more samples of area charts, column charts, and other visualizations, visit Tableau’s Viz Gallery. This
gallery is full of great examples that were created using real data; explore this resource on your own to get
some inspiration.
Explore Tableau’s Viz of the Day to see visualizations curated by the community. These are
visualizations created by Tableau users and are a great way to learn more about how other data analysts
are using data visualization tools.
Data is a collection of facts. Metrics are quantifiable data types used for measurement.
Now for the good news! Here are some benefits that come with big data:
When large amounts of data can be stored and analyzed, it can help companies identify more efficient
ways of doing business and save a lot of time and money.
Big data helps organizations spot the trends of customer buying patterns and satisfaction levels,
which can help them create new products and solutions that will make customers happy.
Volume Variety Velocity Veracity
The amount of data The different kinds of data How fast the data can be processed The quality and reliability of the data
By analyzing big data, businesses get a much better understanding of current market conditions,
which can help them stay ahead of the competition.
As in our earlier social media example, big data helps companies keep track of their online presence
—especially feedback, both good and bad, from customers. This gives them the information they
need to improve and protect their brand.
The three (or four) V words for big data
When thinking about the benefits and challenges of big data, it helps to think about the three Vs: volume,
variety, and velocity. Volume describes the amount of data. Variety describes the different kinds of data.
Velocity describes how fast the data can be processed. Some data analysts also consider a fourth V: veracity.
Veracity refers to the quality and reliability of the data. These are all important considerations related to
processing huge, complex data sets.
Formulas in spreadsheets
A cell reference is a single cell or range of cells in a worksheet that can be used in a formula. Cell references
contain the letter of the column and the number of the row where the data is. The great thing about using cell
references is that they also automatically update when a formula is copied to a new cell. A range of cells is a
collection of two or more cells. A range can include cells from the same row or column, or from different
columns and rows collected together. We'll show you an example in an upcoming video.
Quick reference: Formulas in spreadsheets
As a quick reminder, a formula is a set of instructions that does a specific calculation using the data in a
spreadsheet. Formulas make it easy for data analysts to do powerful calculations automatically, which helps
them analyze data more effectively. Below is a quick-reference guide to help you get the most out of
formulas.
Formulas
The Basics
When you write a formula in math, it generally ends with an equal sign (2 + 3 = ?). But with
formulas, they always start with one instead (=A2+A3). The equal sign tells the spreadsheet that what
follows is part of a formula, not just a word or number in a cell.
After you type the equal sign, most spreadsheet applications will display an autocomplete menu that
lists valid formulas, names, and text strings. This is a great way to create and edit formulas while
avoiding typing and syntax errors.
A fun way to learn new formulas is just by typing an equal sign and a single letter of the alphabet.
Choose one of the options that pops up and you will learn what that formula does.
Mathematical operators
The mathematical operators used in spreadsheet formulas include:
Subtraction – minus sign ( - ) Division – forward-slash ( / )
Addition – plus sign ( + ) Multiplication – asterisk ( * )
Auto-filling
The lower-right corner of each cell has a fill handle. It is a small green square in Microsoft Excel and a small blue
square in Google Sheets.
Click the fill handle for a cell and drag it down a column to auto-fill other cells in the column with the
same value or formula in that cell.
Click the fill handle for a cell and drag it across a row to auto-fill other cells in the row with the same
value or formula in that cell.
If you want to create a numbered sequence in a column or row, do the following: 1) Fill in the first two
numbers of the sequence in two adjacent cells, 2) Select to highlight the cells, and 3) Drag the fill handle
to the last cell to complete the sequence of numbers. For example, to insert 1 through 100 in each row of
column A, enter 1 in cell A1 and 2 in cell A2. Then, select to highlight both cells, click the fill handle in
cell A2, and drag it down to cell A100. This auto-fills the numbers sequentially so you don't have to type
them in each cell.
Absolute referencing
Absolute referencing is marked by a dollar sign ($). For example, =$A$10 has absolute referencing for
both the column and the row value
Relative references (which is what you normally do e.g. “=A10”) will change anytime the formula is
copied and pasted. They are in relation to where the referenced cell is located. For example if you copied
“=A10” to the cell to the right it would become “=B10”. With absolute referencing “=$A$10” copied to
the cell to the right would remain “=$A$10”. But if you copied $A10 to the cell below, it would change to
$A11 because the row value isn't an absolute reference.
Absolute references will not change when you copy and paste the formula in a different cell. The cell
being referenced is always the same.
To easily switch between absolute and relative referencing in the formula bar, highlight the reference you
want to change and press the F4 key; for example, if you want to change the absolute reference, $A$10, in
your formula to a relative reference, A10, highlight $A$10 in the formula bar and then press the F4 key to
make the change.
Data range
When you click into your formula, the colored ranges let you see which cells are being used in your
spreadsheet. There are different colors for each unique range in your formula.
In a lot of spreadsheet applications, you can press the F2 (or Enter) key to highlight the range of data in
the spreadsheet that is referenced in a formula. Click the cell with the formula, and then press the F2 (or
Enter) key to highlight the data in your spreadsheet.
Combining with functions
COUNTIF() is a formula and a function. This means the function runs based on criteria set by the
formula. In this case, COUNT is the formula; it will be executed IF the conditions you create are true. For
example, you could use =COUNTIF(A1:A16, “7”) to count only the cells that contained the number 7.
Combining formulas and functions allows you to do more work with a single command.
#N/A A formula can't find the data The cell being referenced can't be found
To remove conditional formatting, click Format and select Conditional Formatting, and then click the Trash
icon for the format rule.
Functions in spreadsheets
As a data analyst, it’s hard to overstate the importance of an SOW document. A well-defined SOW keeps you,
your team, and everyone involved with a project on the same page. It ensures that all contributors, sponsors,
and stakeholders share the same understanding of the relevant details.
Clarifying requirements and setting expectations are two of the most important parts of a project. Recall the
first phase of the Data Analysis Process—asking questions.
As you ask more and more questions to clarify requirements, goals, data sources, stakeholders, and any other
relevant info, an SOW helps you formalize it all by recording all the answers and details. In this context, the
word “ask” means two things. Preparing to write an SOW is about asking questions to learn the necessary
information about the project, but it’s also about clarifying and defining what you’re being asked to
accomplish, and what the limits or boundaries of the “ask” are. After all, if you can’t make a distinction
between the business questions you are and aren’t responsible for answering, then it’s hard to know what
success means!
Deliverables: What work is being done, and what things are being created as a result of this project? When the
project is complete, what are you expected to deliver to the stakeholders? Be specific here. Will you collect
data for this project? How much, or for how long?
Avoid vague statements. For example, “fixing traffic problems” doesn’t specify the scope. This could mean
anything from filling in a few potholes to building a new overpass. Be specific! Use numbers and aim for
hard, measurable goals and objectives. For example: “Identify top 10 issues with traffic patterns within the
city limits, and identify the top 3 solutions that are most cost-effective for reducing traffic congestion.”
Milestones: This is closely related to your timeline. What are the major milestones for progress in your
project? How do you know when a given part of the project is considered complete?
Milestones can be identified by you, by stakeholders, or by other team members such as the Project Manager.
Smaller examples might include incremental steps in a larger project like “Collect and process 50% of
required data (100 survey responses)”, but may also be larger examples like ”complete initial data analysis
report” or “deliver completed dashboard visualizations and analysis reports to stakeholders”.
Timeline: Your timeline will be closely tied to the milestones you create for your project. The timeline is a
way of mapping expectations for how long each step of the process should take. The timeline should be
specific enough to help all involved decide if a project is on schedule. When will the deliverables be
completed? How long do you expect the project will take to complete? If all goes as planned, how long do you
expect each component of the project will take? When can we expect to reach each milestone?
Reports: Good SOWs also set boundaries for how and when you’ll give status updates to stakeholders. How
will you communicate progress with stakeholders and sponsors, and how often? Will progress be reported
weekly? Monthly? When milestones are completed? What information will status reports contain?
At a minimum, any SOW should answer all the relevant questions in the above areas. Note that these areas
may differ depending on the project. But at their core, the SOW document should always serve the same
purpose by containing information that is specific, relevant, and accurate. If something changes in the project,
your SOW should reflect those changes.
SOWs should also contain information specific to what is and isn’t considered part of the project. The scope
of your project is everything that you are expected to complete or accomplish, defined to a level of detail that
doesn’t leave any ambiguity or confusion about whether a given task or item is part of the project or not.
Notice how the previous example about studying traffic congestion defined its scope as the area within the city
limits. This doesn’t leave any room for confusion — stakeholders need only to refer to a map to tell if a stretch
of road or intersection is part of the project or not. Defining requirements can be trickier than it sounds, so it’s
important to be as specific as possible in these documents, and to use quantitative statements whenever
possible.
For example, assume that you’re assigned to a project that involves studying the environmental effects of
climate change on the coastline of a city: How do you define what parts of the coastline you are responsible
for studying, and which parts you are not?
In this case, it would be important to define the area you’re expected to study using GPS locations, or
landmarks. Using specific, quantifiable statements will help ensure that everyone has a clear understanding of
what’s expected.
Importance of Context
As a data analyst, part of your job is to communicate the data analysis process and your insights to
stakeholders. This often involves defining the problem and summarizing key questions and available data
early on. You might include this information in a formal document for stakeholders like a scope of work
(SOW) at the beginning of a project. As a reminder, an SOW is an agreed-upon outline of the tasks to be
performed during a project; it is important to ensure your stakeholders understand this key information at that
stage.
Before you start your learning log entry, take a moment to review your notes and your reflection for Learning
Log: Ask SMART questions about real life data. Imagine that you are going to design a data analysis project
based on this data conversation.
In the learning log template linked below, you will create a summary of key information you think a
stakeholder would need to know about this project. In this case, your stakeholder could be a member of the
executive team, like a project manager. Here are some questions to help you get started:
What is the problem?
Can it be solved with data? If so, what data?
Where is this data? Does it exist, or do you need to collect it?
Are you using private data that someone will need to give you access to, or publicly available data?
Who are the relevant sponsors and stakeholders for this project? Who is involved, and how?
What are the boundaries for your project? What do you consider “in-scope?” What do you consider “out-
of-scope?”
Is there any other information you think is relevant to the project?
Is there any information you need or questions you need answered before you can begin?
As you think about these questions, it’s likely you’ll discover that you don’t have all the information you
need. This is part of the process!
When kicking off data analysis projects, expect to have a lot of conversations. By identifying what you
know and what you don’t know, it makes it much easier to plan your next data conversation, so that you
can get the answers you need.