Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

Part 1 :- Foundations: Data, Data, Everywhere

Week 1

Google Data Analytics

Origins of the data analysis process


When you decided to join this program, you proved that you are a curious person. So let’s tap into your
curiosity and talk about the origins of data analysis. We don’t fully know when or why the first person
decided to record data about people and things. But we do know it was useful because the idea is still
around today!  

We also know that data analysis is rooted in statistics, which has a pretty long history itself.
Archaeologists mark the start of statistics in ancient Egypt with the building of the pyramids. The
ancient Egyptians were masters of organizing data. They documented their calculations and theories
on papyri (paper-like materials), which are now viewed as the earliest examples of spreadsheets and
checklists. Today’s data analysts owe a lot to those brilliant scribes, who helped create a more
technical and efficient process.

It is time to enter the data analysis life cycle—the process of going from data to decision. Data goes
through several phases as it gets created, consumed, tested, processed, and reused. With a life cycle
model, all key team members can drive success by planning work both up front and at the end of the
data analysis process. While the data analysis life cycle is well known among experts, there isn't a
single defined structure of those phases. There might not be one single architecture that’s uniformly
followed by every data analysis expert, but there are some shared fundamentals in every data analysis
process. This reading provides an overview of several, starting with the process that forms the
foundation of the Google Data Analytics Certificate.

The process presented as part of the Google Data Analytics Certificate is one that will be valuable to
you as you keep moving forward in your career:

1. Ask: Business Challenge/Objective/Question


2. Prepare: Data generation, collection, storage, and data management
3. Process: Data cleaning/data integrity
4. Analyze: Data exploration, visualization, and analysis
5. Share: Communicating and interpreting results 
6. Act:  Putting your insights to work to solve the problem
Understanding this process—and all of the iterations that helped make it popular—will be a big part of
guiding your own analysis and your work in this program. Let’s go over a few other variations of the
data analysis life cycle.

EMC's data analysis life cycle


EMC Corporation's data analytics life cycle is cyclical with six steps:

1. Discovery
2. Pre-processing data
3. Model planning
4. Model building
5. Communicate results
6. Operationalize
EMC Corporation is now Dell EMC. This model, created by David Dietrich, reflects the cyclical nature of
real-world projects. The phases aren’t static milestones; each step connects and leads to the next, and
eventually repeats. Key questions help analysts test whether they have accomplished enough to move
forward and ensure that teams have spent enough time on each of the phases and don’t start modeling
before the data is ready. It is a little different from the data analysis life cycle this program is based on,
but it has some core ideas in common: the first phase is interested in discovering and asking questions;
data has to be prepared before it can be analyzed and used; and then findings should be shared and
acted on.

For more information, refer to this e-book, Data Science & Big Data Analytics.

SAS's iterative life cycle


An iterative life cycle was created by a company called SAS, a leading data analytics solutions provider.
It can be used to produce repeatable, reliable, and predictive results:

1. Ask
2. Prepare
3. Explore
4. Model
5. Implement
6. Act
7. Evaluate
The SAS model emphasizes the cyclical nature of their model by visualizing it as an infinity symbol.
Their life cycle has seven steps, many of which we have seen in the other models, like Ask, Prepare,
Model, and Act. But this life cycle is also a little different; it includes a step after the act phase designed
to help analysts evaluate their solutions and potentially return to the ask phase again. 

For more information, refer to Managing the Analytics Life Cycle for Decisions at Scale.

Project-based data analytics life cycle


A project-based data analytics life cycle has five simple steps:

1. Identifying the problem


2. Designing data requirements
3. Pre-processing data
4. Performing data analysis
5. Visualizing data
This data analytics project life cycle was developed by Vignesh Prajapati. It doesn’t include the sixth
phase, or what we have been referring to as the Act phase. However, it still covers a lot of the same
steps as the life cycles we have already described. It begins with identifying the problem, preparing and
processing data before analysis, and ends with data visualization.

For more information, refer to Understanding the data analytics project life cycle.

Big data analytics life cycle


Authors Thomas Erl, Wajid Khattak, and Paul Buhler proposed a big data analytics life cycle in their
book, Big Data Fundamentals: Concepts, Drivers & Techniques. Their life cycle suggests phases
divided into nine steps:

1. Business case evaluation


2. Data identification
3. Data acquisition and filtering
4. Data extraction
5. Data validation and cleaning
6. Data aggregation and representation
7. Data analysis
8. Data visualization
9. Utilization of analysis results
This life cycle appears to have three or four more steps than the previous life cycle models. But in
reality, they have just broken down what we have been referring to as Prepare and Process into smaller
steps. It emphasizes the individual tasks required for gathering, preparing, and cleaning data before
the analysis phase.

For more information, refer to Big Data Adoption and Planning Considerations.

Key takeaway
From our journey to the pyramids and data in ancient Egypt to now, the way we analyze data has
evolved (and continues to do so). The data analysis process is like real life architecture, there are
different ways to do things but the same core ideas still appear in each model of the process. Whether
you use the structure of this Google Data Analytics Certificate or one of the many other iterations you
have learned about, we are here to help guide you as you continue on your data journey.

 Data: A collection of facts


 Data analysis: The collection, transformation, and organization of data in order to
draw conclusions, make predictions, and drive informed decision-making
 Data analyst: Someone who collects, transforms, and organizes data in order to drive
informed decision-making
 Data analytics: The science of data
 Data-driven decision-making: Using facts to guide business strategy
 Data ecosystem: The various elements that interact with one another in order to
produce, manage, store, organize, analyse, and share data
 Data science: A field of study that uses raw data to create new ways of modeling and
understanding the unknown
 Dataset: A collection of data that can be manipulated or analysed as one unit

Week 2
Analytical skills: Qualities and characteristics associated with using facts to solve problems
Analytical thinking: The process of identifying and defining a problem, then solving it by
using data in an organized, step-by-step manner

Analytical skills are qualities and characteristics associated with solving problems using facts. 
5 Analytical skills –
-Curiosity - Curious people usually seek out new challenges and experiences. This leads to knowledge. 

-Context - The condition in which something exists or happens

-Technical mindset- A technical mindset involves the ability to break things down into smaller steps
or pieces and work with them in an orderly and logical way.
-Data design- Data design is how you organize information.
-Data strategy - Data strategy is the management of the people, processes, and tools used in data
analysis.

- The qualities and characteristics associated with solving problems using facts -
Analytical skills
- The analytical skill that involves breaking processes down into smaller steps and
working with them in an orderly, logical way - A technical mindset
- The analytical skill that involves how you organize information-Data design
- The analytical skill that has to do with how you group things into categories -
Understanding context
- The analytical skill that involves managing the processes and tools used in data
analysis - Data strategy

Analytical thinking involves identifying and defining a problem and then solving it by using data in an
organized, step-by-step manner.

5 key aspects to analytical thinking

- Visualization- visualization is the graphical representation of information.


- Strategy- Strategy also helps improve  the quality and usefulness of the data we collect. By
strategizing, we know all our data is  valuable and can help us accomplish our goals
- Problem-orientation- Data analysts use a problem- oriented approach in order to
identify, describe, and solve problems.  It's all about keeping the problem top of  mind
throughout the entire project
- Correlation- aspect of analytical thinking involves being able to identify a relationship
between two or more pieces of data
- Big-picture and detail-oriented thinking-

Gap analysis lets you examine and evaluate how a process works currently in order to get where you
want to be in the future.  Businesses conduct gap analysis  to do all kinds of things, such as improve
a product or become more efficient. The general approach to gap analysis is understanding where
you are now compared to where you want to be. 

Root cause: The reason why a problem occurs

Real-world data magic

Nonprofits are organizations dedicated to advancing  a social cause or advocating for a particular
effort,  such as food security,  education or the arts. 

That's an example of the journalist's work bringing about awareness, understanding, and


engagement. 
Week 3

Stages of the data life cycle

Plan - This actually happens well before starting an analysis project. During planning, a business
decides what kind of data it needs, how it will be managed throughout its life cycle, who will be
responsible for it, and the optimal outcomes.

Capture - This is where data is collected from a variety of different sources and brought into the
organization. With so much data being created everyday, the ways to collect it are truly endless. One
common method is getting data from outside resources. When you maintain a database of customer
information, ensuring data integrity, credibility, and privacy are all important concerns

Manage - Here we're talking about how we care for our data,  how and where it's stored, 
the tools used to keep it safe and secure, and the actions taken to make sure that it's maintained
properly.

Analyze - the data is used to solve problems, make great decisions, and support business goals.

Archive – Archiving means storing data in a place where it's still available, but may not be used again

Destroy - To destroy it, the company would use a secure data erasure software. If there were any paper
files, they would be shredded too. This is important for protecting a company's private information, as
well as private data about its customers.

OR

You learned that there are six stages to the data life cycle. Here is a recap:

1. Plan: Decide what kind of data is needed, how it will be managed, and who will be
responsible for it.
2. Capture: Collect or bring in data from a variety of different sources.
3. Manage: Care for and maintain the data. This includes determining how and where
it is stored and the tools used to do so.
4. Analyze: Use the data to solve problems, make decisions, and support business
goals.
5. Archive: Keep relevant data stored for long-term and future reference.
6. Destroy: Remove data from storage and delete any shared copies of the data.

Six phases of data analysis


Exploring data analyst tools
A formula is a set of instructions that performs a specific  calculation using the data in a spreadsheet. 
Formulas can do basic things like add, subtract, multiply and  divide, but they don't stop there. 
You can also use formulas to find the average of a number set.  Look up a particular value, return the
sum of a set of values that meets a particular rule, and so much more

A function is a preset command that automatically performs a specific  process or task using the data
in a spreadsheet.

A query language is a computer programming language that allows you to retrieve and  manipulate
data from a database.  You'll learn something called structured query language, 
more commonly known as SQL.

Query language: A computer programming language used to communicate with a database


Formula: A set of instructions used to perform a calculation using the data in a spreadsheet
Function: A preset command that automatically performs a specified process or task using
the data in a spreadsheet
Database: A collection of data stored in a computer system
Stakeholders: People who invest time and resources into a project and are interested in its
outcome

Spreadsheets Databases
Software applications Data stores - accessed using a query language (e.g.
Structure data in a row and column format Structure data using rules and relationships
Organize information in cells Organize information in complex collections
Provide access to a limited amount of data Provide access to huge amounts of data
Manual data entry Strict and consistent data entry
Generally one user at a time Multiple users
Controlled by the user Controlled by a database management system

Week 4
Attribute: A characteristic or quality of data used to label a column in a table
Observation: The attributes that describe a piece of data contained in a row of a table
Query: A request for data or information from a database
Remember, SQL can do lots of  the same things with data that spreadsheets can do. You can use it to
store, organize and analyse your data, among other things. 

SQL Guide: Getting started

track down text (referred to as strings)

Every programming language, including SQL, follows a unique set of guidelines known as
syntax. Syntax is the predetermined structure of a language that includes all required
words, symbols, and punctuation, as well as their proper placement.
The syntax of every SQL query is the same: 

 Use SELECT to choose the columns you want to return.


 Use FROM to choose the tables where the columns you want are located.
 Use WHERE to filter for certain information (condition).

Capitalization, indentation, and semicolons

WHERE conditions

SELECT all columns

Comments

Aliases

Week 5
- An issue is a topic or subject to investigate. 
- A question is designed to discover information and 
- A problem is an obstacle or complication that needs to be worked out.
- A business task is the question or problem data analysis answers for business. 
- Data-driven decision-making is when facts that have been discovered through data
analysis are used to guide business strategy.
-  
- Fairness means ensuring that your analysis doesn't create or reinforce bias.

The park's decision to distribute the survey near the roller coasters could be
considered an unfair practice because it may not accurately represent the
preferences of all park visitors. A data analyst could correct this unfair practice by
collecting data from a representative sample of park visitors, rather than just those
who happen to be standing in line for a roller coaster at the time. This could be
achieved through methods such as distributing the survey online or at various
locations throughout the park.
Part 2: Ask Questions to Make Data-Driven Decisions

Week 1

Data analysts typically work with six problem types

1. Making predictions 2. Categorizing things 3. Spotting something unusual 4. Identifying themes


5. Discovering connections 6. Finding patterns
A video, Common problem types, introduced the six problem types with an example for
each. The examples are summarized below for review.

Making predictions
A company that wants to know the best advertising method to bring in new customers is
an example of a problem requiring analysts to make predictions. Analysts with data on
location, type of media, and number of new customers acquired as a result of past ads
can't guarantee future results, but they can help predict the best placement of advertising
to reach the target audience.

Categorizing things
An example of a problem requiring analysts to categorize things is a company's goal to
improve customer satisfaction. Analysts might classify customer service calls based on
certain keywords or scores. This could help identify top-performing customer service
representatives or help correlate certain actions taken with higher customer satisfaction
scores.
Spotting something unusual
A company that sells smart watches that help people monitor their health would be
interested in designing their software to spot something unusual. Analysts who have
analyzed aggregated health data can help product developers determine the right
algorithms to spot and set off alarms when certain data doesn't trend normally.

Identifying themes
User experience (UX) designers might rely on analysts to analyze user interaction data.
Similar to problems that require analysts to categorize things, usability improvement
projects might require analysts to identify themes to help prioritize the right product
features for improvement. Themes are most often used to help researchers explore
certain aspects of data. In a user study, user beliefs, practices, and needs are examples of
themes.

By now you might be wondering if there is a difference between categorizing things and
identifying themes. The best way to think about it is: categorizing things involves
assigning items to categories; identifying themes takes those categories a step further by
grouping them into broader themes.

Discovering connections
A third-party logistics company working with another company to get shipments delivered
to customers on time is a problem requiring analysts to discover connections. By
analyzing the wait times at shipping hubs, analysts can determine the appropriate
schedule changes to increase the number of on-time deliveries.

Finding patterns
Minimizing downtime caused by machine failure is an example of a problem requiring
analysts to find patterns in data. For example, by analyzing maintenance data, they might
discover that most failures happen if regular maintenance is delayed by more than a 15-
day window.
 Specific: Questions are simple, significant, and focused on a single topic or a few
closely related ideas.
 Measurable: Questions can be quantified and assessed.
 Action-oriented: Questions encourage change.
 Relevant: Questions matter, are important, and have significance to the problem
you’re trying to solve. 
 Time-bound: Questions specify the time to be studied.

You can apply the SMART framework to all types of questions. The type of questions you
ask can help you explore deeper with your data. Consider the ways your questions help
you examine objectives, audience, time, security, and resources.

Some common topics for questions include: 

 Objectives
 Audience
 Time
 Resources
 Security
Think about how you can ask SMART questions about each of these topics.

You might also like