Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13

CHAPTER 1: FOUNDATION: DATA, DATA, EVERYWHERE, GOOGLE

A, Introduction of Data Analytics


 Data analysis: The collection, transformation and orrganization of data in order to
draw conclusions, make predictions and drive informed decision making
1, Data analytics in everyday life
2, Dimensions of data analytics
 A data analyst is an explorer, a detective and an artist all role into one
3, What is the Data ecosystem?
 Data ecosystem: The various elements that interact with one and another in order to
produce, manage, store, organize, analyze and share data
 Cloud: A place to keep data online, rather than acomputer software
- Eg: Shop’s retail storage database which is an ecosystem filled with customer’s
information, previous purchases, reviews. As a DA, you will use these information to
predict that these customer will buy in the future and make sure has the products in stock
when they’re needed.
 Diffirence between Data Science and Data Analysis
Data Science Data Analysis
Creating new ways of modeling and Find answers to existing questions by
understanding the unknown by using raw creating insight from data sources
data

4, How data informs better decisions


a, Data-driven decision making
- First: Figuring out the business need (problem need to be solved)
 EG: Company neesd to solve the problem of unhappy employees, low level of
engagement, satisfaction and retention
- Then: DA finds an analyzes data and uses it to uncover trends, patterns, and relationships
* Subject matter experts
 Who have ability to look at the results of data analysis and identify any
inconsistencies, make sense of grey areas and eventually validate choices being made
 Data alone will never be as powerful as data combined with human experience,
obsevation, intuition
 Subject matter experts can offer insights into the business problem, identify
inconsistencies in the analysis and validate the choices being made

1
B, All about analytics thinking
1, Key data analyst skill
- Analytical skills: Qualities and characteristics associated with solving problems using facts
 Curiosity
 Understanding context
 Having technical mindset: The ability to break things down into smaller steps or
pieces and work with them in an orderly and logical way
 Data design: How to organize information
 Data stratagy: The management of the people, processes and tools used in data
analysis
2, Thinking analytically
- Analytical thinking: identifying and defining a problem and then solving it by using data
in an organized, step-by-step manner.
- Key asspects to analytical thinking
 Visualization: The graphical representation of information (graphs, maps, designs…)
=> Visual can help data analysts understand and explain information more effectively
 Strategy – Help DA see what they want to archive with the data and how they get
there
 Problem orientation
 Correlation
 Big-picture and detail-oriented thinking
3, Core skills: Think in different ways
- What is the root of the problem?
 Ask “why ?’’ 5 times to reveal the root of the problem – 5 why technique
- Where are the gaps in our process?
 Gap analysis: A method for examining and evaluating how a process works currently
in order to get where you want to be in the future
 Compare where you are now and where you want to be
 Point the gaps
 Solutions to bridge them
- What did we not considered before?
4, Using data to drive successful outcomes

2
C, The wonderful world of data
1, Data life cycle

Plan Capture Manage Analyze Archive Destroy

a, Plan
- Business decides:
 What kind of data is needed
 How to manage those data
 Who will take responsibility
 The optimal outcome
b, Capture
c, Manage
d, Analyze
e, Archive
f, Destroy: To protect company’s private information as well as customer’s privacy

2, 6 phases of data analysis

Ask Prepare Process Analyze Share Act

a, Ask
- Define the problems to be solved and make sure that DA fully understand stakeholders
expectations
- Stakeholder: People who have inverted time and resources into a project and are interested
in the outcome
- Define the problem: Look at the current state and identify how its diffirent from the ideal
state
b, Prepare
c, Process
d, Analyze
e, Share – visualization
f, Act

3
3, Exploring DA’s tools
a, Spreedsheet
b, Querry language
 A computer programming language that allows you to retrieve and manipulate data
from database
 SQL
c, Data visualization
 Tableau, Looker
C, Setting up a data toolbox
1, Spread sheet
First name Last name Age Attribute: A characteristic or quality of data
used to label an column in a table
Tony Hill 36 Obsevation/ Record: All of the attribute for
Will Stein 53 something contained in a row of data table

2, SQL
- Querry: The way we use SQL to communicate with the database
- Basic Structure
SELECT
[choose the column you want] ~ #2
FROM
[from the appropriate table] ~ #1
WHERE
[a certain condition is met] ~ #3
[genre = ‘Action’] – only choose data from “Genre = Action”
=> This is a suggested order in which you write your SQL querry. Start big (database) and
go small (specefic condition) – primary key

3, Data visualization

4
E, Discovering data career possibilities
1, The power of data in business
 Issue: A topic or subject to investigate
 Question: Designed to discover information
 Problem: Am obstacle or complication that needs to be worked out
2, Fairness
 Company with dominate number of male and few female employees. This company
wants to see which employees are doing well, so they start gathering data on
employee performance and their own company culture
 The data shows that women aren’t succeeding as men in their company
=> Concusion: Should hire fewer female employee – Women are doing poorly
=> Unfair bcs they ignore the other systematic factors that contributed to this problem
=> Fairness: The company cultuer is preventing women from succeeding and the company
need to address their problem to boost performance

5
CHAPTER 2: ASK QUESTIONS TP MAKE DATA-DRIVEN DECICIONS
A, Effective questions
1, Data in action (6 phases)
* Case: “Anywhere Gaming Repair” is a small business service provider that come to you to
fix your broken video game systems or accessories. The owner wanted to expand his
business. He knew advertising as a proven way to get more customers, bet he wasn’t sure
when to start
=> Key aspect to choose what kind of ad – Target audience
* Data analyst Actions
- Ask: Define the problem
 Zoom out the picture to have objective understanding about the real problem in
context
 Understand the owner’s needs: Atract new customers
- Prepare: Collect data
 Need to have better understanding about target audience
 Collect data from different advertising methods to determine which was the most
popular one with the company’s target audience
- Process: Clean the collected data to eliminate errors, inaccuracies, inconsistencies
 Questions
o What data errors might get in?
o How to clean the data to make sure it is consistent?
- Analyze: To find out 2 things
 Who’s the most likely to own a video gaming system? (18~30-year-old)
 Where’re these people most likely to see an advertisement?
- Share:
 DA summerized the result using clear and compelling visuals of the analysis
 Help stakeholders understand the solution to the original problem
- Act:
 Company worked with a local podcast production agency to create 30 second ad
about their service
 Ad run on podcast for a month
2, Common problem types
- Problems: How to determine the best advertising method for a target audience

6
 Making prediction
 Categorizing things
 Spotting something unsual
 Identifying themes
 Discovering connections
 Finding patterns
3, SMART questions
- SPECIFIC questions
 Simple, significant, and focused on a single topic or a few closely related ideas
 “What percentage of kid archive the recommended 60 minutes of physical activity at
least 5 day a week?”
- MEASUREABLE questions
 Can be qualified and assessed
 “How many times was our video shared on social channels the first week it was
posted?”
- ACTION-ORIENTED questions
 Encourage change – questions you can act on
 “What design features will make our packaging easier to recycle?”
- RELEVENT questions
 Are matter, are important, and hve significance to the problems you’re trying to solve
- TIME-BOUND questions
 Specify the time to be studied
=> Questions should be open-ended. This is the best way to get reponses that will help you
accurately quantify or disqualify potential solutions to your specific problem
=> Questions should be fair. Questions don’t create or reinforce bias – “Fairness also means
crafting questions that make sense to everyone. Questions are clear and have straightforward
wording that anyone can easily understand”
=> Unfair questions include leading question (this product is great, isn’t it?), and questions
that makes assumption (what do you love most about our exhibition?)

B, Data-driven decicion
1, Use data to make better decisions
2, Qualititive data and Quantitative data
- Quantitative data: Specific and objective measures of numerical facts
7
 What?
 How many?
 How often ?
- Qualitative data: Subjective or explanatory measure of qualities and characteristics
 Why?
 Why the mumber are the way they are?
 Help us add context to a problem?
=> Patterns => Make change
3, Data representations
a, Report – Static collection of data given to stakeholders periodically
Pros Cons
- High level historical data - Continual maintenance
- Easy to design - Less visually appelling
- Pre-cleaned and sorted data - Static

b, Dashboard
Pros Cons
- Dynamic, automatic and interactive - Labor-intensive design
- More stakeholder access - Can be confusing
- Low maintenace - protentially uncleaned data

c, Pivot table
A data summerization tool that is used in data processing. Pivot tables are used to
count summerize, sort, re-organize, group, total or average data stored in a database
=> Difference between Dashboard and Reports
 Dashboard monitor live, incoming data from multiple datasets and organize the
information into one central location. Reports are static collections of data
4, Data and Metrics
- Metrics: Single, quantifiable type of data that can be used for measurement
 Chỉ số đo lường là những dữ liệu thu được nhờ vào việc đo lường, bằng cách thiết lập
các phép đo đạc, theo dõi, đánh giá trong ngữ cảnh.
 VD: ROI, CPO (cost per order)
- Metric in Marketing
Metric can be used to help caculate customer retention rate, or company’s ability to keep its
customers over time

8
- Metric goal: A measureable goal, set by a company and evaluated using metrics
5, Mathematical thinking
a, Size of data:
- Small data: Involves a small number of specific metrics over short periods of time
 Suitable for making day-to-day decision
- Big data: Involves larger and less specific dataset and focuses on change over a long
period of time
 Effective for analyzing substantial decision
=> Consider which size of dataset to choose the right tools
* Bed Occupancy Rate (BOR): [(Total number of impatient day for a period) * 100] /
(available beds * number of day in the p)

6, Speadsheet tasks
- Organize your data
 Pivot table – Sort and filter
- Caculate your data
 Formulars
 Functions

9
CHAPTER 3: PREPARE DATA FOR EXPLORATION
A, Data types and Structures
1, How data is collected
 Interviews
 Observations
 Forms
 Questionairs
 Surveys
 Cookies
o Small files stored on computers that contain information about users
o Help inform advertisers about user’s personal interests and habits based on
online surfing without personalyu identify users
2, Decide what data to collect
a, Data collection considerations for – “What cause increasing rush hour traffic?”
- How the data will be collected?
Use observations of traffic patterns to count the number of cars on city streetsw during times
- Choose data’s sources
 Observation => First patty data (Data collected by an individual or group using their
own resources)
o DA prefer this method as it shows exactly where the data came from
 Second party data: Data collected by a group directedly from its audience and then
sold
o Organization that led traffic pattern studies in the city
o Reliable as it come from source that has experience with traffic analysis
 Third party data: Data collected from outside sources who did not collect it directly
- Decide what data to use:
- How much data to collect
 Use sample: A part of a population that is representative of the population
- Select the right data type
- Determine the time frame
3, Discover Data format
Qualitative Quantitative
- Subject and explainatory measures of - Specific and objective measures of
qualities and characteristics numerical facts
- Can’t be count - Can be count

10
- Listed as a name, category or description - Expressed by numbers

Continuous Discreate
- Data that is measured and can have any - Data that is counted and has a limited
numberic value number of values
 Height of kids in third grade classes  Number of people who visit a
(53,2 inches,…) hospital daily (10,34,12,45…)
 Temperature  Room’s maximum capacity allowed
 Runtime markers in a video  Ticket shown in current month

=> Không đếm được => Đếm được

Norminal Ordinal
- A type of qualititatie data that is’nt - A type of qualitative data with a ser order
catagorized with a set of order or scale
- Dùng cho đặc điểm, thuộc tính, phân loại - Sắp xếp theo thứ bậc
đối tượng  Movie rating: 1, 2, 3, 4 (Stars)
- Không có thứ tự nhất định  Ranked choice voting sellection (1st,
 First time customer, returning 2nd, 3rd,…)
customer, regular customer  Income level (Low, High, Middle)
 Gender

Internal External
- Data lives inside a company’s own system - Data that lives outside of the company or
=> more reliable and easier to collect organization
 Wage of employee  National average wages for various
 Sales data by store location positions throughout your
organization

Structured Unstructured
- Data organized in a certain format, like - Data that isn’t organized in any easily
rows and columns identifyable manner
=> Store data in a structured way by:  Audio and video files
Spreadsheet, Relational database  Socical media posts
=> Make data easily searchable and more
analysis ready

Primary Secondary

11
4, Understanding structured data
- Data model: A model that is used for organizeing data elements and how they relate to one
another
- Data elements: Pieces of information, such as people’s names, account numbers, addresses
=> Data models help to keep data consistend and provide a map of how data is organized

5, Data type
- Data type: A specific kind of data attribute that tells what kind of value the data is
- Data type in Spreadsheet
 Number
 Text or string
 Boolean – Kiểu dữ liệu chỉ nhận 2 giá trị đúng/sai, true/false,…
- Wide data: Data in which every data subject has a single row with multiple columns to
hold the values of various attributes ob subject
Country name 2010 2011 2012
Brazil -- -- --
Mexico -- -- --

- Longdata: Data in which each ro is one time point per subject so each subject will have
data in multiple rows
Country name Year
Brazil 2010 --
Brazil 2011 --
VN 2010
VN 2011

5, Bias
- Data bias: A type of error that systematically skews results In a certain direction
 VD: Khi pvan, cho người trả lời ít thời gian quá => Gấp nên người trả lời k đưa ra
câu hỏi khách quan, đưa ra ý kiến chủ quan vào thời điểm được hỏi
 Khảo sát nhiều nữ hơn nam
- Sampling bias: A sample that isn’t representave of the population as a whole
- Unsampling bias: A sample isn’t representave of the population being measured
=> Để kiểm tra kỹ thì visualization nó xem có sự bất cân đối trong chọn mẫu không
6, Understand bias in data
12
- Observe bias: The tendency for different people to observe things differently
 Two doctor looks at the exact same umage of a brain scan. The image is
inconclusive, yet one doctor sees evidence of an abnormality in the brain. The other
doctor sees healthy brain
- Interpretation bias: The tendency to always interpret ambigious situations in a positive or
negative ways
 Bias do quan điểm và suy nghĩ của người đọc khác nhau – ví dụ như đối với 1 dòng
tin nhắn của sếp, 1 nhân viên nghĩ đó là câu nói bình thường, 1 người lại thấy đó là
sếp đang tức giận – do hoàn cảnh và suy nghĩ của mỗi người
- Confirmation bias: The tendency to reseach for or interpret information in a way that
confirms pre-existing beliefs
=> Chỉ quan tâm tới gut feeling – suy nghĩ của bản thân vì thế tập trung hết vào dấu hiệu đó,
bỏ qua những yếu tố khác
7, Identifying good data sources
ROCCC
Reliable – Original – Comprehensive – Current – Cited
8, Data ethic
- Aspects of data ethic
 Ownership – Quyền sở hữu
 Transaction transparency – Minh bạch trong giao dịch
 Consent – Sự đồng ý
 Currency
 Privacy
 Openness – free access

13

You might also like