Professional Documents
Culture Documents
3331 Data Driven
3331 Data Driven
The most reliable information that can be utilized in decision making includes
f. All of these
e. None of these
d. data
c. experience
a. instinct
Question 5/5
2 points
Data analytics programs must
f. All of these
e. None of these
True
False
Question 5/5
4 points
In order to have a high quality impact, we must focus data analysis on quantitative data only.
True
False
Question 5/5
5 points
Critics fault drug tests for destroying careers (false positives) and polygraphs for missing
potential criminals (false negatives), therefore such technology is of little value due to the errors.
True
False
Taking from the above definitions, a practical approach to defining data is that data is
numbers, characters, images, or other method of recording, in a form which can be
assessed to make a determination or decision about a specific action. Many believe that
data on its own has no meaning, only when interpreted does it take on meaning and
become information. By closely examining data we can find patterns to perceive
information, and then information can be used to enhance knowledge (Denis Howe,
2005).
What has been evident in disciplines such as education, public health, nutrition,
nursing, and management, is now becoming evident in early care and education.
Organizations are recognizing that the quality and quantity of data is needed to:
establish baselines;
identify effective actions;
create goals and targets;
monitor progress;
and evaluate impacts.
Before you can present and interpret information, there must be a process for gathering
and sorting data. once again, 1,099 is a number - and this number is, in fact, data. The
number 1,099 is a raw number - on its own it has no meaning.
Why the Soliloquy? Types of Data
In research circles there has been a long-term debate over the merits of Quantitative
versus Qualitative data. Key influences in this debate are based upon how researchers
were taught, compounded by differences among individuals and their preference in
relating to numbers or to words. “Qualitative and quantitative methods are not simply
different ways of doing the same thing. instead, they have different strengths and
logics and are often best used to address different questions and purposes (Maxwell,
2005).” That being said, there are other times when it makes sense to “have the best
of both worlds,” and to use a combination of some quantitative and some qualitative
data in order to credibly address a particular question or make well informed decisions.
Types of Data
In order to have a high quality impact, we must collect both types of data. There are
times when a quantitative approach will be better suited to the situation and vice versa.
“ Qualitative and quantitative methods are not simply different ways of doing the same
thing. instead, they have different strengths and logics and are often best used to
address different questions and purposes (Maxwell, 2005).” That being said, there are
other times when it makes sense to “have the best of both worlds,” and to use a
combination of some quantitative and some qualitative data in order to credibly address
a particular question and make well informed decisions.
Qualitative data
Data that is represented either in a verbal or narrative format is qualitative data. These
types of data are collected through focus groups, interviews, opened ended
questionnaire items, and other less structured situations. A simple way to look at
qualitative data is to think of qualitative data in the form of word.
Quantitative data
Quantitative data is data that is expressed in numerical terms, in which the numeric
values could be large or small. numerical values may correspond to a specific category
or label.
Data Strategies
There are a variety of strategies for quantitative and qualitative analyses, many of
which go well beyond the scope of an introductory course. Different strategies provide
data analysts with an organized approach to working with data; they enable the analyst
to create a “logical sequence” for the use of different procedures. Four examples of
strategies for quantitative analysis are described below:
Strategy: Visualizing the Data
Involves: Looking at data to identify or describe “what’s going on”? – creating an initial
starting point (baseline) for future analysis.
Reason(s): to begin understanding, and utilize the data.
Strategy: Estimation
Visualizing Data
Visualizing data is to literally create and then consider a visual display of data.
Technically, it is not analysis, nor is it a substitute for analysis. However, visualizing
data can be a useful starting point prior to the analysis of data.
Consider, for example, someone who is interested in understanding Migrant and
Seasonal Head Start from a national perspective. Specifically, someone might be
interested in the differences in funded enrollment across all MSHS grantees. Looking at
a random list of funded enrollment numbers (AED, 2006) gives us one perspective:
In random order, it is a bit difficult to get a handle on the data. By ranking the values in
order however (notes: this can be done either lowest to highest or highest to lowest)
we gain a more organized perspective of the data set:
Source: AED (2006) Permission granted to copy for non-commercial use
By creating a visual display of the data, we can begin to get a “feel” of how MSHS
grantees differed in terms of their funded enrollment in 2004 using the numbers above.
By creating and viewing a graphic display of the data, we get a “feel” of how MSHS
grantees’ funded enrollment varies across the region. in particular, the size differences
between the two largest grantees and the rest of the region stand out, as do the more
basic differences between “small” and “large” programs. Again, this visual display of
data is not a substitute for analysis, but it can often provide an effective foundation to
guide subsequent analyses.
Exploratory Analysis
Exploratory analysis entails looking at data when there is a low level of knowledge
about a particular indicator. It could also include the relationship between indicators
and/or what is the cause of a particular indicator.
Trend Analysis
The most general goal of trend analysis is to look at data over time. For example, to
discern whether a given indicator has increased or decreased over time, and if it has,
how quickly or slowly the increase or decrease has occurred.
Estimation
Estimation procedures may occur when working with either quantitative or qualitative
data. Estimation is one of many tools used to assist planning for the future. Estimation
works well for forecasting into the future. Estimation is the combination of information
from different data sources to project information not available in any one source by
itself.
From this perspective, data analysis can be viewed as a process that includes the
following key components:
Purpose
Questions
Data Collection
Data Analysis Procedures and Methods
Interpretation/identification of Findings
Writing, Reporting, and Dissemination; and Evaluation
Therefore, the simplest possible answer to the question, what is data analysis, is
probably: IT DEPENDS. Rather than chose to present ‘data analysis’ as either linear or
cyclical, this course presents both approaches. Hopefully, this choice will give you the
options and flexibility to make informed decisions, to utilize skills that you already have,
and to grow and develop the ability to use data and its analysis to support
program/agency purposes and goals.
Process Component #1. Purpose(s):
What Do We Do? & Why?
An effective data analysis process is based upon the nature and mission of the
organization as well as upon the skills of the team that is charged with the task of
collecting and using data for program purposes. Above all, an effective data analysis
process is functional – i.e., it is useful and adds value to organizational services and
individual practices. in some cases, the data analysis process can even be regarded as
fun.
Before effective data collection or analytical procedures can proceed, one or more
specific questions should be formulated. These questions serve as the basis for an
organized approach to making decisions: first, about what data to collect; and second,
about which types of analysis to use with the data.
Different types of questions require different types of data – which makes a difference
in collecting data. in any case, the selection of one or more specific questions allows the
process of data collection and analysis to proceed. Based on the nature and scope of
the questions (i.e., what is included in the question) programs can then create a plan to
manage and organize the next step in the process – data collection. Finally, by
formulating specific questions at the beginning of the process, programs are also in a
position to develop skills in evaluating their data analysis process in the future.
Data collection is a process in and of itself, in addition to being a part of the larger
whole. Data come in many different types and can be collected from a variety of
sources, including:
Observations
Questionnaires
Interviews
Documents
Tests
Others
In order to successfully manage the data collection process, programs need a plan that
addresses the following:
By creating a data collection plan, programs can proceed to the next step of the overall
process. in addition, once a particular round of data analysis is completed, a program
can then step back and reflect upon the contents of the data collection plan and identify
“lessons learned” to inform the next round.
Once data have been collected, the next step is to look at and to identify what is going
on – in other words, to analyze the data. Here, we refer to “data analysis” in a more
narrow sense: as a set of procedures or methods that can be applied to data that has
been collected in order to obtain one or more sets of results. A list of specific analytical
procedures and methods is provided below.
Because there are different types of data, the analysis of data can proceed on different
levels. The wording of the questions, in combination with the actual data collected,
have an influence on which procedure(s) can be used – and to what effects. The task of
matching one or more analytical procedures or methods with the collected data often
involves considerable thought and reflection. “Balancing the analytic alternatives calls
for the exercise of considerable judgment.” This is a rather elegant way of saying that
there are no simple answers on many occasions.
Process Component #5. Interpretation:
What Do The Results Mean?
Once a set of results has been obtained from the data, we can then turn to the
interpretation of the results. In some cases, the results of the data analysis speak for
themselves. For example, if a program’s teaching staff all have bachelor’s degrees, the
program can report that 100% of their teachers are credentialed. In this case, the
results and the interpretation of the data are (almost) identical.
However, there are many other cases in which the results of the data analysis and the
interpretation of those results are not identical. For example, if a program reports that
30% of the teachers have an AA degree, the interpretation of this result is not so clear-
cut. In this case, interpretation of the data involves two parts:
On a final note, it is important to state that two observers may legitimately make
different interpretations of the same set of data and its results. While there is no easy
answer to this issue, the best approach seems to be to anticipate that disagreements
can and do occur in the data analysis process. As programs develop their skills in data
analysis, they are encouraged to create a process that can accomplish dual goals:
Once data have been analyzed and an interpretation has been developed, programs
face the next tasks of deciding how to write, report, and/or disseminate the findings.
First, good writing is structured to provide information in a logical sequence. in turn,
good writers are strategic – they use a variety of strategies to structure their writing.
One strategy is to have the purpose for the written work to be clearly and explicitly laid
out. This helps to frame the presentation and development of the structure of the
writing. Second, good writing takes its audience into account. Therefore, good writers
often specify who their audience is in order to shape their writing. A final thought is to
look upon the writing/reporting tasks as opportunities to tell the story of the data you
have collected, analyzed, and interpreted. From this perspective, the writing is intended
to inform others of what you – the data analysis team – have just discovered.
The final step of the data analysis process is evaluation. Here, we do not refer to
conducting a program evaluation, but rather, an evaluation of the preceding steps of
the data analysis process. Here, program staff can review and reflect upon:
Purpose: was the data analysis process consistent with federal standards and
other, relevant regulations?
Questions: were the questions worded in a way that was consistent with federal
standards, other regulations, and organizational purposes? Were the questions
effective in guiding the collection and analysis of data?
Data Collection: How well did the data collection plan work? Was there enough
time allotted to obtain the necessary information? Were data sources used that
were not effective? Do additional data sources exist that were not utilized? Did
the team collect too little data or too much?
Data Analysis Procedures or Methods: Which procedures or methods were
chosen? Did these conform to the purposes and questions? Were there additional
procedures or methods that could be used in the future?
Interpretation/Identification of Findings: How well did the interpretation
process work? What information was used to provide a context for the
interpretation of the results? Was additional relevant information not utilized for
interpretation? Did team members disagree over the interpretation of the data or
was there consensus?
Writing, Reporting, and Dissemination. How well did the writing tell the
story of the data? Did the intended audience find the presentation of information
effective?
Correlation - a statistical relation between two or more variables such that systematic
changes in the value of one variable are accompanied by systematic changes in the
other; a statistic representing how closely two variable co-vary; it can vary from -1
(perfect negative correlation) though 0 (no correlation) to +1 (perfect positive
correlation) (Example: “What is the correlation between those two variables?”).
Data - a collection of facts from which conclusions may be drawn (Example: “Statistical
data”).
Difference - the number that remains after subtraction; the number that when added
to the subtrahend gives the minuend; a variation that deviates from the standard or
norm.
Median - the value below which 50% of the cases fall relating to or situated in or
extending toward the middle .
Mode - the most frequent value of a random variable.
Trend - a general direction in which something tends to move (Example: “The trend of
the stock market”).
Validity - the quality of having legal force or effectiveness; the quality of being
logically valid.
Variance - the second moment around the mean; the expected value of the square of
the deviations of a random variable from its mean value
the quality of being subject to variation.
KPIs (Key Performance Indicators) are the vital navigation instruments that help decision-makers see
how well an organisation, business unit, project or individual is performing in relation to their strategic
goals and objectives.
KPI https://bernardmarr.com/kpi-library/
Define the strategic goal, be clear about the audience, what questions are answered, how this indicator
will be used.
Name the indicator, data collection methodology, how will it be measured, performance threshold,
indicators are not targets, how often is data collected, responsible person, expiry or revision date
How much is this costing, how complete is this indicator, unintended consequences
Technologies: Hadoop
5 / 5 points
In order to have a high quality impact, we must focus data analysis on quantitative data only.
True
False
Question 5/5
2 points
Data that is represented either in a verbal or narrative format is qualitative data.
True
False
Question 5/5
3 points
Data analytics programs must
f. All of these
e. None of these
f. All of these
e. None of these
f. All of these
e. None of these
True
False
Question 5/5
7 points
Researchers often try to trick us with “story time,” as they drift effortlessly from data to
theory/assumptions.
True
False
Question 5/5
8 points
Statisticians are a curious lot: when given a vertical set of numbers, they like to look sideways.
True
False
Question 5/5
9 points
Business intelligence can be described as:
None of these
All of these
Question 5/5
10 points
Statistics provide decision makers with
f. All of these
e. None of these
d. an assurance of quality
c. evidence of relationships and connections
b. evidence to back up assertions
a. a focus on the big picture
Pre Test 2
5 / 5 points
Although regression is a good tool for prediction, there is no way to know how accurate the
prediction is.
True
False
Question 5/5
2 points
The most important result of visualizing data information with charts, graphs, etc., is that the
visuals help us to:
f. All of these
e. None of these
c. what is correlated?
a. what works?
Question 5/5
4 points
Strategies such as employing unified data architecture (e.g., Verizon) has found success in
creating
a. data strategies
e. None of these
f. All of these
Question 5/5
5 points
Tera mining and big data analytics are commonly used by:
An important characteristic of any set of data is the variation in the data. In some data sets, the data
values are concentrated closely near the mean; in other data sets, the data values are more widely
spread out from the mean. The most common measure of variation, or spread, is the standard
deviation. The standard deviation is a number that measures how far data values are from their mean.
provides a numerical measure of the overall amount of variation in a data set, and
can be used to determine whether a particular data value is close to or far from the mean.
The standard deviation provides a measure of the overall variation in a data set
The standard deviation is always positive or zero. The standard deviation is small when the data are all
concentrated close to the mean, exhibiting little variation or spread. The standard deviation is larger
when the data values are more spread out from the mean, exhibiting more variation.
Suppose that we are studying the amount of time customers wait in line at the checkout at supermarket
A and supermarket B. the average wait time at both supermarkets is five minutes. At supermarket A, the
standard deviation for the wait time is two minutes; at supermarket B the standard deviation for the
wait time is four minutes.
Because supermarket B has a higher standard deviation, we know that there is more variation in the
wait times at supermarket B. Overall, wait times at supermarket B are more spread out from the
average; wait times at supermarket A are more concentrated near the average.
The standard deviation can be used to determine whether a data value is close to or far from the
mean
Suppose that Rosa and Binh both shop at supermarket A. Rosa waits at the checkout counter for seven
minutes and Binh waits for one minute. At supermarket A, the mean waiting time is five minutes and the
standard deviation is two minutes. The standard deviation can be used to determine whether a data
value is close to or far from the mean.
Seven is two minutes longer than the average of five; two minutes is equal to one standard
deviation.
Rosa's wait time of seven minutes is two minutes longer than the average of five minutes.
Rosa's wait time of seven minutes is one standard deviation above the average of five minutes.
One is four minutes less than the average of five; four minutes is equal to two standard
deviations.
Binh's wait time of one minute is four minutes less than the average of five minutes.
Binh's wait time of one minute is two standard deviations below the average of five minutes.
A data value that is two standard deviations from the average is just on the borderline for what
many statisticians would consider to be far from the average. Considering data to be far from
the mean if it is more than two standard deviations away is more of an approximate "rule of
thumb" than a rigid rule. In general, the shape of the distribution of the data affects how much
of the data is further away than two standard deviations.
https://openstax.org/details/books/introductory-statistics
In his book, Super Crunchers, Ayers provides numerous examples of how the statistical procedure called
regression is being used on very large data sets (Tera mining) to estimate the various causal factors that
influence a single variable of interest. For those that are a bit nervous about math, relax. We are not
going to go into detail of doing a regression analysis. For this reading assignment, you will review
examples of how algorithms and the associated statistical analysis are being used to predict outcomes.
These organizations are using data to establish a competitive advantages. Many of these organizations
are also gaining additional capacity and financial savings to host these data using cloud services.
Zillow - is a leading real estate and rental marketplace that provides its consumers with key data about
their home and connects them with local professionals who can help if they want to sell their home. This
company crunches a data set of over 110 million home prices to help buyers and sellers price their
homes. They use regression techniques as the basis of their predictions. Zillow launched in 2006 and is
headquartered in Seattle, Washington. If you want to check on the estimated value of a home, go to
zillow.com and input the address.
Capital One - Do you remember the commercials for Capital One where the actors are asking, "What's in
your wallet?" The key message was that the contents (of your wallet or purse) represents financial
security, purchase power, and prosperity. Capital One is on a mission to help their customers succeed by
bringing ingenuity, simplicity, and humanity to banking by harnessing the power of information and
technology. One example of how they use algorithms to improve the success of their customers and also
improve their bottom line involves customer service calls. Here is what Capital One does with customer
calls. As soon as the customer calls, the service representative sees key information about this customer
including a list of products or services that have been predicted based on specific customer
characteristics. The representative can solve the customers issue (e.g., say they wanted a late fee
reversed) and then offer up additional products or services. It is actually working! According to Ayers,
Capital One now makes more than a million sales a year through their customer-service marketing
channel.
Jo-Ann Fabrics - A popular fabrics store, like many brick-and-mortar stores, has moved to offering their
products on the web through JoAnn.com. Because they have over a million unique visitors a month, they
are able to use regression techniques to test different promotions. One promotion was to buy two
machines and save 10 percent. At first the employees of Jo-Ann fabrics thought this was a silly idea.
Who would ever want to have two sewing machines. Crazy right? Well what they found was this was a
very successful marketing campaign generating the highest returns that year. Why? Well the customers
got their friends to join them in the purchase. The discount was making their customers into sales
agents! This randomized testing was responsible for increasing revenue per visitor by 209%. Isn't that
"sew" impressive!
Continental Airlines - Uses regression techniques to improve customer loyalty. If they know a customer
has had a bad experience because of a flight delay or cancelation, they contact that customer and try
different approaches to keep a customer for future flights. For example, in on group, they sent a letter
apologizing for the delay. In a second group, they also received a letter of apology and got a trial
membership in the Continental President's Club. In the third group, they did nothing. After 8 months,
they looked at the results. The groups that got letters spent 8% more on tickets in the next year. This
amounted to an extra $6 million in revenue. They also found that the group experiencing the President's
Club renewed membership after the trial period (thus increasing revenue from the club fee).
For the last example, we are going to be looking at Governments and the use of randomization.
Remember that you can make the most definitive causal conclusion ONLY when you use randomization
testing. Without randomization, validity and evidence of causality suffer. For instance, if you like to
evaluate the impact of a new incentive or policy, your data must include recipients/participants of the
new program as well as those who didn't (called control group). Without randomization, confounding
factors kick in (e.g., individual bias or starting differences between those groups), thus you can't tell
whether the effect or outcome difference was due to the new program or something else.
Progresa Program in Mexico - In 1997, Mexico began a randomized experiment involving 24,000
households in 506 villages. The mothers assigned to Progresa population were eligible for three years of
cash grants and supplements if their children had regular healthcare visits and attended school. The
results, compared to the non-Progresa population (control group), showed that boys attended school
10% more and girls 20% more. The children in Progresa were healthier too (12% lower incidence of
serious illness). The success of the program has resulted in expansion.
5 / 5 points
Tera mining and big data analytics are commonly used by:
b. weather forecasters
d. Internet based-businesses
True
False
Question 5/5
3 points
Organizations must have the latest technology to be successfully utilizing and leveraging data in
decision making.
True
False
Question 5/5
4 points
Preference databases are powerful ways to improve personal decision making.
True
False
Question 5/5
5 points
The most effective measures for managing effectiveness of long lines and freeway traffic aim to
take _______ out of the system.
a. averages
b. means
c. standard deviations
d. variability
e. None of these
f. All of these
View question 5 feedback
5 / 5 points
The father of modern statistics is
a. Thomas Bayes
b. Francis Gaulton
c. James Lind
d. Ronald Fisher
e. none of these
f. all of these
Question 5/5
7 points
Statisticians focus on average values, not variations.
True
False
Question 5/5
8 points
Utilizing visual representations of data has been likened to:
a. new perspectives
Question 5/5
9 points
It is sometimes possible for firms to make more accurate predictions about how a consumer will
behave than the consumer could ever make themselves.
True
False
Question 5/5
10 points
Individuals tasked with running hyper-controlled experiments designed to test the effectiveness
of creative alternatives such as webpage design or book titles are known as:
d. data analysts
a. super crunchers
b. graphic designers
c. usability experts
Pre test 3
5 / 5 points
Choose one that best describes the following: Large set of objects, which is of interest as a
whole.
Population
Sample
Data collection
Data analysis
Question 5/5
2 points
The distribution of data can be known by:
e. all of these
d. none of these
c. stacking data in line plots
b. plotting data
a. number lines
Question 0/5
3 points
An example of qualitative data would be:
c. sunny day
d. none of these
e. all of these
Question 5/5
4 points
Standard terms that everyone on the team should understand include
a. mean
b. standard deviation
c. benchmarking
d. counts
e. None of these
f. All of these
Question 5/5
5 points
Decisions should be made and communicated to the group, only after:
5 / 5 points
An example of qualitative data would be:
c. sunny day
d. none of these
e. all of these
Question 5/5
2 points
Match the following Key Performance Indicators (KPIs) with the question each attempts to
answer:
True
False
Question 5/5
4 points
Complex data gathering and analysis allows an organization to examine
a. their current strategies
c. future goals
d. decision making
e. None of these
f. All of these
Question 5/5
5 points
Using the data set of scores 9, 10, 8, 7, & 7. What is the range?
3.5
802
none of these
all of these
Question 5/5
6 points
Match the following terms and definitions:
Data based decision making is cyclical, therefore it has a clear starting and ending and should be
scheduled rigorously.
True
False
Question 5/5
8 points
Using the data set of scores 9, 10, 8, 7, & 7. What is the mean score?
8
8.2
none of these
all of these
Question 5/5
9 points
Using the data set of scores 9, 10, 8, 7, & 7. What is the median score?
10
none of these
all of these
Question 0/5
10 points
Variables can be described as
f. all of tjhese
e. none of these
c. distributions of measures
b. measures or scales
a. quantitative data
Pre test 4
5 / 5 points
Match the following Key Performance Indicators (KPIs) with the
question each attempts to answer:
a. Pie Charts
b. Column Charts
c. Bar Charts
e. none of these
f. all of these
Question 5/5
3 points
When multiple factors are contributing to a measure, it is impossible to
untangle them.
True
False
Question 5/5
4 points
Data and analytics has been shown to have the power to transform all kinds
of organizations, including units of the criminal justice system.
True
False
Question 5/5
5 points
Subjectivity must be relied upon in highly sensitive or risky decisions
True
False
A causal relation between two events exists if the occurrence of the first causes the
other. The first event is called the cause and the second event is called the effect. A
correlation between two variables does not imply causation. On the other hand, if
there is a causal relationship between two variables, they must be correlated."
Erin Palmer, business intelligence and data mining expert, posits are the
benefits of using data to make decisions:
5 / 5 points
Today’s managers face fewer complexities than ever before due to all the easy software that can
calculate all important measures for them.
True
False
Question 5/5
2 points
Making decisions based on intuition and instinct is just as valuable as making decisions based on
data if the decision maker has enough experience.
True
False
Question 5/5
3 points
How racial gaps in test scores should be interpreted is an extremely challenging and contentious
matter for all concerned.
True
False
Question 5/5
4 points
There are no online resources to help me gain a better understanding of excel.
True
False
Question 5/5
5 points
Match the following Key Performance Indicators (KPIs) with the question each attempts to
answer:
Match the following Key Performance Indicators (KPIs) with the question each attempts to
answer:
a. Pie Charts
b. Column Charts
c. Bar Charts
e. none of these
f. all of these
Question 5/5
8 points
When multiple factors are contributing to a measure, it is impossible to untangle them.
True
False
Question 5/5
9 points
Subjectivity must be relied upon in highly sensitive or risky decisions
True
False
Question 5/5
10 points
Data and analytics has been shown to have the power to transform all kinds of organizations,
including units of the criminal justice system.
True
False
Pre test 5
5 / 5 points
Evidence has shown the following actions to be ineffective, yet many organizations still follow
this line of action:
f. All of these
e. None of these
True
False
Question 5/5
3 points
The issue of false negatives has largely been ignored by the media.
True
False
Question 5/5
4 points
Critics fault drug tests for destroying careers (false positives) and polygraphs for missing
potential criminals (false negatives), therefore such technology is little value due to the errors.
True
False
Question 5/5
5 points
There are few reasons that a company or organization should measure and track employees
performance. Some employees just need to be fired.
True
False
For your Culminating Project, you will be creating a variety of pivot tables from sample
data, creating charts and graphs from the pivot tables, and then building a dashboard
using the charts. Please watch each video to learn the process that you will use in your
Culminating Project.
o Part 1: Building a Pivot Table (14:47) https://www.youtube.com/watch?
v=9NUjHBNWe9M
o Part 2: Using a Pivot Table to Create Charts (14:47)
https://www.youtube.com/watch?v=g530cnFfk8Y
o Part 3: Building a Dashboard (15:19) https://www.youtube.com/watch?
v=FyggutiBKvU
Read Trust the Evidence, Not Your Instincts and note how the lack of
evidence in the interview process has contributed to poor recruitment
results:
-------------------------------------------------------------------------------------
-------------------------------------------------------------------
Sampling Errors
When you analyze data, it is important to be aware of sampling errors and non-
sampling errors. The actual process of sampling causes sampling errors. For example,
the sample may not be large enough. Factors not related to the sampling process
cause non-sampling errors. A defective counting device can cause a non-sampling
error.
In reality, a sample will never be exactly representative of the population so there will
always be some sampling error. As a rule, the larger the sample, the smaller the
sampling error.
Critical Evaluation
We need to evaluate the statistical studies we read about critically and analyze them before
accepting the results of the studies. Common problems to be aware of include:
Problems with samples: A sample must be representative of the population. A sample that is
not representative of the population is biased. Biased samples that are not representative of the
population give results that are inaccurate and not valid.
Self-selected samples: Responses only by people who choose to respond, such as call-in
surveys, are often unreliable.
Sample size issues: Samples that are too small may be unreliable. Larger samples are better, if
possible. In some situations, having small samples is unavoidable and can still be used to draw
conclusions. Examples: crash testing cars or medical testing for rare conditions
Undue influence: collecting data or asking questions in a way that influences the response
Non-response or refusal of subject to participate: The collected responses may no longer be
representative of the population. Often, people with strong positive or negative opinions may
answer surveys, which can affect the results.
Causality: A relationship between two variables does not mean that one causes the other to
occur. They may be related (correlated) because of their relationship through a different
variable.
Self-funded or self-interest studies: A study performed by a person or organization in order to
support their claim. Is the study impartial? Read the study carefully to evaluate the work. Do not
automatically assume that the study is good, but do not automatically assume the study is bad
either. Evaluate it on its merits and the work done.
Misleading use of data: improperly displayed graphs, incomplete data, or lack of context
Confounding: When the effects of multiple factors on a response cannot be separated.
Confounding makes it difficult or impossible to draw valid conclusions about the effect of each
factor.
Sampling Errors
When you analyze data, it is important to be aware of sampling errors and non-
sampling errors. The actual process of sampling causes sampling errors. For example,
the sample may not be large enough. Factors not related to the sampling process
cause non-sampling errors. A defective counting device can cause a non-sampling
error.
In reality, a sample will never be exactly representative of the population so there will
always be some sampling error. As a rule, the larger the sample, the smaller the
sampling error.
Critical Evaluation
We need to evaluate the statistical studies we read about critically and analyze them before
accepting the results of the studies. Common problems to be aware of include:
Problems with samples: A sample must be representative of the population. A sample that is
not representative of the population is biased. Biased samples that are not representative of the
population give results that are inaccurate and not valid.
Self-selected samples: Responses only by people who choose to respond, such as call-in
surveys, are often unreliable.
Sample size issues: Samples that are too small may be unreliable. Larger samples are better, if
possible. In some situations, having small samples is unavoidable and can still be used to draw
conclusions. Examples: crash testing cars or medical testing for rare conditions
Undue influence: collecting data or asking questions in a way that influences the response
Non-response or refusal of subject to participate: The collected responses may no longer be
representative of the population. Often, people with strong positive or negative opinions may
answer surveys, which can affect the results.
Causality: A relationship between two variables does not mean that one causes the other to
occur. They may be related (correlated) because of their relationship through a different
variable.
Self-funded or self-interest studies: A study performed by a person or organization in order to
support their claim. Is the study impartial? Read the study carefully to evaluate the work. Do not
automatically assume that the study is good, but do not automatically assume the study is bad
either. Evaluate it on its merits and the work done.
Misleading use of data: improperly displayed graphs, incomplete data, or lack of context
Confounding: When the effects of multiple factors on a response cannot be separated.
Confounding makes it difficult or impossible to draw valid conclusions about the effect of each
factor.
5 / 5 points
A hypothesis-based approach to decision making and problem solving includes
f. none of these
g. all of these
Question 5/5
2 points
Reliance on indirect evidence and the sway of false negatives relative to false positive tend to
produce lots of false alarms
True
False
Question 5/5
3 points
The growing pile of studies on the human and financial costs of employee disengagements,
management distrust, poor group dynamics, faulty incentive schemes, and other preventable
damage suggests a need for evidence-based management movement.
True
False
Question 5/5
4 points
People analytics can be described as
True
False
Question 5/5
6 points
There are few reasons that a company or organization should measure and track employees
performance. Some employees just need to be fired.
True
False
Question 5/5
7 points
“Opinion equals myth, insight equals myth-busted” This quote from Google proves that big
companies do not want employees to have opinions on anything.
True
False
Question 5/5
8 points
The Analytic Value Chain can be described as a progressive movement from opinion, to data, to
metrics, analysis, insight, and finally - to action.
True
False
Question 5/5
9 points
Critics fault drug tests for destroying careers (false positives) and polygraphs for missing
potential criminals (false negatives), therefore such technology is little value due to the errors.
True
False
Question 5/5
10 points
Match the following Key Performance Indicators (KPIs) with the question each attempts to
answer:
Pre test 6
5 / 5 points
If a lot of people come to the same conclusions about something,
e. None of these
f. All of these
Question 5/5
2 points
The statistical testing framework demands a disbelief in miracles.
True
False
Question 5/5
3 points
Statistical thinking is central to the scientific method, which requires
e. None of these
f. All of these
Question 5/5
4 points
Planning out a research study must also include all the technological aspects of data management
below, EXCEPT:
b. Kaiser Fung
c. Tyrion Lannister
e. None of these
f. All of these
A statistician will make a decision about these claims. This process is called
"hypothesis testing." A hypothesis test involves collecting data from a sample and
evaluating the data. Then, the statistician makes a decision as to whether or not there is
sufficient evidence, based upon analyses of the data, to reject the null hypothesis.
https://openstax.org/details/books/introductory-statistics
True
False
Question 5/5
3 points
If a study is obviously about stigmatizing conditions, illegal activities, or life experiences that
subjects may not want others to know about, it's better to not pursue the research project as there
are no ways to avoid violating the subjects' right to privacy.
True
False
Question 5/5
4 points
What does the visionary Barnett say about two of our biggest fears?
a. irregularity in data
e. None of these
f. All of these
Question 5/5
6 points
Based on statistical testing, we should
f. All of these
e. None of these
e. None of these
f. All of these
Question 5/5
8 points
Retention requirements for data typically derive from the same sources that mandate data
sharing. All sources below may dictate how data should or should not be shared, EXCEPT:
Journals
Question 5/5
9 points
The statistical testing framework demands a disbelief in miracles.
True
False
Question 5/5
10 points
If a lot of people come to the same conclusions about something,