Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 41

5 / 5 points

The most reliable information that can be utilized in decision making includes

f. All of these

e. None of these

d. data

c. experience

b. prior system errors

a. instinct
Question 5/5
2 points
Data analytics programs must

f. All of these

e. None of these

d. create accountability for expected results

c. be standardized across all organizations

b. can start small, but must grow fast

a. start at the top


Question 5/5
3 points
Visualizing data is to literally create and then consider a visual display of data. Technically, it is
a substitute for data analysis.

True

False
Question 5/5
4 points
In order to have a high quality impact, we must focus data analysis on quantitative data only.

True

False
Question 5/5
5 points
Critics fault drug tests for destroying careers (false positives) and polygraphs for missing
potential criminals (false negatives), therefore such technology is of little value due to the errors.

True

False

The Merriam Webster online Dictionary defines data as the following:

 factual information (as measurements or statistics) used as a basis for


reasoning, discussion, or calculation. E.g., the data is plentiful and easily
available -- H. A. Gleason, Jr., e.g., comprehensive data on economic growth
have been published -- n. H. Jacoby.
 information output by a sensing device or organ that includes both useful and
irrelevant or redundant information and must be processed to be meaningful.
 information in numerical form that can be digitally transmitted or processed.

Taking from the above definitions, a practical approach to defining data is that data is
numbers, characters, images, or other method of recording, in a form which can be
assessed to make a determination or decision about a specific action. Many believe that
data on its own has no meaning, only when interpreted does it take on meaning and
become information. By closely examining data we can find patterns to perceive
information, and then information can be used to enhance knowledge (Denis Howe,
2005).

The number 1,099 is one example of data.

What has been evident in disciplines such as education, public health, nutrition,
nursing, and management, is now becoming evident in early care and education.
Organizations are recognizing that the quality and quantity of data is needed to:

 establish baselines;
 identify effective actions;
 create goals and targets;
 monitor progress;
 and evaluate impacts.

Before you can present and interpret information, there must be a process for gathering
and sorting data. once again, 1,099 is a number - and this number is, in fact, data. The
number 1,099 is a raw number - on its own it has no meaning.
Why the Soliloquy? Types of Data
In research circles there has been a long-term debate over the merits of Quantitative
versus Qualitative data. Key influences in this debate are based upon how researchers
were taught, compounded by differences among individuals and their preference in
relating to numbers or to words. “Qualitative and quantitative methods are not simply
different ways of doing the same thing. instead, they have different strengths and
logics and are often best used to address different questions and purposes (Maxwell,
2005).” That being said, there are other times when it makes sense to “have the best
of both worlds,” and to use a combination of some quantitative and some qualitative
data in order to credibly address a particular question or make well informed decisions.

Types of Data
In order to have a high quality impact, we must collect both types of data. There are
times when a quantitative approach will be better suited to the situation and vice versa.
“ Qualitative and quantitative methods are not simply different ways of doing the same
thing. instead, they have different strengths and logics and are often best used to
address different questions and purposes (Maxwell, 2005).” That being said, there are
other times when it makes sense to “have the best of both worlds,” and to use a
combination of some quantitative and some qualitative data in order to credibly address
a particular question and make well informed decisions.

Qualitative data

Data that is represented either in a verbal or narrative format is qualitative data. These
types of data are collected through focus groups, interviews, opened ended
questionnaire items, and other less structured situations. A simple way to look at
qualitative data is to think of qualitative data in the form of word.

Quantitative data

Quantitative data is data that is expressed in numerical terms, in which the numeric
values could be large or small. numerical values may correspond to a specific category
or label.

Data Strategies

There are a variety of strategies for quantitative and qualitative analyses, many of
which go well beyond the scope of an introductory course. Different strategies provide
data analysts with an organized approach to working with data; they enable the analyst
to create a “logical sequence” for the use of different procedures. Four examples of
strategies for quantitative analysis are described below:
Strategy: Visualizing the Data

Involves: Creating a visual “picture” or graphic display of the data.


Reason(s): a way to begin the analysis process; or as an aid to the
reporting/presentation of findings.

Strategy: Exploratory Analyses

Involves: Looking at data to identify or describe “what’s going on”? – creating an initial
starting point (baseline) for future analysis.
Reason(s): to begin understanding, and utilize the data.

Strategy: Trend Analysis

Involves: Looking at data collected at different periods of time.


Reason(s): to identify and interpret (and, potentially, estimate) change.

Strategy: Estimation

Involves: Using actual data values to predict a future value.


Reason(s): to plan and make informed decisions.

Next, we will look at each of these strategies in more detail.

Visualizing Data

Visualizing data is to literally create and then consider a visual display of data.
Technically, it is not analysis, nor is it a substitute for analysis. However, visualizing
data can be a useful starting point prior to the analysis of data.
Consider, for example, someone who is interested in understanding Migrant and
Seasonal Head Start from a national perspective. Specifically, someone might be
interested in the differences in funded enrollment across all MSHS grantees. Looking at
a random list of funded enrollment numbers (AED, 2006) gives us one perspective:

Source: AED (2006) Permission granted to copy for non-commercial use

In random order, it is a bit difficult to get a handle on the data. By ranking the values in
order however (notes: this can be done either lowest to highest or highest to lowest)
we gain a more organized perspective of the data set:
Source: AED (2006) Permission granted to copy for non-commercial use

Source: AED (2006) Permission granted to copy for non-commercial use

By creating a visual display of the data, we can begin to get a “feel” of how MSHS
grantees differed in terms of their funded enrollment in 2004 using the numbers above.
By creating and viewing a graphic display of the data, we get a “feel” of how MSHS
grantees’ funded enrollment varies across the region. in particular, the size differences
between the two largest grantees and the rest of the region stand out, as do the more
basic differences between “small” and “large” programs. Again, this visual display of
data is not a substitute for analysis, but it can often provide an effective foundation to
guide subsequent analyses.

Exploratory Analysis

Exploratory analysis entails looking at data when there is a low level of knowledge
about a particular indicator. It could also include the relationship between indicators
and/or what is the cause of a particular indicator.

Trend Analysis

The most general goal of trend analysis is to look at data over time. For example, to
discern whether a given indicator has increased or decreased over time, and if it has,
how quickly or slowly the increase or decrease has occurred.

Estimation

Estimation procedures may occur when working with either quantitative or qualitative
data. Estimation is one of many tools used to assist planning for the future. Estimation
works well for forecasting into the future. Estimation is the combination of information
from different data sources to project information not available in any one source by
itself. 

The “Problem” with Data Analysis


What does ‘data analysis’ mean? Does it refer to one method or many? A collection of
different procedures? is it a process? if so, what does that mean? More important, can
staff – without a background in math or statistics – learn to identify and use data
analysis in their work? (P.S. - the answer to the last question is Yes! – assuming a
minimum investment of time, effort, and practice). Data analysis can refer to a variety
of specific procedures and methods. However, before one can effectively use these
procedures and methods, it is important to see data analysis as part of a process. Data
analysis involves goals; relationships; decision making; and ideas, in addition to
working with the actual data itself. Simply put, data analysis includes ways of working
with information (data) to support the work, goals and plans of your program or
agency.

From this perspective, data analysis can be viewed as a process that includes the
following key components:

 Purpose
 Questions
 Data Collection
 Data Analysis Procedures and Methods
 Interpretation/identification of Findings
 Writing, Reporting, and Dissemination; and Evaluation 

Data Analysis as a Linear Process


A strictly linear approach to data analysis is to work through the components in order,
from beginning to end. A possible advantage of this approach is that it is structured and
organized, as the steps of the process are arranged in a fixed order. in addition, this
linear conceptualization of the process may make it easier to learn. A possible
disadvantage is that the step-by-step nature of the decision making may obscure or
limit the power of the analyses – in other words, the structured nature of the process
limits its effectiveness.

Source: AED (2006) Permission granted to copy for non-commercial use

Data Analysis as a Cycle


A cyclical approach to data analysis provides much more flexibility to the nature of the
decision making and also includes more and different kinds of decisions to be made. in
this approach, different components of the process can be worked on at different times
and in different sequences – as long as everything comes “together” at the end. A
possible advantage of this approach is that program staff are not “bound” to work on
each step in order. The potential exists for program staff to “learn by doing” and to
make improvements to the process before it is completed.

Source: AED (2006) Permission granted to copy for non-commercial use

Therefore, the simplest possible answer to the question, what is data analysis, is
probably: IT DEPENDS. Rather than chose to present ‘data analysis’ as either linear or
cyclical, this course presents both approaches. Hopefully, this choice will give you the
options and flexibility to make informed decisions, to utilize skills that you already have,
and to grow and develop the ability to use data and its analysis to support
program/agency purposes and goals.


Process Component #1. Purpose(s):
What Do We Do? & Why?

An effective data analysis process is based upon the nature and mission of the
organization as well as upon the skills of the team that is charged with the task of
collecting and using data for program purposes. Above all, an effective data analysis
process is functional – i.e., it is useful and adds value to organizational services and
individual practices. in some cases, the data analysis process can even be regarded as
fun.

Process Component #2. Question(s):


What Do We Want To Know?

Before effective data collection or analytical procedures can proceed, one or more
specific questions should be formulated. These questions serve as the basis for an
organized approach to making decisions: first, about what data to collect; and second,
about which types of analysis to use with the data.

Different types of questions require different types of data – which makes a difference
in collecting data. in any case, the selection of one or more specific questions allows the
process of data collection and analysis to proceed. Based on the nature and scope of
the questions (i.e., what is included in the question) programs can then create a plan to
manage and organize the next step in the process – data collection. Finally, by
formulating specific questions at the beginning of the process, programs are also in a
position to develop skills in evaluating their data analysis process in the future.

Process Component #3. Data Collection:


What Information Can Help Us Answer Our Question(s)?

Data collection is a process in and of itself, in addition to being a part of the larger
whole. Data come in many different types and can be collected from a variety of
sources, including:

 Observations
 Questionnaires
 Interviews
 Documents
 Tests
 Others 

The value of carefully selecting the questions to be examined is therefore of major


importance: the way that the question is worded is the foundation for an effective data
collection plan. It is of utmost importance to develop a specific planning process for
data collection (no matter how brief) in order to avoid the common pitfalls of the
collection process, which include having:

 Too little data to answer the question;


 More data than is necessary to answer the question; and/or
 Data that is not relevant to answering the question.

In order to successfully manage the data collection process, programs need a plan that
addresses the following:

 What types of data are most appropriate to answer the questions?


 How much data are necessary?
 Who will do the collection?
 When and Where will the data be collected?
 How will the data be compiled and later stored?

By creating a data collection plan, programs can proceed to the next step of the overall
process. in addition, once a particular round of data analysis is completed, a program
can then step back and reflect upon the contents of the data collection plan and identify
“lessons learned” to inform the next round.

Process Component #4. Data Analysis:


What Are Our Results?

Once data have been collected, the next step is to look at and to identify what is going
on – in other words, to analyze the data. Here, we refer to “data analysis” in a more
narrow sense: as a set of procedures or methods that can be applied to data that has
been collected in order to obtain one or more sets of results. A list of specific analytical
procedures and methods is provided below.

Because there are different types of data, the analysis of data can proceed on different
levels. The wording of the questions, in combination with the actual data collected,
have an influence on which procedure(s) can be used – and to what effects. The task of
matching one or more analytical procedures or methods with the collected data often
involves considerable thought and reflection. “Balancing the analytic alternatives calls
for the exercise of considerable judgment.” This is a rather elegant way of saying that
there are no simple answers on many occasions.
Process Component #5. Interpretation:
What Do The Results Mean?

Once a set of results has been obtained from the data, we can then turn to the
interpretation of the results. In some cases, the results of the data analysis speak for
themselves. For example, if a program’s teaching staff all have bachelor’s degrees, the
program can report that 100% of their teachers are credentialed. In this case, the
results and the interpretation of the data are (almost) identical.

However, there are many other cases in which the results of the data analysis and the
interpretation of those results are not identical. For example, if a program reports that
30% of the teachers have an AA degree, the interpretation of this result is not so clear-
cut. In this case, interpretation of the data involves two parts:

 presenting the result(s) of the analysis; and


 providing additional information that will allow others to understand the meaning
of the results.

In other words, we are placing the results in a context of relevant information.


Obviously, interpretation involves both decision making and the use of good judgment!
The term “results” is used to refer to any information obtained from analysis
procedures. The term “findings” is used to refer to results which will be agreed upon by
the data analysis team as best representing their work. in other words, the team may
generate a large number of results, but a smaller number of findings will be written up,
reported, and disseminated.

On a final note, it is important to state that two observers may legitimately make
different interpretations of the same set of data and its results. While there is no easy
answer to this issue, the best approach seems to be to anticipate that disagreements
can and do occur in the data analysis process. As programs develop their skills in data
analysis, they are encouraged to create a process that can accomplish dual goals:

 to obtain a variety of perspectives on how to interpret a given set of results; and


 to develop procedures or methods to resolve disputes or disagreements over
interpretation. 

Process Component #6. Writing, Reporting &


Dissemination:
What Do We Have To Say? How Do We Tell The Story of Our
Data?

Once data have been analyzed and an interpretation has been developed, programs
face the next tasks of deciding how to write, report, and/or disseminate the findings.
First, good writing is structured to provide information in a logical sequence. in turn,
good writers are strategic – they use a variety of strategies to structure their writing.
One strategy is to have the purpose for the written work to be clearly and explicitly laid
out. This helps to frame the presentation and development of the structure of the
writing. Second, good writing takes its audience into account. Therefore, good writers
often specify who their audience is in order to shape their writing. A final thought is to
look upon the writing/reporting tasks as opportunities to tell the story of the data you
have collected, analyzed, and interpreted. From this perspective, the writing is intended
to inform others of what you – the data analysis team – have just discovered.

Process Component #7. Evaluation:


What Did We Learn About Our Data Analysis Process?

The final step of the data analysis process is evaluation. Here, we do not refer to
conducting a program evaluation, but rather, an evaluation of the preceding steps of
the data analysis process. Here, program staff can review and reflect upon:

 Purpose: was the data analysis process consistent with federal standards and
other, relevant regulations?
 Questions: were the questions worded in a way that was consistent with federal
standards, other regulations, and organizational purposes? Were the questions
effective in guiding the collection and analysis of data?
 Data Collection: How well did the data collection plan work? Was there enough
time allotted to obtain the necessary information? Were data sources used that
were not effective? Do additional data sources exist that were not utilized? Did
the team collect too little data or too much?
 Data Analysis Procedures or Methods: Which procedures or methods were
chosen? Did these conform to the purposes and questions? Were there additional
procedures or methods that could be used in the future?
 Interpretation/Identification of Findings: How well did the interpretation
process work? What information was used to provide a context for the
interpretation of the results? Was additional relevant information not utilized for
interpretation? Did team members disagree over the interpretation of the data or
was there consensus?
 Writing, Reporting, and Dissemination. How well did the writing tell the
story of the data? Did the intended audience find the presentation of information
effective?

In sum, data analysis is a process: a series of connected activities designed to obtain


meaningful information from data that have been collected. The process can be
conceptualized in different ways. However, either approach can be effective if each of
the individual components of the process are included. in turn, each part of the process
is based on decision making. Each stage of the process includes decision making; which
decisions are made will then influence the remaining stages of the process.

Important Terms and Concepts


Analysis - an investigation of the component parts of a whole and their relations in
making up the whole; the abstract separation of a whole into its constituent parts in
order to study the parts and their relations.

Code - a category deemed important by the an individual(s) conducting the analysis. it


is a method used to label important pieces of information that are contained in the
narrative.

Correlation - a statistical relation between two or more variables such that systematic
changes in the value of one variable are accompanied by systematic changes in the
other; a statistic representing how closely two variable co-vary; it can vary from -1
(perfect negative correlation) though 0 (no correlation) to +1 (perfect positive
correlation) (Example: “What is the correlation between those two variables?”).

Data - a collection of facts from which conclusions may be drawn (Example: “Statistical
data”).

Denominator - the divisor of a fraction.

Difference - the number that remains after subtraction; the number that when added
to the subtrahend gives the minuend; a variation that deviates from the standard or
norm.

Estimation - a judgment of the qualities of something or somebody (Example: “in my


estimation the boy is innocent”)
an approximate calculation of quantity or degree or worth. 

Interpretation  - an explanation of something that is not immediately obvious


(Example: “The edict was subject to many interpretations”); an explanation that results
from interpreting something (Example: “The report included his interpretation of the
forensic evidence”).

Interview - the questioning of a person (or a conversation in which information is


elicited); often conducted by journalists (Example: “My interviews with teenagers
revealed a weakening of religious bonds”) discuss formally with (somebody) for the
purpose of an evaluation (Example: “We interviewed the job candidates”).

Narrative - a message that tells the particulars of an act or occurrence or course of


events; presented in writing or drama or cinema or as a radio or television program
(Example: “His narrative was interesting”) consisting of or characterized by the telling
of a story (Example: “narrative poetry”).

Numerator - the dividend of a fraction.

Mean - an average of n numbers computed by adding some function of the numbers


and dividing by some function of n .

Median - the value below which 50% of the cases fall relating to or situated in or
extending toward the middle .
Mode - the most frequent value of a random variable.

Percentage - a proportion multiplied by 100 Qualitative involving distinctions based on


qualities (Example: “Qualitative change”) relating to or involving comparisons based on
qualities.

Quantitative - expressible as a quantity of relating to or susceptible of measurement


(Example: “Export wheat without quantitative limitations”)
relating to the measurement of quantity (Example: “Quantitative studies”).

Questionnaire - a form containing a set of questions; submitted to people to gain


statistical information

Reliability - the trait of being dependable or reliable.

Standard Deviation - the square root of the variance.

Statistics - a branch of applied mathematics concerned with the collection and


interpretation of quantitative data and the use of probability theory to estimate
population parameters.

Statistics (Descriptive) - a branch of statistics that denotes any of the many


techniques used to summarize a set of data. in a sense, we are using the data on
members of set to describe the set.

Statistics (Inferential) - comprises the use of statistics to make inferences


concerning some unknown aspect (usually a parameter) of a population.

Sum - the whole amount; a quantity obtained by addition.

Survey - short descriptive summary (of events); look over in a comprehensively,


inspect (Example: “He surveyed his new classmates”) make a survey of; for statistical
purposes.

Themes - a unifying idea that is a recurrent element within an interview or a narrative


leading to a set of patterns. There is no agreed-upon methodology in narrative analysis
to derive themes from patterns. One practice, however, is to use an analysis team, with
“themes” being whatever sets of “like” information the team reaches consensus on,
based on discussion of transcripts and analysis of patterns; the subject matter of a
conversation or discussion.

Trend - a general direction in which something tends to move (Example: “The trend of
the stock market”).

Validity - the quality of having legal force or effectiveness; the quality of being
logically valid. 
Variance - the second moment around the mean; the expected value of the square of
the deviations of a random variable from its mean value
the quality of being subject to variation.

KPIs (Key Performance Indicators) are the vital navigation instruments that help decision-makers see
how well an organisation, business unit, project or individual is performing in relation to their strategic
goals and objectives.

KPI https://bernardmarr.com/kpi-library/

How to develop great kpis.

Define the strategic goal, be clear about the audience, what questions are answered, how this indicator
will be used.

Name the indicator, data collection methodology, how will it be measured, performance threshold,
indicators are not targets, how often is data collected, responsible person, expiry or revision date

How much is this costing, how complete is this indicator, unintended consequences

Big data characteristics: volume, velocity, Variety.

Technologies: Hadoop

Post Test 1 100%

5 / 5 points
In order to have a high quality impact, we must focus data analysis on quantitative data only.

True

False
Question 5/5
2 points
Data that is represented either in a verbal or narrative format is qualitative data.

True

False
Question 5/5
3 points
Data analytics programs must

f. All of these

e. None of these

d. create accountability for expected results


c. be standardized across all organizations

b. can start small, but must grow fast

a. start at the top


Question 5/5
4 points
If a lot of people come to the same conclusions about something,

f. All of these

e. None of these

d. the data has been coded incorrectly

c. there must be a certain logic behind it

b. there must be no logic in the thought process

a. they are all wrong unless the data proves it


Question 5/5
5 points
Two groups of statistical modelers who have made lasting, positive impacts on our lives include:

f. All of these

e. None of these

d. vexillologists and entomologists

c. geologists and meteorologists

b. meteorologists and epidemiologists

a. epidemiologists and credit modelers


Question 5/5
6 points
Visualizing data is to literally create and then consider a visual display of data. Technically, it is
a substitute for data analysis.

True

False
Question 5/5
7 points
Researchers often try to trick us with “story time,” as they drift effortlessly from data to
theory/assumptions.

True

False
Question 5/5
8 points
Statisticians are a curious lot: when given a vertical set of numbers, they like to look sideways.

True

False
Question 5/5
9 points
Business intelligence can be described as:

the method of collecting, analyzing, and extracting information.

how organizations use data to make fact-based decisions.

the process of gathering information about competitors.

None of these

the correct method of transforming raw data into useful information.

All of these
Question 5/5
10 points
Statistics provide decision makers with

f. All of these
e. None of these
d. an assurance of quality
c. evidence of relationships and connections
b. evidence to back up assertions
a. a focus on the big picture

Pre Test 2
5 / 5 points
Although regression is a good tool for prediction, there is no way to know how accurate the
prediction is.

True

False
Question 5/5
2 points
The most important result of visualizing data information with charts, graphs, etc., is that the
visuals help us to:

e. All of these are important

d. None of these is important

c. focus only on the information that's important

b. design information so it makes more sense

a. see the patterns and connections that matter


Question 0/5
3 points
The bottom-line question that regression studies seek to answer is centered on:

f. All of these

e. None of these

d. what is dependent and independent?

c. what is correlated?

b. what factors are involved?

a. what works?
Question 5/5
4 points
Strategies such as employing unified data architecture (e.g., Verizon) has found success in
creating

a. data strategies

b. value for customers


c. pure data disruption

d. platforms for integration

e. None of these

f. All of these
Question 5/5
5 points
Tera mining and big data analytics are commonly used by:

a. law enforcement agencies


b. weather forecasters
c. companies selecting job applicants
d. Internet based-businesses
e. all of the above
f. none of the above

Variation in the Data

An important characteristic of any set of data is the variation in the data. In some data sets, the data
values are concentrated closely near the mean; in other data sets, the data values are more widely
spread out from the mean. The most common measure of variation, or spread, is the standard
deviation. The standard deviation is a number that measures how far data values are from their mean.

The standard deviation

 provides a numerical measure of the overall amount of variation in a data set, and

 can be used to determine whether a particular data value is close to or far from the mean.

The standard deviation provides a measure of the overall variation in a data set

The standard deviation is always positive or zero. The standard deviation is small when the data are all
concentrated close to the mean, exhibiting little variation or spread. The standard deviation is larger
when the data values are more spread out from the mean, exhibiting more variation.

Suppose that we are studying the amount of time customers wait in line at the checkout at supermarket
A and supermarket B. the average wait time at both supermarkets is five minutes. At supermarket A, the
standard deviation for the wait time is two minutes; at supermarket B the standard deviation for the
wait time is four minutes.
Because supermarket B has a higher standard deviation, we know that there is more variation in the
wait times at supermarket B. Overall, wait times at supermarket B are more spread out from the
average; wait times at supermarket A are more concentrated near the average. 

The standard deviation can be used to determine whether a data value is close to or far from the
mean

Suppose that Rosa and Binh both shop at supermarket A. Rosa waits at the checkout counter for seven
minutes and Binh waits for one minute. At supermarket A, the mean waiting time is five minutes and the
standard deviation is two minutes. The standard deviation can be used to determine whether a data
value is close to or far from the mean.

Rosa waits for seven minutes:

 Seven is two minutes longer than the average of five; two minutes is equal to one standard
deviation.

 Rosa's wait time of seven minutes is two minutes longer than the average of five minutes.

 Rosa's wait time of seven minutes is one standard deviation above the average of five minutes.

Binh waits for one minute.

 One is four minutes less than the average of five; four minutes is equal to two standard
deviations.

 Binh's wait time of one minute is four minutes less than the average of five minutes.

 Binh's wait time of one minute is two standard deviations below the average of five minutes.

 A data value that is two standard deviations from the average is just on the borderline for what
many statisticians would consider to be far from the average. Considering data to be far from
the mean if it is more than two standard deviations away is more of an approximate "rule of
thumb" than a rigid rule. In general, the shape of the distribution of the data affects how much
of the data is further away than two standard deviations.

https://openstax.org/details/books/introductory-statistics

Using Algorithms for Prediction

In his book, Super Crunchers, Ayers provides numerous examples of how the statistical procedure called
regression is being used on very large data sets (Tera mining) to estimate the various causal factors that
influence a single variable of interest. For those that are a bit nervous about math, relax. We are not
going to go into detail of doing a regression analysis. For this reading assignment, you will review
examples of how algorithms and the associated statistical analysis are being used to predict outcomes.
These organizations are using data to establish a competitive advantages. Many of these organizations
are also gaining additional capacity and financial savings to host these data using cloud services.

Zillow - is a leading real estate and rental marketplace that provides its consumers with key data about
their home and connects them with local professionals who can help if they want to sell their home. This
company crunches a data set of over 110 million home prices to help buyers and sellers price their
homes. They use regression techniques as the basis of their predictions. Zillow launched in 2006 and is
headquartered in Seattle, Washington. If you want to check on the estimated value of a home, go to
zillow.com and input the address.

Capital One - Do you remember the commercials for Capital One where the actors are asking, "What's in
your wallet?" The key message was that the contents (of your wallet or purse) represents financial
security, purchase power, and prosperity. Capital One is on a mission to help their customers succeed by
bringing ingenuity, simplicity, and humanity to banking by harnessing the power of information and
technology. One example of how they use algorithms to improve the success of their customers and also
improve their bottom line involves customer service calls. Here is what Capital One does with customer
calls. As soon as the customer calls, the service representative sees key information about this customer
including a list of products or services that have been predicted based on specific customer
characteristics. The representative can solve the customers issue (e.g., say they wanted a late fee
reversed) and then offer up additional products or services. It is actually working! According to Ayers,
Capital One now makes more than a million sales a year through their customer-service marketing
channel.

Jo-Ann Fabrics - A popular fabrics store, like many brick-and-mortar stores, has moved to offering their
products on the web through JoAnn.com. Because they have over a million unique visitors a month, they
are able to use regression techniques to test different promotions. One promotion was to buy two
machines and save 10 percent. At first the employees of Jo-Ann fabrics thought this was a silly idea.
Who would ever want to have two sewing machines. Crazy right? Well what they found was this was a
very successful marketing campaign generating the highest returns that year. Why? Well the customers
got their friends to join them in the purchase. The discount was making their customers into sales
agents! This randomized testing was responsible for increasing revenue per visitor by 209%. Isn't that
"sew" impressive!

Continental Airlines - Uses regression techniques to improve customer loyalty. If they know a customer
has had a bad experience because of a flight delay or cancelation, they contact that customer and try
different approaches to keep a customer for future flights. For example, in on group, they sent a letter
apologizing for the delay. In a second group, they also received a letter of apology and got a trial
membership in the Continental President's Club. In the third group, they did nothing. After 8 months,
they looked at the results. The groups that got letters spent 8% more on tickets in the next year. This
amounted to an extra $6 million in revenue. They also found that the group experiencing the President's
Club renewed membership after the trial period (thus increasing revenue from the club fee).

For the last example, we are going to be looking at Governments and the use of randomization.
Remember that you can make the most definitive causal conclusion ONLY when you use randomization
testing. Without randomization, validity and evidence of causality suffer. For instance, if you like to
evaluate the impact of a new incentive or policy, your data must include recipients/participants of the
new program as well as those who didn't (called control group). Without randomization, confounding
factors kick in (e.g., individual bias or starting differences between those groups), thus you can't tell
whether the effect or outcome difference was due to the new program or something else.

Progresa Program in Mexico - In 1997, Mexico began a randomized experiment involving 24,000
households in 506 villages. The mothers assigned to Progresa population were eligible for three years of
cash grants and supplements if their children had regular healthcare visits and attended school. The
results, compared to the non-Progresa population (control group), showed that boys attended school
10% more and girls 20% more. The children in Progresa were healthier too (12% lower incidence of
serious illness). The success of the program has resulted in expansion.

Post test 2 100%

5 / 5 points
Tera mining and big data analytics are commonly used by:

a. law enforcement agencies

b. weather forecasters

c. companies selecting job applicants

d. Internet based-businesses

e. all of the above

f. none of the above


Question 5/5
2 points
Algorithms are computer codes that can easily be understood and controlled.

True

False
Question 5/5
3 points
Organizations must have the latest technology to be successfully utilizing and leveraging data in
decision making.

True

False
Question 5/5
4 points
Preference databases are powerful ways to improve personal decision making.

True

False
Question 5/5
5 points
The most effective measures for managing effectiveness of long lines and freeway traffic aim to
take _______ out of the system.

a. averages
b. means
c. standard deviations
d. variability
e. None of these
f. All of these
View question 5 feedback
5 / 5 points
The father of modern statistics is

a. Thomas Bayes

b. Francis Gaulton

c. James Lind

d. Ronald Fisher

e. none of these

f. all of these
Question 5/5
7 points
Statisticians focus on average values, not variations.

True

False
Question 5/5
8 points
Utilizing visual representations of data has been likened to:

e. all of these are likely

d. none of these are likely

c. a new kind of language


b. a paradigm shift

a. new perspectives
Question 5/5
9 points
It is sometimes possible for firms to make more accurate predictions about how a consumer will
behave than the consumer could ever make themselves.

True

False
Question 5/5
10 points
Individuals tasked with running hyper-controlled experiments designed to test the effectiveness
of creative alternatives such as webpage design or book titles are known as:

d. data analysts
a. super crunchers
b. graphic designers
c. usability experts

Pre test 3

5 / 5 points
Choose one that best describes the following: Large set of objects, which is of interest as a
whole. 

Population

Sample

Data collection

Data analysis
Question 5/5
2 points
The distribution of data can be known by:

e. all of these

d. none of these
c. stacking data in line plots

b. plotting data

a. number lines
Question 0/5
3 points
An example of qualitative data would be:

a. 101 degree F day

b. 1.56 inches of rain this month

c. sunny day

d. none of these

e. all of these
Question 5/5
4 points
Standard terms that everyone on the team should understand include

a. mean

b. standard deviation

c. benchmarking

d. counts

e. None of these

f. All of these
Question 5/5
5 points
Decisions should be made and communicated to the group, only after:

figuring out how you will use the results

collecting, analyzing, and interpreting data


identifying the problem, developing a hypothesis, and
collecting, analyzing, and interpreting data
None of these
All of these

Post test 3 90%

5 / 5 points
An example of qualitative data would be:

a. 101 degree F day

b. 1.56 inches of rain this month

c. sunny day

d. none of these

e. all of these
Question 5/5
2 points
Match the following Key Performance Indicators (KPIs) with the question each attempts to
answer:

f business 1. OER - Operating Expense Ratio


n of bottom-line results 2. CAPEX to sales ratio
eration for each sales dollar 3. Net profit
ent of operating expense 4. Gross profit margin 
nt in future compared to competitors 5. Revenue Growth Rate  
estments to generate profits 6. ROE - Return on Equity  

Population is subset trying to represent the whole.

True

False
Question 5/5
4 points
Complex data gathering and analysis allows an organization to examine
a. their current strategies

b. their past performance

c. future goals

d. decision making

e. None of these

f. All of these
Question 5/5
5 points
Using the data set of scores 9, 10, 8, 7, & 7. What is the range? 

3.5

802

none of these

all of these
Question 5/5
6 points
Match the following terms and definitions:

of objects, which is of interest as a whole. 1. Population


ing to represent the whole. 2. Sample

Data based decision making is cyclical, therefore it has a clear starting and ending and should be
scheduled rigorously.

True

False
Question 5/5
8 points
Using the data set of scores 9, 10, 8, 7, & 7. What is the mean score?
8

8.2

none of these

all of these
Question 5/5
9 points
Using the data set of scores 9, 10, 8, 7, & 7. What is the median score?

10

none of these

all of these
Question 0/5
10 points
Variables can be described as

f. all of tjhese

e. none of these

d properties of an object or event that take on different values

c. distributions of measures
b. measures or scales
a. quantitative data

Pre test 4

5 / 5 points
Match the following Key Performance Indicators (KPIs) with the
question each attempts to answer:

ess of online and social media spheres 1. Market Growth Rate


customers into actual customers 2. Brand Equity
ess of internet strategy 3. Conversion Rate   
en by brand 4. Page views/Bounce
in markets with future potential 5. Klout Score

Visual representation of data can be accomplished by creating:

a. Pie Charts

b. Column Charts

c. Bar Charts

d. Line and X-Y Scatter Charts

e. none of these

f. all of these
Question 5/5
3 points
When multiple factors are contributing to a measure, it is impossible to
untangle them.

True

False
Question 5/5
4 points
Data and analytics has been shown to have the power to transform all kinds
of organizations, including units of the criminal justice system.

True

False
Question 5/5
5 points
Subjectivity must be relied upon in highly sensitive or risky decisions

True

False

A causal relation between two events exists if the occurrence of the first causes the
other. The first event is called the cause and the second event is called the effect. A
correlation between two variables does not imply causation. On the other hand, if
there is a causal relationship between two variables, they must be correlated."

Humans are evolutionarily predisposed to see patterns and psychologically inclined


to gather information that supports pre-existing views, a trait known
as confirmation bias. We confuse coincidence with correlation and correlation with
causality."

Erin Palmer, business intelligence and data mining expert, posits are the
benefits of using data to make decisions:

 What gets measured gets done.


 Clearly-defined problems lead to better decisions.
 Data is neutral, it simply tells a factual story.
 Knowing data mining terms and concepts such as the mean, standard
deviation, counts and benchmarking is important in interpreting data
accurately. 

Post test 4 100%

5 / 5 points
Today’s managers face fewer complexities than ever before due to all the easy software that can
calculate all important measures for them.

True

False
Question 5/5
2 points
Making decisions based on intuition and instinct is just as valuable as making decisions based on
data if the decision maker has enough experience.

True

False
Question 5/5
3 points
How racial gaps in test scores should be interpreted is an extremely challenging and contentious
matter for all concerned.

True

False
Question 5/5
4 points
There are no online resources to help me gain a better understanding of excel.

True

False
Question 5/5
5 points
Match the following Key Performance Indicators (KPIs) with the question each attempts to
answer:

ess of online and social media spheres 1. Market Growth Rate


en by brand 2. Brand Equity
in markets with future potential 3. Conversion Rate   
customers into actual customers 4. Page views/Bounce
ess of internet strategy 5. Klout Score

Match the following Key Performance Indicators (KPIs) with the question each attempts to
answer:

lean and effective processes 1. Six Sigma Level


y of processes – error free work 2. Process Waste Level
s get what they want when they want it 3. DIFOT 
o internal processes in relation to inventory 4. ISR 
products/services fit for purpose 5. Quality Index 

Visual representation of data can be accomplished by creating:

a. Pie Charts
b. Column Charts

c. Bar Charts

d. Line and X-Y Scatter Charts

e. none of these

f. all of these
Question 5/5
8 points
When multiple factors are contributing to a measure, it is impossible to untangle them.

True

False
Question 5/5
9 points
Subjectivity must be relied upon in highly sensitive or risky decisions

True

False
Question 5/5
10 points
Data and analytics has been shown to have the power to transform all kinds of organizations,
including units of the criminal justice system.

True
False

Pre test 5

5 / 5 points
Evidence has shown the following actions to be ineffective, yet many organizations still follow
this line of action:

f. All of these

e. None of these

d. face to face interviews


c. incentive pay for student performance

b. paying for performance

a. ineffective treatment for medical conditions


Question 5/5
2 points
People data is growing, and many companies/organizations are utilizing this type of data to
influence action.

True

False
Question 5/5
3 points
The issue of false negatives has largely been ignored by the media.

True

False
Question 5/5
4 points
Critics fault drug tests for destroying careers (false positives) and polygraphs for missing
potential criminals (false negatives), therefore such technology is little value due to the errors.

True

False
Question 5/5
5 points
There are few reasons that a company or organization should measure and track employees
performance.  Some employees just need to be fired.

True
False

For your Culminating Project, you will be creating a variety of pivot tables from sample
data, creating charts and graphs from the pivot tables, and then building a dashboard
using the charts. Please watch each video to learn the process that you will use in your
Culminating Project. 
o Part 1: Building a Pivot Table (14:47) https://www.youtube.com/watch?
v=9NUjHBNWe9M
o Part 2: Using a Pivot Table to Create Charts (14:47)
https://www.youtube.com/watch?v=g530cnFfk8Y
o Part 3: Building a Dashboard (15:19) https://www.youtube.com/watch?
v=FyggutiBKvU

Read Trust the Evidence, Not Your Instincts and note how the lack of
evidence in the interview process has contributed to poor recruitment
results:

 Studies have shown that unstructured, face-to-face interviews are biased


 Without structured questions and clearly defined evaluation criteria, studies
have shown that interviewers may select candidates who are likeable, similar
to them, and physically attractive — even if these qualities are irrelevant to
performance.
 In order to complete the module discussion and quiz successfully, you should
complete reading assignments. 

Read the following information on Sampling Errors.

-------------------------------------------------------------------------------------
-------------------------------------------------------------------

Sampling Errors
When you analyze data, it is important to be aware of sampling errors and non-
sampling errors. The actual process of sampling causes sampling errors. For example,
the sample may not be large enough. Factors not related to the sampling process
cause non-sampling errors. A defective counting device can cause a non-sampling
error.

In reality, a sample will never be exactly representative of the population so there will
always be some sampling error. As a rule, the larger the sample, the smaller the
sampling error.

In statistics, a sampling bias is created when a sample is collected from a population


and some members of the population are not as likely to be chosen as others
(remember, each member of the population should have an equally likely chance of
being chosen). When a sampling bias happens, there can be incorrect conclusions
drawn about the population that is being studied.

Critical Evaluation
We need to evaluate the statistical studies we read about critically and analyze them before
accepting the results of the studies. Common problems to be aware of include:
 Problems with samples: A sample must be representative of the population. A sample that is
not representative of the population is biased. Biased samples that are not representative of the
population give results that are inaccurate and not valid.
 Self-selected samples: Responses only by people who choose to respond, such as call-in
surveys, are often unreliable.
 Sample size issues: Samples that are too small may be unreliable. Larger samples are better, if
possible. In some situations, having small samples is unavoidable and can still be used to draw
conclusions. Examples: crash testing cars or medical testing for rare conditions
 Undue influence:  collecting data or asking questions in a way that influences the response
 Non-response or refusal of subject to participate:  The collected responses may no longer be
representative of the population.  Often, people with strong positive or negative opinions may
answer surveys, which can affect the results.
 Causality: A relationship between two variables does not mean that one causes the other to
occur. They may be related (correlated) because of their relationship through a different
variable.
 Self-funded or self-interest studies: A study performed by a person or organization in order to
support their claim. Is the study impartial? Read the study carefully to evaluate the work. Do not
automatically assume that the study is good, but do not automatically assume the study is bad
either. Evaluate it on its merits and the work done.
 Misleading use of data: improperly displayed graphs, incomplete data, or lack of context
 Confounding:  When the effects of multiple factors on a response cannot be separated.
Confounding makes it difficult or impossible to draw valid conclusions about the effect of each
factor.

Sampling Errors
When you analyze data, it is important to be aware of sampling errors and non-
sampling errors. The actual process of sampling causes sampling errors. For example,
the sample may not be large enough. Factors not related to the sampling process
cause non-sampling errors. A defective counting device can cause a non-sampling
error.

In reality, a sample will never be exactly representative of the population so there will
always be some sampling error. As a rule, the larger the sample, the smaller the
sampling error.

In statistics, a sampling bias is created when a sample is collected from a population


and some members of the population are not as likely to be chosen as others
(remember, each member of the population should have an equally likely chance of
being chosen). When a sampling bias happens, there can be incorrect conclusions
drawn about the population that is being studied.

Critical Evaluation

We need to evaluate the statistical studies we read about critically and analyze them before
accepting the results of the studies. Common problems to be aware of include:
 Problems with samples: A sample must be representative of the population. A sample that is
not representative of the population is biased. Biased samples that are not representative of the
population give results that are inaccurate and not valid.
 Self-selected samples: Responses only by people who choose to respond, such as call-in
surveys, are often unreliable.
 Sample size issues: Samples that are too small may be unreliable. Larger samples are better, if
possible. In some situations, having small samples is unavoidable and can still be used to draw
conclusions. Examples: crash testing cars or medical testing for rare conditions
 Undue influence:  collecting data or asking questions in a way that influences the response
 Non-response or refusal of subject to participate:  The collected responses may no longer be
representative of the population.  Often, people with strong positive or negative opinions may
answer surveys, which can affect the results.
 Causality: A relationship between two variables does not mean that one causes the other to
occur. They may be related (correlated) because of their relationship through a different
variable.
 Self-funded or self-interest studies: A study performed by a person or organization in order to
support their claim. Is the study impartial? Read the study carefully to evaluate the work. Do not
automatically assume that the study is good, but do not automatically assume the study is bad
either. Evaluate it on its merits and the work done.
 Misleading use of data: improperly displayed graphs, incomplete data, or lack of context
 Confounding:  When the effects of multiple factors on a response cannot be separated.
Confounding makes it difficult or impossible to draw valid conclusions about the effect of each
factor.

Post test 5 90%

5 / 5 points
A hypothesis-based approach to decision making and problem solving includes

a. clear articulation of the problem at hand

b. clear statement of the hypothesis

c. obtain the relevant data

d. conduct the right test based on the data

e. communicate the results to the proper people at the right time

f. none of these

g. all of these
Question 5/5
2 points
Reliance on indirect evidence and the sway of false negatives relative to false positive tend to
produce lots of false alarms
True

False
Question 5/5
3 points
The growing pile of studies on the human and financial costs of employee disengagements,
management distrust, poor group dynamics, faulty incentive schemes, and other preventable
damage suggests a need for evidence-based management movement.

True

False
Question 5/5
4 points
People analytics can be described as

A gut feeling, based on expert experience and opinion

Metrics such as ratios and trends


Measurement, collection, analysis and reporting of data for purposes of
understanding and optimizing
A new method of deciding who to hire and fire
 
Question 0/5
5 points
People data is growing, and many companies/organizations are utilizing this type of data to
influence action.

True

False
Question 5/5
6 points
There are few reasons that a company or organization should measure and track employees
performance.  Some employees just need to be fired.

True

False
Question 5/5
7 points
 “Opinion equals myth, insight equals myth-busted”   This quote from Google proves that big
companies do not want employees to have opinions on anything.

True

False
Question 5/5
8 points
The Analytic Value Chain can be described as a progressive movement from opinion, to data, to
metrics, analysis, insight, and finally  - to action.

True

False
Question 5/5
9 points
Critics fault drug tests for destroying careers (false positives) and polygraphs for missing
potential criminals (false negatives), therefore such technology is little value due to the errors.

True

False
Question 5/5
10 points
Match the following Key Performance Indicators (KPIs) with the question each attempts to
answer:

__5__ Performance in eyes of all stakeholders 1. HCVA


__4__ Employee retention 2. RPE
__1__ Extent our employees are adding value to bottom line 3. Employee Satisfac
__3__ How happy are our employees 4. Employee Churn R
__2__ How productive are our employees 5. 360 feedback

Pre test 6

5 / 5 points
If a lot of people come to the same conclusions about something,

a. they are all wrong unless the data proves it

b. there must be no logic in the thought process

c. there must be a certain logic behind it


d. the data has been coded incorrectly

e. None of these

f. All of these
Question 5/5
2 points
The statistical testing framework demands a disbelief in miracles.

True

False
Question 5/5
3 points
Statistical thinking is central to the scientific method, which requires

a. a doctorate degree to figure out

b. specialized computer systems

c. data mining and storage capability

d. theories to generate testable hypotheses

e. None of these

f. All of these
Question 5/5
4 points
Planning out a research study must also include all the technological aspects of data management
below, EXCEPT: 

Ensuring that the public in general have access to the data.

The protection of intellectual property related to the research.

How the information will be recorded; stored and preserved.

How the confidentiality will be maintained.


Question 5/5
5 points
The statistician who set a minimum acceptable standard of evidence, which is a
a. Phyllis LaPlante

b. Kaiser Fung

c. Tyrion Lannister

d. Sir Ronald Fisher

e. None of these

f. All of these

Hypothesis Testing Overview


One job of a statistician is to make statistical inferences about populations based on
samples taken from the population. Confidence intervals are one way to estimate a
population parameter. Another way to make a statistical inference is to make a decision
about a parameter. For instance, a car dealer advertises that its new small truck gets 35
miles per gallon, on average. A tutoring service claims that its method of tutoring helps
90% of its students get an A or a B. A company says that women managers in their
company earn an average of $60,000 per year.

A statistician will make a decision about these claims. This process is called
"hypothesis testing." A hypothesis test involves collecting data from a sample and
evaluating the data. Then, the statistician makes a decision as to whether or not there is
sufficient evidence, based upon analyses of the data, to reject the null hypothesis.

Hypothesis testing consists of two contradictory hypotheses or statements, a decision


based on the data, and a conclusion. To perform a hypothesis test, a statistician will:

1. Set up two contradictory hypotheses.


2. Collect sample data (in homework problems, the data or summary statistics will be given
to you).
3. Determine the correct distribution to perform the hypothesis test.
4. Analyze sample data by performing the calculations that ultimately will allow you to reject
or decline to reject the null hypothesis.
5. Make a decision and write a meaningful conclusion.

https://openstax.org/details/books/introductory-statistics

Post test 6 100%


5 / 5 points
Data may be collected for a specific study or it can be retrieved from existing sources. All
statements are true about using preexisting data, EXCEPT:

In some cases, obtaining preexisting data can be nearly costless.


One of the disadvantages of using data from existing sources is the possibility
of the original data collection methodology not being reliable.
When using secondary data sources, there are no restrictions concerning data
sharing.
In general, pre-existing data have the advantage of being much faster to
acquire.
Question 5/5
2 points
Data ownership implies obligations for long-term management, protecting data throughout the
data lifecycle from creation to destruction, so that appropriate sharing can occur and
unauthorized sharing is prevented.

True

False
Question 5/5
3 points
If a study is obviously about stigmatizing conditions, illegal activities, or life experiences that
subjects may not want others to know about, it's better to not pursue the research project as there
are no ways to avoid violating the subjects' right to privacy.

True

False
Question 5/5
4 points
What does the visionary Barnett say about two of our biggest fears?

none of the above


all of the above
only b and d
only a and b
d. Don’t avoid foreign airlines, even after one of their planes has crashed.
c. Choose airlines based on comparison of fatalities, not crashes.
b. Don’t choose between US national airlines based on safety.
a. Never fly in airplanes.
View question 4 feedback
5 / 5 points
One of the laws of statistics that can surely lead to bad decision making

a. irregularity in data

b. utilization of small numbers

c. the central market theorem

d. seek and ye shall find

e. None of these

f. All of these
Question 5/5
6 points
Based on statistical testing, we should

f. All of these

e. None of these

d. believe that rare is impossible

c. evaluate data against the background

b. willingly accept the risk of death

a. avoid playing lotteries


Question 5/5
7 points
Statistical thinking is central to the scientific method, which requires

a. a doctorate degree to figure out

b. specialized computer systems

c. data mining and storage capability

d. theories to generate testable hypotheses

e. None of these
f. All of these
Question 5/5
8 points
Retention requirements for data typically derive from the same sources that mandate data
sharing. All sources below may dictate how data should or should not be shared, EXCEPT:

State or national governments

The researcher's intuition only

One's own organization

Journals 
Question 5/5
9 points
The statistical testing framework demands a disbelief in miracles.

True

False
Question 5/5
10 points
If a lot of people come to the same conclusions about something,

a. they are all wrong unless the data proves it


b. there must be no logic in the thought process
c. there must be a certain logic behind it
d. the data has been coded incorrectly
e. None of these
f. All of these

You might also like