Professional Documents
Culture Documents
BA 216 Lecture 1 Notes
BA 216 Lecture 1 Notes
Stats Overview
● Statistics is a key part of the general process of investigation:
● 1. Identify a question or problem.
● Suppose you flip a coin 100 times. While the chance a coin lands heads in any
given coin flip is 50%, we probably won't observe exactly 50 heads due to natural
variation.
○ Would you be surprised if you flip a coin 100 times and get 55 heads?
○ What about 100 flips and get 80 heads?
○ What about 5 flips and 4 heads
In this course, we’ll learn how to be more confident when we expect a result is “real” for
the whole population, not just the sample.
● The larger the difference we observe (for a particular sample size), the less
believable it is that the difference is due to chance.
● So what we are really asking is the following: is the difference so large that we
should reject the notion that it was due to chance?
● We haven't yet covered statistical tools to fully address this question – that’s what
this whole course is going to be about!
Introduction to data structure
● A survey was conducted on students in an introductory statistics course. Below
are a few of the questions on the survey, and the corresponding variables
the data from the responses were stored in:
Data Matrix:
Variables come in different types
● While deceptively simple, these concepts are extremely important for choosing
the right analytical techniques when trying to answer a research or business
question!
Ex:
● High or low?
● Light or dark?
● More or less?
Ex:
● What is your gender?
Types of variables (Exercise)
Gender:
Sleep:
Bedtime:
Countries:
Dread:
Solution:
● Explanatory variables are usually graphed on the x-axis, and response variables
are usually graphed on the y-axis.
● We name these variables based on the hypothesized relationship between the
two variables, based on our best-guess as to what might be affecting
what….BUT….
● Labeling variables as explanatory and response does not guarantee the
relationship between the two is actually causal, even if there is an association (a
“correlation”) identified between the two variables.
● We use these labels only to keep track of which variable we suspect affects the
other.
● When we suspect that two variables show some kind of connection with one
another, they are called associated variables.
○ Note*: Associated variables can also be called dependent variables and
vice-versa.
○ Note*: No pair of variables is both associated and independent.
Types of variables
Types of variables
Note*: Pay attention to the trend of the data, DO NOT jump to conclusions quickly
Solution:
Ans: B
Reason: We can’t assume that one variable will cause a change in the other as we do
not know the full story behind the data trend. We can only observe from what is shown
on the data, therefore we MUST make observations of the data trend itself (positive,
negative, or no correlation) INSTEAD of making any kind of inference or assumption.
● When we suspect that two variables show some kind of connection with one
another, they are called associated variables.
● We can even say that two variables appear to be positively associated, or
negatively associated.
● But, without MUCH more rigorous work, we cannot confidently say that one
variable is causing the changes in the second variable.
● Correlation does not (always) equal causation!
Example Question:
Exercise 1 - Fisher's irises (3 points): Sir Ronald Aylmer Fisher was an English
statistician, evolutionary biologist, and geneticist who worked on a data set that
contained sepal length and width, and petal length and width from three species of iris
flowers (setosa, versicolor and virginica). There were 50 flowers from each species in
the data set.
Because the number of flowers for each species is already determined in this problem.
Example:
In a study of the relationship between socioeconomic status and unethical behavior, 129
University of California undergraduates at Berkeley were asked to identify themselves
as having low or high social-class by comparing themselves to others with the most
(and least) money, most (and least) education, and most (and least) respected jobs.
They were also presented with a jar of individually wrapped candies and informed that
the candies were for children in a nearby laboratory, but that they could take "some" if
they wanted. After completing some unrelated tasks, participants reported the number
of candies they had taken. The study found that students who were identified as
upper-class took more candy than others.