Professional Documents
Culture Documents
Lecture 01
Lecture 01
Lecture 01
What is ”Statistics”?
Statistics is the science of learning from data. Data are the facts and figures collected,
analyzed, and summarized for presentation and interpretation. All the data collected in a
particular study are referred to as the data set for the study. Any set of data set contains
information about some group of individuals. The information is organized in variables. Our
objective is to make more informed and better decisions based on the information we have.
• A variable is any characteristic of an individual. A variable can take di↵erent values for
di↵erent individuals.
1
Example: A survey was conducted on students in an introductory statistics class. Below are a
few of the questions on the survey, and the corresponding variables the data from the
2
Data set
2 female extravert 2
···
3 female introvert 4
···
36 male extravert 3
···
3
What are the types of variables?
• A quantitative variable takes numerical values for which arithmetic operations such as
adding and averaging make sense. The values of a quantitative variable are usually
all variables
numerical categorical
4
• gender: categorical
• sleep: quantitative
• bedtime: categorical
• countries: quantitative
The distinction between population and sample is basic to statistics. To make sense of any
sample result, you must know what population the sample represents.
Population
parameters
Sample
statistics
5
•
The population in a statistical study is the entire group of individuals about which we want
information.
- ALL INDIVIDUALS
• A sample is a part of the population from which we actually collect information. We use a
sample to draw conclusions about the entire population. --- STATISTICAL
INTERFERENCE
• A sampling design describes exactly how to choose a sample from the population.
Eg: µ, p and represent the mean, proportion and standard deviation of a population.
µ - population mean
- True mean
- Quantitative data
P – Population proportion
6
•
- True proportion
- Categorical data
A statistic is a number that can be computed from the sample data without making use of any
Eg: x¯, ¯p and s represent the mean, proportion and standard deviation of a sample. These
tics.
X – variable
X bar – sample mean
P bar – sample proportion
Note: Remember p and s: parameters come from populations and statistics come from samples.
Statistical tools and ideas help us examine data in order to describe their main features.
Begin by examining each variable by itself. Then move on to study the relationships among
the variables.
• Begin with a graph or graphs. Then add numerical summaries of specific aspects of the data.
7
•
Distribution of a Variable
The distribution of a variable tells us what values it takes and how often it takes these values.
• The distribution of a categorical variable lists the categories and gives either the count, the
percent and the proportion of individuals who fall into each category.
8
• Bar charts and Pie Charts display distribution of a categorical variable graphically.
• A pie chart must include all the categories that make up a whole. Use a pie chart only
when you want to emphasize each category’s relation to the whole.
• To describe the distribution of a categorical variable, we need to write about the main
features.
• To describe distribution of a quantitative variable, look for the overall pattern and
for striking deviations from that pattern.
• You can describe the overall pattern by its shape, center, variability/spread and
location.
9
• Mean and Median describe the center of the distribution of a quantitative variable
numerically.
• A percentile provides information about how the data are spread over the interval
from the smallest value to the largest value. The pth percentile of a data set is a
value such that at least p% of the items take on this value or less (and at least (100
p% 100-p
• Q3 = 75 th
percentile (below that)
• Q1 = 25 th
percentile (below that)
• Min, Max, first-quartile (Q1), third-quartile (Q3) describe the location of the
10
Exploratory analysis to inference
• Sampling is natural.
• Think about sampling something you are cooking - you taste (examine) a small part of
what you’re cooking to get an idea about the dish as a whole.
• When you taste a spoonful of soup and decide the spoonful you tasted isn’t salty enough,
that’s exploratory analysis.
• If you generalize and conclude that your entire soup needs salt, that’s an inference.
• For your inference to be valid, the spoonful you tasted (sample ) needs to be
– If your spoonful comes only from the surface and the salt is collected at the bottom
of the pot, what you tasted is probably not representative of the whole
pot.
– If you first stir the soup thoroughly before you taste, your spoonful will more likely
be representative of the whole pot.
11
Statistical inference is primarily concerned with understanding and quantifying the
uncertainty of parameter estimates. While the equations and details change depending on
the setting, the foundations for inference are the same throughout all of statistics.
12