STAT 1124 - Chapter 1

Contents
Contents 1
1 Picturing Distributions with Graphs 2

1.1 Individuals and Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Categorical variables: pie charts and bar graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Quantitative variables: Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Interpreting Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Quantitative variables: Stemplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1
Chapter 1
Picturing Distributions with Graphs
What is statistics?
• The science of data (information).

• The science of collecting, classifying, summarizing, organizing, analyzing, presenting, and interpreting numerical and
categorical information.
1.1 Individuals and Variables
A dataset is a collection of information about some group of individuals or (subjects, cases, items, or units) such as people,
cars, nations, etc.
Individuals are people, animals, cars, things, or objects described by a dataset and on which data are collected.
Variables of interest are characteristics or properties or attributes of the individuals which may take different values for
different individuals.
An Observation is a row in the spreadsheet which contains the measurement(s) (numbers, letters, or words) on one indi-
vidual’s variable(s).
Categorical and Quantitative Variables
A categorical variable has two or more groups or categories or classes into which an individual would be placed.
A quantitative variable takes numeric values recorded with a unit of measurement such as hours, minutes, percentages,
or kilograms. Ordinary arithmetic operations are meaningful for quantitative data.
Note:
Sometimes the categories of a categorical variable are stored as numbers but these numbers are just labels for the categories
and have no units of measurement (numerical meaning).
Thus, the categorical variables represent the data which are labels, or names. Some categorical variables represent the data
of which the order or rank is meaningful.
2
CHAPTER 1. PICTURING DISTRIBUTIONS WITH GRAPHS 3
Example 1.1.1 Consider the following dataset obtained in a study about seven selected students from a Statistics class.
Figure 1.1: A spreadsheet from seven students of a Statistics class
1. What individuals does the dataset describe? How many individuals are being studied?
2. How many variables do the dataset contain? What is the unit of measurement of each variable? Classify the corre-
sponding variables as categorical or quantitative.

1.2 Categorical variables: pie charts and bar graphs
Exploratory Data Analysis: Describing main features of data by statistical tools and ideas.
Exploring Data
1. Examining and describing one variable and then studying the relationships among the variables.
2. Creating a relative graph or graphs and then calculating numerical summaries of specific aspects of the data.
We usually want to display the distribution of a single variable in order to examine it.
Distribution of a Variable
The distribution of a variable indicates what values it takes and how often it takes these values.
Now we are going to describe the distribution of a single categorical variable using graphs.
The distribution of a categorical variable lists the categories of the variable as well as the count (frequency) or the percent
of individuals who fall into each category.
Example 1.2.1 787 employees of a company were asked to complete a survey on their education level (some high school,
high school graduate, some college, and college graduate). Here are the data on the percents and counts of employees who
have different education levels. Complete the following table.
Education Level Number of Employees(frequency or count) Percent of Employees Relative Frequency

Some High School, 123
High School Graduate 251
Some College 189
College Graduate 224
Total
1- What percentage of employees did not go to college?
2. What proportion of employees are college graduates?
1- Pie charts
Pie charts show the distribution of a categorical variable as a “pie” whose slices are sized by the counts or percents for the
categories.
• A pie chart must include all the categories of a categorical variable.
• The slices of a pie represent the categories of the categorical variable.

• The size of each slice is proportional to the percents of the categories.
• Pie charts are used to emphasize each category’s relation to the whole.
Figure 1.2: Excel pie chart for Education Level
2- Bar graphs
Bar graphs represent each category of a categorical variable as a bar. The height of each bar over each category indicates
the count or percent of the corresponding category.
Figure 1.3: Excel bar graph for Education Level
Figure 1.3 describes the distribution of education levels of employees when the bars follow the alphabetical order of education
levels.
Figure 1.4: Excel bar graph for Education Level
Figure 1.5: Excel bar graph for Education Level when the bars apear in order of hieght
What is the height of the bar related to the “High School Graduate ” level?
1.3 Quantitative variables: Histograms
Quantitative variables often take many different values. The distribution of a quantitative variable tells us what values the
variable takes on and how often it takes those values.
The two commonly used graphs to display the distribution of a quantitative variable are:
• Histograms show the distribution of a quantitative variable by using bars whose height represents the number (or
percent) of individuals who take on a value within a particular class.
• Stemplots or stem-and-leaf plots separate each observation into a stem and a leaf that are then plotted to display
the distribution while maintaining the original values of the variable.

Histograms
• The range of the data set is divided into some (usually between 5-20) classes (or class intervals) with equal width.
• Since the class widths are equal, the taller bar has a larger area and represents more individuals.
• You can use the following formula to calculate the approximate class width:
Largest value - Smallest value

Class Width = Number of Classes
• Each value in the data set falls into one and only one class. A class starts with the minimum value of the class called
the lower class limit and ends with the maximum value of the class called the upper class limit.
• The number of individuals which fall into each class is called class frequency or (class counts).
• The horizontal axis shows the classes and is marked in the units of measurement for the variable of interest while the
vertical axis shows the count or percent of each class.
• There is no gap or space between histogram bars which shows all the values of the variable are covered by the bars. If
there is a gap between bars in a histogram, that means there are no values to fall into that particular class.
Example 1.3.1 The following histogram summarizes the annual sales (in thousands of dollars) amount for some selected
salespeople in a company for the last fiscal year.
Figure 1.6: StatGraphics histogram of Annual Sales

1- How many salespeople were selected for this study?
2- Complete the following table:
Class Count (Frequency) Relative Frequency

50 to < 150
Consider the following histogram:
Figure 1.7: StatGraphics histogram of Annual Sales
What is the difference between the two histograms above (Figure 1.6 and Figure 1.7)?
Note:
• There are some recommendations for selecting the number of classes in a histogram but there is no unique right choice.
• Trial and error and the resulting judgement are used to determine the number of classes in order to describe the shape.
• Not too many classes with one or no observations (“pancake” graph).
• Not too few classes with all values in a few classes with tall bars which results in losing information (“skyscraper”
graph).
Which histogram describes the numerical variables, Sales Amount and Age the best?
1.4 Interpreting Histograms
• We apply the created graph such as histogram for describing the overall pattern and for striking deviations from
that pattern.
• Overall pattern of a histogram can be described by its shape, center, and variability (spread).
• An important kind of deviation is an outlier, an individual that falls outside the overall pattern. We only look for
strong outliers that suggest something special about observations such as error of typing. Usually, look for outliers
after a large gap in the distribution.
• The shape of a distribution can be described by explaining the symmetry or skewness of the histogram and whether
the distribution has a single pick (unimodal) or multiple picks (multimodal). Try to find major picks not minor ups
and downs in the bars of the histogram.
• The centre of a distribution can be described by its midpoint which half of the observations have values smaller than
the value of the midpoint while half of the observations have values larger than the value of the midpoint.
• The variability or spread can be shown by the difference between the largest value and the smallest value called range.
• A histogram of a very large data set with small classes appears as a smooth curve.
Figure 1.8: Data set size effect on a histogram
Describing Distributions: Symmetric and Skewed Distributions
• A distribution is symmetric if the right and left sides of the histogram are approximately mirror images of each other.
Figure 1.9: symmetric

• A distribution is skewed to the right if the right side of the histogram (side with larger values) extends much farther
out than the left side.
Figure 1.10: Right-skewed
• A distribution is skewed to the left if the left side of the histogram extends much farther out than the right side.
Figure 1.11: Left-skewed
Figure 1.12: Symmetric and Skewed Distributions

1.5 Quantitative variables: Stemplots
The second graph for describing a quantitative variable is a stemplot or a stem-and-leaf plot which mostly used for a small
data set (usually fewer than 100 observations) and provides more detail than a histogram.
How to make a stemplot:
1- Separate each observation into a stem, consisting of all but the final (rightmost) digit, and a leaf, which is that remaining
final digit. Stems may have as many digits as needed, but each leaf contains only a single digit.
2- Write the stems in a vertical column with the smallest value at the top, and draw a vertical line at the right of this
column. Be sure to include all the stems needed to span the data, even when some stems have no leaves.
3- Write each leaf in the row to the right of its stem, in increasing order out from the stem.
Example 1.5.1 Sales Data
147, 232, 547, 328, 295, 194, 368, 456, 410, 298, 321, 190, 211, 413, 123, 128, 189, 136, 150, 129, 110, 250, 259, 200, 200, 650,
700, 600, 500, 800
Stems Leaves
1 1222345899
2 00135599
3 226
4 116
5 04
6 05
7 0
8 0
Leaf Unit = 10
What does the 6 stem contain?
Note: If you have too many leaves for one stem, you can split the stem.
Acknowledgement
The core content of the slides are from the textbook of this course;
The Basic Practice of Statistics (8th Edition)
by
MOORE, NOTZ, and FLIGNER

STAT 1124 - Chapter 1

Uploaded by

Copyright:

Available Formats

You might also like

STAT 1124 - Chapter 1

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STAT 1124 - Chapter 1

Uploaded by

Copyright:

Available Formats

Contents

1 Picturing Distributions with Graphs 2

Picturing Distributions with Graphs

• The science of data (information).

1.1 Individuals and Variables

cars, nations, etc.

Categorical and Quantitative Variables

or kilograms. Ordinary arithmetic operations are meaningful for quantitative data.

and have no units of measurement (numerical meaning).

of which the order or rank is meaningful.

Figure 1.1: A spreadsheet from seven students of a Statistics class

sponding variables as categorical or quantitative.

1.2 Categorical variables: pie charts and bar graphs

of individuals who fall into each category.

have different education levels. Complete the following table.

Education Level Number of Employees(frequency or count) Percent of Employees Relative Frequency

1- What percentage of employees did not go to college?

2. What proportion of employees are college graduates?

• A pie chart must include all the categories of a categorical variable.

• The slices of a pie represent the categories of the categorical variable.

• The size of each slice is proportional to the percents of the categories.

Figure 1.2: Excel pie chart for Education Level

the count or percent of the corresponding category.

Figure 1.3: Excel bar graph for Education Level

Figure 1.4: Excel bar graph for Education Level

1.3 Quantitative variables: Histograms

variable takes on and how often it takes those values.

percent) of individuals who take on a value within a particular class.

the distribution while maintaining the original values of the variable.

Largest value - Smallest value

vertical axis shows the count or percent of each class.

salespeople in a company for the last fiscal year.

Figure 1.6: StatGraphics histogram of Annual Sales

1- How many salespeople were selected for this study?

2- Complete the following table:

Class Count (Frequency) Relative Frequency

Consider the following histogram:

Figure 1.7: StatGraphics histogram of Annual Sales

• Not too many classes with one or no observations (“pancake” graph).

1.4 Interpreting Histograms

after a large gap in the distribution.

and downs in the bars of the histogram.

Figure 1.8: Data set size effect on a histogram

Describing Distributions: Symmetric and Skewed Distributions

Figure 1.9: symmetric

out than the left side.

Figure 1.10: Right-skewed

Figure 1.11: Left-skewed

Figure 1.12: Symmetric and Skewed Distributions

1.5 Quantitative variables: Stemplots

How to make a stemplot:

Example 1.5.1 Sales Data

700, 600, 500, 800

What does the 6 stem contain?

The Basic Practice of Statistics (8th Edition)

MOORE, NOTZ, and FLIGNER

You might also like