Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Descriptive Statistics

VEENA GOSWAMI
Getting started
In simple words, statistics is a way to get information from data. More
precisely, it is a science of collecting and analyzing data for the purpose
of drawing conclusions and making decisions.

Every application of statistics involves one or more of the following


three tasks:
• Collecting data,
• Summarizing and exploring data,
• Drawing conclusions and making decisions based on data
Statistics Defined

“ Statistics is the science and practice of developing human knowledge


through the use of empirical data expressed in quantitative form. It is
based on statistical theory which is a branch of applied mathematics.
Within statistical theory, randomness and uncertainty are modelled by
probability theory ”
Why study statistics?
Areas that rely on statistical information and techniques include:
• Quality control
• Product planning
• Yearly reports
• Forecasting
• Market research
• Medical research
Applications of Statistics in Computer Science
A statistical background is essential for understanding algorithms and
statistical properties that form the backbone of computer science.

Statistics is used for


• Data mining and analytics
• Machine learning
• Vision and image analysis
• Data compression
• Network and traffic modeling
Types of statistics
There are two types of statistics, namely, descriptive and inferential
statistics.
Descriptive statistics:
Descriptive statistics are the methods that help collect, summarize,
present, and analyze a set of data.

Inferential statistics:
Inferential statistics are the methods that use the data collected from a
small group to draw conclusions about a larger group.
Types of Variables
Categorical variables (also known as qualitative variables) have values
that can only be placed into categories such as yes and no.
“Do you currently own bonds?” (yes or no) and the level of risk of a
bond fund (below average, average, or above average) are examples of
categorical variables.

Numerical variables (also known as quantitative variables) have values


that represent quantities. Numerical variables are further identified as
being either discrete or continuous variables.
Quantitative Variables - Classifications

Quantitative variables can be classified as either discrete or


continuous.

Discrete variables have numerical values that arise from a counting


process. “The number of premium cable channels subscribed to” is an
example of a discrete numerical variable because the response is one
of a finite number of integers.

EXAMPLE: The number of rooms in a house, or the number of


hammers sold at the local Home Depot (1,2,3,…,etc).
Quantitative Variables - Classifications
Continuous variables produce numerical responses that arise from a
measuring process. The time you wait for teller service at a bank is an
example of a continuous numerical variable because the response
takes on any value within a continuum, or an interval, depending on
the precision of the measuring instrument.

EXAMPLE: The pressure in a tire, the weight of a pork chop, or the


height of students in a class.
Types of Variables
Types of Variables
We have categorized data as either qualitative or quantitative.

Data can be classified according to levels of measurement. The level of


measurement of the data often dictates the calculations that can be
done to summarize and present the data. It will also determine the
statistical tests that should be performed.
These levels indicate the type of arithmetic that is appropriate for the
data, such as ordering, taking differences, or taking ratios.

For example, there are six colors of candies in a bag of M&M's candies.
Levels of measurement
Levels of measurement
MUTUALLY EXCLUSIVE:
A property of a set of categories such that an individual or object is
included in only one category.

EXHAUSTIVE:
A property of a set of categories such that each individual or object
must appear in a category.
Levels of measurement
Nominal= data that is classified into categories and cannot be arranged in any
particular order.
EXAMPLES: eye color, gender, religious affiliation.
To summarize, nominal-level data have the following properties:
1. Data categories are mutually exclusive and exhaustive.
2. Data categories have no logical order.

Ordinal= involves data arranged in some order, but the differences between data
values cannot be determined or are meaningless.
EXAMPLE: During a taste test of 4 soft drinks, Mellow Yellow was ranked
number 1, Sprite number 2, Seven-up number 3, and Orange Crush number 4.
The properties of ordinal-level data are:
1. The data classifications are mutually exclusive and exhaustive.
2. Data classifications are ranked or ordered according to the particular trait they
possess.
Levels of measurement
Interval= similar to the ordinal level, with the additional
property that meaningful amounts of differences between data
values can be determined. There is no natural zero point.
EXAMPLE: Temperature on the Fahrenheit scale.

The properties of interval-level data are:


1. Data classifications are mutually exclusive and exhaustive.
2. Data classifications are ordered according to the amount of
the characteristics they possess.
3. Equal differences in the characteristic are represented by
equal differences in the measurements.
Levels of measurement
Ratio= similar to interval level with an inherent zero starting point.
Differences and ratios are meaningful for this level of measurement.
EXAMPLES: Monthly income of surgeons, or distance traveled by
manufacturer’s representatives per month.

The properties of the ratio-level data are:


1. Data classifications are mutually exclusive and exhaustive.
2. Data classifications are ordered according to the amount of the
characteristics they possess.
3. Equal differences in the characteristic are represented by equal differences
in the numbers assigned to the classifications.
4. The zero point is the absence of the characteristic.
Levels of measurement
The difference between interval and ratio measurements can be
confusing.
The fundamental difference involves the definition of a true zero and
the ratio between two values.
If you have $50 and your friend has $100, then your friend has twice as
much money as you. You may convert this money to Japanese yen or
English pounds, but your friend will still have twice as much money as
you. If you spend your $50, then you have no money.
This is an example of a true zero.
Levels of measurement
As another example, a sales representative travels 250 miles on
Monday and 500 miles on Tuesday. The ratio of the distances traveled
on the two days is 2/1; converting these distances to kilometers, or
even inches, will not change the ratio. It is still 2/1.
Suppose the sales representative works at home on Wednesday and
does not travel. The distance traveled on this date is zero, and this is a
meaningful value.
Hence, the variable distance has a true zero point.
Example 1: Levels of measurement
Identify the type of data.

Taos, Acoma, Zuni, and Cochiti are the names of four Native American
villages from the population of names of all Native American villages in
Arizona and New Mexico.

Solution:
These data are at the nominal level. Notice that these data values are
simply names. By looking at the name alone, we cannot determine if
one name is “greater than or less than” another. Any ordering of the
names would be numerically meaningless.
Example 2: Levels of measurement
In a high school graduating class of 319 students, Jim ranked 25th, Kim
ranked 19th, Walter ranked 10th, and Julia ranked 4th, where 1 is the
highest rank.

Solution:
These data are at the ordinal level. Ordering the data clearly makes
sense. Walter ranked higher than Kim. Jim had the lowest rank, and
Julia the highest.

However, numerical differences in ranks do not have meaning.


Example 2: Levels of measurement
The difference between Kim’s and Jim’s ranks is 6, and this is the same
difference that exists between Walter’s and Julia’s ranks. However, this
difference doesn’t really mean anything significant.

For instance, if you looked at grade point average, Walter and Julia may
have had a large gap between their grade point averages, whereas Kim
and Jim may have had closer grade point averages.

In any ranking system, it is only the relative standing that matters.


Differences between ranks are meaningless.
Example 3: Levels of measurement
Body temperatures (in degrees Celsius) of fish in the Yellowstone River.

Solution:
These data are at the interval level. We can certainly order the data,
and we can compute meaningful differences. However, for Celsius-scale
temperatures, there is not an inherent starting point.

The value 0C may seem to be a starting point, but this value does not
indicate the state of “no heat.”
Furthermore, it is not correct to say that 20C is twice as hot as 10C.
Example 4: Levels of measurement
Length of fish swimming in the Yellowstone River.

Solution:
These data are at the ratio level. An 18-inch fish is three times as long
as a 6-inch fish. Observe that we can divide 6 into 18 to determine a
meaningful ratio of fish lengths.
Example: Levels of measurement
What is the level of measurement for each of the following variables?
a. Distance students travel to class.

b. Student scores on the first statistics test.

c. A classification of students by state of birth.

d. A ranking of students by freshman, sophomore, junior, and senior.

e. Number of hours students study per week.


Example: Levels of measurement
What is the level of measurement for each of the following variables?
a. Distance students travel to class. Ratio

b. Student scores on the first statistics test. Interval

c. A classification of students by state of birth. Nominal

d. A ranking of students by freshman, sophomore, junior, and senior.


Ordinal
e. Number of hours students study per week. Ratio

You might also like