03 Descriptive Statistics - Measures of Position

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

Descriptive Statistics: Measures Of Position

Measures of position (sometimes called measures of relative standing) are used


to compare data values (sometimes within the same data set, sometimes
between data sets). There are several measures of position, the four that will be
covered are ranking, percentiles, quartiles, and z-scores.

Ranking
The rank of a data value is a numerical value given to it to indicate its relative
position when all the data values are placed in some order. This assumes that
your data can be arranged in some meaningful order, which typically (not always)
means your data values are numerical in nature, and the arrangement is usually
from least to greatest or greatest to least.

There are several ways to assign ranks. The standard competition ranking,
modified competition ranking, dense ranking, ordinal ranking, and fractional
ranking can be seen in various fields. All of these rankings are the same in that
data are arranged in some order, but differ in how to deal with data that have the
same value (with respect to the arrangement). Details of these can be found at
https://en.wikipedia.org/wiki/Ranking. Fractional ranking will be discussed here
since it is typically used in statistics.

Fractional Ranking
As with all rankings, first arrange and list the data in some order. Then, assign
the number n to the nth number on the list. That is, 1 to the 1st number, 2 to the 2nd
number, 3 to the 3rd number, etc.

Example: The given data values are 7, 12, 15, 15, 15, 15, 16, 18, 18, 18, 20, 21,
21.
Note that these data values have already been arranged in increasing order, but
depending on the study or tool, it may also be arranged in decreasing order.

Data Initial
Value Rank
7 1
12 2
15 3
15 4
15 5
15 6
16 7
18 8
18 9
18 10
20 11
21 12
21 13

Once this is done, you will note that several of the data values are the same, so
they have the same arrangement value. Actually, this initial ranking is equivalent
to ordinal ranking. In statistics, it is frequently desirable to show that these have
the same rank value as well.

Thus, to deal with data values that have the same arrangement value, first get
the mean of the rank values for the data that have the same arrangement value.
Use this mean as their rank value.

Continuing with the example, you will note that 15, 18, and 21 occur more than
once, so we will have to give them new rank values. Following the instructions,
the new rank for the data value 15 will be

3+4 +5+6
=4.5
4

where 3, 4, 5, and 6 are the four rank values of 15.

Similarly, the new rank for 18 will be

8+9+10
=9
3

Lastly, the new rank for 21 will be

12+13
=12.5
2

Thus, the following table shows the fractional ranking, or simply ranking of the
data:

Data
Rank
Value
7 1
12 2
15 4.5
15 4.5
15 4.5
15 4.5
16 7
18 9
18 9
18 9
20 11
21 12.5
21 12.5

Percentiles
Percentiles arrange the data in non-decreasing order, and tries to divide them
into groups that contain a certain percent of the data. It is possible for a data
value to have several percentile ranks, specially if there are less than 100 data
values.

Unfortunately, there are varying definitions and methods for assigning percentile
ranks. One common definition is the nth percentile is the smallest data value that
is greater than n percent of all the data values. Another common definition is the
nth percentile is the smallest data value that is greater than or equal to n percent
of all the data values. Depending on the definition used, the corresponding data
values may differ between definitions. An alternative method for ranking tries to
accommodate both definitions, but does not necessarily correspond to one of the
actual data values (it may be a kind of imagined ideal data value).

For notation, you may encounter Pn to denote the nth percentile. For example, P37
refers to the 37th percentile. Also, a small emphasis in terminology to avoid
confusion, the nth percentile has a percentile ranking of n. That is, the 37th
percentile has a percentile rank of 37.

The first step in computing percentile ranks is the same as in ranking. Arrange
the data values and assign positive integers starting at 1. Remember though, for
percentiles, the ranking is in non-decreasing order. Thus, using the same
example as before the following table gives the result of the first step.

Data Ordinal
Value Rank
7 1
12 2
15 3
15 4
15 5
15 6
16 7
18 8
18 9
18 10
20 11
21 12
21 13

The second step is to determine the corresponding rank for that percentile. This
corresponding rank is generally a guide to help determine the data value for that
percentile. For the two definitions, the formula is:

Rank = Percentile × Number of data values ÷ 100.

So, if you want to find the 30th percentile for the example, the corresponding rank
would be

30 × 13 ÷ 100 = 3.9

Thus, the data value that corresponds to the rank 3.9 also corresponds to the
30th percentile. Unfortunately, there is no rank of 3.9. Thus, this rank suggests
that the data value that corresponds to the 30 th percentile is possibly one of the
data values around rank 3.9. So, we can check data values that correspond to
the rank of 3 (rounded down) or a rank of 4 (rounded up). In this example, rank 3
and 4 both have a data value of 15, so we only need to check this data value.

The next step depends on the definition you use.

First definition: the nth percentile is the smallest data value that is greater than n
percent of all the data values.
For the first definition, we choose the smallest data value that is greater than
30% of the data values. Thus, let us check the data value of 15. 15 is greater
than 2 of the data values (7 and 12), thus, it is greater than about 15.4% (see **
after this paragraph if you are not sure how 15.4% was approximated) of the
data values. This is less than the 30% we are looking for. Hence, we check the
next highest data value, which is 16. 16 is greater than 6 of the data values (7,
12, and four of 15). Thus, it is greater than about 46.2% of the data values, which
exceeds 30%. Hence, the data value of 16 is the smallest data value that is
greater than 30% of the data values.

** The 15.4% was computed by 100 × 2 ÷ 13. Where 2 is used because 15 is


greater than 2 of the data values, and 13 is used because there are 13 data
values. If you are familiar with your math, this is simply determining what percent
2 of 13 is.

So, by the first definition, P30 = 16.

Second definition: the nth percentile is the smallest data value that is greater than
or equal to n percent of all the data values.
For the second definition, we choose the smallest data value that is greater than
or equal to 30% of the data values. Thus, let us check the data value of 15
(recall, this was derived by using the rank as a guide). 15 is greater than or equal
to 6 of the data values (7, 12, and four of 15). Hence, it is greater than about
46.2% of the data values.

So, by the second definition, P30 = 15.

A third way that tries to accommodate both definitions makes use of interpolation.
Again, the first step is to arrange the data in order.

The second step is to determine the corresponding rank for that percentile. This
corresponding rank is generally a guide to help determine the data value for that
percentile. Take note that this is similar to the previous methods, but the formula
is slightly different

Rank = Percentile × ( Number of data values + 1 ) ÷ 100.

Recall that this alternative method may use a value for the percentile that is not
actually one of the data values collected, hence it is an imagined data value, but
it is taken into consideration, hence 1 is added to the number of data values.
Also, please take note of the parenthesis. If you remember your order of
operations, that means you add 1 to the number of data values first.

Again, we will try to compute P30 (the 30th percentile), by using this third method.
30 × ( 13 + 1 ) ÷ 100 = 30 × 14 ÷ 100 = 4.2

This gives us our guiding rank. In this case, we need the both the data values
when rounded down as well as rounded up. So, we will look at the data values
for rank 4 (rounded down) and rank 5 (rounded up). In this case, they are both
the same at 15.

Next we note the fractional part of the guiding rank, which is 0.2 (from the
decimal part of the rank 4.2).
Also note the data value at the higher rank minus the data value at the lower
rank. In this case, it turns out to be 15 – 15 = 0.

Then multiply the two noted value, that is 0.2 × 0 = 0.

Add this number to the data value at the lower rank, which will be 0 + 15 = 15.

That will be the value for the 30th percentile.

Note that in this case, because the data values were the same at both the
rounded down rank and the rounded up rank, it will turn out that the percentile
will also be the same.

Another example using the third method. This time, we will look for the 73 rd
percentile using the same set of data.

1) Guiding rank: 73 × ( 13 + 1 ) ÷ 100 = 73 × 14 ÷ 100 = 10.22

2) Rounded down rank = 10


Rounded up rank = 11

3) Data value at rank 10 = 18


Data value at rank 11 = 20

4) Fractional part of guiding rank = 0.22

5) Data value at rank 11 minus data value at rank 10 = 20 – 18 = 2

6) Fractional part (from step 4) times the difference (from step 5) = 0.22 × 2 =
0.44
7) Data value at rank 10 + result from step 6 = 18 + 0.44 = 18.44

8) P73 = 18.44

**NOTE that in all three methods, if the guiding rank is a whole number, then the
data value at that rank is the percentile value. The examples showed how to deal
with the case when you don’t get a whole number.

Quartiles
Quartiles are formed by particular percentiles. The first quartile is denoted by Q1;
the second quartile is denoted by Q2; and the third quartile is denoted by Q3. The
percentile equivalent are as follows: Q1 = P25; Q2 = P50; and Q3 = P75.

There are also deciles, which are formed by P10, P20, P30, P40, P50, P60, P70, P80,
and P90. These are usually denoted by D1, D2, D3, D4, D5, D6, D7, D8, and D9,
respectively.

For quartiles and deciles, you may get different values, depending on the
definition/method you used for percentiles. Also note that the median is actually
equivalent to P50, using the third method (interpolation).

Z-Score
The z-score assumes that your data is spread out a certain way, that is, the data
has a certain distribution. There are many types of distributions, but the z-score
assumes what is known as the normal distribution. You may have encountered
this in a previous statistics class. The normal distribution will be discussed further
in measures of dispersion.

To determine the z-score of a particular data value, you will need the mean and
the standard deviation of the data set. The standard deviation will be discussed
further in measures of dispersion, just note that the standard deviation differs
when the data set is formed by the population, or by a sample. Usually, if the
standard deviation is from a sample, it is denoted s, while if it is from a
population, it is denoted σ (the Greek alphabet sigma, in lowercase).

The z-score z of a data value x is given by:

x−x
z=
s
or

x−μ
z=
σ

The formula is essentially the same, the notation differs to emphasize if the data
is from a sample or population.

For example, if the mean of the data set is 10, and the standard deviation of the
data set is 5, then the z-score for a data value of 8 is

8−10
z= =−0.4
5

The z-score can then be used with z-tables to determine what percent of the data
set is below the data value. Technically, the z-table will give you the probability
that a randomly chosen z-score will be less than the particular z-score. This is
equivalent to the percent of z-scores that are less than the particular z-score.

You should be familiar with z-scores from your high school statistics class. If you
have forgotten, the first part of this website on how to use a z-table gives a
simple example. Do NOT read the part about how to create z-tables, unless you
really want to know. The link is: https://towardsdatascience.com/how-to-use-and-
create-a-z-table-standard-normal-table-240e21f36e53

You might also like