Measures of Variability Symmetry and Kurtosis

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 28

MEASURES OF VARIABILITY,

SYMMETRY AND KURTOSIS


• DESCRIPTIVE MEASURES THAT ARE USED TO
INDICATE THE AMOUNT OF VARIATION IN A
DATA SET ARE CALLED MEASURES OF
VARIABILITY, DISPERSION, OR SPREAD. WHEN
DESCRIPTIVE STATISTICS ARE REPRESENTED,
THERE IS USUALLY AT LEAST ONE MEASURE
OF CENTRAL TENDENCY AND AT LEAST ONE
MEASURE OF VARIABILITY REPORTED.
VARIABILITY
• VARIABILITY REFERS TO HOW
SPREAD SCORES ARE IN A
DISTRIBUTION OUT; THAT IS, IT
REFERS TO THE AMOUNT OF SPREAD
OF THE SCORES AROUND THE MEAN.
WHY DOES VARIABILITY
MATTER?
• WHILE THE CENTRAL TENDENCY, OR AVERAGE, TELLS YOU WHERE MOST OF YOUR POINTS
LIE, VARIABILITY SUMMARIZES HOW FAR APART THEY ARE. THIS IS IMPORTANT BECAUSE
THE AMOUNT OF VARIABILITY DETERMINES HOW WELL YOU CAN GENERALIZE RESULTS
FROM THE SAMPLE TO YOUR POPULATION.
• LOW VARIABILITY IS IDEAL BECAUSE IT MEANS THAT YOU CAN BETTER PREDICT
INFORMATION ABOUT THE POPULATION BASED ON SAMPLE DATA. HIGH VARIABILITY
MEANS THAT THE VALUES ARE LESS CONSISTENT, SO IT’S HARDER TO MAKE PREDICTIONS.
• DATA SETS CAN HAVE THE SAME CENTRAL TENDENCY BUT DIFFERENT LEVELS OF
VARIABILITY OR VICE VERSA. IF YOU KNOW ONLY THE CENTRAL TENDENCY OR THE
VARIABILITY, YOU CAN’T SAY ANYTHING ABOUT THE OTHER ASPECT. BOTH OF THEM
TOGETHER GIVE YOU A COMPLETE PICTURE OF YOUR DATA.
• EXAMPLE: VARIABILITY IN NORMAL

DISTRIBUTIONS YOU ARE INVESTIGATING THE AMOUNTS OF TIME SPENT ON PHONES DAILY BY DIFFERENT GROUPS OF PEOPLE.
USING SIMPLE RANDOM SAMPLES, YOU COLLECT DATA FROM 3 GROUPS:
• SAMPLE A: HIGH SCHOOL STUDENTS,
• SAMPLE B: COLLEGE STUDENTS,
• SAMPLE C: ADULT FULL-TIME EMPLOYEES.
• ALL THREE OF YOUR SAMPLES HAVE THE SAME
AVERAGE PHONE USE, AT 195 MINUTES OR 3
HOURS AND 15 MINUTES. THIS IS THE X-AXIS
VALUE WHERE THE PEAK OF THE CURVES ARE.
• ALTHOUGH THE DATA FOLLOWS A 
NORMAL DISTRIBUTION, EACH SAMPLE HAS
DIFFERENT SPREADS. SAMPLE A HAS THE
LARGEST VARIABILITY WHILE SAMPLE C HAS
THE SMALLEST VARIABILITY.
RANGE
• THE RANGE TELLS YOU THE SPREAD OF YOUR DATA FROM THE
LOWEST TO THE HIGHEST VALUE IN THE DISTRIBUTION. IT’S
THE EASIEST MEASURE OF VARIABILITY TO CALCULATE.
• TO FIND THE RANGE, SIMPLY SUBTRACT THE LOWEST VALUE
FROM THE HIGHEST VALUE IN THE DATA SET.
RANGE = HIGHEST VALUE – LOWEST VALUE
•THE RANGE IS USED TO REPORT THE
MOVEMENT OF STOCK PROCESS OVER
A PERIOD OF TIME AND THE WEATHER
REPORTS, TYPICALLY STATE THE
HIGHEST AND LOWEST TEMPERATURE
READINGS FOR A 24-HOUR PERIOD.
EXAMPLE
• FIND THE RANGE IN SET A, B AND C.

SET A: 81, 83, 87, 90, 94


SET B: 84, 86, 87, 88, 90
SET C: 85, 86, 87, 88, 89
EXAMPLE
• FIND THE RANGE IN SET A, B AND C.

SET A: 81, 83, 87, 90, 94 R= 94-81 =13


SET B: 84, 86, 87, 88, 90 R= 90-84 =6
SET C: 85, 86, 87, 88, 89 R= 89-85 = 4
BASED ON THE COMPUTED RANGE OF SET A, B AND C, IT CAN BE CONCLUDED
THAT A HAS GREATER VARIABILITY AS COMPARE TO B AND C.
MEAN AND RANGE
• SARAH AND CATHY RECORD THEIR TEN BEST TIMES FOR SWIMMING RACE. THE TIMES (IN
SECONDS) ARE SHOWN BELOW:
SARAH 79 70 68 75 69 64 69 75 64 76
CATHY 78 79 76 81 89 76 64 85 82 78

A.) FIND THE MEAN TIME FOR SARAH AND CATHY.


B.) FIND THE RANGE FOR SARAH AND CATHY.
C. USE THE MEAN AND RANGE TO COMPLETE THE CONCLUSION BELOW:
THE MEAN SHOWS THAT ON AVERAGE, SARAH WAS ______SECONDS ________ THAN CATHY.
THE RANGE SHOWS THAT ________ WAS MORE CONSISTENT WITH HER TIMES. ________ HAD A
LARGER SPREAD OF TIMES WHICH SHOWS THAT SHE WAS _________ CONSISTENT.
MEAN AND RANGE
• SARAH AND CATHY RECORD THEIR TEN BEST TIMES FOR SWIMMING RACE. THE TIMES (IN
SECONDS) ARE SHOWN BELOW:
SARAH 79 70 68 75 69 64 69 75 64 76
CATHY 78 79 76 81 89 76 64 85 82 78

A.) FIND THE MEAN TIME FOR SARAH AND CATHY.


B.) FIND THE RANGE FOR SARAH AND CATHY.
C. USE THE MEAN AND RANGE TO COMPLETE THE CONCLUSION BELOW:
THE MEAN SHOWS THAT ON AVERAGE, SARAH WAS __7.9__ SECONDS FASTER THAN CATHY.
THE RANGE SHOWS THAT SARAH WAS MORE CONSISTENT WITH HER TIMES. CATHY HAD A
LARGER SPREAD OF TIMES WHICH SHOWS THAT SHE WAS LESS CONSISTENT.
INTERQUARTILE RANGE
• IN DESCRIPTIVE STATISTICS, THE INTERQUARTILE RANGE TELLS
YOU THE SPREAD OF THE MIDDLE HALF OF YOUR DISTRIBUTION.
• QUARTILES SEGMENT ANY DISTRIBUTION THAT’S ORDERED
FROM LOW TO HIGH INTO FOUR EQUAL PARTS. THE
INTERQUARTILE RANGE (IQR) CONTAINS THE SECOND AND THIRD
QUARTILES, OR THE MIDDLE HALF OF YOUR DATA SET.
• WHEREAS THE RANGE GIVES YOU THE SPREAD OF THE
WHOLE DATA SET, THE INTERQUARTILE RANGE GIVES YOU
THE RANGE OF THE MIDDLE HALF OF A DATA SET.
• THE INTERQUARTILE RANGE IS FOUND BY SUBTRACTING THE Q1 VALUE FROM
THE Q3 VALUE:

Formula Explanation
•IQR = interquartile range
•Q3 = 3rd quartile or 75th percentile
•Q1 = 1st quartile or 25th percentile

• Q1 IS THE VALUE BELOW WHICH 25 PERCENT OF THE DISTRIBUTION LIES, WHILE


Q3 IS THE VALUE BELOW WHICH 75 PERCENT OF THE DISTRIBUTION LIES.
• YOU CAN THINK OF Q1 AS THE MEDIAN OF THE FIRST HALF AND Q3 AS THE
MEDIAN OF THE SECOND HALF OF THE DISTRIBUTION.
METHODS FOR FINDING THE
INTERQUARTILE RANGE

• ALTHOUGH THERE’S ONLY ONE FORMULA, THERE ARE VARIOUS


DIFFERENT METHODS FOR IDENTIFYING THE QUARTILES.
YOU’LL GET A DIFFERENT VALUE FOR THE INTERQUARTILE
RANGE DEPENDING ON THE METHOD YOU USE.
• HERE, WE’LL DISCUSS TWO OF THE MOST COMMONLY USED
METHODS. THESE METHODS DIFFER BASED ON HOW THEY USE
THE MEDIAN.
EXCLUSIVE METHOD VS INCLUSIVE METHOD

• THE EXCLUSIVE METHOD EXCLUDES THE MEDIAN WHEN IDENTIFYING Q1 AND


Q3, WHILE THE INCLUSIVE METHOD INCLUDES THE MEDIAN IN IDENTIFYING
THE QUARTILES.
• THE PROCEDURE FOR FINDING THE MEDIAN IS DIFFERENT DEPENDING ON
WHETHER YOUR DATA SET IS ODD- OR EVEN-NUMBERED.
• WHEN YOU HAVE AN ODD NUMBER OF DATA POINTS, THE MEDIAN IS THE VALUE
IN THE MIDDLE OF YOUR DATA SET. YOU CAN CHOOSE BETWEEN THE INCLUSIVE
AND EXCLUSIVE METHOD.
• WITH AN EVEN NUMBER OF DATA POINTS, THERE ARE TWO VALUES IN THE
MIDDLE, SO THE MEDIAN IS THEIR MEAN. IT’S MORE COMMON TO USE THE
EXCLUSIVE METHOD IN THIS CASE.
STEPS FOR THE EXCLUSIVE METHOD
• TO SEE HOW THE EXCLUSIVE METHOD WORKS BY HAND, WE’LL USE TWO EXAMPLES:
ONE WITH AN EVEN NUMBER OF DATA POINTS, AND ONE WITH AN ODD NUMBER.
Even-numbered data set
We’ll walk through four steps using a sample data set with 10 values.
Step 1: Order your values from low to high.

Step 2: Locate the median, and then separate the values below it from the values above it.
With an even-numbered data set, the median is the mean of the two values in the middle, so you simply divide your data
set into two halves.
• STEP 3: FIND Q1 AND Q3.
Q1 IS THE MEDIAN OF THE FIRST HALF AND Q3 IS THE MEDIAN OF THE SECOND HALF. SINCE
EACH OF THESE HALVES HAVE AN ODD NUMBER OF VALUES, THERE IS ONLY ONE VALUE IN THE
MIDDLE OF EACH HALF.

STEP 4: CALCULATE THE INTERQUARTILE RANGE.


• ODD-NUMBERED DATA SET
• THIS TIME WE’LL USE A DATA SET WITH 11 VALUES.
STEP 1: ORDER YOUR VALUES FROM LOW TO HIGH.

STEP 2: LOCATE THE MEDIAN, AND THEN SEPARATE THE VALUES BELOW IT FROM THE
VALUES ABOVE IT.
IN AN ODD-NUMBERED DATA SET, THE MEDIAN IS THE NUMBER IN THE MIDDLE OF THE LIST.
THE MEDIAN ITSELF IS EXCLUDED FROM BOTH HALVES: ONE HALF CONTAINS ALL VALUES
BELOW THE MEDIAN, AND THE OTHER CONTAINS ALL THE VALUES ABOVE IT.
• STEP 3: FIND Q1 AND Q3.
Q1 IS THE MEDIAN OF THE FIRST HALF AND Q3 IS THE MEDIAN OF THE SECOND HALF.
SINCE EACH OF THESE HALVES HAVE AN ODD-NUMBERED SIZE, THERE IS ONLY ONE VALUE IN
THE MIDDLE OF EACH HALF.

STEP 4: CALCULATE THE INTERQUARTILE RANGE.


STEPS FOR THE INCLUSIVE METHOD
• ALMOST ALL OF THE STEPS FOR THE INCLUSIVE AND EXCLUSIVE METHOD ARE
IDENTICAL. THE DIFFERENCE IS IN HOW THE DATA SET IS SEPARATED INTO TWO
HALVES.
• THE INCLUSIVE METHOD IS SOMETIMES PREFERRED FOR ODD-NUMBERED DATA
SETS BECAUSE IT DOESN’T IGNORE THE MEDIAN, A REAL VALUE IN THIS TYPE OF
DATA SET.
• STEP 1: ORDER YOUR VALUES FROM LOW TO HIGH.

• STEP 2: FIND THE MEDIAN.


THE MEDIAN IS THE NUMBER IN THE MIDDLE OF THE DATA SET.
SEPARATE THE LIST INTO TWO HALVES, AND INCLUDE THE MEDIAN IN BOTH HALVES.
THE MEDIAN IS INCLUDED AS THE HIGHEST VALUE IN THE FIRST HALF AND THE LOWEST
VALUE IN THE SECOND HALF.

STEP 3: FIND Q1 AND Q3.


Q1 IS THE MEDIAN OF THE FIRST HALF AND Q3 IS THE MEDIAN OF THE SECOND HALF.
SINCE THE TWO HALVES EACH CONTAIN AN EVEN NUMBER OF VALUES, Q1 AND Q3 ARE
CALCULATED AS THE MEANS OF THE MIDDLE VALUES.
• STEP 4: CALCULATE THE INTERQUARTILE RANGE.

WE CAN SEE FROM THESE EXAMPLES THAT USING THE INCLUSIVE


METHOD GIVES US A SMALLER IQR. WITH THE SAME DATA SET,
THE EXCLUSIVE IQR IS 24, AND THE INCLUSIVE IQR IS 20.
WHEN IS THE INTERQUARTILE RANGE
USEFUL?
• THE INTERQUARTILE RANGE IS AN ESPECIALLY USEFUL MEASURE OF
VARIABILITY FOR SKEWED DISTRIBUTIONS.
• FOR THESE FREQUENCY DISTRIBUTIONS, THE MEDIAN IS THE BEST
MEASURE OF CENTRAL TENDENCY BECAUSE IT’S THE VALUE EXACTLY IN
THE MIDDLE WHEN ALL VALUES ARE ORDERED FROM LOW TO HIGH.
• ALONG WITH THE MEDIAN, THE IQR CAN GIVE YOU AN OVERVIEW OF
WHERE MOST OF YOUR VALUES LIE AND HOW CLUSTERED THEY ARE.
• THE IQR IS ALSO USEFUL FOR DATASETS WITH OUTLIERS. BECAUSE IT’S
BASED ON THE MIDDLE HALF OF THE DISTRIBUTION, IT’S LESS
INFLUENCED BY EXTREME VALUES.
EXERCISE
• DETERMINE THE INTERQUARTILE RANGE OF THE DATE BELOW USING THE
EXCLUSIVE AND INCLUSIVE DATA: NUMBER OF TELEVISION SETS SOLD
MONTHLY BY 11 APPLIANCE STORES.

6, 8, 10, 10, 12, 14, 15, 22, 23, 24, 24


QUIZ

• DETERMINE THE IQR OF THE DATA USING EXCLUSIVE AND INCLUSIVE METHOD:
SCORES OBTAINED IN AN APTITUDE TEST OF 13 APPLICANTS FOR CLERICAL
POSITION IN A LARGE COMPANY.

39 32 43 25 58 65 93 85 75 62 37 86 98

You might also like