Professional Documents
Culture Documents
Chapter One - 231003 - 141351
Chapter One - 231003 - 141351
Data Analysis-
Descriptive statistics
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Calculation and Probability Academic Year 2023-2024 1
Descriptive statistics: Understanding the basics 1.1 Types of Data in Descriptive Statistics
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Caluclation and Probability Academic Year 2023-2024 2
1.1. type of dataset
Data
Type
Categorical or Numerical or
qualitative Data Quantitative Data
Nominal Data
Discrete Data
Ordinal Data
Continuous Data
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Caluclation and Probability Academic Year 2023-2024 3
1.1.1 Types of Categorical Data
b. Ordinal Data: have categories in which only the ordering counts. The difference between the
values in order does not matter. Examples:
• Military ranks (Private; Corporal; Sergeant; Lieutenant; Captain; Colonel)
• Socio-economic status (poor, middle class, rich),
d. Continous Data: The data is said to be continuous if the measurements can take any
value usually within some range. Example:
• height,
• weight,
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Caluclation and Probability Academic Year 2023-2024 4
1.2. Frequency distribution Table
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Caluclation and Probability Academic Year 2023-2024 5
1.2.1 Frequency Distribution Table
pH [6.0 6.5[ [6.5 7.0[ [7.0 7.5[ [7.5 8.0[ [8.0 8.5[ [8.5 9.0[ [9.0 9.5[ Total
Frequency 5 12 18 25 20 8 2 90
Date 1 2- 3- 4- 5- 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20-
Sept sept Sept Sept Sept Sept
DOC 7.8 7.2 6.5 7.1 8.2 7.3 6.9 7.8 7.6 6.8 6.9 7.5 7.0 6.7 7.4 8.0 6.8 7.2 7.3 7.6
mg/L
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Caluclation and Probability Academic Year 2023-2024 6
1.2.1 Frequency Distribution Table
Example 1.3:
To understand the distribution of electricity consumption, the electricity
usage of a sample consisting of 1000 households is analyzed. The data is
presented in the following table:
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Caluclation and Probability Academic Year 2023-2024 7
Example 1.4: Frequency distribution
of energy consumption in the United
States across different sectors: Example 1.5: A survey of 2,500
families focuses on the number
Sector Frequency of children per family in a city.
Residential 21%
Commercial 18% Number of Kids Frequency
Transportation 28% 0 8% Example 1.6: Water quality
Industrial 22% 1 12% of 50 water samples
Electric Power 11% 2 25%
3 30% Water Clarity Frequency
Example 1.7: Energy sources used in 4 15% Clear 12
200 households in a particular cite 5 6%
Slightly Cloudy 20
6 1.5%
Energy Source Frequency
7 2% Moderately 10
Electricity 100 Cloudy
8 0
Natural Gas 60
9 0.5% Turbid 8
Propane 20
Heating Oil 10
Renewable Energy 10
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Caluclation and Probability Academic Year 2023-2024 8
Example 1.8 1.2.2.How to construct a frequency table for a series of
observations?
Yr Q Yr Q Yr Q Yr Q Yr Q Measurements of the annual flow of the
River Nile at Aswan, in 10^8 m^3
1871 1120 1891 1100 1911 831 1931 781 1951 744
(1871-1970).
1872 1160 1892 1210 1912 726 1932 865 1952 749
1873 963 1893 1150 1913 456 1933 845 1953 838
1874 1210 1894 1250 1914 824 1934 944 1954 1050
1875 1160 1895 1260 1915 702 1935 984 1955 918
1876 1160 1896 1220 1916 1120 1936 897 1956 986
1877 813 1897 1030 1917 1100 1937 822 1957 797
1878 1230 1898 1100 1918 832 1938 1010 1958 923
Example 1.9 How to construct
1879 1370 1899 774 1919 764 1939 771 1959 975
Frequency distribution of dissolved
1880 1140 1900 840 1920 821 1940 676 1960 815 oxygen concentrations in a pond
1881 995 1901 874 1921 768 1941 649 1961 1020 ecosystem based on the following
1882 935 1902 694 1922 845 1942 846 1962 906 chronological measurements:
1883 1110 1903 940 1923 864 1943 812 1963 901
1884 994 1904 833 1924 862 1944 742 1964 1170 7.8, 7.2, 6.5, 7.1, 8.2, 7.3, 6.9, 7.8,
1885 1020 1905 701 1925 698 1945 801 1965 912 7.6, 6.8, 6.9, 7.5, 7.0, 6.7, 7.4, 8.0,
1886 960 1906 916 1926 845 1946 1040 1966 746 6.8, 7.2, 7.3, 7.6.
1887 1180 1907 692 1927 744 1947 860 1967 919
1888 799 1908 1020 1928 796 1948 874 1968 718
1889 958 1909 1050 1929 1040 1949 848 1969 714
1890 1140 1910 969 1930 759 1950 890 1970 740
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Caluclation and Probability Academic Year 2023-2024 9
The following table shows Nile River Flow measurements sorted in ascending way
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Caluclation and Probability Academic Year 2023-2024 10
1.2.2.1 Guideline for the construction of frequency table
3. Choose the number of intervals. Intervals should be non-overlapping and of equal length. (The
objective is to use an adequate number of classes to display the data's variation, while avoiding
having too few data points in numerous classes).
5. The first interval should begin a Little below the minimum value, and the last intervalle should end a
Little above the maximum value.
6. The intervals are called class intervals and the bounderies are called class bouderies.
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Caluclation and Probability Academic Year 2023-2024 11
1.2.2.2 Some rules commonly used to determine the number of classes
• Freedman’s Rule : Bin Width = 2 * IQR * n^(-1/3) Number of Classes = (max - min) / Bin Width
Where:
• K: Number of classes
• n: The total number of data points in the dataset.
• σ (Standard Deviation
• Bin Width: The width of each class (bin).
• IQR (Interquartile Range): The range between the 75th percentile and the 25th percentile of the dataset.
• max: The maximum value in the dataset.
• min: The minimum value in the dataset.
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Caluclation and Probability Academic Year 2023-2024 12
1.2.2.3 Creating a frequency table for the data in the example 1.8
Example 1.8: Weight of 50 students in kilogram. 1st : we sort the dataset in ascending way
85 67 85 107 59 76 82 79 95 74
59 65 78 114 63 56 115 96 99 80 Value Rank Value Rank Value Rank
95 55 75 65 63 66 105 65 66 54 50 1 66 18 85 35
57 87 110 50 58 89 64 61 64 77
54 2 66 19 85 36
74 84 105 82 68 85 66 68 73 105
55 3 67 20 85 37
2nd : We determine the number of classes (here we choose to 56 4 68 21 87 38
appley Sturge’s rule): 57 5 68 22 89 39
𝐿𝑁(50) 58 6 73 23 95 40
𝐾 = 1 + 𝐿𝑜𝑔2 50 = 1 + =6.6
𝐿𝑁(2) 59 7 74 24 95 41
We choose to take the number of classes as k=7. 59 8 74 25 96 42
61 9 75 26 99 43
3rd : Calculating the range of the dataset: 63 10 76 27 105 44
63 11 77 28 105 45
𝑅𝑎𝑛𝑔𝑒 = 𝑀𝑎𝑥 = 𝑀𝐼𝑛 = 115 − 50 = 65
64 12 78 29 105 46
64 13 79 30 107 47
4th: We approximate the width of the classes: 65 14 80 31 110 48
𝑅𝑎𝑛𝑔𝑒 65 65 15 82 32 114 49
= =9.285 65 16 82 33 115 50
𝑘 7
We estimate the width of the classes at L=9,3 66 17 84 34
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Caluclation and Probability Academic Year 2023-2024 13
5th : We identify the class boundaries: 𝑎0 , 𝑎1 , … , 𝑎6 , 𝑎7 Value Rank Value Rank Value Rank
52 1 66 18 85 35
𝑎0 = 50; 𝑎1 = 50 + 9.3 = 59.3 ; 𝑎3 = 59.3 + 9.3 ; … ; 54 2 66 19 85 36
The last boundary will be: 𝑎7 = 115.1 55 3 67 20 85 37
56 4 68 21 87 38
6th: We complete the frequency table: 57 5 68 22 89 39
58 6 73 23 95 40
59 7 74 24 95 41
59 8 74 25 96 42
61 9 75 26 99 43
As commonly agreed upon in statistical studies, the 63 10 76 27 105 44
lower bound of a class interval is included, while the
63 11 77 28 105 45
upper bound is excluded."
64 12 78 29 105 46
64 13 79 30 107 47
65 14 80 31 110 48
65 15 82 32 114 49
65 16 82 33 115 50
66 17 84 34
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Caluclation and Probability Academic Year 2023-2024 14
1.3. Cumulative frequencies
1.3.1. Increasing & Decreasing Cumulative Frequencies
Cumulative frequencies are associated with the class boundaries.
Increasing Cumulative Frequencies corresponding to the class limit 𝒂𝒊 is the number of measurements strictly less than that
boundary.
Conversely, Decreasing cumulative frequencies corresponding to the class limit 𝒂𝒊 is the number of measurements greater
or equal to than that boundary.
Weight 50 59.3 68.6 77.9 87.2 96.5 105.8 115.1
Frequency 8 14 6 10 4 4 4
ICF 0 8 22 28 38 42 46 50
DCF 50 42 28 22 12 8 4 0
1.3.2. Relative Increasing & Decreasing Cumulative Frequencies (frequencies are in percentages %)
Weight 50 59.3 68.6 77.9 87.2 96.5 105.8 115.1
Frequenc 8 14 6 10 4 4 4
y
RICF 0% 16% 44% 56% 76% 84% 92% 100%
RDCF 100% 84% 56% 44% 24% 16% 8% 0%
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Caluclation and Probability Academic Year 2023-2024 15
1.3. GRAPHICAL DISPLAY
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Caluclation and Probability Academic Year 2023-2024 16
1.3.1 Time series plot or chronological plot
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Caluclation and Probability Academic Year 2023-2024 17
1.3.2 Time series plot: anomalies display
400
200
-200
-400
-600
1860 1880 1900 1920 1940 1960 1980
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Caluclation and Probability Academic Year 2023-2024 18
1.3.3 Bar plot or Bar chart 350
300
250
Example 1.11: Two series of data represent
Frequency
the numbers of boys and girls that have a A graph representing the 200
smartphone at the Secondary School from numbers of girls with 150
2012 to 2019. The blue bar represents the smartphones.
number of boys, and the pink bar 100
represents the number of girls. 50
Frequency
A graph representing the 200
2016 305 280 numbers of boys with
smartphones. 150
2017 310 315 100
2018 315 305 50
2019 315 320 0
2012 2013 2014 2015 2016 2017 2018 2019
Total 2065 2000
Number of girls
350 18%
300 16%
250 14%
Frequency
12%
200
10%
150 8%
100 6%
4%
50
2%
0 0%
2012 2013 2014 2015 2016 2017 2018 2019 2012 2013 2014 2015 2016 2017 2018 2019
Number of boys
350 18%
16%
300
14%
250 12%
Frequency
200 10%
150 8%
6%
100
4%
50 2%
0 0%
2012 2013 2014 2015 2016 2017 2018 2019 2012 2013 2014 2015 2016 2017 2018 2019
1.3.3.1 Generating a bar plot that displays several variables
A graph representing the numbers of boys and girls with smartphones in both
series.
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Caluclation and Probability Academic Year 2023-2024 21
Number of boys Number of girls
18%
16%
14%
12%
Frequency
10%
8%
6%
4%
2%
0%
2012 2013 2014 2015 2016 2017 2018 2019
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Caluclation and Probability Academic Year 2023-2024 22
1.4 Histogram Example 1.9: data refer to a certain type of chemical impurity measured in
parts per million in 25 drinking-water samples randomly collected from
different areas of county.
Histogram of Data Impurity Using Frequencies Histogram of Data Impurity Using Relative Frequencies
Frequency
4 16%
4
15%
3 12%
3
10%
2
5%
1
0 0%
10.8-15.7 15.7-20.6 20.6-25.5 25.5-30.4 30.4-35.3 10.8-15.7 15.7-20.6 20.6-25.5 25.5-30.4 30.4-35.3
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Calculation and Probability Academic Year 2023-2024 23
1.5 Cumulative Function Creating a cumulative frequency polygon for the data in
example 1.8
1.5.1 CUMULATIVE FREQUENCY POLYGONE
Weight 50 59.3 68.6 77.9 87.2 96.5 105.8 115.1
Frequency 8 14 6 10 4 4 4
ICF 0 8 22 28 38 42 46 50
RICF 0% 16% 44% 56% 76% 84% 92% 100%
Frequency
Frequency
30 60%
25 50%
20 40%
15 30%
10 20%
5 10%
0 0%
40 50 60 70 80 90 100 110 120 40 50 60 70 80 90 100 110 120
Weight in kg Weight in kg
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Calculation and Probability Academic Year 2023-2024 24
Weight 50 59.3 68.6 77.9 87.2 96.5 105.8 115.1
Freq. 8 14 6 10 4 4 4
ICF 0 8 22 28 38 42 46 50
DCF 50 42 28 22 12 8 4 0
DCF 0% 16% 44% 56% 76% 84% 92% 100%
RDCF 100% 84% 56% 44% 24% 16% 8% 0%
Increasing & Decreasing Cumulative Frequency Increasing & Decreasing Relative Cumulative
Polygon Frequency Polygon
50 100%
40 80%
Frequency
Frequency
30 60%
20 40%
10 20%
0 0%
40 50 60 70 80 90 100 110 120 40 50 60 70 80 90 100 110 120
Weight in kg Weight in kg
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Calculation and Probability Academic Year 2023-2024 25
1.5 Cumulative Function The cumulative function is an extension (red line) of the
cumulative relative frequency polygon, starting at 0% for
1.5.2 CUMULATIVE FUNCTION the lowest boundary and reaching 100% for the highest
boundary
Cumulative Function
100%
80%
Frequency
60%
40%
20%
0%
40 50 60 70 80 90 100 110 120 130
Weight in kg
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Calculation and Probability Academic Year 2023-2024 26
EXERCISES
Test.: Complete with the correct mention identifying the data type
Variable Type • Scores and Marks (Ex: 59, 80, 60, etc.)
• Room Temperature continuouse • Marital status (Single, Widowed, Married)
• Time • What language do you speak
• The total number of players who • Colour of hair (Blonde, red, Brown, Black,
participated in a competition etc.)
• Opinion on something (agree, disagree, or • Gender (Male, Female)
neutral) • Wi-Fi Frequency
• Nationality (Indian, German, American) • Ranking of people in a competition (First,
• “Time-taken” to finish the work Second, Third, etc.)
• Education Level (Higher, Secondary, • Letter grades in the exam (A, B, C, D, etc.)
Primary) • Cost of a cell phone
• Speed of a vehicle • Eye Color (Black, Brown, etc.)
• Economic Status (High, Medium, and Low) • Total numbers of students present in a class
• Market share price • Weight of object
• Favorite holiday destination • Height of a person
• Numbers of employees in a company
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Caluclation and Probability Academic Year 2023-2024 27
Exercises Chapter 1
Exercise 1.1
a) Create a frequency table for the data in example 1.2. Include two rows for increasing and decreasing cumulative frequencies.
b) Generate a frequency histogram.
c) Plot the polygon of increasing cumulative frequencies and the polygon of decreasing cumulative frequencies.
d) Deduce the coordinates of the intersection point of the two polygons and provide a commentary on the graph.
Exercise 1.2 For the Nile River Flow dataset, create a frequency table and draw the histogram
Exercise 1.3. Plot the chronological evolution of the DOC from example 1.2.
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Caluclation and Probability Academic Year 2023-2024 28
Exercise 1.6. The following data refer to a certain type of chemical impurity measured in parts per million in 25 drinking-water
samples randomly collected from different areas of county 11 19 24 12 20 29 15 31 21
What type of data is it? 24 31 16 23 26 26 32 25 17
Make a frequency table displaying class intervals, frequency,
22 26 35 18 24 18 27
relative frequency, and percentages
Exercise 1.7 For the data in Example 1.8: weight of 50 students in kilogram.;
1. Draw the cumulative function,
2. Deduce the median and the interquartile,
3. Determine the relative frequency of the students having a weight below 50 kg.
4. Determine the percentage of the students having a weight greater than 50 kg
5. Determine the percentage of the students having a weight equal 50 kg
6. Determine the percentage of the students having a weight between 50 and 75 kg.
7. Determine percentage of the students having a weight greater than 100 kg.
Pr. Abdesselam Megnounif Course: ITS1.3- Statistics Caluclation and Probability Academic Year 2023-2024 29