MMW Chapter 4

You might also like

Download as pdf
Download as pdf
You are on page 1of 24
Mathematics as a Tool Chapter 4 Data Management Data management is the development, execution and supervision of plans, policies, programs and practices that control, protect, deliver and enhance the value of data and information assets. It is an administrative process by which the required data is acquired, validated, stored, protected, and processed, and by which its accessibility, reliability, and timeliness is ensured to satisfy the needs of the data users. ‘The process is actually what statistics all about. Statistics is a branch of mathematics, dealing with the collection, organization, presentation, analysis and interpretation of data. Statistical treatment of data is essential in order to make use of the data in the right form. Raw data collection is only one aspect of any experiment; the organization of data is equally important so that appropriate conclusions can be drawn. This is what statistical treatment of data is all about. There are many techniques involved in statistics that treat data in the required manner. Statistical treatment of data is essential in all experiments, whether social, scientific or any other form. Statistical treatment of data greatly depends on the kind of experiment and the desired result from the experiment (Kalla, 2009). G@ Objectives [After the students have gone through Chapter IV. Data Management, they should be able to: 1. Use a variety of statistical tools to process and manage numerical data; 2. Use the methods of linear regression and correlations to predict the value of a variable given certain conditions; and 3. Advocate the use of statistical data in making important decisions. A Gathering, Organizing, Representing and ingerpreting P=? ich requir anat investigation should always be b con accurate data a RecN Boog tpanagement. Corect methods of callectng ota, right way Of TEN 3nd B04 resentation wil result to a precise analysis 2@ interpretat planned when the stud sn, ‘data should be carefyy ted for processing. T,’ sa wells former? ased toe ne slotstens faa processing should beplann befor any te ate calles, AB forms sed for recordin designed an tested to ensure tat he ata can east Be general pincine apie to al ee cetan Fe te oeptl not Feresris Reem iaberatory aseayy darn etitclO® oh es, eK hapter: (Chapters. Describing and presenting dato, "-4.) 1. Data Gathering There are aifferent methods used in gathering °F collecting data. These are + Direct or interview method is jon encounter Between the source of information, the interviewee, and the one who gathers information, the interviewer interview can be done personal through phone or interne, access. a person-to-pers one «+ indirect or questionnaire method is the techn)aue in which a questionnaire ic used to ect the information or data needed: ecords of government agency ains data from the r tion and made these avallale ‘+ Registration method obt informa authorized by law to keep such data orl to researchers. je in which data particularly those + Observation method is the techniqu pertaining to the behaviors of individuals oF Bro*k of individuals during the given situation are best obtained through observations. «Experimental method is a system used tO gather data from the results of performed series of experiments on some controlled and experimental Pariables, This is commonly used in scientific inquiries. 2. Data Organization and Presentation ever manner are called raw data. 03 ment used. There are fou" inal, ordinal, interval ant Data collected or obtained from what collected can be classified according to the scale of measuret levels of measurements, from lowest to highest scale: the nomi ratio scales. 84 | mathematics in the Modern World + Nominal scale assigns names or labels to observation in purely arbitrary sequence. The labels are used to classify the respondents or objects without ordering. For instance, if we need to classify the respondents’ preferences on cellphone brands, such as apple, samsung, lg, oneplus, vivo, oppo, etc. we measure on the nominal scale and data gathered is a nominal data. ‘* Ordinal scale assigns numbers or labels to observations with implied ordering, Ranking the respondents preferences means measuring responses in the ‘ordinal scale and the data obtained is called ordinal data. ‘+ Interval scale assigns real numbers to observations to reflect distance between rank position of the respondents or objects in equal units. This scale gives the distance between any two numbers of known sizes, has a zero point and has a unit of measurement. The data collected can be manipulated algebraically by addition or subtraction but not division or multiplication. Examples are expenses, distance, weight, etc. «Ratio scale assigns numbers to observations to reflect the existence of true absolute zero point as its origin. The ratio of two scale point is independent of the unit of measurement. The data collected has all the properties of an interval data and can be manipulated algebraically by multiplication and division. Examples are data on birth rate, unemployment rate, etc Take note that the data set obtained using the interval and ratio scales is called measured data. ‘An effective presentation of data is necessary in any investigation. Data, which have been collected and organized well, but not presented properly and clearly would become of little use. Data presentation should give a clear picture of various relationship among the different data presented. Proper presentation of the gathered data is important in order to decide what tool or tools of analysis should be employed to come up with an intelligent judgement. ‘There are different ways or forms to present data. These are: textual, tobular and graphical. * Textual form makes use of words, sentences and paragraphs in presentation. It is commonly used when there are only few numerical data to be enumerated or to be compared with other data. ‘+ Tabular form is a systematic presentation of data in rows and columns. itis used when related numerical facts need to be classified in arrays. ‘© Graphical form shows numerical values or relationships in a pictorial form. It makes use of graphs, symbols or visual aids. 85 Tabular Presentation ‘should be sim imple. a I should focus the reader's attention onthe 2 'tshould make the meanings and significanc Presented clear. rather than on the form, formation being ‘Statistical Tables should have the following parts 1d note title, and heat at the nature, classification 4, 4 the area to which © Heading which shows the table num! The tite is a brief statement of WE time reference of the information presente statistics refer, The head note is a statement enclo title and the top rule of the table that provi © Box Head is the portion that contains the col the data in each column, © Stub is the first column on the left of the tabl ‘on the given row. mn ° sment inserted at the © Sours note the act cation af te sore of 2 WHC sy include to acknowledge the origin of the data losed in brackets between ty, ides additional information jumn heads, which descrip, le, which describes the dy, 1m of the table. Example: Table 1. Relationship Between Academic Performance andthe Identified Variables, [Variables Correlation Coefficient | Significance | Remarks [GPa 746i 9.000» |. HS MI I 0.4015 0.000 | S$ 1a I 0.9892 0.000 _s Gender 0.1452 0.084 NS 5~Signtieant 1S— highly Sienticant Graphical Presentation Good chart should possess the following properties: * Accurate — the dimensional aspects should reflect the highest degree of accuracy possible within the practical limits imposed by expert draftsman or the electronic computer being used. It should not be deceptive, distorted, 0” misleading or in any way susceptible to wrong interpretation as a result inaccurate or careless construction, 86 | Mathematics in the Modern World + Simple — the basic design should be simple and straight forward and not loaded with irrelevant, of trivial symbols and ornamentation Clear ~ it should be easily read understood. There should be a forceful and unmistakable focus of the message that the graph is trying to communicate and there should be a truthful and unambiguous representation of the facts and that the message it conveys is meaningful ‘© Attractive ~ itis designed and constructed to attract and hold the attention by holding a neat, dignified, and professional appearance. tt should be stylish pifferent types of graphs can be used in data presentation based on purpose. These are: ‘* Line graph is used when (1) data cover a long period of time; (2) several series are compared; (3) movements are to be emphasized; (4) trends are to be established; and (5) estimates are to be forcasted. Example: 2 Number of Graduates 2014 2015 2016 2017 Ye Figure 1. BS Applied Statistics Graduates (2012-2017) «Bar graph is used when numerical values of an item over a period of time are compared, It consists of regular bars where the height of bars represents ‘quantity or frequency for each category. Example: Year oll BE 2012 2018 204 2015 2018 2017 [Number of Graduates Figure 2. BS Applied Statistics Graduates (2012-2017) Pie graph is used to show percentage oF the composition by Parts Of Whoje Example: + Pictograph or pictogram is used to immediately SUE 2012,6 2013,4 Figure 3. BS Applied Statistics Graduates (2012-2017) gest the nature of data, Example: “Growh Paton of Philippine Population: 19602010 Nee tn Sev Ma ely ant etaty asi 88 | Mathematics in the Modern World —— Organizing collected numerical data can be done in two ways 1, Array is an arrangement of the numerical data/values according to order of ‘magnitude either ascending or descending 2, Frequency distribution table is a condensed version of an array. It categorizes the numerical data into intervals or classes. It has the following parts * Classes are mutually exclusive categories defining the lower limit and the upper limit with equal intervals, + Class frequency is the number of observations in each class + Class mark or class midpoint is used in computing the mean and some measures of variability + Cumulative frequency tells the sum of frequencies in a particular class of interest. * Relative frequency tells the percentage of observations in a particular class of interest. Steps in Constructing a Frequency Distribution with Equal Class Size 1. Determine the range R of the numerical data. | Highest value ~ Lowest value} 2. Determine the number of classes K to which the data are to be grouped using the Sturges’ Approximation: 143.322 LogN where N= total number of values to be grouped 3. Determine the class size C R/K 4, Determine the lower limit of the first class. Note: There is no fixed rule in determining the lower limit of the first class For the purpose of uniform result, the lowest value in the data set should be the lower limit of the first class. 5, Construct the class intervals and determine the class frequencies. Remarks: 1, Sturges’ Approximation is just a guide and a flexible rule. 2. The number of classes should be large enough to demonstrate the major characteristics of the data yet not so large as to result in losing 89 a Y ‘where the bj For instance, hes the advantage of summarizing aw 312 FFT ios, constructed yo observed value fails to be included In ‘number of classes should not be increa sed just to accommodate 1, highest value increase the dS S26 seo 20 depending nati 3. The number of classesis usually taken) of the data without using the Sturges’ APP 4. Class intervals are chosen so thatthe class observed data. However, class boun' actually observed data. oximation. Example: Raw scores of 50 students in 200 item test: 144 112 156 122 168 172 141 159 156 145 134 137 123 149 144 160 136 142 138 159 151 147 150 126 152 147 135 132 146 133 150 122 139 149 152 131 155 116 140 145 135 160 125 172 154 139 136 129 163 1, The range R= 172-112= 60 2, K=1+ 3.32 log 50 = 6.643978 =7 3. C= 60/7 =8.571428571 +9 4. The lower limit is 112 5. Construct the Frequency Distribution Table L Class | Class Relative | Classintervals | Frequency | mark | soundary __| Frequency arks coincide with acty;j, idaries should not COINCide yy, “111.5 1205 4 1205-1295 | 14 129.5 1385 | 20 1385-1475 | 24 1475-1565 | 22 [156.5 — 165.5 10 1665-1745 | 6 100 90 | Mathematics in the Modem World Graphical Presentation of Frequency Distributions with Equal Class Size Steps in Constructing Frequency Charts 1. Label either class limits or class marks along the horizontal axis. 2. Plot the frequency of each class along the vertical axis above the class mark of the corresponding class, 3. The vertical scale must always include zero, 4. The horizontal scale must include only the range of the observed data and one extra interval at each end, 5. The vertical axis height should be approximately % the length of the horizontal axis, FREQUENCY HISTOGRAM — a set of vertical bars whose areas are proportional to the frequencies presented. sass 2 3 3 Z oo 49.5 54.5 59.5 64.5 69.5 74.5 79.5 84.5 89.5 94.5 99.5 Grades Note: The length of the bars is equal to the class size and the height is numerically equal to the class frequency. Try this! Make a Frequency Histogram from the constructed FOT (Raw scores of SO Students in 0 200 item test) 91 92 4 along the same Scale 3, against the class mark, FREQUENCY POLYGON ~ is a line chart plots! histogram, The class frequency is plote Try this! Make a Frequency Polygon from the constructed FOT (Raw scores of students in a 200 item test) LESS THAN OGIVE ~ the less than cumulative frequency Is plotted against the upper-class limit GREATER THAN OGIVE ~ the greater than cumulative frequency is plotted against the lower-class limit. 88 Cumulative Frequency eo BSB BB Mathematics in the Modern World 3. Data Analysis and interpretation Data analysis and interpretation is the process of making sense of numerical data that has been collected, analyzed, and presented. A common method of describing the crores of individual objects or group of individuals under study is known as, fescriptive statistics, while the analyzing and int alyzing and interpreting data is known as inferential statistics. . “ens Descriptive statistics give a single value which represents the set of values. There are three methods of describing a set of values: the measures of central tendency, measures of dispersion and measure of skewness and kurtosis, Measures of central tendency refer toa value where the set of values differ from each other; while skewness and kurtosis measures the symmetry and flatness/peakedness of the distribution. Inferential statistics are techniques wherein samples can be used to make generalizations about the populations from which the samples were drawn. Itis important that the sample accurately represents the population, The process of achieving this is called sampling. Inferential statistics arise out of the fact that sampling naturally incurs ‘sampling error and thus a sample is not expected to perfectly represent the population. There are two methods of inferential statistics: the estimation of parameter(s) and the hypothesis testing. B. Measures of Central Tendency Measures of central tendency are measures indicating the center of a set of data which are arranged in order of magnitude. It is described as the point about which the scores tend to cluster, hence, regarded as a sort of average in the series. tis the center of the concentration of the scores. It isa single number which describes the totality of the set of data collected. It refers to the parameters of the sample. There are three measures of central tendency commonly used. These are: the mean, ‘median and mode. 1. Mean orarithmetic mean (or average) is the most popular and well known measure of central tendency. It can be used with both discrete and continuous data (although it is used most often with continuous data). {for Ungrouped data: The mean is the most frequently used measure of central tendency. “The mean is denoted by a symbol 41 (read as “mu") and X (read as “x ~ bar) for the population and sample respectively. ues divi The mean ofa series of values equal tothesum attneset of aiue vided yy, Dumber o * values. Symbolically the mean is Ie Xi 4% +%% i IN - total number of observations panne N N | where 44 - population mean Hl X. -P observed value in the population Ex ty Ht Me were | samole nen = observed value in the sample sum of all values = total number of observations Example 1. The items listed below represent the scores of seven BS Applied Statistics students during the final examination. Compute the mean score, 89, 75, 90,85, 78, 87, and 80. f % -B9-+75+90+85+78+87 +80 SBF 13.43 n 7 7 A‘ Example 2. Suppose BS Applied Math program has 10 students and the height (in cm) ae se a5 follows: 170, 165, 155, 160, 150, 149, 152, 161, 163, 175. Find the mean height of the students, 170+165+155+160+150+149+152+161+163 +175, EX, 1600 we —— = = 160m N 10 94 | _Mathemotics in the Modern World for Grouped data The mean for grouped data is den fe ascaa Ped data is denoted by wg oF XG for population and ERX AX. +X: +HXs 4.0.4 GX N where: 4 -mean X= class mark of ith class {| - frequency of ith class = - Sum of all values N_ - Total number of observations Example: The table below represents the scores of 64 students in 2 long quiz Frequency | [Glass interval | C7 N=fith th +...+h 7410 +13 418484543 64 EGX, = GX) + BX + fiXe+... + AN 7(7) + 10(12) +13(7) + 18(22) + 8(27) + 5(32) +3(37) 1273 DX X14 Xe + HX +. + HX 1273 19.89 N N bee Note: From the Frequency distribution table add another column to represent the product of the ith frequency and ith class mark. Then take the sum under this column. Wei ighted Mean ulation and sample respectively The weighted mean is denoted BY Hy ‘or x, for POF Me = : wit wa Ws to AW where: ie ~ weighted mean X, - ith quantity ‘wy weight of ith class Z - Sum of all values ple: Consider the grade of 2 freshman student during the first semester. fami [Subject Units Grade WK Purposive Comm 3 2.25 6.750 STS, 3 175 5.250 | MMW 3 2.00 6.000 _| Panitikan 3 2.00 6,000 PHED 1 2 1.50 3.000 Military Science 15 1.25 1.875 Total Ewiki= 28.875 .25) rw (155 Properties of the Mean 4. The sum of the deviations of the observations from the mean is zero. The deviation of ith observation from the mean is denoted by d= Kio 96 | Mathematics in the Modern World Given the following observed values 3, 8 and 4, The mean is 5. Bah =die de + dy=-243+(1)=0 2. The sum of the squared deviations of the observations from the mean is minimum. + (8-5)? + (4-5)2= 14 +(4-3)= 26 (3-8)? + (8-8)? + (4-8) = 41 B(Ki- Xs) (3-4)2+ (8-4)? + (4-4)? = 17 Hence, the sum of the squared deviation of the observation from the mean has @ minimum value. 3, The mean reflects the magnitude of every observation, since every observation contributes to the value of the mean. 4, The mean can be easily affected by the presence of an extreme value, hence not a good measure of central tendency when an extreme value do occur. From the previous data 3, 8, 4 and 50. The mean is 16.25 5, The mean of subgroups may be combined when properly weighted, the combined mean is called the weighted mean. 2. Median is the middle score for a set of data arranged in order of magnitude. Median is best used when data has several extreme entries, for Ungrouped dota: The median is defined as the middle value when a set of observed values have been arrangedin either ascending (from lowest to highest value) or descending {from highest to lowest value) order of magnitude. The median is the centermost array into two equal parts, that is 50% of the total number of observation is ess than the median value while the other 50% is greater than the median value. The median is denoted by Md, smbolicalya given set of data is denoted by X, Xp...» Xn the array is denoted by (4) X(a}t Xin The median is (Xone WEN is odd maz 4 Xia Xe Nis even i Example 3. Result of the survey of the color of cars owned by faculty shows that 40 were White, 20 blue and 10 were red. The modal color of cars owned by faculty is white. for Grouped data: The mode from grouped data can be approximated using the formula fimo - fb Moc= Lmo+ © (- 2fime fa fy where Lye - Lower CB of the modal class C-Class size fi, ~ frequency before the modal class f, ~ frequency after the modal class ‘Note: Modal class is the class with the highest frequency Example: The table below represents the scores of 64 students in a long quiz. [class | Frequency | Class Class Boundary Interval Mark 5-9 7 7 45-95 io-14 | 10 2 95-145 15-19 13 a7 145-195) 20-24 | 18 2 195-245 25-29 | 8 27 245-295 30-34 5 32 295-345 35-39 3 37 345-395) ~ Total 64 ‘The modal class is the interval 20 - 24. lw =195¢=5 fo=13 fine =18 8 18-13 Mog=19.5+5 |~ =F 21.166666 = 21.17 (18)-13-8 Properties of Mode 1. May not be the center of the data, 2, Itdoes not make use of all observations. 3. It’s difficult to manipulate algebraically. 4, It’s ideal for qualitative type of data. 100| Mathematics in the Modern World C. Measures of Dispersion Measures of dispersion identify how a set of values spreads or fluctuates. The measures of dispersion are the range, the mean absolute deviation or variance, the ‘standard deviation, the coefficient of variation, the coefficient of skewness and the boxplot. 4. Range isthe simplest measure of dispersion itis the difference between the highest and lowest score. It actually does not reflect the variations in the data that lie in between the highest and the lowest scores; therefore, itis not considered to be @ valid measure of variability and spread ability, for ungrouped data: The range of a set of data i the absolute ifference between the Frghest and the lowest value in the set, The range is denoted by R R=|HV-LVI where R - Range HV ~ Highest value LV - Lowest value «listed below represent the scores of seven BS Applied Statistics Example 1. The item: luring the final examination, Compute the range 89, 75,90, 85, 78 students 87, and 80. The Range, R = |HV-W]= 190-75 |= 35 xample 2. Suppose BS Applied Math program has 10students and the height (in cm) are setolions: 170, 165, 155, 160, 150, 149, 152, 161, 163,175. Find the range of height of the BSAM students. The Range R = |HV-WV|= |175-149|= 26 for Grouped data: The Range for grouped data is denoted by Rg Re = |ULHC-LLLC| where R - Range ULHC - Upper Limit of the Highest Class LLLC ~ Lower Limit of the Lower Class | 102 Example 3. The table below represents the scores of 64 students in a long quiz, Re =| ULHC-LLLc| =/39-5| Properties of the Range 1. _Itis quick but rough measure of dispersion. 2. The larger the value of the range, the more dispersed are the observations, 3. Itconsiders only the lowest and the highest value in the data set. 2, Mean absolute deviation, also known as variance, is the simplest method of taking into account the variations or the spread ability of all items into a series from the point of central tendency. The variance considers the position of each observation relative to the mean. The variance of a given data set is the average of the sum of the square deviation of the observation from the mean. The variance from the population is denoted by o? (read as “sigma square”) and s? (read as “s-square”) for the sample. ‘for Ungrouped dato: Given the set of values X, X:,X, __Xy The deviation of # ‘observation from the mean is Xi~ yt. The population variance, o?, is EQ) OH) + Oe)? + OH)? + Ree os — —— N =x — =p? N ot 102 Mathematics in the Modern World ‘The following data represent the score of 7 BS Applied Statistics in a quiz: Example 4 X12, X27, Xo=8, Xe , Kee9, Xo. Compute the population variance. 4474842424943 7 Sp)? Kr 5)? + (Ko 5)? + Hae SPF #---# Br 5)? eed age + (3! + CBP + #4 C2 ——— = 7.42857 = 7.43 Using the computational formula SRP 42472 48422422 +94 74285714 = 7.43 — -5 N 7 | Note: Using the definitional or computational formula the population variance 's the same, ut the computational is faster and easier to apply than the definitional formula, ‘The sample variance is DOA)? (=H)? + O-X)? + OG“? +. s n-1 net ‘The computational formula of the variance is n DXi? (%)? | 103 Example 5: Given a random sample of size, n=10. X,=8, X22, X,=2 (£8, X29, KD X25, X27 ‘Compute the sample variance. Ae74B+24240+94245+7 54 3x ——=——=54 10 Using the definitional formula T06-X)? (4-4.9)2 + (7-49)? + (8-49)? + 9 +72 = 305 Lotmy? 42 + 784 BF. EX)? 10(360) - (54)? —=76 ste 10(10- 1) The variance from the grouped data can be obtained using the formula, SAK? Xi? + GAM? + AX? + AME so pet a= —— - he nDAxe ~ (EAX)? n(n-1) where f= the frequency of the ith class X; ~ the class mark of the ith class his ~ the mean from the grouped data 4404] mathematics in the Modern World ; —— pre Example 6: The table below represent the scorer of 64 students in a long quiz class | Ve F lnersat | Frequency | Class Mark | 6X $x? 5-9 10-14 29311 12732 —< 2.347412 = 62.35 64 64 nEAX?- (LAX)? 64(29311) - (1273) 7 — = 63.3370536 = 63.34 n(o-t) 64(64-1) Properties of the Variance 1. The variance is always nonnegative; 2. The larger the value of the vari observations; 3, The variance can be easily manipulated; 4. Each observation contributes to the magnitude of the variance; $. The unit of measure ofthe variance is the square of the unit of measure the original data set. iance the more dispersed are the 4, Standard deviation is based on the deviations of all the scoresina series. is uted from the mean, The standard deviation is defined as the variance. Hence the variance is denoted by the lard deviation and s for the sample standard always com positive square root of the for the population stand: deviation. oa Vo 0 = oe s-¥e S =VSe Example 7. Using the data in example 4. Compute the population standard deviation, From example 4, the population variance was 7.43, then the population standarg is r= 2.7258. Example &. Using the data in example 5. compute the sample standard deviation. From example 5, the sample variance was 7.21, then the sample standard deviation is = 2.68514, Example 9. Using the data in example 6. Compute the population and sample standard deviation From example 6, the population variance was 63.34, then the sample standarg deviation is Ss = 7.958 Properties of Standard Deviation The properties of standard deviation have the same properties with the variance except property 5. The unit of measure of the standard deviation is the same as the unit of measure of the raw data. 4. Coefficient of variation, also known as relative dispersion, isthe ratio of the standard deviation and the mean and is usually expressed in percent; Le. CVs j x 100 or v= = 100 The coefficient of variation is a unitless measure of dispersion, hence it can be used to compare variability of two or more groups of data measured in the same or different units. 106 | Mathematics in the Modern World | 5, Skewness is 2 measure of a criterion on how asymmetric the distribution of data is from the mean. Positive skewness indicates a distribution with an asymmetric tail extending toward the right side of the distribution while negative skewness indicates a distribution with an asymmetric tail extending toward the left. Sd U metric about the mean if its graph can be \d the two sides coincide. Analytically if the the data is symmetric about the Positive Skew Negative Skew The distribution of data is said to syn folded along a vertical axis about the mean an coefficient of skewness is zero, then the distribution of mean. Using Measures of Central Tendency «if Mean = Median = Mode, the skewness is zero. (Symmetrical) '¢ If Mean > Median > Mode, the skewness is positive. «if Mean < Median < Mode, the skewness is negative symmetric (Not Skewed) skewed

You might also like