Hypothesis Testing 1

You might also like

Download as pdf
Download as pdf
You are on page 1of 15
INFERENTIAL STATISTICS rT (Hypothesis testingts the process used in a statistical experiment wherein two conflicting alternatives are presented as hypotheses concerning certain parameters of the population under study. Simply put, itis 2 ‘method of deciding which of the two contradictory hypotheses is the correct one. One of the hypotheses is to \be accepted due to inferences made from sample data. Applications of probability theory separate the set of all possible outcomes for the experiment into two mutually exclusive sets, one set supporting the validity of ‘one hypothesis, the other set supporting the validity ofthe other. Of the mutually exclusive hypotheses formulated during hypothesis testing, one is known as the mull hypothesis, denoted as Ha; the other one is called the alternative hypothesis, denoted as Hi. Oftentimes, the null hypothesis represents the current line of thought concerning population parameters, prior to any application of inferential statistics, while the alternative hypothesis is accepted only after the validity of the null hypothesis s statistically inferred to be incorrect. [HYPOTHESIS! an educated gues. Itisa formulated statement whichis yet tobe proven. abel aN. Forutafin 2Kinds: prea [Nall Hypotiesis'(H.))- expresses the idea of nonsigniicance of a difference or relationship. It denotes ‘neutrality and objectivity. [Alternative Hypothesis (Hi) - also called predictive hypothesis. Itis the opposite of the Null Hypothesis Examples: Topic/Title: A Comparative Study on the Consumer's Acceptance of A and B Dishwashing Liquids Ha Theconsumer’s acceptance of A and B dishwashing liquids are the same. = Hy The consumer's acceptance of A and B dishwashing liquids are not the same! =! 2 4eauidl" ] ‘Tre consumers acceptance of A dishwashing ids higher than that of Bey wee ‘Topic/Title: _ A Study on the Relationship of Study Hat the UST College of Education. He There is no significant relationship between study habits and math performance among students in the College of Education. = Hz There is a significant relationship between study habits and math performance among students in the College of Education. and Math Performance Among Students of Topic/Title; A comparative study on the performance of Secondary Education students in their in- ‘campus and off-campus practicum. Ha There is no significant difference in the performance of Nutrition students in their hospital and community practicum. Hy: There is a significant difference in the performance of Nutrition students in their hospital and community practicum 2 WANED ‘Write the null and alternative hypothesis for the following topics/titles: 1. A comparative study on the performance of Education High School students in their Araling, Panlipunan and Filipino subjects. 2. Astudy on the effectiveness of Brand X vitamin in improving the energy levels among teenagers. 3. Student satisfaction of students in the different programs under the College of Education Pagel? Scanned with CamScanner 2 Types oferror: cepein iene Ho 1. | ‘Type error erroi)> rejecting the null hypothesis when itis supposed to be true. 2. ‘Typeerror (f error) ~ accepting the null hypothesis when itis supposed to be false. accept bem AIM the «level of significance. Its the maximum probability of committing an error. ‘At 0.05 ~ significant, 0.03 ~ highly significant Type IIT error: Asking the wrong question’ “Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.” John Tukey ‘Type IV error: Asking a question not worth answering (ONE AND TWO-TAILED TESTS 1) [One-tatled> rejection region i found only atone ofthe tals ofthe distribution, Critical value at 0.05 level of significance: 1.645 or -1.645; at 0.01: 2.33 or -2.33, ‘neeepeance’ 053A retion 16S as BS og <— Critical values 2) [-iwe-tailed2 non-directional. Z-score is located at both sides of the mean. Critical value at 0.05 level of significance: 1.96 and ~1.96; at 0.01: 2.575 or -2575, A= 0.05 2s = 025 4 no 1.96 196 &—— Criticalvalues CRITICAL/ TABULAR VALUES: jor - Pio - ‘Alpha Onetailed Tworailed 0.05 £1,645 31.96 0.01 2233 32575 Note: IC Hy is a statement of non-equality (¢), then hypothesis is non-directional. If'it makes use of an order relation (> or <), then itis one-tailed. STEPS IN HYPOTHESIS TESTING ‘A. CRITICAL VALUE APPROACH 1, Formulate the Null Hypothesis (H.). State the alternative hypothesis (H1). 2. Sethe level of significance. Determine the test to be used. Determine the tabular value for the test. 3. Make the necessary computations as needed. 4. Compare the computed value with its corresponding tabular value. Reject H. if absolute computed value 2 absolute tabular value. Otherwise, accept He. Interpret Paget Scanned with CamScanner B. —_Pevalue Approach ae Formulate your hypotheses. 2 Setthelevel of significance. 3 Determine the testto be used. 4 Findthe p-value. 5. Giveyour decision and interpret. DECISION RULES ‘In hypothesis testing. a statistic of the sample data is computed, and Its value determines the acceptance oF ‘election of the null hypothesis. This function of the sample data is known as the test statistic. Specifically, the null Ihypothests Is accepted ifthe value of the test statistic falls within an interval of real numbers determined through the application of probabllty theory, and Is rejected otherwise. ‘The set of values forthe test statistic that result in the ‘acceptance ofthe null hypothesis is called the acceptance region; the set of values that support the rejection of the null ‘hypothesis is called the critical region. “The acceptance region and the critical region are mutually exclusive sets. They are also complements of each ‘other with respect to the set of possible values forthe test statistic, Le. the union ofthese two sets isthe set ofall possible test statistic values, and their Intersection I the mul et For most sample tests using hypothesis testing. the maximum value of the Type I error can be designated at the beginning of the experiment. This value, denoted by a, is called the level of significance ofthe sample test. The value of ‘will generally have a maximum value of 0.10. ‘The level of significance of the test Is always associated with the critical region or rejection of the test. A (1-a) value represents the level of confidence. The level of confidence is always ‘associated with the acceptance region of the test. The value that divides the distribution of the test into the rejection and ‘the acceptance region i called the critical value ofthe test. Pagel Scanned with CamScanner Z-TEST What is a z-test and when is it used? ‘A testis teststatistc and is manly used when thepopulation sands deviation gee deviation and nis number of items Error in statistics is not a mistake but a deviation or difference from the true or actual value. Its avariation resulting form chance sampling, The variability of a sampling distribution of means is ‘measured by the standard error of the mean. on" Xin : Ifa population distribution has a mean of p and a standard deviation o, then the distribution ‘of random sample means drawn from this population approaches a normal distribution with a mean of wand a standard deviation of Un as the sample size n increases. Consider this example: Po re a e a Saye ct o oe It is claimed that the average IQ for children in a certain region is 100 with a standard deviation of 15. At 0.01 level of significance, would you agree with this claim ifa sample of 2500 children showed that the average 1Q is 101? hag ee Wo Rew Mae Ors ns 907 ator does not appear tobe Sn 46 ees Scanned with CamScanner t ' ig 0 and nie the number of tems inthe frst and second sample Ifthe sample standard deviations are known: Consider this example: s nig from one class in Trigonometry got an faverage grade OF BS) while a group of/45|freshmeh ‘belonging to another class got an Sverage grade of 82,/The total population of students in that college taking up the subject has a Standard deviation of AS) Is there a significant difference ‘between the two samples at 05 level ofsignificance? Step 1: Formulate your hypotheses. th Mi = Ma We =H ¥: He wens VM BM PT Step 2: reeks fstgnicance O Levor a gated Determine the testo be used. - pt meen Teststatistic: Z-test le a a ave ti ck eved Law Determine the tabular value for the test. This is your critical value. cv: #196 citicgh Yuet Step 3: Perform the necessary calculations. ee eee Computation: 9 Z = HX = Seen Lea Ofte OO gest ale atic eoteu laiedfumrareel value Step 4: Compare the computed and critical values. Give your decision and interpretation. Decisions Since 1.57<1.96, we accept Hy. There is no significant difference between the two sample deceot 1 ae Accept Ho rgadticont ditt et. ang ome grec of the 2 avec no signee ove greet ee lev EE Eon ee er pant agroup possesing he gv characters. Test ‘on hypothesis on proportion can be about any value of a population proportion or between two proportions. Formula: tere Pr is proportion of the first sample, Pr of the second sample, p.=is the combined estimate of the common population proportion and p= Is the number of respondents in the Jirst sample possessing the given characteristic and x, for the second sample. are = cud rotie - ComPariinr Goma ud A a? Paget rot = diff wit tion pep prtalin - umowiiar of 9 wrt Scanned with CamScanner av nse opie Mh “be A survey was conducted on male and female shoppers and it shows that 87 out of the 200 males and 96 out of, 300 females buy only name-brand grocery products. Does this suggest that the proportion of male shoppers who buy name-brand items differ from the proportion of female shoppers who also buy branded Items? AAO Step I: Formulate your hypotheses. : ete AHL vn prope Ben of femal af welt He PePy wee no $i We PrePe : oxen ef buy bette tog eT RE aa Step 2: ‘Set the level of significance. owe a 005 aaalud 4 a4 4 408 of gap, Determine the test to be used. i Teststatistic: | Z:test Pm a 13 Determine the tabular value for the test. This is youreritical value. == Pp_ Qe Pee Cy; 31.96 * Fre sup Step 3: Do the necessary computations. Use the formula, e 1 -%) Computation: 2 === < 0 Soe =261 rele X2+2) fo.37¢1-037)(35 +345) Step 4: Compare the computed and critical values. Give your decision and interpretation, Decision: Since 2.61>1.96, we reject H.. There is significant difference between male ‘and female shoppers when buying name-brand grocery items. wu Si Reed He the we BE Ad ved she pop of Pagel Scanned with CamScanner i : i : : {TEST + Whatis a t-test and when is it used? “The test was first solved by WS. Gossett and ater modified by Fisher) It is called thé Stier jethod ort-distibution. Iti used when the Sample standard deviation sigiven and n<30: The ‘degree of freedom is computed. ov sie y n 2.262( reject He) The average distance traveled by the truck ‘using a gallon of that gasoline is not 12: miles. ete sie be Day sec ger: (0B) "comparison between two sam sample macans;ndependcay J oe Formula: “ivhere %;, s the mean of the frst sample! X, of the second sample, s; s standard deviation of first sample, sz is standard deviation. ofsecond sample a are ‘the number of items in the first and second sample respectively. Scanned with CamScanner Consider this example. {ss part of the training program, some trainees are trained by method A which is straight instruction wrile ‘some are taught by method B. Random samples of size 10 are taken from large groups of trainees taught by each of these two methods, The following are their scores on an appropriate achievement test: Bis rpmior ome = Note SEERA TST Se ea Step 1: Formate your ype : Be wckeve med “UE of the atin neue or Ma = Mp j & 7 < wear reed a ect melt i weg dnc he : Step 2: Set the level of significance and degree of freedom. ture e005 Ja] oy — Hath 95-99 16 — HENCE, tet tnileed 7 ceop, non GE 1OO RIE an ae out Determine the tststatiticto be used. fre Seid Teststatistie; | Ptest Waa aT (me zs jot Determine the tabular value for the test. This is your critical value. CV: = 1.734 cowl be nenyative VLE IPED CrHo Step 3: Perform the necessary calculations. frst Computation: Faq 1.334 ae Comparethe computed and cecal values Gveyourdecionandinepretton, Pgh 3? Decision: Since |-1.99]>1,734, reject He. Method 8 issuperior to method A. 3 a gf a! he oe adiertnaced cores hF.- sanifead ly Wgher tho... competes sion’ a Reliod & where D is the mean difference of the pairs, a=2504 ) ond Consider this example. A random sample of 10 female adults were taken to test the effectiveness of a weight-reducing pill, The ‘weights were taken before and after 2 weeks the pill had been taken. The results are as follows: foe 8 6 S&S 6 7 sees 10) Wtbefore «156. 160 170 140 «148 «154 «150 160138150, or After 152 185164 «138144150 148-58 134/140 ~ ‘ 5 A A Sent Pail dy : { Henny “e Scanned with CamScanner Teapede A pord sap weer Eee mT cc. oy a ear Solution: se oa Step ts Formulate your hypotheses. Ns wae He woo Step 2: ‘Set the level of significance and degree of freedom. © 005 gained att oer df 10-1=9 apd ad Af = wrl=a Determine the test-statistic to be used. Test statistic: ttest, Determine the tabular value forthe test. Ths is your eritical volue cv: 1.833 (we af <— * Step 3: Perform the necessary calculations. = [Wibetare] Witater [a | (ea aise 10) a 008) 2] — reo] 8 sh 049| ~< 3170] 16a 3] 209 = (a-ay ae alia ea 52] S = he se ta ag ren i 3] 14] 139 a0, 2 "is 5.44) — 3, Fal 2529] | 160] 156} 2| 529) 61a] 134 ‘al 00% 0] 180] ‘40 so] 32.4 aal_52.19) Computation, 3=2=43 2_5210 ——=5,79 sf Le fast 4 43.566 gy 76 Reset 10 Step 4: Compare the computed and eritical values. Give your decision and interpretation. Decision: Since 5.66>1.833, reject Ho. The pills effective. Ania sy acum He ocant ef) fro, octults apt te Kane fale ie pW qd PL Page] 10 z é 5 : $ ER a ee Scanned with CamScanner interval Penson < elo Syeermon = rent < wttical CORRELATION AND LINEAR REGRESSION Correlation analysis ~ concerned with the association or relationship in the changes of two variables. Degrees of correlation 1. Perfect correlation (+1 or -1) Examples: Stress and Strain, Period and frequency, Pressure and temperature with volume constant 2. Some degree of correlation Examples: weight vs. height, Academic engaged time and grades, 3. Nocorrelation Pearson Product Moment Correlation Coefficient - used when linear relationship is present and the level of measurement forthe two variables are either interval or ratio. — oon date Degree of linear relationship 0.900 1.000r-0:90t0-1.00 Very highcorrelation gees CUAL [ren 0.700 0.900r-0.70t0-0.90. Highcorrelation sede 0.50t00.700r-050t0-0.70 Moderate correlation 030t0.0500r-0.3010-050 — Lowcorrelation ; 0.00100300r0.00t0-030 Little, ifany correlation pte Gt mt rabatoe ae ! ' SS seater degra {@ perf postive corel (©) perfect negative correlation a {c) no correlation {@) Some postive coreation Example: “The following shows the final grades of 10 students in Algebra and Visual Basie Programming. 7[?7T3s[*]7sfet7 {sys [10 ‘Algebra_| 85 | 78 | a6 | 72 | 91 | eo | 95 | 72 | a9 [74 VBasic_| 83 | 80 | e8_| 75 | 99 | as | 92 | 75 | e8 | 70 Solution: + 10(68906)-(822)(833)____ agg ; 10(68176 - 822" f10(69721)- 833" r=. Paget Scanned with CamScanner ae] _7832| 7921 7744) 74) 78) 5772, 5476] 6084 B22 w33f _onyve] 81 76f 69721] Bese er ‘Scatter diagram: ; : ‘There is high positive correlation between the grades in Algebra and Visual Basic. But fs the relationship significant? [_x ¥ XY XN YA2 h 35 Bap 705s] 7225] a9] al Bo] 620] 6084] 6400] 3 86 8a] 7568] 7396] 7744] 72, 75| 5400) 5184 5625) 31 35] e099] __azeif 7921 0] 5] 6no0] 6400] 72251 = 95 92) e740] 9025[ 0464 * 7a 75[—saoo] sie] 625 89 [8 joo i | js | 00: bay | 0 + 1 0 %0 100 °: p20 0.05, 10-2= 2306 ‘The population r-value is zero or no correlation. Computation: ¢=0.96, Decision; _ Since 9.70>2.306, reject He. There exists a real correlation between the grades in Algebra and Visual Basic. One can also check whether the points are close or spread apart using Scatter diagram. Note: 1. rdoes not necessarily imply cause-effect relationship. . 2. Ahigh computed r does not necessarily mean that one variable strongly depends on the other. [fone variable is supposed to depend on another and the computed r is high, this confirms the dependence. Other methods: PhL(#) CoefMctent + Nominal by nominal ‘Spearman's rank cho) Ordinal by ordinal Kendall's Coeficent of Concordance (2) Ordinal by ordinal Polot-Blertal Nominal by faterval/ratio Scanned with CamScanner LINEAR REGRESSION = predicting or estimating one variable ¥ knowing the other variable X. Ys the dependent variable and X is. the independent variable. Formula: Y= a BX where X is the predicted score, ais the y-intercept and b isthe slope of the line. is the least square line or the regression line. The method of least square fits the line to the given data. Least square means that the most accurate trend line that may be drawn is. ‘one where the sum of the squares of the vertical distances of the points from the line is least or minimum, Consider previous example: ‘What would be the predicted grade of a student In V. Basic Programming if his grade is 80 in Algebra? ALXY-LOXDY _ 10/68906)-(822)(833) _ 9.7, nDX-(A— 10(68176)-822" ¥=022 =833 6X =833-822(071, ¥ =a+bX =24.9440.71(80)=81.74 ‘The regression line is ¥=24.94+0.71K. ‘The predicted grade in Visual Basic programming is 82. Computation for simple correlation and regression can be done using Excel. Explore the following functions: correl(arraylarray2); _ linest{known y’sknown x'sconststats) Constructing a Scatter diagram is also possible in Excel, Make use of the Chart Wizard. Solution: Scanned with CamScanner ) : . } : CORRELATION AND LINEAR REGRESSION Example: DENSITY OF POPULATION AND DEATH RATE n Density of | Death rate (%) CITY | population (x) o x r - A 200) 10 40 9) 200 R 500) 16, 250000] 256 800 c 400) ¥ 16000 19% 560 D 20 490000) 400) 1400 E 600) 17] 360000) 2K9| 1020¢ F 300, 13} 90000) 169] 3900) 27001 90) 1390000) 1410) 43700) 6(43700)-(2700)(90) ‘There isa very high correlation between Population density and death rate. CORRELATION AND LINEAR REGRESSION 700) }[6(1410)-(90)"} ‘TEST FOR SIGNIFICANCE: 1 He p=0: Hy p20 & 005 df 6-224 Test for correlation cv: 42.776 Computation: 12099, Decision: Since 14.04>2.776, reject Ho ‘There exists a real correlation between the population density and death rate, Correlation quantifies the degree to which two variables are related. You simply are computing a correlation coefficient (r) that tells you how much one variable tends to change when the other one does. When ris 0.0, there is no relationship. ‘= When ris positive, there isa trend that one variable goes up(down) as the other one goes up(d0wn). ‘© When ris negative, there isa trend that one variable goes up as the other one goes down. ACADEMIC ENGAGED TIME AND ACADEM!C PERFORMANCE ‘The course/subject grade of a student depends on the number of quality study hours devoted, ‘Number of observations: 10 Variable Description ‘Study Number of hours spent in studying (x) per day Grade_Grade obtained ‘Study (2 Grade Sia lals|s|sialalalsla 1 1 = 2 Zz z = om aa 3 20 2 1o(iwn) - Gal7AP i (© ee) (i [ua w)- pape 7! Scanned with CamScanner ¥-SXEY : frze coxF [es ye “cP | 10(1611)-(20)(785) -¥i10146)- (20 {10161955} -(7857") =091 + COEFFICIENT OF DETERMINATION 83 LINEAR REGRESSION Compute for the means. 20, 10 85, 10 2. Solveforb. ter DXDY Te 3. Solve fora. a= 7-8 yeatbr 1844683" 1.84 +6.83(4)=92 y oS Gy Example: Description: HEIGHT AND WEIGHT ‘Number of observations: 12 4. Write the equation of the regression line. Y= 64.844+663x Pp 78-785 ‘There isa very high correlation between hours spent in studying and the subject/course grade received. eo the jk. emt AfEEUT Mn dupndtet rentable. ene weer ‘This means that 83% of the variance in grades (y) is explained by study hours (x). The remaining 17% is due to other factors such as 1Q, learning environment, teacher, learning experiences, etc. Linear regression finds the best line that predicts ¥ from. ‘The decision of which variable you call "X* and which you call “Y" matters in regression, as you'll get a different best-fit line if you swap the twe. The line that best predicts ¥ from X isnot the same asthe line that predicts X from ¥ (however both these lines have the same value for R*) __ 10(1661)-(20)(785) _¢ 95 10(46)-(20" a =785~(683)(2)= 6484 What isthe predicted grade ifthe student studles for 4 hours? What is the predicted grade of a student who studies for only 30 minutes per day? ye bat +6 284 (0, c)~ Obviously the height ofa child is not constant, but Increases over time. On the other hand it {s well-known that the growth pattern varies between children. In this dataset the focus is on determining the general growth pattern. One way to explore this is by using the average of several children’s heights. ‘The response variable is the average heights of a group of 161 children in Kalama, an Egyptian village: the site of a study of nutrition in developing countries. The data were obtained by measuring the heights ofall 161 children in the village each month over several years. Time is the explanatory variable, Pagel 15 se San ok sags 3 has Scanned with CamScanner Variable Description Age ‘Age in months Height ‘Average height in centimeters for children at this age AGE G) HEIGHT = ye xy] 194 76a| 324] 5791.21 1369. C 19] 77. 361| 592! 1463] 20} 7a 40 6099.61] 1562| 24 78.2| 441) 6115.24 1642.2] 224 78. 484] 6209.44] 1733. 2: 79.7| 529) 6352.09 1833.1] 2q 799] 57 6384.01] 1917.6 2 81.1) 625| 6577.21 2027.5| 2 812] 676] 6593.44) 2ii2 z e1.8| 725] 6691.24 2208.6) | 2 92.84 724 6855.84) 2318.4) 254 835] B41 — 6972.25] 2421.5 282] 9582] 6770 765705 22608.5| ‘There isa very high correlation between age and height of children in Kalama. 12(22608.5)-(282)(9582) COEFFICIENT OF DETERMINATION [12(6770)-(282)}[12(7657058)-(958.2)"} ‘This means that 98.01% of the variance iny is =099 explained by x. LINEAR REGRESSION 1. Compute for the means. y= 282 2235 j= 2582 79.85 12 2 2. Solveforb. ptt DY _12(2260°5)-(282)(958.2) _ 56 my -0%. cP 126770) -(282)* 3 7985 -(0.63)(23.5)=65.05 4. Write the equation of the regression line. yratbr ¥=65.05+0.63x What is the predicted height ofa child 32 months old? y= 65.05+0.63(32)=85om Pagel 16 Scanned with CamScanner

You might also like