Download as pdf
Download as pdf
You are on page 1of 46
16 Statistical Inference: Hypothesis Testing 16.1 INTRODUCTION Hypothesis testing is a very important phase of statistical inference. It is a procedure which enables us to decide on the basis of information obtained from sample data whether to accept or reject a statement or an assumption about the value of a population parameter, Such a statement or assumption which may or muy not be truc, i called u sfatistical hypothesis. We accept the hyputhesis as being true, when it is supported by the sample data. We reject the hypothesis when the sample data fail to support it. It is important to understand what we mean by the terms reject and accept in hypothesis testing. The rejection of a hypothesis is to declare it false. The acceptance of a hypothesis is to conclude that there is not sufficient evidence to reject it. Acceptance does not necessarily mean that the hypothesis is true. The basic concepts associated with hypothesis testing ave discussed below: g 16.1.1, Null and Alternative Hypothesis. A null hypothesis, generally denoted by the symbol H 9, is any hypothesis which is to be tested for possible rejection under the assumption that it is true, Today the term is used for any hypothesis that is being tested. The word null in the term null hypothesis implies that usually Hg is the hypothesis of so effect. A null hypothesis should always be precise such as “the given coin is unbiased" or “a drug is ineffective in curing a particular disease” , or “the difference between the two teaching methods is null or zero,” The bypothesis is usually assigned a numerical value. For example, suppose we think that the average height of students in all colleges is 62'', This 123 * is true. It is important to remember that a tes! 124 INTRODUCTION TO STATISTICAL THEORY statement is taken as a hypothesis and is written symbolically as Ho: =62". In other words, we hypothesize that [t = 62”. An alternative hypothesis is any other hypothesis which we accept when the null hypothesis Ho is rejected. Ic is customarily denoted by H, or H4. A null hypothesis Hg is thus tested against an alternative hypothesis H,. For example, if our null hypothesis is Ho : Ht = 62”, then our alternative hypothesis may be H, : # # 62” or H, = > 62" or Hy: < 62". 16.1.2. Simple and Composite Hypotheses. A simple hypothesis is one in which all parameters of the distribution . are specified. For example, if the heights of college students are nofinally distributed with o? = 4, the hypothesis that its mean | is, say, 62", that is H : | = 62, we have stated a simple hypothesis, as the mean and variance together specify a normal! distribution completely. A simple hypothesis, in general, states that 0 = Oy where Qq is the specified value of a parameter 0, (0 nay represent Lt, p, My — fla. ete.) 1 A hypothesis which is not simple (i.e. in which not all of the parameters are specified) is called a composite hypothesis. For instance, if we hypothosize that H: jt > 62 (and o? = 4) or H; jt = 62 and Ot s4, the hypothesis becomes a composite hypothesis because we cannot know the exact distribution of the population in either case. Obviously, the parameters jt > 62’ and o? < 4 have more than one value and no specified values are being assigned. The general form of a composite hypothesis is 0 < Qo or © 2 Qo, that is the parameter @ does not exceed or does not fall-short of a specified value Q,. The concept of simple and composite hypotheses applies to both null hypothesis and alternative hypothesis. i Hypotheses may also be classified as exact and inexact. A hypothesis is said to be an exact hypothesis if it selects a unique value for the parameter such as H : 1) = 62 or p = 0.5. A hypothesis is called an inexact hypothesis when it indicates more, than one possible values for the parameter such as H : jt # 62 or IT; p > 0.5. A simple hypothesis must be an exact one while an exact hypothesis is not necessarily a simple hypothesis. An inexact hypothesis is a composite hypothesis. 16.1.3. Test-statistic. A sample statistic which provides a basis for testing a null hypothesis, is called a fest-statistic. Every tedt-statistic has a probability (sampling) distribution which gives the probability of obtaining a specified value of the test-statistic when the null hypothesis ~statistic does not prove the hypothesis to be correct but it furnishes an evidence against the Zé STATISTICAL INFERENCE: HYPOTHESIS TESTING SE 126: hypothesis. The s.: pling distributions of the most commonly used test- statistics are normal, f, chi-square or F. BR ey. 16.1.4. Aceopinnce and Rejection Regions. All possible values which a tost-statistic may assume can be divided into two mutually exclusive groups: one group consisting of values which appear to be consistent with the null hypo-hesis, and the other having values which are unlikely te occur if Hy is true. The first group is called the acceptance region and the second set of values is known as the rejection region for a test. The rejection region is also called the critical region. The value(s) . that separates the critical region from the acceptance region, is called the critical value(s). The critical value which can be in the same units as the Parameter or in the standardized units, is to be decided by the experimenter keeping in view the degree of confidence he (she) is willing | to have in the null hypothesis. seine. ‘16.1.5, Type I and Type I Errors. When we perform a hypothesis test, we derive the evidence from the sample in the form of a test-statistic, There is a possibility that the sample evidence may lead us to make a wrong decision, We may reject a null hypothesis Hy, when it is, in fact, true or we may accept.a null hypoth Hy, when it ts actually false. The former type is called an error of the-first kind or a Type I- error, while the latter, an error of the sccond kind ur a Type l-error. The decision and the corresponding two types of error may be displayed in a tabular form as below: DECISION True Situation Reject Hy (or accept H,) Ag is true Correct decision | Wrong decision (No error) (Type-I error), H, is false Wrong decision Correct decision (Type-Il error) (No error) A legal analogy will help in understanding the difference between’ Type I and Type II errors. In a court trial, the supposition of Law is that the accused (the defendant) is innocent. This supposition of innocence may be regarded as a kind of null hypothesis Hg that is to be rejected or accepted. After having heard the evidence presented during the trial, the judge arrives at a decision. Suppose the accused is, “= vact, innocent i.e. Hp is true), but the finding of the judge is guilty. Tac judge has rejected a true null hypothesis and in so doing, has made + Type ' 2rror, If, on the other hand, the accused is, in fact, guilty (i.e..-y is false) sid the finding of the judge is innocent, the judge has accepted a false nuil a t— 126 INTRODUCTION TO STATISTICAL THEORY hypothesit , Type Il error. The probability of making a Type I error is er:ventionally denotec by @ (alpha) and that gf committirg a Tyr> « error is indicated by B (beta). Thus « is the probability of rejecting Ho when Ho is true and B is the probability of accepting Hq when Ho is false (i.e. Hy is true). In symbols, we may writé a = F (Type l error) = P (reject Ho / His true), * B = P (Type Il error) = P (accept Hy / Hg is false). Let us consider two distributions: one under the null hypothesis Ho: | = bg (ie. distribution assuming Hp is true) and the other under alternative hypothesis H,:1 = }1, (i.e. distribution assuming Hj ‘s true). and by accepting a false hypothesis, he has commiued « HM=Hy Hy:#=#, The probabilities of a and B are the shaded and dotted areas respectively of the distributions under the null hypothesis and. under the alternative hypothesis. When our null hypothesis Ho is true, then any value greater than or equal to C (the critical point) constitutes the rejection region equal to & (one-sided), That is a is associated with extreme values of the [g-distribution. The commonly used values of & are 0.05 and 0.01. On the other hand, B is associated with the area under the j1)- distribution in the acceptance region established from |1g-distribution. The probability of accepting Hy when H, is true, ic. B, thus depends both on the null hypothesis Hy and on the alternative hypothesis H,. In order to determine f (the probability of Type II error) we require a (the probability of Type I error) and the values of both [ty and jt;. When a becomes smaller, B tends to become larger and when & becomes larger, -B tends to become smaller. Thus there is an inverse relationship between o& and B. We can decrease both a and f by increasing the sample size. : STATISTICAL INFERENCE: HYPOTHESIS TESTING 127 Example 16.1. The proportion of adults living in a small town who are matriculates is estimated to be p=0.3. To test this hypothesis a random sample of 16 adults is selected. If the number of matricwlates in _ our sample is anywhere from 2 to 7, we shall accept the null hypothesis that p=0.3; otherwise we shall conclude that p#0.3. Evaluate a assuming p=0.3, Evaluate B forthe alternatives ?- 0. p=04. (CUAL ASE Se. 1986) The null and alternative hypotheses are given as Ho:p = 0.3 and H,:p# 0.3. Let X denote the number of adults who are matrigulates. Then the test-statistic has the binomial distribution with p=0.3 and m=16, The acceptance region, as given, consists of all values from X=2 to X=7. Then the criticai region is composed of two parts: all values less than 2 and all values greater than 7. Thus the probability of making ‘Type I " error, i.e, & consists of P(X <2) and P(X>7). Hence a= P(X<2 when p=0.3) + P(X>7 when p=0.3) 1 im = DY b&; 15, 0.3) + ¥ bt; 15, 0.3) x=0 xe 1 7 = XY be; 15, 0.3) + 1 — Y ke; 15, 0.3)) x=0 x=0 = 0.0353 + [1- 0.9500] (From Binomial probability tables) = 0.0853 To compute B, the probability of Type II error, we need a specific alternative. hypothesis. Now, we are given Hy : p=0.3; and Hy, : p=0.2. A Type II error results when a false null hypothesis is accepted. That is a Type II error occurs if any value of the distribution under Ay, : p=0.2 falls in the region X=2 to X=7, the acceptance region of the distribution under null hypothesis Hy : p=0.3. Hence B = P(2$X <7 when Hy: p = 0.2) 7 = DY b(x; 15, 0.2) x=2 ue 1 = ¥ 6G; 15, 0.2)— Y bx; 15, 0.2) x=0 x=0 = 0.9958 - 0.1671 (From Binomial probability tables) = 0.8287 INTRODUCTION To gry, a ION TO STAT 128 ee Hom Similarly, when Hy :p = 0.4, wo have ae p= PQSXS7,whenp @ 0.4) 7 = ¥ bG; 16, 0.4) x08 7 i = XY bG; 15, 0.4)— ¥ dG; 15, 0.4) xe0 2-0 = 0.7869 — 0.0052 = 0.7817 16.1.6. The Power of a Test with respect to a specified al hypothesis, is the probability of rejecting @ null hypothesis when itis actually false. The power is the complement of f, the Probability of committing a Typo II error, It is therefore numerically equivalent to one minus J). Symbolically, Powor = P (reject Hy / Ho is false) =1-8 To represent a, distributions of the t below: B and power of a test graphically, we show the est-statistic under both hypotheses Ho and Hyas DISTRIBUTION UNDER Ho LET Hy :BS; REJECTION c ofF=20) REGION ACCEPTANCE REGION i #=20 REJECT Ho | ACCEPT. Ho ! DISTRIBUTION UNDER Hy . ! (eT. n220) =O", Serie STATISTICAL INFERENCE: HYPOTHESIS TESTING 129 The shaded area in the lower diagram represents power, This probability corresponds to the rejection region of the distribution under Ho. The power generally increases with an increase in the sample Ras ae test for which 3 is small, is defined to be a powerful test. a A curve giving the probabilities of making Type il errors for various parameteric values under alternative hypotheses, is called an atin, Characteristic Curve or simply the OC curve. The Power curva may be regarded as the complement of the OC curve, snows thi probabilities of rejecting the null hypothesis Hy for various values of the parameter 0. st 16.1.7. The Significance Level of a test is the probadiity used as a standard for rejecting a null hypothesis Hy when Ho is assumed to be true. This probability is equal to some small pre-assigned value, conventionaiiy denoted by a. The value o is also known as the size of the critical region, It is note-worthy that the significance level and the probability of Type I error are equivalent. The most frequently used values of a, the significance level, are 0.05 and 0.01, ie. 5 percent sna 1 percent but occasionally 0.10 or 0,001 is used. By a=5%, ve mean that there are about 5 chances in 100 of incorrectly rejecting a true null hypothesis. To put it in another way, we say that we are 95% confident in making the correct decision. 16.1.8. Test of Significance. A test of significance is a rule cz procedure by which sample results are used to decide whether to accent or reject a null hypothesis. Such a procedure is usually based on a test- statistic and the sampling distribution of such a statistic under Ho, A value of the statistic is said to be statistically significant when the probability of its occurrence under Ho is equal to or less than the significance level G, that is the value falls in the rejection region, Hy in this case is rejected. If, on the other hand, the velve falls in the acceptance region, it is said to be statistically insignificant. In this case, Hy may be accepted. There are two desirable qualities for a test of significance. First, when the null hypothesis is actually true, it must have a low frobability of rejecting Ho, and secondly, when Ay is a false, it must have a high probability of rejecting Ho. It is to be us that the word significant is used in a special sense. 130 INTRODUCTION TO STATISTICAL THEORY. 16.1.9. One-tailed and Two-tailec Tests. A test for which the entire rejection region is located in only one of the two tails-either in the right tail or "ECHON in the left tail-of the sampling distribution of the test- statistic, is called a One-tailed test or One-sided test. For example, if Z is a test-statistic, then the rejection region consists of all z-values which REJECTION REGION are greater than + z, or less- than —z, where « is the size of 0 ta critical region. A one-tailed test is used when the: elternative hypothesis H, is formulated in the following form: Hy :0>0o or ap L elz H,:8 144.9 lb. (b)A Type-II error can be committed only by accepting a false Ho. The hypothesis Hy : jt = 140 Ib will be false if 1 takes a value greater than 140 Ib. Given H, : } = 150 lb so that Ho : |t = 140 Ib becomes false. Therefore, the probability of accepting Hy ; = 140 lb ‘false) when #,: |t = 150, ie. the probability of a Type-II error is indicated by th dotted area in the figure shown below. To compute this area, we the distribution under the alternative hypothesis Hy : |t = 150 1b. Hy:4=150 Now, atx = 144.9, we find z = sea = —2.04, and 15 / 36 NN 132 INTRODUCTION TO STATISTICAL THEORY 135.1 — 150 = -5.96. t= 135.1, we find z= : ce 15 / 36 Thus B= Area between z = —5.96 andz = 2,04, i.e. area in the acceptance region of the distribution under Hg, when H, is = 150 = 0.0207 This is the probability of accepting the null hypothesis Hy:}1=140 Ib, when, in fact, the alternative hypothesis Hj : 1 = 150 Ib is true. Example 16.3, A random sample of size 4 is drawn from a normal Population with known variance 15. A one-tailed test of the form Ho: US 30 against H, : 1 > 80 at the 5% level of significance is performed. Calculate the probabilties of Type-Il error (B) for the values of 1 = 31, 32, 34 and 36 in the alternative hypothesis. Also calculate the powers of the test and hence sketch the power curve for this test. Given Ho: (S30 and Hy: \t > 30; = 0.05,n = 4 and o? = 15. Under Ho:1 $ 30, the population under consideration is N(30, 15/4). To find the values of B, the probability of Type-II error, we first noed to calculate the critical values. For a one-tailed test (the upper tailed) with & = 0.05, the critical value is given by C= Hy + 2g. he ae he ¢ = 30 + (1.645) (15/4) = 30 + (1.645) (8.75) = 30 + (1.645) (1.94) = 33.19 Since 4 values of | in the alternative hypothesis are specified, so we associate the variables X,, Xo, X; and X, with each of the fcur alternative (H,) distributions. Using the alternative (H,) distribution with i 31, we calculate the value of B (say B,, . g)) as By < a1 = P(Type-Il error / jt = 31) = P(X, < 33.19) (, _ 33.19 - 31 = P[Z< Tar] = PU < 1.13) = 0.8708 Agein using the H)-distribution with [1 = 32, the value of B is By, . a2 = P(Type-Il error / | ~ 32) = P(X, < 33,19) . (2 < we = P(Z < 0.61) = 0.7291 ' STATISTICAL INFERENCE: HYPOTHESIS TESTING 133, Similarly, By. 94 = P(Type-Il error / jt = 34) = P(X, < 83.19) . P(z “ aes) = PZ < ~0.42) = 0.3372 1.94 B, ~ 36 = P(Type-Il ertor / 1 = 36) = P(X, < 33,19) = P(z < wis) = P(Z < -1.45) = 0.0735 The power of the test for |i = |1), say P,,(j1) is given by eB a? Thus the required powers are: ’ P,,(31) = 1-B,,.51 = 1— 0.8708 = 0.1292, P,,(32) = 1-B,,.39 = 1— 0.7291 = 0.2709, P,,(84) = 1-B,,.34 = 1— 0.8372 = 0.6628, and P,,(36) = 1—B,,..55 = 1 - 0.0735 = 0.9265. The sketch of the power curve for the test is shown below: Power curve 16.1.10. Sample size when & and f are specified. We may determine the sample size necessary for discriminating between the twe hypotheses when the two Types of error are specified. Let us state our null hypothesis as Hp : [tL = [lo and the alternative hypothesis as :jt=[1). There are then two sampling distributions from two normal populations N(Ho, a) and (HL), 9). hae Ho HyK=By 134 INTRODUCTION TO STATISTICAL THEORY The rejection region under the [p-distribution is the area @ (one-tailed test) and it lies to the right of the point x. The type II error under the 1 ,- distribution is represented by the area B and it lies to the left of the point x, i.e. it is associated with the area under the eee in the acceptance region established from the Ho-distribution. We oe ana every value to the right of x, ie. falling in the critical region, calls for the rejection of the null hypothesis Hp and every value to the left.of x, ie. falling in the acceptance region, calls for the acceptance of the null hypothesis Hy. When Ho is true, the upper limit of the acceptance region (the point x) is determined by the expression X— Ho Sa: oie 8 oe Sanaa where Zp is the normal deviate corresponding to the lower limit of the critical region. Under alternative hypothesis H,, corresponding to an area equal to the specified magnitude of B is the acceptance region under Ho. Considering #1, we again determine the upper limit of the acceptance region by i oe -2; (-ve sign as the critical point x lies o,/Vn to the left of j1,) So or x= My-zZ, Ber where z; is the normal deviate corresponding to the upper limit of the area representing Type-II error. Hence the upper limit of the acceptance region can be represented by two relations, viz., So o ae a and x= py-z,— Equating these two values of x, we get So - 1 Hg + a ee Solving for n, we obtain _ Goto + 421)? (Hy — Ho)? OZ, + 2,)? : (ty — We)? * n when Sg=O;=o ee ee ee ree oe ee ae ee eee STATISTICAL INFERENCE: HYPOTHESIS TESTING 135 The required sample size for a two-tailed test can be determined in a similar way. Example 16.4. A firm wishes to test the hypothesis that the average monthly wage of its shop employees is Rs. 180 with a standard deviation of Rs. 4.50. They wish to run a risk of only 0.05 of rejecting the null hypothesis when it is true, and a risk of 0.05 of accepting the null hypothesis if the average is as high as Rs. 182 or as low as Rs. 178. How large a sample should be taken? (P.U., M.Sc. 1986) A risk of only 0.05 of rejecting the null hypothesis when it is true, implies that the probability of a type I error, i.e. & = 0.05 and a risk of accepting the null hypothesis if the average is as high as Rs. 182 or as low as Ks. 178, implies that the probability of a type U error, i.e. B = 0.05. The distributions of average monthly wage for alternative values of hi and the acceptance and rejection regions at the specified levels of and B are represented by normal distributions in the following figure: Ay ; w= 178 Ho: y=180 Hy: p= 182 The value of the variable Z for two tailed test at u = 0.05 under the Ho-distribution from area tables is zy = 1.96 and the value of the variable Z at B = 0.05 under the j1,-distribution from area tables is z; = 1.645. Substituting these values in the formula (2 + z,)? 0? a Sarasa we get (ty — Ho)? _ (1.96 + 1.645)? (4.5)? (182 — 180)? _ (8.605)? (4.5)? 4 Hence the sample size must be at least 66, the next higher integer. 16.1.11, Formulation of Hypotheses. To formulate the null (Hp) and clternative (H,) hypotheses which are statements about parameters, not statistics, is perhaps the most difficult task. The hypotheses must be formulated in such a way that when one ir ~ue, the = 65.79 136 INTRODUCTION TO STATISTICAL THEORY other is false, i.e. Hy and ’, are opposites. The basic rule in formulating hypotheses is to make H, the hypothesis that the experimenter thinks is true or the hypothesis that he(she) wants to establish as true. Sometimes hypotheses are formulated from an experimenter’s attitude toward a claim. If the experimenter wishes to establish a certain claim with substantive support of sample information, then the claim is taken as the alternative hypothesis H, and its negation becomes the null hypothesis Hp. If the experimenter wants to disprove or refute the claim, then claim is made Ho. If the problem simply says to test a claim, it would be interpreted to mean that the claim is to be disproved (and would be made Hy). In situations where we want to test for a change in the value of a parameter, the old or accepted value of parameter is used for Hy and H, includes the new value. The null hypothesis Ho will always contain some form of an equality sign such as =, Sor 2. If Hp contains the exact form of equality sign =, then H, will have the not-equal sign #. If Ho is stated using the less- than-or-equal-to sign S, then H, will contain the greater than sign >, and if Ho contains the sign 2, then H; will have the sign <. Examples are Ho: st = 62, (say), Hy: #62 Hy: "$62 Hy: > 62 Hy:p262 © Hy:p< 62 It is of importance to note that by rejecting the null hypothesis Hp: = 62 (and accepting the alternative Hy: jt > 62), we are automatically rejecting all values of }t that are less than 62, because the nul! and the alternative hypotheses being opposites, cover all possible values for the parameter |, In general, if Oo is a spe-ified value of a parameter 6, then the null and alternative hypotheses in case of two-tailed test, take the form Hy: 9 = 6 and H,:0#Q,, and in case of one-tailed test, are stated in either of the following two forms 0} Hy:9 <6 ard H,:8 > Oo, Gi) Hg:8 20 and H,:8 < 8p. —— tS STATISTICAL INFERENCE: HYPOTHESIS TESTING 137 The sampling distribution of @ in case of an inexact null hypothesis (Ho : 8 SO or Hy: 0 >8,) is not defined and hence we cannot set up the acceptance and rejection regions. In such a case, we would take the null hypothesis as if it is an exact one, ie. Hy: @ = By. 16.1.12. General Procedure for Testing Hypotheses. The procedure for testing a hypothesis about a population parameter involves the following six steps: () State your problem and formulate an appropriate null hypothesis Hy with an alternative hypothesis Hy, which is to be accepted when Hy is rejected. (i) Decide upon a significance level, o of the test, which is the Probability of rejecting the null Hypothesis if it is true. Choose an appropriate test-statistic, determine and sketch the sampling distribution of the test-statistic, assuming Hy is true. ii) (iv) Determine the rejection or critical region in such a way that the probability of rejecting the null hypothesis Ho, if it is true, is equal to the significance level, a. The location of the critical region depends upon the form of H, . The significance level will separate the acceptance region from the rejection region. (v) Compute the value of the test-statistic from the sample data in order to decide whether to accept or reject the null hypothesis Ho. (vi) Formulate the decision rule as below: (a) Reject the null hypothesis Ho, if the computed value of the test-statistic falls in the rejection region and conclude that A, is true. (b) Accept the null hypothesis Ho, otherwise. When a hypothesis is rejected, we can give a measure of the strength of the rejection by giving the P-value, the smallest significance level at which the null hypothesis is being rejected. 16.2 TESTS BASED ON NORMAL DISTRIBUTION Suppose we wish to test a hypothesis that a parameter 6 of a normal distribution has some specified value 9. We draw a random sample of size n from the population and calculate 6 as an estimate of 0. It has been shown: in the previous chapter that the sampling distribution of 9 is 4 normal or approximately normal with mean @ and standard deviation 38 INTRODUCTION TO STATISTICAL THEORY 13 it S89 If the null hypothesis Hy Gy. Then the variable a ae NO, is N(O, 1) and is used as the test-statistic for © = Gg is true, then 0 testing the hypothesis Hg : 9 = Qo. If the significance level is &, then the critical region will consist of all values of Z which are (i) less than ~2,2 and greater than zz in case of two-tailed test, less than ~z, or greater than z, in case of one-tailed test. Critical values of Z for the most frequently used values of ct, £1,645 = Zy/2 £1.28 = zy Gi) are given below: Significance level (a) £1,645 = z, 42.88 = z, The decision rule is then formulated as below: A Reject the null hypothesis Hy, when the z value of the statistic 0 exceeds these values. Accept the null hypothesis Ho, otherwise. In this chapter, we deal with the following tests of hypotheses: To test whether the mean Lt of a normal population, is equal to a specified value Mp, when the population standard deviation o is known. Symbolically, Hp : |t = [o, when o is known. (f) To test whether the mean } of a normal population is equal to a specified value [19, when the population standard deviation is not known and sample size is large. (i) (iii) To test whether the mean Ht of a non-normal population is equal to a specified value Ho, When the sample size is large. (iv) To test whether the difference between means of two normal distributions (11)Hy) is equal to a specified value Ag (Ap may be zero), when © snd Gy are known, ie. Hy : Hy-Hy = Ag when Go, and Oy are known. hag } STATISTICA INFERENCE: HYPOTHESIS TESTING mi! oN | (v) To test whether the difference between means of two normal distributions is equal to specified value, Ao when o, and Oy are not known and sample sizes are large. (vi) To test whether the difference between means of two non- normal distributions is equal to a specified value Ay, when sample sizes are large. (vii) Other tests based on the normal distribution for large sample size such as tests of standard deviation, proportions, etc. 16.2.1. Testing Hypothesis about Mean of a Normal Population when o is known. Suppose a random sample of size n is drawn from a normal population with mean having a specified value [1g and a known standard deviation c. The sample mea: is given by ¥. We wish to determine whether the sample accords with the hypothesis that the population mean |1 has the specified value j1o. For this purpose, we employ the normal distribution test Z = 3= Mo , and the procedure is o/yn outlined below: (i) Formulate the null and alternative hypotheses about jt. Three possible forms are (a) Ho: = Uo, and Hy: 4 py (b) Ho: tS Uo, and Hy: [1 > [lp (c) Ho: W2Ho, and Hy: pt < fg. (ii) Decide on significance level a, i.e. take & = 0.05 or 0.01: (iii) The test-statistic in this case will be Z = ae Under the o/\n null hypothesis, Z has a standard normal distribution. (iv) Determine the rejection region, which actually depends on the alternative hypothesis, from the table of areas under the normal curve by finding areas exactly equal to a. The rejection regions for Hy corresponding to different alternative hypotheses are given below: When the alternative hypothesis is | the rejection re on will be (a) Hy:|t# lg (two-sided) Z<—2qyy and z > 2yy (b) Hy: [> [lg (one-sided) Sime a (c) H,:|t < [lg (one-sided) wae a INTRODUCTION TO STATISTICAL THEORY 140 For example, when H1, is 1 # Hg, the areas in the two - - a . 0.05 would be 0.05 if z < —1.96 and z > 1.96, as the critical values of Z are * -zo,925 = ~1.96 and zo,y25 = 1.96. In this text, wherg spe alternative heen has not been stated explicitly or implicitly, a two-sided alternative hypothesis has been assumed. (v) Calculate the value of Z from the sample data. (vi) Decide as below: Reject Hy, when the calculated value of Z falls in the rejection region, otherwise, accept it. In case of rejection, the decision would be that [1 differs from 19. Example 16.5. A random sample of n=25 values gives ==83. Can this sample be regarded as drawn from a normal population with mean u=80 and o=7? (i) We formulate our null and alternative hypotheses as Ao: | = 80 and H,: 41480 (two-sided) (ii) We set the significance level at a = 0.05. . xX- ’ (iii) ‘The test-statistic to be used is Z = moe which under the f me null hypothesis is a standard normal variable. The critical region for & = 0.05 is |Z| 2 1.96. The hypothesis will be rejected if, for the sample, |2| 2 1.96. (v) We calculate the value of Z from the sample data as (iy) +83 — 80 ~3xs 7/N25 7 (vi) Conelusion, Since our calculated value z = 2.14 falls in the critical region, so we reject our null hypothesis Hy: = 80 and accept H, : |t # 80. We may conclude that the sample with = 83 cannot be regatded as drawn from the population with }t=80. Example 16.6. Test the hypothesis that the mean of a normal population with known variance 70 is 31, if a sample of size 13 gave © = 34. Let the alternative hypothesis be Ay: | > 81, and let a = 0,10, = 2.14. (i) We formulate our hypotheses as Ho: | = 31, and Hy: > 31 (one-sided) STATISTICAL INFERENCE: HYPOTHESIS TESTING 141 (ii) We are given the significance level as & = 0.10. (iii) The test-statistic to be used is Z = 2—H0., which under null o/\n hypothesis has a standard normal distribution. (iv) The critical region for a = 0.10 is Z > 1.28 (vy) We calculate the value of Z from the sample data as Pi 34-31 = 3% 8.606 | 1 29 Vio /Vis 8.367 i (vi) Conclusion. Since our calculated value z = 1.29 falls in the critical region, so we reject our null hypothesis Hy : fb = 31. ‘We may conclude that there is evidence at the 10% level that sample does not come from the given population. 16.2.2. Testing Hypothesis about Mean of a Normal Population when o is unknown and n > 30. When the population standard deviation o is not known, we use the sample standard deviation S as an estimate of the true but unknown population standard deviation. For large sample size (n > 30), the central limit theorem allows us to assume that the sampling distribution of X is approximately normal with a mean of | and a standard deviation oe In other words, when o is In ; OP en s unknown but n is large, we replace ve with —=, and the variable in xn = Xa Ho , which is approximately N(0, 1), is then used as the test- S/\n statistic to test the hypothesis Hg : |t = [lo. The rest of the procedure is the same. Example 16.7, The marks obtained by students at a large number of colleges are known to be normally distributed with a mean of 25. A random sample of 36 students showed an average number of marks of 27 with a standard deviation of 5. What conclusion should be drawn? (i) Since it is given that the average number of marks obtained by students at a large number of colleges is 25, we therefore have Hoy: | = 25 and H,:11#25 (two-tailed) (ii) Let us specify the significance level at & = 0.05. INTRODUCTION TO STATISTICAL THEORY 142 30, we use S in Because o is not known and the sample size ” > co. Thus the test-statistic is Z = s/ mi which has an Gi place of approximate standard normal distribution under the given oul hypothesis. (iv) The rejection region is |Z| 2 1.96. Computing the value of Z, we find 27-25 2x6 ze 5/36 aan 2.40, Conclusion. Since the calculated value z = 2.40 falls in the : jt = 25 and accept ‘v) (vi) rejection region, we therefore reject Hy H,: |. # 25, On the basis of the evidence, we may conclude that this sample of students appears to be superior. 16.2.3. Turiing Hypothesis about Mean of a Non-Normal Population wien sample size is large. The central limit theorem tells us that for large sample sizes, the sampling distribution of X is approximately a normal even though the population sampled is non- ap Sot or Z a ae normal, That is, the random variable oy = Fabs o J Nn according as O is known or not known, is approximately standard normal and is used as the test-statistic to test the hypothesis Ho ; [t = Ho. The rest of the procedure is the same. Example 16.8. A random sample of 100 workers with children in day care shows a mean day-care cost of Rs. 2,600 and a standard deviation of Rs. 500. Verify the department’s claim that the mean exceeds Rs, 2,500 at the 0.05 level with this information. We make H,, what the department claims, that the mean exceeds Rs. 2,500, and take the negation of its claim as Ho. Thus, we have (i) Ho: MS 2,500 Hy: |t > 2,500 (exceeds 2,500) (i) We are given the significance level at & = 0.05. (iii) The test-statistic, under Hg is z= abo . S/n’ which is approximately normal as n = 100 i = is large enough to make use of the central limit theorem. .

You might also like