Nội dung Phân tích và dự báo kinh tế

You might also like

Download as pdf
Download as pdf
You are on page 1of 95
Introductory STATISTICS S. Mann ce WileyPLUS z Now with: ORION, An Adaptive Experience eageneeeee WileyPLUS is a research-based online environment for effective teaching and learning. WileyPLUS builds students’ confidence because it takes the guesswork out of studying by providing students with a clear roadmap: what to do how to do it if they did it right It offers interactive resources along with a complete digital textbook that help students learn more. With WileyPLUS, students take more initiative so you'll have greater impact on their achievement in the classroom and beyond. Now available for For more information, visit www.wileyplus.com Blackboard ALL THE HELP, RESOURCES, AND PERSONAL SUPPORT YOU AND YOUR STUDENTS NEED! www.wileyplus.com/resources DAY OF 14st Class ... AND BEYOND! 2-Minute Tutorials and all of the resources you and your students need to get started WileyPLUS Quick Start Pre-loaded, ready-to-use assignments and presentations created by subject matter experts WileyPLUS Student Partner Program Student support from an experienced student user —f Technical Support 24/7 FAQs, online chat, and phone support www.wileyplus.com/support Wiley Faculty Network RAR Collaborate with your colleagues, find a mentor, attend virtual and live events, and view resources www.WhereFacultyConnect.com conveying Saete Your WileyPLUS Account Manager, providing personal training and support INTRODUCTORY STATISTICS PREM S. MANN EASTERN CONNECTICUT STATE UNIVERST ITY WILEY ‘Vice President and Director Lari Rosstone Sponsoring aitor Jmifer Brady [Acquisitions Editor Joanna Dingle Project Editor lea Keohane Market Solutions Assistant Carolyn DeDeo Senior Content Manager Valerie Zaborsk Senior Production Ealitor Ken Santor Production Management Services Tckic Henry, Aptrs, le Senior Photo Biter lly Ry Marketing Manager Join Lavacea It Senior Designer | Wendy Lai Senior Product Designer David Dictz Product Design Manager Thomas Kulest Cover Designer Wendy Lai Cover photo: © Shai Getty ages, Ie ‘This book was sein 10/12 Times LT Sid-Reman by Aplara, Ine. and printed and bound by RK. Domeliey/Kenéallvile. The cover was printed by RI. Donneley(Kendllile. ‘This book is printed on aid fee paper. co Copyright © 2016, 2013, 2010, 2007, 2004, 201, John Wiley & Sons, Tne. Al sight reserved, ‘No par ofthis publication may be reproduce, stored ina reuieval system or wansmited in an form or by any ‘cans, clecwonie, mechanical, pholocopying, recording, caning or olherwise excep as permed under Sections 107 or 108 ofthe 1976 United States Copyright Act, without either the pir writen permission of the Publisher, or auhorzaton trough payne of the appropiate percopy Tes tothe Copyright Clearance Cee, le 222 Rosewood Drive, Danvers, MA 01923, website www copyright.com. Requests othe Publisher for permision should be adiesod to he Permissions Depatient, Joba Wiley & Sons, ne 11 River Sweet, Hoboken, NJ 07030-5774, (201)748.601, fax (201)788-6008, website tp/nwne ley comigalpermisons [valuation copies ace provided to qualified academics snd profesional or review parpore oly for we I their courses during the next academic year. These copies ae licensed and may not be sold or wansfened toa third pay, Upon competion of the review period, please reir the evahation copy te Wiley. Retim instructions anda fee ofehage return shipping label are avaable at ww. wiley.com/goleturalabel. Outside ofthe Unite States, please contact you local representative, ‘The inside bark cover wil eontain pining denieation and coun of origin if omited rom this page In adtion, ithe ISBN onthe hack ever differs from the ISBN on this page, the one onthe back cover ISBN 978-1-119-168326 Binder Ready Vetson) ISBN 978-1-119.255449 aluatioDesk Copy Version) ISBN 974-1-119.234562 (ISBVersion) Printed inthe United Stats of America woxresssat To the memory of my parents Introductory Statistics is written for a one- or two-semester first course in applied statistics. ‘This book is intended for students who do not have a strong background in mathematics. The only prerequisite for this text is knowledge of elementary algebra Today, college students from almost all fields of study are required to take at least one ‘course in statistics. Consequently, the study of statistical methods has taken on a prominent role in the education of students from a variety of backgrounds and academic pursuits. From the first edition, the goal of Introductory Statistics has been to make the subject of statistics interesting and accessible to a wide and varied audience. Three major elements ofthis text support this goal: 1. Realistic content of its examples and exercises, drawing from a comprehensive range of applications from all facets of life 2. Clarity and brevity of presentation 3. Soundness of pedagogical approach ‘These elements are developed through the interplay of a varity of significant text features, ‘The feedback received from the users of the eighth edition (and earlier editions) of Introductory Statistics has been very supportive and encouraging. Positive experiences reported by instructors dnd students have served as evidence that this text offers an interesting and accessible approach to statisties—the author's goal from the very first edition. The author has pursued the same goal through the refinements and updates inthis ninth edition, so that Introductory Statistics can continue to provide a successful experience in statistics to a growing number of students and instructors. New to the Ninth Edition ‘The following are some of the changes made in the ninth edition: I Now for the ninth edition, are the videos that are accessible via the WileyPLUS course associated with this text, These videos provide step-by-step solutions to selected examples in the book. mA large number of the examples and exercises are new or revised, providing contemporary and varied ways for students to practice statistical concepts. Coverage of sample surveys, sampling techniques, and design of experiments has been moved from Appendix A to Chapter | I In Chapter 3, the discussions of weighted mean, trimmed mean, and coefficient of variation hhave been moved from the exercises to the main patt of the chapter. mm The majority of the case studies are new or revised, drawing on current uses of statistics in areas of student interest Now data are integrated throughout, reinforeing the vibrancy of statistics and the relevance of statistics to student lives right now. I The Technology Instructions sections have been updated to support the use of the latest versions of TI-84 Colot/TI-84, Minitab, and Excel I Many of the Technology Assignments atthe end of each chapter ate either new of have been updated 1m The data sets posted on the book companion Web site and WileyPLUS have been updated. I Most ofthe Uses and Misuses sections atthe end of each chapter have been updated or replaced. I Many of the Mini-Projects, which are now located on the book companion Web site, are cither new or have been updated Many of the Decide for Yourself sections, also located on the book companion Web site, are either new or have been updated Hallmark Features of This Text Clear and Concise Exposition The explanation of statistical methods and concepts is clear and concise, Moreover, the style is user-friendly and easy to understand, In chapter introductions and in transitions from section to section, new ideas are related to those discussed earlier, Thorough Examples The text contains a wealth of examples. The examples are usually pre- sented in a format showing a problem and its solution. They are well sequenced and thorough, displaying all facets of concepts. Furthermore, the examples capture students’ interest because they cover a wide variety of relevant topics. They are based on situations that practicing statisti- cians encounter every day. Finally, a large number of examples are based on real data taken from sources such as books, government and private data sources and reports, magazines, newspapers, and professional journals. Step-by-Step Solutions A clear, concise solution follows each problem presented in an example. ‘When the solution to an example involves many steps itis presented in a step-by-step format. For instance, examples related to tests of hypothesis contain five steps that are consistently used to solve such examples in all chapters. Thus, procedures are presented in the conerete settings of applications rar than as isolated abstractions, Frequently, solutions contain highlighted remarks that recall and reinforce ideas critical to the solution of the problem, Such remarks add to the clarity of presentation Titles for Examples Each example based on an application of concepts now contains a title that describes to what area, field, or concept the example relates Margin Notes for Examples A margin note appeats beside each example that briefly describes What is being done in that example. Students can use these margin notes to assist them as they read tough sections and to quickly locate appropriate model problems as they work through exercises Frequent Use of Diagrams Concepts can often be made more understandable by describing them visually with the help of diagrams. This text uses diagrams frequently to help students understand concepts and solve problems. For example, tree diagrams are used a few times in Chapters 4 and 5 to assist in explaining probability concepts and in computing probabilities Similarly, solutions to all examples about tests of hypothesis contain diagrams showing rejection regions, nonzejection regions, and critical values. Highlighting Definitions of important terms, formulas, and key concepts are enclosed in colored boxes so that students can easily locate them, Cautions Certain items need special attention. These may deal with potential trouble spots that commonly cause errors, or they may deal with ideas that students often overlook. Special empha- sis is placed on such items through the headings Remember, An Observation, ot Warning. An icon is used to identity such items. Real World Case Studies These case studies, which appear in most of the chapters, provide additional illustrations of the applications of statistics in research and statistical analysis. Most of these case studies are based on articles or data published in journals, magazines, newspapers, of Web sites. Almost all case studies are based on real data Variety of Exercises The text contains a variety of exercises, including technology assignments, ‘Moreover, a large number of these exercises contain several parts. Exercise sets appearing at the Preface vil vill Preface end of cach section (or sometimes at the end of two or thtee sections) include problems on the topics of that section. These exercises are divided into two parts: Concepts and Procedures that emphasize key ideas and techniques and Applications that use these ideas and techniques in concrete settings. Supplementary exercises appear at the end of each chapter and contain exer- cises on all sections and topics discussed in that chapter. A large number of these exercises are ‘based on real data taken from varied data sources such as books, government and private data sources and reports, magazines, newspapers, and professional journals, Not only do the exercises given in the text provide practice for students, but the real data contained in the exercises provide interesting information and insight into economic, political, social, psychological, and other aspects of life, The exercise sets also contain many problems that demand critical thinking skills. ‘The answers to selected odd-numbered exercises appear in the Answers section atthe back of the book. Optional exercises are indicated by an asterisk (*) Advanced Exercises All chapters have a set of exercises that are of greater difficulty. Such exercises appear under the heading Advanced Exercises after the Supplementary Exercises: Uses and Misuses This feature toward the end of each chapter (before the Glossary) points out common misconceptions and pitfalls students will encounter in their study of statistics and in everyday life. Subjects highlighted include such diverse topics as do not feed the animal Decide for Yourself This feature is accessible online at www.wiley.com/college/mann. Each Decide for Yourself discusses a real-world problem and raises questions that readers can think about and answer Glossary Each chapter has a glossary that lists the key terms introduced in that chapter, along ‘with a brief explanation of each term, Self-Review Tests Each chapter contains a Self-Review Test, which appears immediately after the ‘Supplementary and Advanced Exercises. These problems can help students test their grasp of the concepts and skill presented in respective chapters and monitor their understanding of statistical ‘methods. The problems marked by an asterisk (*) in the Sel/-Review Tests are optional. The answers to almost all problems of the Seif-Review Tests appear in the Answer section. ‘Technology Usage At the end of each chapter is a section covering uses of three major technolo- gies of statistics and probability: the TI-84 Colot!TI-84, Minitab, and Excel. For each technology, students are guided through performing statistical analyses in a step-by-step fashion, showing them how fo enter, revise, format, and save data in a spreadsheet, workbook, or named and ‘unnamed lists, depending on the technology used, Ilustzations and screen shots demonstrate the use of these technologies. Additional detailed technology instruction is provided in the technol- ‘ogy manuals that are online at www-wiley.comvcollege/mann. ‘Technology Assignments Each chapter contains afew technology assignments that appear atthe end of the chapter. These assignments can be completed using any of the statistical software. Mini-projects Associated with each chapter of the text are Mini-projects posted online al www. wiley.com/college/mann, These Mini-projects are either very comprehensive exercises or they ask students to perform their own surveys and experiments. They provide practical applications of statistical concepts to real life. Data Sets A large number of data sets appear on the book companion Web site at www.wiley. comv/college/mann, These large data sets are collected from various sources, and they contain information on several variables, Many exercises and assignments in the text are based on these data sets. These large data sels can also be used for instructor-driven analyses using a wide variety of statistical software packages as well as the TI-84, These data sets are available on the Web site of the text in numerous formats, including Minitab and Excel ‘Videos New for the Ninth Edition, videos for each text section illustrate concepts related to the topic covered in that section to more deeply engage the students. These videos are accessible via WileyPLUS. GAISE Report Recommendations Adopted In 2003, the American Statistical Association (ASA) funded the Guidelines for Assessment and Instruction in Statistics Education (GAISE) Project to develop ASA-endorsed guidelines for assessment and instruction in statistics for the introductory college statistics course. The report, which can be found at www.amstat orgleducation/gaise, resulted in the following series of recommendations forthe first course in statistics and data analysis, Emphasize statistical literacy and develop statistical thinking. Use real data. ‘Stress conceptual understanding rather than mere knowledge of procedutes. Foster active learning in the classroom, Use technology for developing concepts and analyzing data. Use assessments to improve and evaluate student learning, Here are a few examples of how this Introductory Statistics text can assist in helping you, the instructor, in mecting the GAISE recommendations. 1. Many of the exercises require interpretation, not just answers in terms of numbers. Graphical and numeric summaries are combined in some exercises in order to emphasize looking at the ‘whole picture, as opposed (o using just one graph or one summary statistic 2. The Uses and Misuses and online Decide for Yourself features help to develop statistical thinking and conceptual understanding, ‘All of the data sets listed in Appendix A ate available on the book's Web site. They have been formatted for a variety of statistical software packages. This eliminates the need to enter data into the software. A variety of software instruction manuals also allow the instructor to spend more time on concepts and less time teaching how to use technology. 4. The online Mini-projects help students to generate their own data by performing an ‘experiment and/or taking random samples from the large data sets mentioned in Appendix A. ‘We highly recommend tha all statistics instructors take the time to read the GAISE report, There is wealth of information in this report that can be used by everyone. Web Site www.wiley.com/college/mann After you go tothe page exhibited by the above URL, click on Visit the Companion Sites. Then click fn the site that applies to you out ofthe two choices. This Web site provides additional resources for nstructors and students, The following items are available for instructors on this Web site: + Key Formulas + Printed Test Bank + Mini-Projects + Decide for Yourself + Power Point Lecture Slides + Instructor's Solutions Manual + Data Sets (see Appendix A for a complete list ofthese data sets) + Chapter 14: Multiple Regression + Chapter 15: Nonparametric Methods + Technology Resource Manuals: + T1 Graphing Calculator Manual + Minitab Manual + Bxcel Manual ‘These manuals provide step-by-step instructions, sereen captures, and examples for using tech- nology in the introductory statistics course. Also provided are exercise lists and indications of which exercises from the text best lend themselves to the use of the package presented. Preface & Preface Using WileyPLUS SUCCESS: WileyPLUS helps to ensure that each study session has a positive outcome by puting students in control. Through instant feedback and study objective reports, students know if they did it right and where to focus neat, so they achieve the strongest results Our efficacy research shows that with WileyPLUS, students improve their outcomes by as such as one letter grade. WileyPLUS helps students take more initiative, o you will have greater impact on their achievement inthe classroom and beyond, What Do Students Receive with WileyPLUS? + The complete digital textbook, saving students up to 60% off the cost of a printed text, + Question assistance, including links to relevant sections in the online digital extbook, + Immediate feedback and proof of progress, 24/7. + Integrated, multimedia resources—including videos—that provide multiple study paths and encourage more active learning. What Do Instructors Receive with WileyPLUS? + Reliable resources that reinforce course goals inside and outside of the classroom. + The ability to easily identify those students who are falling behind + Media-tich course materials and assessment content, including Instructor's Solutions Manual, PowerPoint slides, Learning Objectives, Printed Test Bank, and much more, ‘wwrw.wileyplus.com. Learn More. Supplements ‘The following supplements are available to accompany this text I Instructor’s Solutions Manual (ISBN 978-1-119-14830-2). This manual contains com- plete solutions to all of the exercises in the text. lm Printed Test Bank The printed copy of the test bank contains a large number of multiple choice questions, essay questions, and quantitative problems for each chapter. It can be downloaded and printed from WileyPLUS or from www.wiley.com/college/mann, Student Solutions Manual (ISBN 978-1-119-14829-6). This manual contains complete solutions to all of the odd-numbered exercises in the text Acknowledgments 1 thank the following reviewers of the ninth edition of this book, whose comments and sugges- tions were invaluable in improving the text. D.P.Adhikari Hossein Behforooz Marywood University Utica College Wendy Abrendsen Joleen Beltrami South Dakota State University University of the Incarnate Word Nan Hutchins Bailey Bill Burgin The University of Texas at Tyler Gaston College Les Bamhouse David Bush Athabasca University Villanova University Feny Butar Butar Sam Houston State University Cris Chapa University of Texas at Tyler Jerry Chen Suffolk Counsy Community College, ‘Ammerman Campus A. Choudhury Mlinois State University Robert C. Forsythe Frostburg State University Daesung Ha Marshall University Rhonda Hatcher Texas Christian University Joanna Jeneralezuk University of Massachusetts, Amherst Annette Kanko University ofthe Incarnate Word ‘Mohammad Kazemi University of North Carolina, Charlotte Janine Keown-Gerrard Athabasca University Hoon Kim California State Polytechnic University—Pomona ‘Matthew Knowlen Horry Georgetown Technical College Preface ‘Maurice LeBlanc Kennesaw State University ‘Min-Lin Lo California State University, San Bernardino Natalya Malakhova Johnson County Community College Nola McDaniel ‘McNeese State University Stéphane Mechoulan Dathousie University Robert L, Nichols Florida Gulf Coast University Jonathan Oaks ‘Macomb Community College South Campus ‘Marty Rhoades West Texas A&M University ‘Mary Beth Rollick Kent State University Kent Seema Sehgal Athabasca University Linda Simonsen University of Washington, Bothell Criselda Toto Chapman University ‘Viola Vajdova Benedictine University Daniel Weiner Boston University express my thanks to the following for their contributions to earlier editions of this book that made it better in many ways: Chris Lacke (Rowan University), Gerald Geissert (formerly of Eastern Connecticut State University), Daniel S. Miller (Central Connecticut Stat and David Santana-Ortiz (Rand Organization). University), Textend my special thanks to Doug Tyson, James Bush, and Chad Cross, who contributed to this edition in many significant ways, take this opportunity to thank Mark MeKibben for work- ing on the solutior the solutions for accuracy. nanuals and preparing the answer section and Julie M. Clark for checking It is of utmost importance that a textbook be accompanied by complete and accurate sup- pplements. I take pride in mentioning that the supplements prepared for this text possess these ‘qualities and much more. I thank the authors of all these supplements. It is my pleasure to thank all the professionals at John Wiley & Sons with whom I enjoyed working during this revision. Among them are Laurie Rosatone (Vice President and Director), Joanna Dingle (Acquisitions Editor), Wendy Lai (Senior Designer). Billy Ray (Senior Photo Editor), Valerie Zaborski (Senior Content Manager), Ken Santor (Senior Production Fiitor), Ellen Keohane (Project Editor), Jennifer Brady (Sponsoring Editor), Carolyn DeDeo (Market Solutions Assistant), David Dietz (Senior Product Designes), and Jobn LaVacca (Marketing. “Manager) Talso want to thank Jackie Henry (Pull Service Manager) for managing the production process and Lisa Torri (Art Development Editor) for her work on the case study art ‘Any suggestions from readers for future revisions would be greatly appreciated. Such suggestions can be sent to the author yremmann@ yahoo.com or mann@easternct.edu. I recently retired from Eastern Connecticut State University and, hence, will prefer that readers email me at premmann@ yahoo.com, Prem S. Mann December 2015 x xii CHAPTER 1 Introduction 1 1.1 Statistics and Types of Statistics 2 Case Study 1-1 2014 Lobbying Spencing by Selected Companies 3 Case Study 1-2 Americans’ Life Outlook 2014 4 1.2 Basic Terms 5 1.3 Types of Variables 7 1.4 Cross-Section Versus Time-Series Data 9 1.5 Population Versus Sample 10 1.6 Design of Experiments 18 1.7 Summation Notation 22 Uses and Misuses/ Ghssary/ Supplementary Exercises /Advancad Exercises / Se Instruction /eenrcogy Assign Test Technolgy cHaPTER 2 Organizing and Graphing Data 36 2.1 Organizing and Graphing Qualitative Data 37 Case Study 2-1 ldeolaycal Compson ofthe US. Publ, 2014 40 Case Study 2-2 Milena’ Views on Their Level of Day-to-Day Barking Knowledge 41 2.2 Organizing and Graphing Quantitative Data 43 Case Study 2-3 Car hswance Premiums pe Yearin 50States. 49 Case Study 2-4 Hours Worked in a Typical Week by Full-Time U.S. Workers 50. Case Study 2-5 How Many Cups of Coffee Do You Drink a Day? 53 2.3 Stem-and-Leaf Displays 60 24 Dotplots 64 Uses and Misuse / Ghssary/ Supplementary Fores /Aihancad Baises / SeF-Re Instructions /Torheaagy Assignments Tast/ Technology CHAPTER 3 Numerical Descriptive Measures 77 3.1 Measures of Center for Ungrouped Data 78 Case Study 3-1 2013 Average Starting Saaros for Solactod Majors 81 Case Study 3-2 Education Level and 2014 Median Week Earnings 83. 3.2 Measures of Dispersion for Ungrouped Data 9 3.3 Mean, Variance, and Standard Deviation for Grouped Data 97 3.4 Use of Standard Deviation 103 Case Study 3-3 Does Spread Mean the Same as Varlbllty and Dispersion? 106 Contents xl 3.5 Measures of Position 108 3.6 Box-and-Whisker Plot 113 Uses and Misuss / Glossary /Supolementary Technology Instructions / Technology Assignments 888 / Advanced Exercises / Append 3.1 /Sel-Revew Test / CHAPTER 4 Probability 129 4.1 Experiment, Outeome, and Sample Space 130 4.2 Calculating Probability 138 4.3 Marginal Probability, Conditional Probability, and Related Probability Concepts 140 Case Study 4-1 Do You Worry About Your Weight? 143 4.4 Intersection of Events and the Multiplication Rule 150 4.8 Union of Events and the Addition Rule 186 4.6 Counting Rule, Factorials, Combinations, and Permutations 162 case Study 4-2 Probabilty of Winning a Mega Millions Lotery Jackpot 166 Uses and Misues / Glossary /Suppbementary Instructions Technology Assignments csas/Révanced Exercises /Sall-Rovow Test Techrology cuapTeR 5 Discrete Random Variables and Their Probability Distributions 179 5.1 Random Variables 180 5.2 Probability Distribution of a Disorete Random Variable 182 5.8 Mean and Standard Deviation of a Discrete Random Variable 187 Case Study 5-1 AllState Lottery 189 5.4 The Binomial Probability Distribution 193 5.5 The Hypergeomettic Probability Distribution 203, 5.6 The Poisson Probability Distribution 206 Case Study 5-2 Global Bith and Death Rates 210 Uses and Misuses / Glossary Supplementary Exercises / Advanced Fxarcios/Salt-Ravow Test Technology Instructions Technology Assignments cHapTER 6 Continuous Random Variables and the Normal Distribution 227 6.1 Continuous Probability Distribution and the Normal Probability Distribution 228 Case Study 61 Distribution of Tme Taken to Run @ Road Race 231 6.2. Standardizing a Normal Distribution 242 6.3 Applications of the Normal Distribution 247 6.4 Determining the z and x Values When an Area Under the Normal Distribution Curve Is Known 252 6.5 The Normal Approximation to the Binomial Distribution 257 Uses and Misuses / Glossary Supolementay Exercises / Advanced Exercises / Appendix 6.1 /Soll-Revew Test/ Technology stuns / Technology Asignments cater 7 Sampling Distributions 275 7.4. Sampling Distribution, Sampling Error, and Nonsampling Errors 276 7.2 Mean and Standard Deviation of 281 xv Contents 7.3. Shape of the Sampling Distribution of 282 7.4 Applications of the Sampling Distribution of 288 7.5 Population and Sample Proportions; and the Mean, Standard Deviation and Shape of the Sampling Distribution of 6238 7.6 Aoplications of the Sampling Distribution of 6 299 Uses and Misuses/ Ghssary / Supplementary Exercises /Athanced Exercises / Se Instructions eehreogy Assign Raven Test Technology CHAPTER 8 Estimation of the Mean and Proportion 311 8.1 Estimation, Point Estimate, and Interval Estimate 312 8,2 Estimation of a Population Mean: « Known 315 Case Study 8-1 Annual Salaries of Registered Nurses, 2014 319 8.3 Estimation of a Population Mean: « Not Known 324 8.4 Estimation of a Population Proportion: Large Samples 331 Case Study 8-2 Americans’ Etfors to Lose Weight Stil tall Desires 334 Uses and Msuses/ Shssary/ Supplementary Exercies/ Avance Exercises /SeltRevaw Test Technolgy Instructions Teenrology Assignments CHAPTER 9 Hypothesis Tests About the Mean and Proportion 246 9.1 Hypothesis Tests: An Introduction 347 9.2 Hypothesis Tests About»: «Known 354 Case Study 9-1 Average Student Loan Debt forthe Clss of 2012 264 9.3 Hypothesis Tests About y: « Not Known 367 9.4 Hypothesis Tests About a Population Proportion: Large Samples 375 Case Study 8-2 Ave Upper tcome Penole Paying Ther Fl Shavein Federal Taxes?” 382 Uses and Misuses/ Shssary/ Suppo ew Test Technology Instraston Technology Assignments tary Breese / Advanced Exercises /Sa-Re cHaPTER 10 Estimation and Hypothesis Testing: Two Populations 396 10,1 inferences About the Difference Between Two Population Means for Independent Samples: «, and «, Known 397 10.2 Inferences About the Difference Between Two Population Means for Independent Samples: o; and 6, Unknown but Equal 404 10.3. Inferences About the Difference Between Two Population Means for Independent Samples: «, and 62 Unknown and Unequal 411 10.4 Inferences About the Mean of Paired Samples (Dependent Samples) 416 10.5 Inferences About the Difference Between Two Population Proportions for Large and Independent Samples 425 tary Bross /Advancod Exrcss Sel Uses and Misuses/ Shssary/ Suppo Instructions Teenroogy Assignments ow Test Technolgy CHAPTER 11 Chi-Square Tests 448 11.1 The Chi-Square Distribution 449 11.2 AGoodness-ot-Fit Test. 451 Case Study 11-1 AvePeople on Wall Steet Honest and Moral? 457 Contents. x 11.3 A Test of Independence or Homogeneity 459 11.4 Inferences About the Population Variance 468 Uses and Wisuses / Glossary / Supplementary Exercises / Advanced Exocies /Sal/-Ravew Test Technology Instructions Technology Assignments CHAPTER 12 Analysis of Variance 483 12.1 TheF Distribution 484 12.2 One-Way Analysis of Variance 486 Uses and Misuses / Glossary Supplementary Exercises / Advanced Exercises /Solt-Ravew Test Technology Instructions Teehrology Assignments CHAPTER 13 Simple Linear Regression soz 13.1 Simple Linear Regression 503 Case Study 13-1 Regression of Weights on Heights for NFL Players 512 13.2 Standard Deviation of Errors and Coefficient of Determination 517 13.8 Inferences About B 524 13.4 Linear Correlation 528 13.5 Regression Analysis: A Complete Example 533, 13.6 Using the Regression Model 538 Uses and Misuss / Glossary Supolementary Exercises / Advanced Exercises /Self-Ravew Test Technology Instructions / Technology Assignments CHAPTER 14 Multiple Regression This chapter is not incluced in this text but is availabe for download from WileyPLUS or from wonn.wiley.com/college/mann, chapter 15 Nonparametric Methods This chapter isnot ineluce inthis text but is avaiable for dovinioad from WileyPLUS or from wir viley.com/collage/mann APPENDIX A Explanation of Data Sets At APPENDIX B Statistical Tables 61 ANSWERS TO SELECTED ODD-NUMBERED EXERCISES AND SELF-REVIEW TESTS ANI INDEX 11 © Carl ThackeriStockghoto ‘Are you, as an American, thriving in your life? Or are you struggling? Or, even worse, are you suffering? ‘A pol of 176,803 American adults, aged 18 and older, was conducted January 2 to December 30, 2014, fas part of the Gallup-Healthways Well-Being Index survey. The poll found that while 54.1% of these ‘Americans sad that they were thriving, 42.19% Indicated that they were struggling, and 8.8% mentioned that they were sutering. (See Case Study 1-2) The study of statistics has becom mote popular than ever over the past four decades. The increas ing avalabilty of computers and statistical software packages has enlargad the role of statistics as 2 tool for empirical research. As a result stalstcs is used for research in almost all professions, trom rmadicine to sports, Taday, college stucints in almost al discolinas are recuired to tako atleast one statistics course. Almost all newspapers and magazines these days contain graphs and stories on statistical studies. After you frish reading this book, it snould be much easier to understand these graphs and stores. Every field of study has its own terminology. Statistics is no exception. This introductory chapter ‘explains the basic terms and concepts of statistics. These terms and concepts wil bridge our under- standing ofthe concopts and tachniqus presented in subsoquont chapters. CHAPTER 1.4 Statistics and Types of Statistics Case Study 1-1 2014 Lobbying Spending by Selected Companies Case Study 1-2 Americans’ Life Outlook, 2014 41.2 Basic Terms 1.3. Types of Variables 1.4 Cross-Section Versus. Time-Series Data 1.5 Population Versus Sample 1.6 Design of Experiments 1.7 Summation Notation 2 Chapter 1 Invoducton 1.1 Statistics and Types of Statistics In this section we will leara about statistics and types of statistics. 1.1.1 Whats Statistics? ‘The word statistics has two meanings. In the more common usage, statistics refers to numerical facts, The numbers that represent the income of a family, the age ofa student, the percentage of passes completed by the quarterback ofa football eam, and the starting salary ofa typical college ‘graduate are examples of statistics inthis sense of the word. A 1988 article in U.S. News & World Report mentioned that “Statistics are an American obsession.” During the 1988 baseball World Series between the Los Angeles Dodgers and the Oakland A’s, the then NBC commentator Joe Garagiola reported to the viewers numerical facts about the players’ performances. In response, fellow commentator Vin Scully said, “I love it when you talk statistics.” In these examples, the ‘word statistics vefers to numbers. ‘The following examples present some statistics: 1. During March 2014, a total of 664,000,000 hours were spent by Americans watching March ‘Madness live on TV and/or streaming (Fortune Magazine, March 15, 2015) 2. Approximately 30% of Google's employees were female in July 2014 (USA TODAY, July 24 2014). 3. According to an estimate, an average family of four living in the United States needs $130,357 a year to live the American dream (USA TODAY, July 7, 2014), 4. Chicago’s O'Hare Airpo:t was the busiest airport in 2014, with a total of 881,933 flight arrivals and departures. 5. In 2013, author James Patterson camed $90 million from the sale of his books (Forbes September 29, 2014) 6. About 22.8% of U.S. adults do not have a religious affiliation (Time, May 25, 2015). 7. Yahoo CEO Marissa Mayer was the highest paid female CEO in America in 2014, with a total compensation of $42.1 million. ‘The second meaning of statistics refers to the field or discipline of study. In this sense of the word, statistics is defined as follows. Statistics Statistics is the science of collecting, analyzing, presenting, and interpreting daa, a well as of making decisions based on such analyses. Every day we make decisions that may be personal, business related, or of some other kind Usually these decisions are made under conditions of uncertainty. Many times, the situations or problems we face in the real world have no precise or definite solution, Statistical methods help tus make scientific and intelligent decisions in such situations, Decisions made by using statistical methods are called educated guesses. Decisions made without using statistical (or scientific) ‘methods are pure guesses and, hence, may prove to be unreliable. For example, opening a large store in an area with or without assessing the need for it may affect its success. Like almost all fields of study, statistics has two aspects: theoretical and applied. Theoretical (ot mathematical statistics deals with the development, derivation, and proof of statistical theo- ems, formulas, rules, and laws. Applied statistics involves the applications of those theorems, formulas, rules, and laws to solve real-world problems. This text is concerned with applied sta- tistics and not with theoretical statistics. By the time you finish studying this book, you will have learned how to think statistically and how to make educated guesses. 1.1.2 Types of Statistics Broadly speaking, applied statisties can be divided into two areas: descriptive statistics and inferential statistics. "othe Numbers Racket: How Pll nd Stasis Li” U0, News World Report, uly LL, 1988, pp. 147. a Data source Fortune Magazine, ne 3, 2015. ‘The accompanying chart shows the lobbying spending by five selected companies during 2014. Many com- panies spend millon of dollars to win favors In Washington. According to Fortune Magazine of June 1,2015, “Comeast has remained one of the biggest corporate lobbyists in the country.” In 2014, Comeast spent $17 milion, Google spent $16.8 milion, ATAT spent $14.2 milion, Verizon spent $13.3 milion, and Time ‘Warmer Cable spert $7.8 milion on lobbying. These numbers simply describe the total amounts spent by ‘these companies on lobbying. We arenot drawing any inferences, decisions, or predictions from these data, Hence, this data set and its presentation an example of descriptive statist. 2014 LOBBYING SPENDING BY SELECTED COMPANIES Descriptive Statistics Suppose we have information on the test scores of students enrolled in a statistics class. In statis- tical terminology, the whole set of numbers that represents the scores of students is called a data set, thc name of each student is called an element, and the score of each student is called an observation, (These terms are defined in more detail in Section 1.2.) ‘Many data ses in their original forms are usually very large, especially those collected by federal and state agencies. Consequently, such data sets are not very helpful in drawing eonclu- sions or making decisions. It is easier to draw conclusions from summary tables and diagrams than from the original version ofa dataset. So, we summarize data by constructing tables, draw- ing graphs, or calculating summary measures such as averages. The portion of statistics that helps us do this type of statistical analysis is called descriptive statistics Descriptive Statistics Descriptive statistics consists of methods for organizing, displaying, and describing data by using tables, graphs, and summary measures, ‘Chapters 2 and 3 discuss descriptive statistical methods. In Chapter 2, we learn how to con struct tables and how to graph data, In Chapter 3, we learn how to calculate numerical summary measures, such as averages. ‘Case Study 1-1 presents an example of descriptive statistics. Inferential Statistics In statistics, the collection of all elements of interest is called a population. The selection of a portion of the elements from this population is called a sample. (Population and sample are discussed in more detail in Section 1.5.) A D AMERICANS’ LIFE OUTLOOK, 2014 Sutfering Margot sampling ear 21% ata source: Gallup-eathways Wo8-Deing Index A pollof 176,808 American adults, aged 18 and older, was conducted January 2 to December 30, 2014, as part of the Gallup-Healthways Well-Being Index survey. Gallup and Healthways have been “wracking ‘Americans’ Ie evaluations dal" since 2008. According to this poll in 2014, Americans’ outlook on Iie was the best in seven years, as 54.19% “rated ther les highly enough to be considered thriving,” 42.1% seid they were struggling, and 3.8% mentioned that they were suffering. AS mentioned in the chat, the ‘margin of sampling error was =1%. In Chapter 8, wo will discuss the concept of margin of error, which can bbe combined with these percentages when making inferences. As we notice the results described inthe chat are obtained from a pol of 176,903 adults, We will learn in later chapters how to apply the to the entre population of aduts. Such decision making about the population based on sample results is called inferential statisti, A major portion of statistics deals with making decisions, inferences, predictions, and fore- casts about populations based on results obtained from samples, For example, we may make some decisions about the political views of all college and university students based on the polit- ical views of 1000 students selected from a few colleges and universities. As another example, we ‘may want to find the starting salary of a typical college graduate. To do so, we may select 2000, recent college graduates, find their stating salaries, and make a decision based on this informa- tion. The area of statistics that deals with such decision-making procedures is referred to as inferential statistics, This branch of statistics is also called inductive reasoning or inductive statistics, Inferential Statistics Inferential statistics consists of methods that use sample results to help make decisions or predictions about a population. Case Study 1-2 presents an example of inferential statistics. It shows the results of a survey in which American adulls were asked about their opinions about their lives. Chapters 8 through 15 and parts of Chapter 7 deal with inferential statistics Probability, which gives a measurement ofthe likelihood that a certain outcome will occur, acts asa link between descriptive and inferential statistics, Probability is used to make statements about the occurrence or nonoceurence of an event under uncertain conditions. Probability and probability distributions are discussed in Chapters 4 through 6 and parts of Chapter 7. 1.2BasicTerms 5 dss CONCEPTS AND PROCEDURES b. The following table gives the earnings ofthe world’s top seven female professional athletes forthe year 2014 (ceoworld biz) LA. Briefly describe the two meanings ofthe word statistics 1.2 Briefly explain the types of statistics 2014 Earnings APPLICATIONS Female Profesional Atte ails of los 1.3. Which of the following is an example of descriptive statistics Maria Sharapova 24 ‘Stwhshisan nan of veil stir Daplan, LNs 8 Ine seve by Forne Magan and Srv Mony, pai Seren Wins no Frschating grees (Fortune Tae 1, 2019) Te folowing Kn Yone ‘3 iene ts vonmmary te myponsc of hese preigen Dania Patick iso Aisi the anon mari of rari tra Aen mn —_— Calne Wonk fas ear erent of Repent amber ofa 8 Catton oon 1s 1.2 Basic Terms I is very important to understand the meaning of some basic terms that will be used frequently inthis text. This section explains the meaning of an element (or member), a variable, an observa tion, and a data set. An element and a data set were briefly defined in Section 1.1. This section defines these terms formally and illustrates them with the help of an example. ‘Table 1.1 gives information, based on Forbes magazine, on the total wealth of the world’s eight richest persons as of March 2015. Each person listed in this table is called an element or a member of this group. Table 1.1 contains information on cight elements. Note that elements are also called observational units. Element or Member An element or member ofa sample or population is a specific subject or object (for example, a person, firm, item, state, or country) about which the information is collected. Table 1.1 Total Wealth of the World's Eight Richest me Tae vee set | [ease rar] -—{tatersone 4 wa eerie ao Source: Fores, Mach 23, 2015 5 Chapter 1 Invocucton ‘The total wealth in our example is called a variable. The total wealth is a characteristic of these pessons on which information is collected. Variable A variable is a characteristic under study that assumes different values for different elements. In contrast to a variable, the value of a constant is fixed. A few other examples of variables are household incomes, the number of houses builtin a city per month during the past year, the makes of cars owned by people, the gross profits of com- panies, and the number of insurance policies sold by a salesperson per day during the past month In general, a variable assumes different values for different elements, as illustrated by the total wealth for the eight persons in Table 1.1, For some elements in a data set, however, the values of the variable may be the same. For example, if we collect information on incomes of households, these households are expected to have different incomes, although some of them ‘may have the same income, AA vatiable is often denoted by x. or z. For instance, in Table 1.1, the total wealth for per- sons may be denoted by any one of these letters. Starting with Section 1.7, we will begin to use these letters to denote variables. Each of the values representing the total wealths ofthe eight persons in Table 1.1 is called an observation or measurement. Observation or Measurement The value of a variable for an element is called an observation or measurement, ‘From Table 11, the total wealth of Warren Bulfett was $72.7 billion. The value $72.7 billion is an ‘observation or a measurement Table 1.1 contains eight observations, one for each ofthe eight perzons. ‘The information given in Table 1.1 on the total wealth of the eight richest persons is called the data or a data set. Data Set A data set is a collection of observations on one or more variables. Other examples of data sets are a list of the prices of 25 recently sold homes, test scores of 15 students, opinions of 100 voters, and ages of all employees of a company. a fSiteiSs CONCEPTS AND PROCEDURES Briefly explain the meaning of a member, a variable, a measurement, and a dat set with reference tothe information in this table. 1.6 The following table lists the number of deaths by cause as reported by the Centers for Disease Control and Prevention on 1.4 Explain the meaning of an element, a variable, an observation, and a data st APPLICATIONS February 6, 2015 (Source: www.ede gov). 1.5 The following table lists the number of deaths by cause as ‘Cause of Death Number of Deaths reported by the Centers for Disease Conttol and Prevention on Cawseor Death Number of Dents February 6, 2015 (Source: www.ede gov) Heart disease ‘11.105 (Cause of Death ‘Number of Deaths Cancer Seasl ae Xe Accidents 130557 Heart disease 11.105 ‘Soke ioso7s Cancer 586.881 Alsheimer's disease 84.167 Accidents 130557 Diabetes 18578 Stroke 128978 Influenza and Pneumonia 56979 Alzheimer's disease 84,767 Suicide ania Diabetes 15518 Influenza and Paeumonia 56,979 1 What isthe variable fr this dataset? Suicide 4149 », How many observations ae in this dataset? ‘e. How many clements does this dataset contain? 1.3 Types of Variables 1.3 Types of Variables 7 In Section 1.2, we learned that a variable is a characteristic under investigation that assumes differ- ent values for different elements, Family income, height of a person, gross sales of a company, price of a college textbook, make of the car owned by a family, number of accidents, and status (fresh- ‘man, sophomore, junior, or senior) ofa student enrolled at a university are examples of variables. ‘A variable may be classified as quantitative or qualitative. These (wo types of variables are explained next 1.3.1 Quantitative Variables Some variables (such as the price of a home) can be measured numerically, whereas others (such as hair color) cannot. The price of a home is an example of a quantitative variable while hair color is an example of a qualitative variable. Quantitative Variable A variable that can be measured numerically is called a quantitative variable, The data collected on a quantitative variable are called quantitative data, Income, height, ross sales, price of a home, number of cars owned, and number of accidents are examples of quantitative variables because each of them can be expressed numerically. For instance, the income of a family may be $81,520.75 per yest, the gross sales for a company may bbe $567 million for the past year, and so forth, Such quantitative variables may be classified as either discrete variables or continuous variables. Discrete Variables ‘The values that a certain quantitative variable ean assume may be countable or noncountable. For example, we can count the number of cars owned by a family, but we cannot count the height of a family member, as itis measured on a continuous scale. A variable that assumes countable values is called a discrete variable. Note that there are no possible intermediate values between consecutive values of a discrete variable. Discrete Variable A variable whose values are countable is called a discrete variable. In other ‘words, a discrete variable can assume only certain values with no intermediate values. For example, the number of cars sold on any given day ata car dealership is a discrete vati- able because the number of ears sold must be 0, 1,2, 3, ... and we can count it. The number of cars sold cannot be between 0 and 1, or between I and 2. Other examples of discrete variables are the number of people visiting a bank on any day, the number of cars in a parking lot, the number of cattle owned by a farmer, and the number of students in class. Continuous Variables Some variables assume values that cannot be counted, and they can assume any numerical value between two numbers. Such variables ate called continuous variables. Continuous Variable A variable that can assume any numerical value over a certain interval or intervals is called a continuous variable. ‘The time taken to complete an examination is an example of a continuous variable because itcan assume any value, let us say, between 30 and 60 minutes. The time taken may be 42.6 min- utes, 42.67 minutes, or 42.674 minutes. (Theoretically, we can measure time as precisely as we 8 Chapter 1 Invocucton ‘want, Similarly, the height of person can be measured to the tenth of an inch orto the hun- dredth of an inch. Neither time nor height can be counted in a discrete fashion. Other examples of continuous variables are the weights of people, the amount of soda ina 12-cunce can (note that 4.can does not contain exactly 12 ounces of soda) and the yield of potatoes (in pounds) per ace. Note that any variable that involves money and can assume a large numberof values is typically treated a8 a continuous variable 1.3.2 Qualitative or Categorical Variables Variables that cannot be measured numerically but can be divided into different categories are called qualitative or categorical variables. Qualitative or Categorical Variable A variable that cannot assume a numerical valve but can be classified into two or more nonnumeric categories is called a qualitative or categorical variable, ‘The data collected on such a variable are called qualitative data, For example, the status of an undergraduate college student isa qualitative variable because ‘a student can fall into any one of four categories: freshman, sophomore, junior, ot senior. Other examples of qualitative variables are the gender of a person, the make of a computer, the opinions ‘of people, and the make of a car. Figure 1.1 summarizes the different types of variables. ‘Qualtave or categorical (6.g., make of a ‘computer opinions of ‘people, gender) Discrote Continuous. (.g,,number of eg. length, noises, cars, go, height, accidents) ‘weigh, tie) Figure 1.1. Types of variables. assis) CONCEPTS AND PROCEDURES 1. The amount of time a student spent studying for an exam 1b. The amount of ran last yea in 30 cities 1.7. Explain the meaning ofthe following terms. ‘6 The arvival status of an ailin flight (eaty, on time, late 8 Quantitative variable canceled) at an airport, i, Qualitative variable 4. A person's blood type «Discrete variable « The amount of gasoline put into a car ata gas station 4. Continuous variable 19 Classify the following quantitative variables as diserete or © Quantitative data 28 sib he Following a ables as £. Qualitative data ou ‘a The amount of time a student spent studying for an exam 'b. The amount of rain last year in 30 cities APPLICATIONS «. The amount af gasoline put into acar at a ges station ‘of the following variables are quantitative and 4. The number of customers in the line waiting for service a a bank ata given time 1.10 A survey of families living in a certain city was conducted to collet information on the following variables: age of the oldest per- son in the family, numberof family members, number of males in the family, number of females in the family, whether or not they own a house, income ofthe family, whether or not the family tok vacations 1.4 Cross-Section Versus Time-Series Data 9 1, Which ofthese variables are qualitative variables? , Which ofthese variables are quantitative variables? ‘6 Which of the quantitative variables of part b are discrete variables? 4, Which of the quantitative vatisbes of part b are continuous during the past one year, whether or not they axe happy with thet financial situation, and the amount of their monthly mortgage or rent variables? 1.4 Cross-Section Versus Time-Series Data Based on the time over which they are collected, data can be classified as either cross-section or time-series data, 1.4.1 Cross-Section Data (Cross-section data contain information on different elements of a population or sample for the ‘same period of time. The information on incomes of 100 families for 2015 is an example of cross-section data, All examples of data already presented in this chapter have been cross- section data, Cross-Section Data Data collected on different elements at the same point in time or for the same period of time are called cross-section data, ‘Table 1.1, reproduced here as Table 1.2, shows the total wealth of cach of the eight richest persons in the world, Because this table presents data on the total wealth of eight persons for the same period, itis an example of cross-section data, Jable 1.2 Total Wealth of World’s Eight Richest Persons ‘Total Wealth Name Cillions of dollars) Bill Gates 792 Carlos Slim Hela Tm Warren Butiett, nr Amancio Ortega O45 Larry Elson 543. Charles Kock 29 David Koch 29 Christy Walton a7 ‘Source: Forbes, Mash 23, 2015. 1.4.2 Time-Series Data ‘Time-series data contain information on the same element at different points in time. Informa- tion on U.S. exports for the years 2001 to 2015 is an example of time-series data. Time-Series Data Data collected on the same element for the same variable at different points in time or for different periods of time are called time-series data. 10 Chapter Invoduction ‘The data given in Table 1.3 are an example of time-series data. This table lists the average tuition and fee charges in 2014 dollars for in-state students at four-year public institutions in the United States for selected years. Table 1.3. Average Tuition and Fees in 2014 Dollars at Four-Year Public Institutions ‘Tltion and Fee Years (ottars) 1974-75 2469) 198-85 2810 1994-95 4343 2004-05 sass 2014-15 9139 Source The College Boar [> tate) CONCEPTS AND PROCEDURES 1.11. Explain the difference between cross-section and time-series b, Number of accidents each year in Dallas from 2000 to 2015 data, Give an example ofeach ofthese two types of data ‘e Number of supermarkets in each of 40 cites as of December 31,2015 APPLICATIONS 4. Gross sales of 200 icecream parlors in July 2015, 12 Classify the following as cross-section or time-series data a. Food bil of a family or each month of 2015 1.5 Population Versus Sample ‘We will encounter the terms population and sample on almost every page of this text. Conse- quently, understanding the meaning of each of these two terms and the difference between them is crucial ‘Suppose a statistician is interested in knowing the following: A, The percentage of all votes in a city who will vote for a particular candidate in an election 2, Last year’s gross sales ofall companies in New York City 3. The prices ofall homes in California In these examples, the statistician is interested in all voters in a city, all companies in New York City, and all homes in California, Each of these groups is called the population for the respective example. In statistics, a population does not necessarily mean a collection of people Itcan, in fact, be a collection of people or of any kind of item such as houses, books, television sets, o cars. The population of interest is usually called the target population. Population or Target Population A population consists of all elements—individuals, items, or objects —whose characteristics are being studied. The population that is being studied is also called the target population, Most of the time, decisions are made based on portions of populations. For example, the ‘lection polls conclucted in the United States to estimate the percentages of voters who favor ‘various candidates in any presidential election are based on only a few hundred or afew thousand ‘voters selected from across the country. In this case, the population consists of all registered voters in the United States, The sample is made up of a few hundred or few thousand voters who are included in an opinion poll. Thus, the collection of a number of elements selected from a population is called a sample, Figure 1.2 illustrates the selection of a sample from a population Sample A portion of the population selected for study is referred to as a sample. Population Figure 1,2. Population and sample. ‘The collection of information {rom the elements of a sample is called a sample survey. When we collect information on all elements ofthe target population, itis called a census, Often ‘the target population is very lage. Hence, in practice, a census is rarely taken because itis expen- sive and time-consuming. In many cases, itis even impossible to identify each clement of the target population. Usually to conduct a survey, we select a sample and collect the required infor- mation from the elements included in that sample, We then make decisions based on this sample information, As an example, if we collect information on the 2015 incomes of all families in Connecticut, it will be referred to as a census. On the other hand, if we collect information on the 2015 incomes of 50 families selected from Connecticut, it wil be called a sample survey. Census and Sample Survey A survey that includes every member of the population is called a ‘census. A survey that includes only a portion of the population is called a sample survey. ‘The purpose of conducting a sample survey is to make decisions about the corresponding population. It is important that the results obtained from a sample survey closely match the results that we would obtain by conducting a census. Otherwise, any decision based on a sample survey will not apply to the corresponding poptlation. As an example, to find the average income of families living in New York City by conducting a sample survey, the sample must contain families who belong to different income groups in almost the same proportion as they exist in the population. Such a sample is called a representative sample. Inferences derived from a repre- sentative sample will be mote reliable, Representative Sample A sample that represents the characteristics of the population as closely as possible is called a representative sample, 1.5 Population Versus Sample 12 Chapter Inzoduction A sample may be selected with or without replacement. In sampling with replacement, each time we select an element from the population, we put it back in the population before we select the next clement. Thus, in sampling with replacement, the population contains the same number of items each time a selection is made. As a result, we may select the same item more than once in such a sample, Consider a box that contains 25 marbles of different colors. Suppose we draw ‘a marble, record its color, and put it back in the box before drawing the next marble. Every time ‘we draw a marble from this box, the box contains 25 marbles. This is an example of sampling ‘with replacement. The experiment of rolling a die many times is another example of sampling ‘with replacement because every roll has the same six possible outcomes. Sampling without replacement occurs when the selected element is not replaced in the population, In this ease, each time we select an item, the size ofthe population is reduced by one clement. Thus, we cannot select the same item more than once in this type of sampling. Most of the time, samples taken in statistics are without replacement. Consider an opinion poll based on certain number of voters selected from the population of all eligible voters. In this case, the same voter is not selected more than once. Therefore, this is an example of sampling without replacement 1.5.1 Why Sample? Most of the ime surveys are conducted by using samples and not a census of the population, Three ofthe main reasons for conducting a sample survey instead ofa census ate listed next Time In most cases, the size of the population is quite large. Consequently, conducting a census takes a long time, whereas a sample survey can be conducted very quickly. Itis time-consuming to interview or contact hundreds of thousands or even millions of members of a population. On the other hand, a survey of a sample of a few hundred elements may be completed in much less time. In act, because of the amount of time needed to conduct a census, by the time the census is com- pleted, the results may be obsolete Cost “The cost of collecting information from all members of a population may easily fall outside the limited budget of most, if not all, surveys. Consequently, to stay within the available resources, conducting a sample survey may be the best approach, Impossibility of Conducting a Census ‘Sometimes it is impossible to conduct a census. First, it may not be possible to identify and access each member of the population. For example, if a researcher wants to conduct a survey about homeless people, itis not possible to locate each member of the population and include him ‘or her in the survey, Second, sometimes conducting a survey means destroying the items included in the survey. For example, to estimate the mean life of lightbulbs would necessitate burning out all the bulbs included in the survey. The same is true about finding the average life of batteries. In such cases, only a portion of the population can be selected for the survey. 1.5.2 Random and Nonrandom Samples Depending on how a sample is drawn, it may be a random sample or a nonrandom sample. Random and Nonrandom Samples A random sample is a sample drawn in such a way that ‘each member of the population has some chance of being selected in the sample. In a nonran- dom sample, some members of the population may not have any chance of being selected in the sample. Suppose we have a list of 100 students and we want to select 10 of them. If we write the names of all 100 students on pieces of paper, put them in a hat, mix them, and then draw 10 names, the result will be a random sample of 10 students. However, if we arrange the ‘names of these 100 students alphabetically and pick the first 10 names, it will be a nonrandom sample because the students who are not among the fist 10 have no chance of being selected in the sample ‘A random sample is usually a representative sample. Note that for a random sample, each ‘member ofthe population may or may not have the same chance of being included in the sample. ‘Two types of nonrandom samples are a convenience sample and a judgment sample, In a convenience sample, the most accessible members of the population are selected to obtain the results quickly. For example, an opinion poll may be conducted in a few hours by collecting information from certain shoppers at a single shopping mall. In a judgment sample, the mem- bers are selected from the population based on the judgment and prior knowledge of an expert. Although such a sample may happen to be a representative sample, the chances of it being so are small Ifthe population is large, itis not an easy task to select a representative sample based on judgment. ‘The so-called pseudo polls are examples of nontepresentative samples. For instance, a survey conducted by a magazine that includes only its own readers does not usually involve a repre- sentative sample, Similarly, a poll conducted by a television station giving two separate telephone numbers for yes and no votes is not based on a representative sample. In these two examples, respondents will be only those people who read that magazine or watch that television station, ‘who do not mind paying the postage or telephone charges. or who feel compelled to respond. Another kind of sample is the quota sample. To select such a sample. we divide the target population into different subpopulations based on certain characteristics, Then we select a sub- sample from each subpopulation in such a way that each subpopulation is represented in the sample in exactly the same proportion asin the target population, As an example of a quota sample, suppose we want to select a sample of 1000 persons from a city whose population has 48% men and 52% women. To select a quota sample, we choose 480 men from the male population and 520 women from the female population, The sample selected in this way will contain exactly 48% men and 52% women, Another way to select a quota sample is to select from the population fone person at a time until we have exactly 480 men and $20 women, ‘Uniil the 1948 presidential election in the United States, quota sampling was the most com- ‘monly used sampling procedure to conduct opinion polls. The voters included in the samples ‘were selected in such a way that they represented the population proportions of voters based on age, sex, education, income, race, and so on, However, this procedure was abandoned after the 1948 presidential election, in which the underdog, Harry Truman, defeated Thomas E. Dewey, ‘who was heavily favored based on the opinion pols. Fist, the quota samples failed to be repre~ sentative because the interviewers were allowed to fil their quotas by choosing voters based on. their own judgments. This caused the selection of more upper-income and highly educated people, who happened to be Republicans. Thus, the quota samples were unrepresentative of the popula- tion because Republicans were overrepresented in these samples. Second, the resulls of the ‘pinion polls based on quota sampling happened to he false because a large number of factors differentiate voters, but the pollsters considered only a few of those factors. A quota sample based fon a few factors will skew the results. A random sample (one that is not based on quotas) has a much better chance of being representative of the population of all voters than a quota sample based on a few factors. 1.5.3 Sampling and Nonsampling Errors ‘The results obtained from a sample survey may contain two types of errors: sampling and non- sampling ertors. The sampling erzor is also called the chance ertor, and nonsampling ervors are also called the systematic errors. Sampling or Chance Error ‘Usually, all samples selected from the same population will give different results because they contain different elements of the population. Moreover, the results obtained from any one sample will not be exactly the same as the ones obtained from a census. The difference between a sample result and the result we would have obtained by conducting a census is called the sampling ‘error, assuming thatthe sample is random and no nonsampling error has been made. 1.5 Population Versus Sample 3 14 Chapter nzoduction Sampling Error The sampling error is the difference between the result obtained from a sample survey and the result that would have been obtained if the whole population had been included in the survey. ‘The sampling error occurs because of chance, and it cannot be avoided, A sampling error can ‘occur only in a sample survey. It does not oceur in a census, Sampling error is discussed in detail in Chapter 7, and an example of itis given there. Nonsampling Errors or Biases Nonsampling errors or biases can occur both in a sample survey and in a census. Such errors ‘occur because of human mistakes and not chance, Nonsampling errors are also called systematic errors or biases Nonsampling Errors or Biases The errors that occur in the collection, recording, and tabulation of data are called nonsampling errors or biases. ‘Nonsampling errors can be minimized if questions are prepared carefully and data are han- dled cautiously. Many types of systematic errors or biases can occur in a survey, including selec- ‘don error, nonresponse error, response error, and voluntary response error. Figure 1.3 shows the types of errors “Types of evors Sampling Nonsamping or systematic chanee error ‘rors or bases. Selection __Nenresponse Response orbias orbias orbias Figure 1.8 Types of eros. ( Selection Error or Bias ‘When we need to select a sample, we use a list of elements from which we draw a sample, and this list usually does not inckude many members of the target population. Most of the time it is not feasible to include every member ofthe target population in this list. This list of members of the population that is used to select a sample is called the sampling frame. For example, if we use a telephone directory to sclect a sample, the list of names that appears in this directory makes the sampling frame. In this case we will miss the people who are not listed inthe telephone direc tory, The people we miss, for example, will be poor people (including homeless people) who do not have telephones and people who do not want to be listed in the directory. Thus, the sampling frame that is used to select a sample may not be representative of the population, This may cause the sample results to be different from the population results, The error that occurs because the sampling frame is not representative of the population is called the selection error or bias. Selection Error or Bias The list of members of the target population that is used to select a sample is called the sampling frame, The error that occurs because the sampling frame is not representative of the population is called the selection error or bias, If sample is nonrandom (and, hence, nonrepresentative), the sample results may be quite different from the census results Gi) Nonresponse Error or Bias Even if our sampling frame and, consequently, the sample are representative of the population, nonresponse error may occur because many of the people included in the sample may not respond to the survey. Nonresponse Error or Bias The error that occurs because many of the people included in the sample do not respond to a survey is called the nonresponse error or bias, ‘This type of error occurs especially when a survey is conducted by mail. A lot of people do not return the questionnaires, Ithas been observed that families with low and high incomes do not respond to surveys by mail. Consequently, such surveys overrepresent middle-income families. This kind of error occurs in other types of surveys, too. For instance, in a face-to-face survey Where the interviewer interviews people in their homes, many people may not be home when the interviewer visits their homes. The people who are home atthe time the interviewer visits and the ‘ones who aze not home at that time may differ in many respects, causing a bias in the survey results, This kind of error may also occur in a telephone survey. Many people may not be home ‘when the interviewer calls. This may distort the results, To avoid the nonresponse error, every ‘effort should be made to contact all people included in the sample. (ii) Response Error or Bias ‘The response error or bias occurs when the answer given by a person included in the survey is not correct. This may happen for many reasons. One reason is that the respondent may not have ‘understood the question. Thus, the wording of the question may have caused the respondent to answer incorrectly It has been observed that when the same question is worded differently, many people do not respond the same way. Usually such an error on the part of respondents is not intentional Response Error or Bias The response error or bias occurs when people included in the survey ddo not provide correct answers, ‘Sometimes the respondents do not want to give correct information when answering & ques- tion, For example, many respondents will not disclose their true incomes on questionnaires or in interviews. When information on income is provided, itis almost always biased in the upward direction, ‘Sometimes the race of the interviewer may affect the answers of respondents. This is espe- cially true if the questions asked are about race relations. The answers given by respondents may differ depending on the race of the interviewer Gv) Voluntary Response Error or Bias Another source of systematic error is a survey based on a Voluntary response sample. Voluntary Response Error or Bias Voluntary response error or bias occurs when a survey is not conducted on a randomly selected sample but on a questionnaire published in a magazine or newspaper and people are invited to respond to that questionnaire, 1.5 Population Versus Sample 5 16 Chapter Inzoduction ‘The polls conducted based on samples of zeaders of magazines and newspapers sulfer from voluntary response error or bias. Usually only those readers who have very strong opinions about the issues involved respond to such surveys. Surveys in which the respondents are requited to call some telephone numbers also suffer from this type of error. Here, to participate, many times a respondent has to pay for the cal, and many people do not want to bear this cost. Conse~ quently, the sample is usually neither random nor representative of the target population because participation is voluntary. 1.5.4 Random Sampling Techniques ‘There ate many ways to select a random sample. Four of these techniques ate discussed next. ‘Simple Random Sampling From a given population, we can select a large number of samples of the same size. If each of these samples has the same probability of being selected, then itis called simple random sampling. Simple Random Sampling In this sampling technique, each sample of the same size has the same probability of being selected. Such a sample is called a simple random sample. ‘One way to select a simple random sample is by a lottery or drawing. For example, if we need to select 5 students from a class of 50, we write each of the 50 names on a separate piece of paper. Then, we place all SO names in a hat and mix them thoroughly. Next, we draw I name randomly from the hat. We repeat this experiment four more times. The 5 drawn names make up a simple random sample. ‘The second procedure to select a simple random sample is to use a table of random numbers, which has become an outdated procedure. In this age of technology, it is much easier to use a statistical package, such as Minitab, to select a simple random sample. ‘Systematic Random Sampling ‘The simple random sampling procedure becomes very tedious if the size of the population is large. For example, if we need to select 150 households from a list of 45,000, itis very time consuming either to write the 45,000 names on pieces of paper and then select 150 houscholds or {ouse a table of random numbers. In such cases, iis more convenient to use systematic random sampling. ‘The procedure to select a systematic random sample is as follows. Inthe example just men- tioned, we would arrange all 45,000 households alphabetically (or based on some other char teristic). Since the sample size should equal 150, the ratio of population to sample size is 45,000/150 = 300. Using this ratio, we randomly select one household from the fist 300 louse- hholds in the arranged lst using either method. Suppose by using ether of the methods, we select the 210th household. We then select the 210th houschoid from every 300 households inthe list. In other words, our sample includes the households with numbers 210, $10, 810, 1110, 1410, 1710, and so on. Systematic Random Sample In systematic random sampling, we first randomly select one ‘member from the first k units of the list of elements arranged based on a given characteristic ‘where kis the number obtained by dividing the population size by the intended sample size. Then every kth member, starting with the first selected member, is included in the sample. Stratified Random Sampling ‘Suppose we need to select a sample from the population of a city, and we want households with different income levels to be proportionately represented in the sample. In this case, instead of selecting a simple random sample or a systematic random sample, we may prefer to apply a dif ferent technique. First, we divide the whole population into different groups based on income 1.5 Population Versus Sample 17 levels. For example, we may form three groups of low-, medium. and high-income households. We will now have three subpopulations, which are usually called strata. We then select one sample from each subpopulation or stratum, The collection of all three samples selected from the three strata gives the required sample, called the stratified random sample, Usually, the sizes of the samples selected from different strata are proportionate tothe sizes of the subpopu- lations in these strata, Note that the elements of each stratum are identical with regard to the possession of a characteristic Stratified Random Sample Ina stratified random sample, we frst divide the population into subpopulations, which are called strata, Then, one sample is selected from each of these stata ‘The collection of all samples from al strata gives the stratified random sample, ‘Thus, whenever we observe that a population differs widely in the possession of a character- istic, we may prefer to divide it nto different strata and then select one sample from each stratum, We can divide the population on the basis of any characteristic, such as income, expenditure, gender, education, race, employment, or family size. Cluster Sampling ‘Sometimes the target population is scattered over a wide geographical area, Consequently, if a simple random sample is selected, it may be costly to contact each member of the saraple. In such ‘case, we divide the population into different geographical groups or clusters and, as a first step, select a random sample of certain clusters from all clusters, We then take a random sample of ‘certain elements from each selected cluster. For example, suppose we are to conduct a survey of households in the state of New York. First, we divide the whole state of New York into, say, 40 regions, which are called elusters or primary units. We make sure that all clusters are similar and, hence, representative of the population. We then select at random, say, 5 clusters from 40. ‘Next, we randomly select certain households from each of these 5 clusters and conduct a survey of these selected households. This is called cluster sampling. Note that all clusters must be representative of the population, Cluster Sampling In cluster sampling, the whole population is first divided into (geographi- cal) groups called clusters. Each cluster is representative of the population. Then a random sample of clusters is selected. Finally, a random sample of elements from each of the selected clusters is selected. fst CONCEPTS AND PROCEDURES 1.17 Explain the following four sapling techniques. 1.13 Briefly explain the terms population, sample, representative t Sree ea tig mpl, sampling with replacement, and sampling without replacement b gems am min 1.14 Give one example each of sapling with and sampling without Cluster sampling replacement, 118 In which sampling technique do all samples ofthe same size LAS Briefly explain the difference between a census and a sample selected from a population have the same chance of being selected? survey. Why is conducting a sample suvey preferable to conducting a census? APPLICATIONS 116 Explain the following, 1.19 Explain whether each of the following constitutes data col- 2. Random sample lected from a population or a sample ». Nonrandom sample a. The numberof pizzas ordered on Fridays during 2015 at all of| «, Convenience sample the pizza parlor in your town. 4. Judgment sample . The dollar values of auto insurance claims filed in 2015 for ©. Quota sample 200 randomly selected policies. 18 Chapter nvoduetion «. The opening price of each of the S00 stocks inthe S&P 500 stock index on January 4, 2016 4. The toal home attendance foreach of the 18 teams in Major League Soccer during the 2015 season, «. The living azeas of 3S houses listed forsale on Masch 7, 2015 in Chicago, Mlinois 1.20 A statistics professor wanted to find out the average GPA, (grade point average) forall students at her university: She used all, students enzlled in er statistics class as a sample and collected infor. ‘ation on their GPAs to find the average GPA. Is tis sample a random or a nonrandom sample? Explain. ’, What kindof sample it? In other words is ita simple random sample, a systematic sample, 2 stratified sample, a cluster sample, a convenience sample, a judgment sample, oF a quota sample? Explain, «What kind of systematic erro, if any, will be made with this kind of sample? Explain, 1.21 A statistics professor wanted to find the average GPA (grade point average) ofall students at her university. The professor obtains 4 list ofall students enrolled at the university from the registrar's office and then selects 150 students at random from this list using a tial software package such as Minitab a. Is this sample a random or a nonrandom sample? Explain, , What kind of sample isi In other words, is ta simple ran- ddom sample, a systematic sample, a statfied sample, a clus ter sample, a convenience sample, a judgment sample, of & quota sample? Explain. « Do you think any systematic error will be made in this case? Explain, 1.22 A statistics professor wanted to select 20 students {rom his class of 300 students to collect detailed information oa the profiles of his students. The profestor enters the naris of all students enrolled in his class on a computer. He then selects a sample of 20 students at random Using a staistial software package such as Minitab 4. Is this sample a random or a nonrandom sample? Explain. ’b, What kindof sample sit? In other words, is ita simple random sample, a systematic sample, a stratified sample. a cluster ‘sample, a convenience sample, a judgment sample, of a quota sample? Explain 1.6 Design of Experiments «© Do you think any systematic error will be made in this case? Explain 1.23 A company has 1000 employees, of whom 58% are men and. 42% are women. The research department atthe company wanted to conduct a quick survey by selecting a sample of 50 employees and asking them about their opinions on an issue. They divided the popu lation of employees into two groups. men and women, and then selected 29 men and 21 women ftom these cespective groups. The Inerviewers were free to choose any 29 men and 21 women they ‘wanted, What kind of sample is it? Explain, 1.24 A magazine published a questionaite forts readers to fill out and mail (o the magazine’ olfce. In the questionnaire, cell phone ‘owners were asked how much they would have to be pu to do with- ‘out ther ell phones for one month. The magazine received responses from $439 cell phone owners a, What ype of sample is this? Explain, b. To what kind(s) of systematic error, if any, would this survey be subject? 1.25 A researcher wanted 1 conduct survey of major companies to find out what benefits are offered to their employees, She mailed {questionnaires to 2500 companies and received questionnaires back {rom 493 companies. What kind of systematic erzor does this survey suffer from? Explain, 1.26 An opinion poll agency conducted a survey based on a random sample in which the interviewers called the parents included in the sample and asked them the following questions 4, Do you believe in spanking children? fi, Have you ever spanked your children? lil I'he answer tothe second question is yes, how often? ‘What kind of systematic ertor, if any, does this survey suffer fom? Explain 1.27 A survey based on a random sample taken from one borough of| [New York City showed that 65% of the people living there would prefer to live somewhere other than New York City i they had the ‘opportunity todo so, Base on this result can the researcher say that {65% of people living in New York City would prefer to live some- where else if they had the opportunity to do so? Explain ‘To use statistical methods to make decisions, we need access to data, Consider the following ‘examples about decision making: 1. A government agency wants to find the average income of households in the United States. 2. A.company wants to find the percentage of defective items produced by a machine 3. A researcher wants to know if there is an association between eating unhealthy food and cholesterol level 4. A pharmaceutical company has developed a new medicine for a disease and it wants to check if this medicine cures the disease Al ofthese cases relate to decision making. We cannot reach a conclusion in these examples unless we have access to dala, Data can be oblained from observational studies, experiments, or surveys. This section is devoted mainly to controlled experiments. However, it also explains observational studies and how they differ from surveys. Suppose two diets, Diet 1 and Diet 2, are being promoted by two different companies, and ‘each of these companies claims that its diet is successful in reducing weight. A research nutrition ist wants fo compare these diets with regard to their effectiveness for losing weight. Following are the two alternatives forthe researcher to conduc this research, 1. The researcher contacts the persons who are using these diets and collects information on their weight loss. The researcher may contact as many persons as she has the time and finan- cial resources for. Based on this information, the researcher makes a decision about the comparative effectiveness of these diets. 2. The researcher selects a sample of persons who want to lose weight, divides them randomly into wo groups, and assigns each group to one of the two diets, Then she compares these two ‘groups with regard to the effectiveness of these diets ‘The first alternative is an example of an observational study, and the second is an example of a controlled experiment, In an experiment, we exercise control over some factors when We ddesign the study, but in an observational study we do not impose any kind of control Treatment A condition (or a set of conditions) that is imposed on a group of elements by the ‘experimenter is called a treatment. In an observational study the investigator does not impose a treatment on subjects or ele- ‘ments included in the study. For instance, inthe first alternative mentioned above, the researcher simply collects information from the persons who are currently using these diets. In this case, the persons were not assigned to the two diets at random: instead, they chose the diets voluntarily. Ia this situation the researcher's conclusion about the comparative effectiveness of the two diets ‘ay not be valid because the effects of the diets will be confounded with many other factors or variables. When the effects of one factor cannot be separated from the effects of some other factors, the effects are said to be confounded. The persons who chose Diet 1 may be completely different with regard to age, gencter, and cating and cxercise habits from the persons who chose Diet 2, Thus, the weight loss may not be due entirely to the diet but to other factors or variables as well, Persons in one group may aggressively manage both diet and exercise, for example, whereas persons in the second group may depend entirely on diel. Thus, he effects ofthese olher variables will get mixed up (confounded) with the effect of the diets, Inder the second alternative, the researcher selects a group of people, say 100, and ran- domly assigns them to two diets, One way to make random assignments is to write the name of ‘each of these persons on a piece of paper, put them in a hat, mix them, and then randomly draw 50 names from this hat. These 50 petsons will be assigned to one of the two diets, say Diet 1. The ‘remaining 50 persons will be assigned to the second diet, Diet 2. This procedure is called rand= omization, Note that random assignments can also be made by using other methods such as a table of random numbers of technology. Randomization The procedure in which elements are assigned to different groups at random is called randomization ‘When people are assigned to one or the other of two diets at random, the other differences among people in the two groups almost disappear. In this case these groups will not differ very ‘muuch with regard to such factors as age, gender, and cating and exercise habits. The two groups will be very similar to each other. By using the random process to assign people to one or the other of two diets, we have controlled the other factors that can affect the weights of people. Consequently, this is an example of a designed experiment. 'As mentioned earlier, a condition (or a set of conditions) that is imposed on a group of ele- ments by the experimenter is called a teatment. In the example on diets, each of the two diet \ypes is called a treatment. The experimenter randomly assigns the elements (o these two treat ments, Again, in such cases the study is called a designed experiment, 1.6 Design of Experiments, 19 20. Chapter nvoduction ‘An example of n ebsonations sty. ‘An example ofa designed experiment Designed Experiment and Observational Study When the experimenter controls the (random) assignment of elements (o different treatment groups, the study is said to be a designed experi- ‘ment, In contrast, in an observational study the assignment of elements to different treatments is voluntary, and the experimenter simply observes the results of the study. ‘The group of people who receive a treatment is called the treatment group, and the group of people who do not receive a treatment is called the control group. In our example on diets, both groups are treatment groups because each group is assigned to one of the two types of diet. ‘That example does not contain a control group. Treatment and Control Groups The group of elements that receives a treatment is called the treatment group, and the group of elements that does not receive a treatment is called the control group. EXAMPLE 1-1 Testing a Medicine Suppose a pharmaceutical company has developed a new medicine to cure a disease. To see ‘whether or not this medicine is effective in curing this disease, it will have to be tested on a group of humans. Suppose there are 100 persons who have this disease; 50 of them voluntarily decide to take this medicine, and the remaining 50 decide not to take it. The researcher then compares the cute rates for the two groups of patients. Is this an example of a designed experiment or an observational study? Solution This is an example of an observational study because 50 patients voluntarily joined the treatment group: they were not randomly selected. In this case, the results of the study may not be valid because the effects of the medicine will be confounded with other variables. All of the patients who decided to take the medicine may not be similar to the ones who decided not to take it, Ibis possible that the persons who decided to take the medicine are inthe advanced stages of the disease. Consequently, they do not have much to lose by being in the treatment group. The patients in the two groups may also differ with regard to other factors such as age, gender, and so on, . EXAMPLE 1-2 Testing a Medicine Reconsider Example 1-1. Now, suppose that out of the 100 people who have this disease, 50 are selected at random. These 50 people make up one group, and the remaining 50 belong to the second group. One of these groups is the treatment group, and the second is the control group, ‘The researcher then compares the eure rates for the two groups of patients Is this an example of ‘a designed experiment ot an observational study? Solution In this ease, the two groups are expected to be very similar to each other. Note that ‘we do not expect the two groups to be exactly identical, However, when randomization is used the two groups are expected to be very similar, Afler these two groups have been formed, one ‘group will be given the actual medicine. This group is called the treatment group. The other group ‘will be administered a placebo (a dummy medicine that looks exactly like the actual medicine). ‘This group is called the control group. This is an example of a designed experiment because the patients are assigned to one of two groups—the treatment or the control group—randomly. Usually in an experiment like the one in Example 1-2, patients do not know which group they belong to, Mast of the time the experimenters do not know which group a patient belongs to, ‘This is done to avoid any bias or distortion inthe results of the experiment, When neither patients nor experimenters know who is taking the real medicine and who is taking the placebo, itis called ‘a double-blind experiment, For the results ofthe study to be unbiased and valid, an experiment ‘must be a double-blind designed experiment, Note that if either experimenters or patients or both 1.8 Design of Experiments 21 ‘have access to information regarding which patients belong to weatment or control groups it will 1no longer be a double-blind experiment. ‘The use of a placebo in medical experiments is very important. A placebo is just a dummy pill that looks exactly like the real medicine. Often, patients respond to any kind of medicine. ‘Many studies have shown that even when the patients were given sugar pills (and did not know it), many of them indicated a decrease in pain, Patients respond to a placebo because they have confidence in their physicians and medicines. This is called the placebo effect. Note that there can be more than (wo groups of elements in an experiment, For example, an investigator may need to compare three diets for people with regard to weight loss. Here, in a designed experiment, the people will be randomly assigned to one of the three diets, which are the three treatments In some instances we have to base our research on observational studies because itis not feasible to conduct a designed experiment. For example, suppose a researcher wants to compare the starting salaries of business and psychology majors. The researcher will have to depend on an observational study. She wall select two samples, one of recent business majors and another of secent psychology majors. Based on the stating salaties ofthese two groups, the researcher will make a decision. Note that, here, the effects of the majors on the starting salaries of the two groups of graduates will be confounded with other variables. One of these other factors is that the business and psychology majors may be different in regard to intelligence level, which may affect their salaries. However, the researcher cannot conduct a designed experiment in this case. She cannot select a group of persons randomly and ask them to major in business and select another group and ask them to major in psychology. Instead, persons voluntarily choose their majors, Ina survey we do not exercise any control over the factors when we collect information. This characteristic of a survey makes it very close to an observational study. However, a survey may bbe based on a probability sample, which differentiates it from an observational study. fan observational study or a survey indicates that 1wo variables are related, it does not mean that there is a cause-and-effect relationship between them, For example, if an economist takes a sample of families, collects data on the incomes and rents paid by these families, and establishes ‘an association between these two variables, it does not necessarily mean that families with higher incomes pay higher rents. Here the effects of many variables on rents are confounded. A family ‘may pay a higher rent not because of higher income but because of various other factors, such as family size, preferences, or place of residence. We cannot make a statement about the cause-and- effect relationship between incomes and rents paid by families unless we control for these other variables, The association between incomes and rents paid by families may fit any ofthe follow- ing scenarios 1. These two variables have a cause-and-effect relationship. Families that have higher incomes do pay higher rents. A change in incomes of families causes a change in rents paid ‘The incomes and rents paid by families do not have a cause-and-effect relationship. Both of these variables have a cause-and-effect relationship with a third variable. Whenever that third variable changes, these two variables change, 3. The effect of income on rent is confounded with other variables, and this indicates that income affects rent paid by families. If our purpose in a study isto establish a cause-and-effect relationship between two varia- bles, we must control for the effects of other variables. In other words, we must conduct a designed study, EXERCISES CONCEPTS AND PROCEDURES 1.30 In March 2005, the New England Journal of Medicine pab- lished the results of a 10-year clinical tral of low-dose aspirin therapy 1.28 Explain the difference between a survey and an experiment. for dhe cardiovascular heath of wonnea (Time, March 21, 2005). The 1.29 Explain the difference between an observational study and an study was based on 40,000 healthy women, most af whom were in experiment ‘heir 40s and 50s when the trial began, Half of these women were 22 Chapter nvodution sudministered 100 mg of asprin every other day, and the others were given a placebo, Assume thatthe women were assigned randomly (0 these two groups. Is this an observational study or a designed experiment? Explain. >, From the information given above, can you determine whet or not this isa double-blind stady? Explain. Lf not, what addi tional information would you need? 131 Th March 2005, The New England Journal of Medicine pub- ished the results of «10-year clinical til of low-dose asprin therapy for the cardiovascular health of wornen (Time, March 21, 2008). The study was based on 40,000 healthy women, most of whom were in their 40s and 50s when the tial began, Half of these women were sudministered 100 mg of asprin every other day, and the others were given aplacebo. The study looked a the incidences of heart attacks in the two groups of women. Overall the study did not find a statistically significant difference in heart attacks between the two groups of women. However, the study noted that among women who were a least 65 years old when the study began, there was a lower incidence of heat attack for those who took aspirin than for those who took a placebo, Suppose that some medical researchers want to study this Phenomenon more closely. They recruit 2000 healthy women aged 65 years and older, and randomly divide them into two groups. One group takes 100 mg of aspisin every other day, andthe other group takes a placebo, The women do not know to which group they belong, but the doctors who are conducting the study have access to thie information Is this an observational study or a designed expesim Explain, Ie tis a double-blind study? Explain. 132 In March 2005, The New England Journal of Medicine pub- lished the results ofa 10-year clinical tal of low-dose aspirin therapy for the cardiovascular heath of women (Time, March 21, 2008). The study was based on 40,000 healthy women, most of whom were in their 40s and 50s when the tial began, Half of these women were sudministeed 100 mg of asptin every other day, and the olbers were sven a placebo, The study noted that among wornen who were atleast 65 years old when the study began, there was « lower incidence of heart attack for those who took aspirin than for these wi took pla- cebo. Some medical researchers want to study thi phenomenon more closely. Tay reerit 2000 healthy women aged 65 years and oder, and randomly divide them into two groups. One group takes 100 mg of sspirin every other day, and the olaer group takes placebo, Neil patients nor doctors now which group patients belong to 1.7 Summation Notation Is this an observational study or a designed experiment? Explain b. Is this study a double-blind study? Explain, 1.33 A federal government think tank wanted to investigate whether job traning program helps the families who are on welfare to get off the welfare program. The researchers at this agency seleced 5000, volunteer families who were on welfare and offered the adults in those {amnilies fre job training. The researchers selected another group of 5000 volunteer families who were on welfare and did not offer them such job taining. After 3 years the two groups were compared in regard tothe percentage of families who got off wells, Is thie an ‘observational study ofa designed experiment? Explain 1.34 A federal government think tank wanted t investigate whether job raining program helps the families who are on wellare to get ofl the welfare program, The researchers at this ageney selected 10,000, families at random from the ist of all families that were on welfare. ‘Of these 10,000 families the agency randomly selected $000 families and offered them fee job raining. The remaining 5000 families were not offered such job taining, After 3 years the wo groups were com- pared in regard tothe percentage of families who got off welfare Is, (his an observational study ora desigaed experiment? Explain, 1.35. federal government think tank wanted to investigate whether job traning program helps the families who are on weltare to get ff, the welfare program. The researchers at this agency selected $000, volunteer families who were on welfare and offered the adults in those families fre job training. The researchers selected another group of '5000 volunteer families who were on welfare and did not offer them ‘uch job tning. After three yeas the two groups were compared in regard to the percentage of families who go off welfare. Based on that, sudy, the researchers concluded thatthe job raining program causes (helps) families who are.on welfaeto get off the welfare program. Do you agree with this conclusion? Explain, 41.36 A federal government think tank wanted to investigate whether job taining program helps the families who are on welfat to get off the weltare program, The researchers at this agency selected 10,000, {ailies al random {rom the list of all families that were on wellare. (Of these 10,000 families, the researchers randomly selected 5000 families and offered the adults in those families fre job taining. The remaining 5000 families were not offered such job taining. Aller three years the two groups were compared in regard tothe percentage of families who got off welfare. Based on tet study, the researchers concluded thatthe job training program causes (helps) families who axe on welfare to get off the welfare program. Do you agree with this, conclusion? Explain, Sometimes mathematical notation helps express a mathematical relationship concisely. This section describes the summation notation that is used to denote the sum of values. Suppose a sample consists of five books, and the prices of these five books are $175, S80, $165, $97, and $88, respectively. The variable, price of a book, can be denoted by x. The prices of the five books can be written as follows: Price of the first book 1 = $179, Subscript of x deaoes the umber ofthe book 1.7 Summation Notation 23, Similacly, Price of the second book = Price ofthe third boo! Price of the fourth book Price ofthe fifth book = In this notation, x represents the price, and the subscript denotes a particular book. ‘Now, suppose we want to add the prices of all five books. We obtain Ay ba bay by by = 17S + 80+ 165 + 97 + 88 = $605 ‘The uppercase Greck letter > (pronounced sigma) is used to denote the sum of all values, ‘Using ¥ notation, we ean write the foregoing sum as follows Ex say tan tay tay tas = $605 “The notation Ex inthis expression represents the sum of all values of x and is real as “sigma x” or “sum ofall values of" EXAMPLE 1-3 Annual Salaries of Workers Annual salaries (in thousands of dollars) of four workers are 75, 90, 125, and 61, respectively, § ha Using summation notation (a) Ex (®) (Zn? © =e ore variate Solution Let», 2, x3, and x, be the annual salaries (in thousands of dollars) of the first, second, third, and fourth worker, respectively. Then, S75, 90, xy = 125, and ay = 61 (@) Exam tu tay tx 75 +904 125 +61 = 351 = $351,000 (b) Note that (E29? is the square of the sum of all x values. Thus, (2x7 = (BS1)* = 123,201 (©) The expression Ev’ is the sum of the squares of x values. To calculate Ex’, we first ‘quae each ofthe x values and then sum these squared values. Thus, (© Toes Gramgeadistociphowo Ea? = (75)? + (90)? + (125)? + (61)? 5625 + 8100 + 15,625 + 3721 = 3,071 . EXAMPLE 1-4 ‘The following table lists four pairs of m and f values: Using summation notation are two varias. m [12 15 20 30 fis 9 wo 16 Compute the following (a) Im (o) Ef (©) Imp (a) Imp Solution We can write fi 12415420+30=77 Sy! + (9)? + (10)? + (16)? 2 5 (@) Im (b) Ey? 25 + 81 + 100 + 256 = 462 24 Chapter Invodution (©) To compute Ym, we multiply the corresponding values of m and f and then add the products as follows; mi maf + fo + ms fy + maf 12(5) + 15(9) + 20(10) + 30(16) = 875 (d) To calculate Em’f, we square each m value, then multiply the corresponding {fvlues, and add the products. Thus, Eomf = (rm) fi + (mde + (rm) + Coma) (12)°(5) + (15)°(9) + (20)'(10) + 30)"(16) = 21,145, ‘The calculations done in pats (a) through (a) to find the values of Em, 3%, Ym, and Smif can be performed in tabular form, as shown in Tale 14 Table 14 ™ mf 2 5 12x 12x5 15 15x15 x9 20 20x 20x 10 30 30x 30x 16 ‘The columns of Table 1.4 can be explained as follows. 1. ‘The frst column list the values of m. The sum ofthese values gives Em 2. The second column list the values off The sum of this column gives B= 40. 3. The third column lists the squares of the f values. For example, the first valve, 25 is the square of 5. The sum ofthe vals in this column gives Ef? = 462, 4. The fourth column records products of the comresponding m anéf values. For example, the first value, 60, inthis column is obtained by multiplying 12 by 5. The sum ofthe values in this column gives Em = 875, S. Next, the m value are squared and multiplied by the corresponding f values. The resulting products, denoted by m’/, are recorded in the fifth column. For example, the first value, 720, is obtained by squaring 12 and multiplying this result by 5. The sum of the values inthis column gives Em’ = 21,145 7 EXERCISES CONCEPTS AND PROCEDURES 1.37 The following table lists five pairs of m and f values. $35, $92, $14, $175, $11, and $57, respectively. Let y denote the SS amount spent by a customer on groceties in a single visi, Find: m|] 5 10 17 20 25 ” : jo ev ady byt By 1.39 Acar was filled with 16 gallons of gas on seven occasions. The ‘number of miles tht the car wat able to travel on eack tankful was Compute the value of each of the following: 387, 414, 404, 396, 410, 422, and 414, Let x denote the distance adm bE eG Emp de Emif leaveled on 16 gallons of gas. Find: adr bye 16 APPLICATIONS 1.38 Fight randomly selected customers at a local grocery store spent the following amounts on groceries ina single visit: $216, S184, SES AND MISUSES. JUST A SAMPLE PLEASE Have you ever read an article or seen an advertisement on televsion ‘that clams to have found some important result based on a scientific study? Or maybe you have seen some results based on a poll con- ‘ducted by a research fim. Have you ever wondered how samples are selected for such stuces, or just how valid the results really are? (One ofthe most important considerations in conducting a study Is to ensure that a representative sample has been taken in such a way that the sample represents the underlying population of interest. So, It you see a clam that a recent study found that a new drug was a magic bullet for curing cancer, how might we investigate how valid this ela i? Inis very important to consider how the population was sampled In order to understand beth how useful the result is and if the results really generalizable. In research, a result that's generalizable means ‘that it applies to the general population of interest, not just to the sample that was taken orto only @ small segment of the population. ‘One of the mast common sampling strategies is the simple ‘random sample in which each subject in a population has some chance of being chosen forte study. hough there are numerous ways, to take a vald statistical sample, simple random samples ensure that ‘theres equal representation of members o the population of intrest and therefore We s0e this ype of sampling strategy employed que often. But, how many samples taken for studies are trly random? I Is not uncommon in survey research, for example, to send people out, to amall ort a busy street comer with elpboard and survey in hand, to ask questions of those willing to take the survey. Its also not Luncommen for poling fms to call fom a random set of telephone, Glossary 25 rnumbers—in fact, many ef you may have received a calltopartcinate ina pol In ether of these scenarios, it's tkely that those wing to ‘ake the time to arswer a survey may not represent everyone In the population of interest. Adltonaly, if «telephone pol is used, it is impossible to sample those without a telephone or those that do not answer the cal and hence this segment othe population may not be availabe tobe included inthe sample. (One of the major consequences of having a poor sample is the "isk of creating bias in the resus, Bias causes the esuls to potentially . Salaries of football players c, Number of pets owned by families 4. Favorite breed of dog fo each of 20 persons 5. Explain the following typos of samples: random sample, nonran- dom sample, representative sample, convenience sample, judgement sample, quota sample, sample with replacement, and sample without replacement 6. Explain the following concepts: sampling or chance extor, non- ‘sampling or systematic ertor, selection bias, nontesponse bias, response bias, and voluntary response bias. 7. Explain the following sampling techniques: simple random sampling, systematic random sampling, stratified random sampling snd cluster sampling, 8. Explain the following concepts: an observational stay, a designed experimen, randomization, treatment group, conto group, double-blind experiment, and placebo eee. 9. The following tale lists the mide inactas. n est scores of six students Student Score Joba 8 Destiny 7 Colleen 88 can n Joseph a Vieworia 68 measurement, and a Explain the meaning of a member, a variable, ala sot with reference to this table. 410, The sumber of ypes of cereal inthe panties of six households is 6,11, 3, 5, 6, and 2, respectively. Let xhe the number of types of cereal in the pantry of a houtehold. Find: ads ba 6 Et 11, The following table lists the age (rounded to the nearest yest) and price (founded to the neatest thousand dollars) for each of the seven cats ofthe same model Age (years) Price 6) (thousand dollars) 284 172 216 BS 63 168 94 Let age be denoted by x and price be denoted by y. Calculate adx b3y ee @ Ey ery f(D 12, A psychologist needs 10 pigs for 8 study ofthe intelligence of, pigs. She goes toa pig farm where there are 40 young pig ina large pen. Assume that these pigs are representative of the population ofall, pigs. She selects the frst 10 pigs she can eatch and uses them for her study

You might also like