Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 17

COURSE DESCRIPTION

It emphasizes descriptive
and inferential statistical
procedures through
simulation, sampling
design, descriptive
statistics, linear regression,
and correlation, probability,
sampling distributions,
hypotheses testing and
confidence intervals, and
technology to perform
statistical analyses.

Laura Pereira
Salt Lake Community College
Math-1040-406
Spring Semester 2022

INTRO TO 04/28/2022
Professor Robert Woodward

STATISTICS
MATH-1040-406
1

Part I: TB Data Project


Lau
ra Pereira

Part one of my TB Data Project.


The method of sampling that I have chosen is to select a random
sample in order to get the 10 members of the WHO. There are 194 members
of the World Health Organization, so I will use GeoGebra to help with the
sampling method. I chose to use the random sampling method for this
project In GeoGebra I used Random Between the numbers of 1 and 194 to
randomly select the Countries that are in my study.

Below are the countries in the order in which they were randomly selected
using GeoGebra:

Total TB
  Country Incidence Success Rate Cohort Size
         
1 Tajikistan 84 91% 5114
2 Rwanda 58 87% 5633
3 Fiji 66 30% 527
4 Armenia 23 82% 485
5 Burundi 103 94% 6753
6 Belgium 7.7 81% 877
7 Georgia 70 85% 1947
8 Paraguay 48 67% 2812
9 United Arab Emirates 0.79 81% 79

18 61% 217

10 Trinidad and Tobago


2

On the next part we were asked to identify the age group which had the
most Notified cases for women and the least notified cases for men.
This is the Incidence of the Country of Rwanda. For the females, the age
group which had the most notified cases was the ages of 25-34. I estimate
the total females in this age group to be 400. For the total, I will add the
sum of the lengths of all of the columns for the females starting with the
top. 100+125+175+275+400+250+25+75= 1425. So, the relative
frequency would be 400/1425 which equals 0.281
For the males, the age group which had the least notified cases was 5-14. I
estimate the total males in this age group to be 30. For the total, I will add
the sum of the lengths of all of the columns for males starting with the top.
200+475+500+1000+950+475+30+120=3750. The relative frequency
would be 30/3750 which equals 0.008
On the next question which is asking “if there were 5000 total notified cases
for the two genders, conjecture how many would be in the age categories
which I identified above”. I will take each relative frequency and multiply
each one by 5000. So for the gender of females it would be 0.281 X 5000
which equals 1405 and for the gender of males it would be 0.008 X 5000
which equals 40.
Now I am going to carry out a simulation using the Rossman/Chance applet.
I will assume the sample size (n) is 100 and p=0.85. I will complete 50 trials
which is the number of samples. I will compute a two-sided p-value for the
first country on my list which is Tajikistan. I will use the null and
alternative hypotheses:

Ho : p=0.85
HA : p ≠ 0.85
I am going to choose the level of alpha to be 0.05.
3

As you can see in red (above), the p-value is 0.18. If the p-value is < than
alpha, then we reject the null hypothesis and say that there is sufficient
evidence to conclude that the alternative hypothesis is true. On the other
hand, if p-value is > than alpha, then we fail to reject the null hypothesis
and say that there is insufficient evidence to conclude that the alternative
hypothesis is true.
Since I chose the level of alpha to be 0.05 and the p-value is 0.18, it is
larger than alpha, then we fail to reject the null hypothesis and say that
there is insufficient evidence to conclude that the alternative hypothesis is
true.

Part 2: TB Data Project


Laura Pereira

Part two of my TB Data Project.


4

For this part of my project, I am going to choose two countries out of the ten
on my list to find the true proportion of TB treatment. For both of these
countries, I will need to check the following three conditions: 1- a
confirmation of randomness- that I have randomly selected the country(s), I
randomly selected the 2 countries: Fiji and Paraguay. 2- I need to verify
the independence: The formula for achieving independence is: n < or equal
to 0.05N. The cohort size of Fiji is 527 which is less than 5% of all people in
Fiji and the cohort size of Paraguay is 2812 and this is less than 5% of all
people in Paraguay. 3- I will also need to verify that there are enough
successes and failures. In order to complete this I will be using the Cohort
size as the sample size (n) and I will use the sample proportion as the
success rate. Fiji has a cohort size of 527 and a success rate of 30%, I will
take 527 X .30 which is a success rate of 158. Paraguay has a cohort size
of 2812 and a success rate of 67%, I will take 2812 X .67 which is a success
rate of 1884. I believe that the results of the confidence Intervals in the next
step will be valid.
I am going to open up GeoGebra and compute a 95% confidence interval.
When opening up GeoGebra, click on the three dots and then open up the
probability calculator. It may be on distribution so click on the right which is
statistics. Then pull down to Z Estimate of a Proportion. Now plug in the
confidence level of 0.95 and then put in the number of successes and the
success rate (cohort size). The number on top will be the smaller number.
This will give a lower limit and an upper limit. Here are the screenshots of
each of my selected countries:
As you can see the lower limit for Paraguay is 0.6526 and the upper limit is
0.6874. The lower limit for Fiji is 0.2607 and the upper limit is 0.3389.
We are 95% Confident that the interval of 0.653 to 0.687 is the true
proportion of TB treatment for Paraguay and we are 95% Confident that
the interval of 0.261 to 0.339 is the true proportion of TB treatment of Fiji.
5

PARAGUAY Confidence level FIJI Confidence level

According to the PLOS One report, the identified global threshold for
effective treatment is to be set at 85%. Based on the confidence intervals of
the two countries that I selected, you may be wondering if this success rate
is a likely value for the true success rate of effective TB treatment. Neither
one of the countries meet the global threshold of 85%. Paraguay is 65% to
69% and Fiji is 26% to 34%.

Now I will carry out a two-sided test of whether p=0.85. I will use the same
country from Part One that I performed a simulation on which is the country
of Tajikistan. Tajikistan has a cohort size of 5114 and a success rate of 91
So, I will take 5114 X .91 to get the number of successes which is: 4653.74
and I will round up to 4654. I will perform a Hypothesis test using
GeoGebra. I will perform the same steps as in step 2 except instead of using
Z Estimate of a Proportion I will use Z Test of a Proportion. In this test, I am
testing the null hypothesis of Ho: p=0.85 and the alternative hypothesis:
Ha: p ≠0.85. Here is the test below that I performed on GeoGebra:
6

As you can see above, the test statistic is (Z) 12.03 and the p-value (P) is 0.
I chose the level of alpha to be .05. Since the p-value is less than alpha:
0<0.05, then we will REJECT Ho and state that “We have sufficient
evidence to conclude that the Country of Tajikistan does not meet the TB
rate of the global threshold set at 85%.”

Now I will compare this hypothesis test result with the results from the
simulation of the Rossman Chance Apelet from Part One of my project.

Below is the Rossman Chance result. The sample size was only 100, the
number of successes was 83 out of 100. The p-value was 0.18 and the level
of alpha that I chose for this test was .05. So, in this case the p-value (0.18)
>alpha (.05).
7

The similarity in both of these tests is that the p-values were low-less than 1

in both test results. The hypothesis test had a large sample size compared to

the Rossman Chance: 5114 compared to 100 respectively. In the hypothesis

test there were only 460 failures out of the 5114-sample size vs. the

Rossman Chance only having 17 failures out of the 100-sample size. As far

as my opinion if one test is more valid than the other, I think that they are

both valid based on what they are testing and also based on the sample size

that was used for each test.


8

Part 3: TB Data Project


Laura Pereira

Part three of my TB Data Project.

Country Total TB
Incidence

Tajikistan 84
Rwanda 58
Fiji 66
Armenia 23
Burundi 103
Belgium 7.7
Georgia 70
Paraguay 48
United Arab 0.79
Emirates
Trinidad and 18
Tobago
9

Total TB Incidence in 10 Sampled Countries

Total TB Incidence

The graph above is a boxplot which summarizes each country and the Total TB
Incidence of each one. The range is from 0.79 to 103. I put the numbers on a
Spreadsheet in GeoGebra and then highlighted the list to form a One Variable
Analysis. This Boxplot has a shape that is approximately symmetric. The best
10

measure of the center is the Mean which is 47.85. The best measure for the
variability is the Standard Deviation which is 34.33. The value (number) for the
center is 53. We get this number by placing the numbers in ascending order and
finding the center number. Since there are 2 center numbers which are 48 and 58,
we add the two numbers together and they equal 106 and half of 106 is 53. Then
we want to find the spread. The spread is the IQR which is the Interquartile Range.
We get this number by taking Q3-Q1. Q3 is the 75th percentile which is the median
of the second half of the data. Q1 is the 25th percentile which is the median of the
first half of the data. So, numbering from smallest to largest is: 0.79 7.7 18 23 48
58 66 70 84 103. 18 is Q1 and 70 is Q3. We take Q3-Q1 to get the IQR. This
would be: 70-18 which is: 52 which means that the spread is 52.

My five number summary is: 0.79 18 53 70 103

Fences: The formula for figuring out the lower fence is Q1-1.5 x IQR= 18-1.5 x
52= -61.5

The formula for figuring out the upper fence is: Q3+1.5 x IQR= 70+1.5 x 52= 148
There aren’t any outliers in my TB Incidence. Outliers are extreme- an outlier is an
observation that is extreme- relative to the rest of the data.

Now I am going to Compute a Confidence interval. In order to complete this I first


need to check the 3 conditions for using this and they are:

(1) Random Sample or Randomized Experiment: This a random sample of the 10


random countries which I selected.
(2) n< or equal to 0.05 N. I will add all 10 of the Cohort Sizes from the 10
countries together to get n:
5114+5633+527+485+6753+877+1947+2812+79+217=24,444
and to get N I am supposed to use the population of the whole world. So
definitely 24,444 is likely less than 5% of the population of the whole world.
(3) Population normal or n> or equal to 30. 24,444 >30. So, the three conditions
are met and I will continue to Compute a Confidence interval with GeoGebra. I
am supposed to use a
95% Confidence level and use the sample size as 10.
11

Here is the Copy of the Confidence Interval:


I used the mean from GeoGebra which was 47.85, s is the sample population which
is 34.33 and the number of countries is 10 so that is put in N. Then we use the
Lower Limit and the Upper Limit for the intervals.

Now I will Interpret this Confidence Interval:


There is 95% confidence that the interval between 23.29 and 72.41 captures the
true mean number of the total TB Incidence of the World.

In the next step I am going to perform a two-sided Hypothesis test. I will use
GeoGebra in order to do this. According with the CDC, the Global Incidence Rate
of TB in the world is different than the reported value of 132 people. I will still use
12

n as 10 countries. I have already checked the 3 conditions in the step above so I do


not need to do this again because they are the same. Since I am performing a
Hypothesis test, I will need to do a T-Test of Mean. I will use the same Mean and
standard deviation as the Confidence Interval above. Mean is 47.85 and the
standard deviation of the population is 34.33. So, I will use Ho :TB=132 and the
Null Hypothesis will be Ha :TB ≠ 132.

Here is a screenshot of the results:


13

The Test statistics for this is t which is -7.7514, the p-value is P which is 0. I am

using my selected value of alpha as .01. So, we will reject Ho if the p-value is <

than alpha and we will fail to reject alpha if the p-value is > than alpha.

Since 0 is < than .01, we will reject Ho.

Here is our Conclusion: Reject Ho. There is sufficient evidence to conclude that

the global incidence of TB in the world is different than the reported value of 132.

Part IV: TB Data Project


Laura Pereira

Part four of my TB Data Project.

Treatment Success/Failure
Success Failure TOTAL
Tajikistan 91 9 100
Rwanda 87 13 100
Fiji 30 70 100
Member of WHO

Armenia 82 18 100
Burundi 94 6 100
Belgium 81 19 100
Georgia 85 15 100
Paraguay 67 33 100
United Arab 81 19 100
Emirates
Trinidad 61 39 100
and Tobago
TOTAL 759 241 1000

The table above is a contingency table. We focused on these in Chapter 5. A


contingency table is a table used to summarize data collected to analyze the
14

relationship between two categorical variables. The table above is consisting of the
members of WHO that I had selected in Part one of my project.

I am going to be computing some probabilities below:

The first is: the probability that a randomly selected case is taken from the 5 th “or”
the 6th member of WHO in my table above. The two members from my table above
would be Burundi and Belgium. The word “or” implies addition.
100 100 200
+ = =0.2
1000 1000 1000
The second is: the probability that a randomly selected case is taken from the 5 th
member of WHO in my table or is a failure.
100 241 6 335
+ − = =0.335
1000 1000 1000 1000
The third is: the probability that a randomly selected case is taken from the 5 th
member of WHO in my table “and” is a failure. This will require using the
multiplication rule since we are using the word “and”.
100 241 24100
x = =0.0241
1000 1000 1,000,000
The fourth is: the probability that a randomly selected case is from the 6 th member
of WHO in my table “given” it is a failure.

P ( AandB )
The word “given” suggests that:
P (B )
19
=0.079
241
The fifth is: the probability that 3 randomly selected cases (without replacement)
are all successes from the 8th member of WHO in my table.

Without replacement means that the items are dependent and that the number of
items in the pool will progressively decrease.
67 66 65 795960
x x = =0.000799
1000 999 998 997002000
For the next step in my project, I will complete a confidence interval using
GeoGebra. I will use the 5th and 6th members of WHO in my table. I will use a 95%
confidence interval for the difference in proportions of successful treatments. I will
be using the sample number as 100 for each country. I am also assuming that all of
the conditions have been met to complete this confidence interval.
15

As you can see below: we are 95% confident that the interval of 0.0401 to 0.2199
contains the true difference of proportions of Burundi and Belgium.

Now I will be conducting a hypothesis test using the data from the 5th and 6th
members of WHO. I will be testing whether there is a difference in the proportions
of the two members. I will be using 0.01 as my selected value of alpha. I am also
assuming that all of the conditions to perform this hypothesis test are met.

First, I will clearly state the two hypotheses using symbolic notation:
H o : pBurundi = p Belgium H A : p Burundi ≠ p Belgium

Here is my screenshot below of the Hypothesis test.


16

The test statistic is represented by the letter” Z” which is: 2.7795 and the p-value

is represented by the letter “P” which is 0.0054. I used 0.01 as alpha. The p-value

is definitely less than alpha. Since the p-value is less than alpha, we will REJECT H O

, we have SUFFICIENT evidence to claim that the Alternative Hypothesis is true:

Burundi is not equal to Belgium for successful treatments.

You might also like