Professional Documents
Culture Documents
NAY Eset Assignment - biostatistics final
NAY Eset Assignment - biostatistics final
NAY Eset Assignment - biostatistics final
June, 2024
The following ungrouped data array is the age of individuals on which smoking stops
30 34 35 37 37 38 38 38 38 39 39 40 40 42 42
43 43 43 43 43 43 44 44 44 44 44 44 44 45 45
45 46 46 46 46 46 46 47 47 47 47 47 47 48 48
48 48 48 48 48 49 49 49 49 49 49 49 50 50 50
50 50 50 50 50 51 51 51 51 52 52 52 52 52 52
53 53 53 53 53 53 53 53 53 53 53 53 53 53 53
53 53 54 54 54 54 54 54 54 54 54 54 54 55 55
55 56 56 56 56 56 56 57 57 57 57 57 57 57 58
58 59 59 59 59 59 59 60 60 60 60 61 61 61 61
61 61 61 61 61 61 61 62 62 62 62 62 62 62 63
63 64 64 64 64 64 64 65 65 66 66 66 66 66 66
67 68 68 68 69 69 69 70 71 71 71 71 71 71 71
72 73 75 76 77 78 78 78 82
From the given data above, calculate both Manually Mean of the ungrouped data
1
Answer:-
=then summing all the observations written in the table gives 10,401
Then dividing the sum of all observations which is 10,401, by the number of population (N=189)
gives 10,401/189
=55.03
Answer:-
Since the data is already arranged in ascending order, and the number of observations (N) is odd,
we take the middle value 95th observation, which is the one between the 94thand 96th
observations giving us 54
Answer:-
Mode is the observation that appears most frequently. We have to see the frequency of each
observation which needs rearrangement of the table firstly.
The observation s 53 appears 17 times. All other observations appear less than 17 times. Thus,
the observation 53 is the modal value of the ungrouped data.
2
Answer:-
As can be seen from the ascending arranged observation table, the minimum value which
appears first is 30, and the maximum value which appears at the end is 82.
Answer:-
Range is the difference between the maximum and minimum values. In our case, the difference
between the maximum value which is 82, and the minimum value which is 30, gives the Range
which is 52.
Answer:-
The IQR describes the middle 50% of values when ordered from lowest to highest. To find the
interquartile range (IQR), first find the median (middle value) of the lower and upper half of the
data. These values are quartile 1 (Q1) and quartile 3 (Q3). The IQR is the difference between Q3
and Q1.
IQR=(Q3-Q1)/2
In order to get IQR, first divide the data into to by finding the median. The median as calculated
as the 95th observation which is 54. After dividing data into two datasets, we identify the median
for each of the two datasets.
Since medians of the first and second datasets are IQ1 and IQ3 which are 48 and 61.5
respectively. The IQR becomes
IQR=(Q3-Q1)/2
IQR =(61.5-48)/2
IQR =13.5/2
IQR =6.75
3
1.7) 90th percentile of the ungrouped data
Answer:-
The percentile means the value that sits when the data is rearranged as a hundred percent. Then,
the 90th percentile means the value that sits at the 90th order when the data is rearranged and seen
as a hundred percent.
n=189
P90=90(189+1)/100
P90=90(190)/100
P90=17100/100
P90=171
P90=69
4
The sum of the squares of each of the 189 observations from the mean gives 18,479.81. Dividing
this by the number of observations (N=189) gives
2=18,479.81/189
2=97.78
The standard deviation, S.D., is just the square root of the variance.
Thus, taking the square root of the population variance which is 97.78 gives
SD=9.89
The standard error represents the standard deviation of the sampling distribution of the sample
mean. It tells us how much the sample mean is likely to vary from the population mean.
5
To calculate the standard error, we'll follow these steps:
Standard error=0.72
1.11) 95% Confidence interval of the population age at which smoking stops
=55.03+1.96(9.89/13.75)
= 55.03+1.41
=56.44
6
(b) =55.03-1.96(9.89/sqrt 189)
=55.03-1.96(9.89/13.75)
= 53.62
(53.62, 56.44)
Thus, the mean (average) value, which is one measure of central tendency, of the ungrouped data
is 55.03.
Thus, the median (middle) value when arranged in ascending order, which is one measure of
central tendency, of the ungrouped data is 54.
Thus, the mode (most frequently appearing) value, which is one measure of central tendency, of
the ungrouped data is 53.
Thus, the Minimum (lowest) value and the maximum (highest) values of the ungrouped data are
30 and 82 respectively.
Thus, the difference between the maximum which is 82 and the minimum which is 30 is 52.
Thus, the IQR (Interquartile Range) which is the difference between the third quartile (the point
that divided the second half of the dataset into two), and the first quartile (the point that divides
the first half of the dataset into two, is 6.75.
7
Thus, when the whole data is rearranged and seen as a hundred percent, the value pertaining to
the 90th order is 69.
Thus, the individual data points vary from the mean or average value of the dataset by 97.78.
Thus, the result shows that the data points are 9.89 standard deviations away from the mean.
The standard error is smaller than the sample standard deviation, indicating that the sample mean
is a more precise estimate of the population mean compared to individual data points.
With a sample size of 189, we can expect the sample mean to be within approximately 0.72
points of the population mean 68% of the time (based on the 68-95-99.7 rule).
If we were to take multiple samples of size 189 from the population, the sample means would be
distributed around the population mean with a standard deviation of 0.72.
95% Confidence interval of the population age at which smoking stops falls within the age range
from 53.62 to 56.44 ((53.62, 56.44), only 5% of the times the age at which population stops
smoking falls either below 53.62 or above 56.44.
Question 2. From the same data above, Using SPSS, after feeding the data directly in to SPSS
software, Show the steps in your descriptive analysis
Answer:-
USING SPSS
8
check Quartiles, Percentiles, Mean, Median, Mode, Standard Deviation, Variance, Range,
Minimum, Maximum, Standard Error Mean, Skewness, and Kurtosis as needed OK.
The minimum of the ungrouped data is 30, and the maximum of the ungrouped data is 82.
9
The Range of the ungrouped data is 52.00.
There are several statistical measures that can be used to check the normality of a dataset. Here
are some of the most common ones:
10
Kurtosis measures the "peakedness" of the distribution. A normal distribution has a
kurtosis of 3.
Values of skewness and kurtosis significantly different from 0 and 3, respectively, can
indicate a departure from normality.
2) Shapiro-Wilk Test:
It compares the sample data to a normally distributed set of values with the same mean
and standard deviation.
The test statistic (W) ranges from 0 to 1, with values closer to 1 indicating a normal
distribution.
A p-value less than the chosen significance level (e.g., 0.05) suggests that the data is not
normally distributed.
3) Kolmogorov-Smirnov Test:
The Kolmogorov-Smirnov (K-S) test is another formal statistical test for normality.
It compares the cumulative distribution function (CDF) of the sample data to the CDF of
a normal distribution.
The test statistic (D) represents the maximum difference between the two CDFs.
A p-value less than the chosen significance level indicates a departure from normality.
4) Anderson-Darling Test:
The Anderson-Darling test is a variation of the K-S test that is more sensitive to
deviations from normality in the tails of the distribution.
The test statistic (A^2) measures the difference between the sample data and a normal
distribution.
A p-value less than the chosen significance level suggest that the data is not normally
distributed.
11
The normal probability plot, or Q-Q plot, compares the quantiles of the sample data to the
quantiles of a normal distribution.
If the data is normally distributed, the points on the Q-Q plot should align closely to a
straight line.
Deviations from the straight line indicate departures from normality, such as skewness or
kurtosis.
When checking the normality of a dataset, it's generally a good idea to use a combination of
these statistical measures, as well as visual inspections like histograms and Q-Q plots. This
provides a more comprehensive assessment of the normality of the data, which is important for
many statistical analyses and modeling techniques.
The choice of which specific normality test to use may depend on factors such as the sample
size, the expected distribution, and the specific research question or analytical needs.
2.12) 95% Confidence interval of the population age at which smoking stops and interpret it
The 95% Confidence Interval (CI) of the population at which smoking stops is between ages
53.62 and 56.44.
95% of the times, the population age at which smoking stops falls within the age range from
53.62 to 56.44(53.62, 56.44), only 5% of the times the age at which population stops smoking
falls either below 53.62 or above 56.44.
Overall, it can be seen that the manually calculated results are the same with that of SPSS
calculated results.
12
Question 3. After converting the above ungrouped data in to a grouped data array; present the
data in the following data presentation styles
In our case
K=1+3.322(logn)
W=(L-S)/K
13
In our case, the number of classes(k) can be computed using Sturg's rule as:
K= 1 +3.322Log(189)
K=1+3.322*2.28
K=1+7.57
K=8.57
K=9
Then, the width of each class can be calculated:
W=(Largest observation – Smallest observation)/K
W=(82-30)/9
W=52/9
W=5.78
Thus, the width can be six.
Based on these, we get the regrouped dataset as follows:
S.No. Class Limit Class boundary Class mark Frequency RF(%) CF
1 30-35 29.5-35.5 32.5 3 1.59 1.59
2 36-41 35.5-41.5 38.5 10 5.29 6.88
3 42-47 41.5-47.5 44.5 30 15.87 22.75
4 48-53 47.5-53.5 50.5 49 25.93 48.68
5 54-59 53.5-59.5 56.5 35 18.52 67.20
6 60-65 59.5-65.5 62.5 32 16.93 84.13
7 66-71 65.5-71.5 68.5 21 11.11 95.24
8 72-77 71.5-77.5 74.5 5 2.65 97.88
9 78-82 77.5-82.5 80 4 2.12 100.00
3.1) A Frequency
Based on the above calculation, the Frequency is demonstrated in the following table.
14
3.2) A relative frequency distribution
The Relative Frequency (RF) in % is shaded as follows:
S.No. Class Limit Class boundary Class mark Frequency RF(%)
1 30-35 29.5-35.5 32.5 3 1.59
2 36-41 35.5-41.5 38.5 10 5.29
3 42-47 41.5-47.5 44.5 30 15.87
4 48-53 47.5-53.5 50.5 49 25.93
5 54-59 53.5-59.5 56.5 35 18.52
6 60-65 59.5-65.5 62.5 32 16.93
7 66-71 65.5-71.5 68.5 21 11.11
8 72-77 71.5-77.5 74.5 5 2.65
9 78-82 77.5-82.5 80 4 2.12
15
3.4) A histogram
Answer:-
Answer:-
As can be seen from the Histogram above, the data is positively skewed.
Answer:-
Yes, the data is normally distributed, and further analysis can be done.
3.7) Categorize the continuous data in to categorical data using SPSS at four categories 30-39,
40-49,50-60 and above 60
16
Answer:-
Here is the continuous data categorized into four fed into SPSS.
4. Kanjanarat et al. (A-11) estimate the rate of preventable adverse drug events (ADE) in
hospitals to be 35.2 percent. Preventable ADEs typically result from inappropriate care
medication errors, which include errors of commission and errors of omission. Suppose that
10 hospital patients experiencing an ADE are chosen at random. Let p =5, and calculate the
probability that:
17
(a) Exactly seven of those drug events were preventable
Answer:-
Answer:-
Answer:-
18
(d) Between three and six inclusive were preventable
Answer:-
5. The goal of a study by Klingler et al (A-2) was to determine how symptom recognition and
perception influence clinical presentation as a function of race. They characterized symptoms
and care – seeking behavior in African—American patients with chest pain seen iii the
Emergency Department. One of the presenting vital signs was systolic blood pressure. Among
157 African-American men, the near‘ systolic blood pressure was 146 mm Hg with a standard
deviation of 27. The investigator may want to conclude that the mean systolic Mood pressure for
a population of African – American men is greater than 140.mmhg.
Does the researcher provide sufficient evidence to conclude that the population mean is greater
than 140 mm Hg ? Develop both null alternative hypothesis
Answer:-
Null Hypothesis (H0): The population mean systolic blood pressure for African-American men is
less than or equal to 140 mmHg.
Alternative Hypothesis (H1): The population mean systolic blood pressure for African-American
men is greater than 140 mmHg.
Given information:
19
z = (x - μ) / (s / √n)
z = 6 / (27 / 12.5)
z = 2.78
b) Do you think that the systolic blood pressure of the population is greater than 140 MM Hg ?
Answer:-
Yes, the calculated z-statistic of 2.78 is greater than the critical value of 1.645 for a one-tailed
test at α = 0.05. This means there is sufficient evidence to conclude that the population mean
systolic blood pressure for African-American men is greater than 140 mmHg.
c) What is your decision to refute or fail to refute the null hypothesis at one tailed, at α =0.05.)
Answer:-
Since the calculated z-statistic of 2.78 is greater than the critical value of 1.645, we can reject the
null hypothesis.
Therefore, we can conclude that the population mean systolic blood pressure for African-
American men is greater than 140 mmHg, at a significance level of α = 0.05.
20