Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 24

Saint Paul’s Hospital Millennium Medical College

School of Public Health


Introduction to computer and software application (Basic Statistical computing with SPSS)
(PHCA 6012)
Assessment (MM = 50)
Target group: MPH General Public Health, MPH Nutrition and MPH Epidemiology

(Program: weekend)

Submission due date on the note more than two weeks


Part I: DATA ENTRY

Activity 1: Given below is an example of a questionnaire, suppose you have information on


several of such questionnaires. Prepare a data entry format that will help you to enter your data to
SPSS. Save the data set as trial1.sav.

HOUSEHOLD SURVEY QUESTIONNAIRE IDENTIFICATION AND


INFORMATION ABOUT THE RESPONDENT
Study site and address:
a) District___________________
b) Kebele ____________
c) Village _______________
d) Name of respondent ________________
Part I: BACKGROUND INFORMATION
1.1 Sex of respondent
1. Male
2. Female
1.2. How old are you? ____________
1.3. What is your current marital status?
1. Never married

2. Married/ living with partner

3. Divorced/ separated

4. Spouse died and living alone


1.4. What is your religion?
1. Orthodox Christian

2. Muslim

3. Protestant

4. Catholic

5. Traditional belief

6. Other specify ___________


1.5. What is the highest grade you completed?
1. Illiterate (cannot read and write)
2. Non-formal education (read and write only)
3. Primary level (Grade 1-6)
4. Primary level (Grade 7-8)
5. Grade 9-10
6. Grade 11-12
7. Above grade 12
1.6. Total members of the HH living together in the house
1. Male ______
2. Female _____
1.7. Sex of the head of the household
1. Male
2. Female
1.8. Average annual family income in 2012 in Birr --------------
1.9. Wealth category (Perceived wealth status)
1. Rich
2. Medium
3. Poor
4. Very poor
Part II: ACCESS TO ADEQUATE AND SAFE WATER SUPPLY
2.1. What is your most commonly used water sources for household drinking?

Type of water source 1.Dry season 2.Wet season


(1.yes, 2.no) (1.yes, 2.no)
1. Spring with large scale gravity network
2. On spot protected spring
3. Deep well with motorized pump
4. Shallow- well fitted with hand pump
5. Hand dug well fitted with hand pump
6. Domestic roof water harvesting
7. Small dams /with treatment plants and net work
8. 8. Un protected spring
9. 9. Traditional hand dug well/not fitted with hand
pump/
10. 10. River/Stream /pond
11. 11. Other (specify)
2.2. Is the water source functional?
1. Yes
2. No
2.3. Who constructed /implemented / your water sources?
1. NGO operating in the locality
2. Government
3. The community by its own
4. Private people for business
5. Other (specify)--------------------------
2.4. Do you think the quantity of water in the area is adequate? Please put √ mark for each
answer

Season Yes No
Dry season
Wet season
2.5. Does your source of drinking water quality change seasonally?
1. Yes
2. No
2.6. If yes to Q 2.6 what describes the state of the water (Multiple responses are possible)
1. It is muddy
2. It has worm/s
3. It is salty
4. It has bad taste
5. It has bad smell
6. It finishes sop /leas lather/
7. It gives more lather
8. Others specify________________________________
2.7. Did you experienced the following problem in the last thirty days?

Problem Yes No
1. The household drink water that he/she thought might not be
safe for health
2. The household not cook a desirable food because there was not
enough water
3. Any boy children in the household, who is a student, go to
school late or stay home from school to help with water
collection
4. Any girl children in the household, who is a student, go to
school late or stay home from school to help with water
collection
5. Someone in the household sleep very few hours because he/she
wake up very early in the morning to go for collecting water
6. The household did not collect water when it wanted to.
7. The household collect water from an undesirable or dirty source
because it could not collect from its preferred source
8. The household take water from someone else in the village
9. Anyone who is not a member of the family take water from
your house because of shortage
10. The household unable to complete all of its work due to water
collection

PART II: DATA ANALYSIS

Activity 2: The following small data set consists of four variables namely, Y= amount of body
fat, X1=triceps, X2=Thigh Circumference, and X3=Midarm Circumference.

X1 X2 X3 Y
19.5 43.1 29.1 11.9
24.7 49.8 28.2 22.8
30.7 51.9 37.0 18.7
29.8 54.3 31.1 20.1
19.1 42.2 30.9 12.9
25.6 53.9 23.7 21.7
31.4 58.5 27.6 27.1
27.9 52.1 30.6 25.4
22.1 49.9 23.2 21.3
25.5 53.5 24.8 19.3

A. Obtain the scatter plot of Y and X1, Y and X2, Y and X3, is there any linear relationship
between these variables?
Figure. 1 Scatter plot of amount of body fat and triceps
There is a linear relationship between amount of body fat and triceps.
Figures 2 Scatter plot of Amount of body fat and Thigh circumference
As we see the above scatter plot there is a linear relationships Amount of body fat and Thigh
circumference

Figure 3. Scatter plot of Amount of body fat and Midarm circumference

As we see the graph above dots are far from the fit line, so, there is a linear relationship between
Amount of body fat and Midarm circumference

B. Find the bivariate correlations and interpret.


Correlations
Thigh Midarm
Circumferenc Circumferenc amount of
Triceps e e body fat
Triceps Pearson 1 .877** .326 .709*
Correlation
Sig. (2-tailed) .001 .358 .022
N 10 10 10 10
Thigh Circumference Pearson .877** 1 -.165 .841**
Correlation
Sig. (2-tailed) .001 .648 .002
N 10 10 10 10
Midarm Pearson .326 -.165 1 -.228
Circumference Correlation
Sig. (2-tailed) .358 .648 .526
N 10 10 10 10
amount of body fat Pearson .709* .841** -.228 1
Correlation
Sig. (2-tailed) .022 .002 .526
N 10 10 10 10
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).
Figure 4 bivariant correlation
Interpretations of bivariate correlation results
a. The correlation Triceps with itself (r=1), and the number of non-missing
observations for triceps (n=10)
b. The correlation Triceps and Thigh circumference (r= 0.877), and based on
n=10 observations with pairwise non - missing values. Triceps and Thigh
circumference don’t have a statistically significant linear relationship (r=
0.877) since the P-value =0.358 greater than level of significant (α = 0.05)
c. The correlation Triceps and Midarm circumference (r= 0.326) and based on
n=10 observations with pairwise non - missing values.
Triceps and Midarm circumference don’t have a statistically significant linear
relationship (r= 0.877) since the P-value =0.358 does not less than level of
significant (α = 0.05)
d. The correlation Triceps and amount of body fat (r= 0.709) which is significant
(p < .0.022 for a two-tailed test), and based on n=10 observations with
pairwise non - missing values. Triceps and amount of body fat have a
statistically significant linear relationship (r= 0.709), since the P-value =0.022
less than level of significant (α = 0.05)
e. The correlation Thigh Circumference with itself (r=1), and the number of non-
missing observations for Thigh Circumference (n=10)
f. The correlation Thigh Circumference and Midarm circumference (r= -0.165),
and based on n=10 observations with pairwise non - missing values. There no
is a statistically significant linear relationship between Thigh Circumference
and Midarm circumference since p-value = 0.648 greater than level of
significant (α = 0.05)
g. The correlation Thigh Circumference and amount of body fat (r= 0.841), and
based on n=10 observations with pairwise non - missing values. There is no a
statistically significant linear relationship between Thigh Circumference and
amount of body fat. Since P- value = 0.02 greater than level of significant
(α = 0.01)
h. The correlation midarm circumference with itself (r=1), and the amount of
non-missing observations for Thigh Circumference (n=10)
i. The correlation midarm circumference and amount of body fat (r= -0228), and
based on n=10 observations with pairwise non - missing values. Since P-
value = 0.526 greater than level of significant (α = 0.05) there is
statistically significant linear relationship between midarm circumference
and amount of body fat.
C. Fit the linear regression of Y on X 1, X2 and X3. Interpret the coefficient of determination,
the ANOVA table and the parameter estimates.

Model Summary
Mod R R Adjusted R Std. Error
el Square Square of the
Estimate
1 .894a .799 .699 2.6508
a. Predictors: (Constant), Midarm Circumference, Thigh
Circumference, Triceps

ANOVA
Model Sum of df Mean F Sig.
Squares Square
1 Regressi 167.894 3 55.965 7.964 .016b
on
Residual 42.162 6 7.027
Total 210.056 9
a. Dependent Variable: amount of body fat
b. Predictors: (Constant), Midarm Circumference, Thigh Circumference,
Triceps
Coefficientsa
Model Unstandardized Standar t Sig. 95.0% Confidence Interval for B
Coefficients dized
Coeffici
ents
B Std. Beta Lower Bound Upper Bound
Error
1 (Constant) 186.668 129.207 1.445 .199 -129.491 502.826

Triceps 6.380 4.024 5.825 1.586 .164 -3.466 16.226


Thigh -4.576 3.393 -4.750 - .226 -12.878 3.725
Circumference 1.349
Midarm -3.391 2.082 -2.914 - .155 -8.486 1.705
Circumference 1.628
a. Dependent Variable: amount of body fat

Interpretation of the Regression result

a. 79.9% of explanatory variable (the amount of body fat) is explained by independent


variables.
b. The ANOVA table above indicates that the model, as a whole, is a significant fit to
the data at 0.05 level of significant.
c. The coefficient table above show that
 The constant, or intercept term for the line of best fit, is 186.668.
 The coefficient of Thigh Circumference and Midarm Circumference are
negative, 4.576 and -3.391 respectively. Whereas the coefficient of Triceps is
positive, which is 6.380.
 All explanatory variables are no statistically significant at 0.05 level of
significant. Which means all explanatory variables do not differ from zero.

D. Which explanatory variable is the most determinant factor of body fat (Y)? Why?
The most determinant explanatory variable of body fat is Triceps. Since it has larger
positive coefficient.

Activity 3:

3. Using the data SBP1.sav, describe the data using the appropriate summary statistics, graph
and diagram based on the properties of the variable?
Summary statistics Include the followings
a. Measure of Central tendency (Mean, Median and Mode), measure of dispersion (range,
variance and standard deviation), measure of distribution (skewness and kurtosis),
measure of positions (percentile)

Statistics

  Age Sex weight height SBP cholesterol

N Valid 866 866 864 864 835 825

Missing 0 0 2 2 31 41

Mean 49.10 0.48 164.28 65.356 126.13 204.52

Median 46.00 0.00 158.60 65.300 122.00 201.00

Std. Deviation 20.101 0.500 39.840 3.8177 19.605 45.344


Variance 404.056 0.250 1587.229 14.575 384.347 2056.087

Skewness 0.309 0.093 1.348 0.168 1.008 1.009

Kurtosis -1.119 -1.996 5.505 -0.134 1.545 3.419

Range 70 1 388 23.3 143 425

Percentile 25 31.00 0.00 137.88 62.700 112.00 172.00


s
50 46.00 0.00 158.60 65.300 122.00 201.00

75 65.00 1.00 186.78 67.975 137.00 231.00

b. Graphs (histogram, bar chart, pie chart)

Figure 1. Pie Chart


Figure 2 Box plot

4. Using SPSS data SBP.sav, please compute the following:


a. Transform the variable age in to four levels categorical variables named agecat.
Age cate
Lowest via 31 -> 1
31-46 ->2
46-75 ->3
75 via highest ->4
b. Compute the variable BMI using the variable weight and height.
c. Construct histogram for the variable weight showing all necessary components. Is the
weight distribution normal?

Fig 1. Histogram for weight.


As the graph show the distribution of weight is approximately normal.

d. Construct a bar chart showing the distribution of agecat by gender showing all necessary
components.
e. Recode the variable level HBP 0 to 1 and 1 to 2 in to the same variable.

See SPSS

f. Construct grouped frequency distribution for the variable height.

height in inches
Frequency Percent Valid Percent Cumulative
Percent
Valid < 60.0 65 7.5 7.5 7.5
60.0 - 64.9 343 39.6 39.7 47.2
65.0 - 69.9 351 40.5 40.6 87.8
70.0 - 74.9 97 11.2 11.2 99.1
75.0+ 8 .9 .9 100.0
Total 864 99.8 100.0
Missing System 2 .2

Total 866 100.0

g. Construct the pie chart for the variable race showing all necessary components.

h. Construct box plot for the variable weight using gender for group variable?
i. Construct the scatter plot using the variable weight and BMI?
j. Describe the variable gender, smoke and race using frequency?

Frequency
Statistics
sex of the race of the does respondent
subject subject smoke now?
N Valid 866 864 424
Missing 0 2 442
Mean .48 1.37 1.48
Std. Error of Mean .017 .020 .024
Median .00 1.00 1.00
Mode 0 1 1
Std. Deviation .500 .580 .500
Variance .250 .337 .250
Skewness .093 1.333 .095
Std. Error of Skewness .083 .083 .119
Kurtosis -1.996 .767 -2.000
Std. Error of Kurtosis .166 .166 .237
Range 1 2 1
Minimum 0 1 1
Maximum 1 3 2
Sum 413 1181 626
Percentil 25 .00 1.00 1.00
es 50 .00 1.00 1.00
75 1.00 2.00 2.00

k. Describe the variable age, weight and BMI using descriptive statistics.

Descriptive Statistics

N Range Minimu Maxi Mean Std. Varian Skewness Kurtosis


m mum Deviati ce
on

Statisti Statisti Statistic Statis Statis Statistic Statisti Statisti Std. Statist Std. Error
c c tic tic c c Error ic

age 866 70 20 90 49.10 20.101 404.05 0.309 0.083 -1.119 0.166


6

864 388 82 470 164.2 39.840 1587.2 1.348 0.083 5.505 0.166
weigh 8 29
t

BMI 863 66.08 13.29 79.37 26.98 5.94293 35.318 1.562 0.083 7.648 0.166
79

Valid 863                    
N

l. State gender and HBP are related, show column percentages? Crosstabulation

sex of the subject * high blood pressure Crosstabulation


high blood pressure Total
SBP<=140 SPB > 140
mmHg mmHg
sex of the femal Count 365 88 453
subject e % within high blood 52.6% 51.2% 52.3%
pressure
male Count 329 84 413
% within high blood 47.4% 48.8% 47.7%
pressure
Total Count 694 172 866
% within high blood 100.0% 100.0% 100.0
pressure %
Chi-Square Tests

Value df Asymptotic Exact Sig. Exact Sig.


Significance (2-sided) (1-sided)
(2-sided)

Pearson Chi-Square .113a 1 .737

Continuity Correctionb .063 1 .802

Likelihood Ratio .113 1 .737

Fisher's Exact Test .798 .401

Linear-by-Linear .113 1 .737


Association

N of Valid Cases 866

a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 82.03.

b. Computed only for a 2x2 table

Hypothesis :

H0: There is no association between Gender and Blood Pressure.

H1: There is an association between Gender and Blood Pressure.

The key result in the Chi-Square Tests table is the Pearson Chi-Square.

 The value of the test statistic is 0.113

 The footnote for this statistic pertains to the expected cell count assumption (i.e.,
expected cell counts are all greater than 5): no cells had an expected count less than 5, so
this assumption was met.
 Because the test statistic is based on a 2x2 crosstabulation table, the degrees of freedom
(df) for the test statistic is

df=(R−1)∗(C−1)=(2−1)∗(2−1)=1∗1=1, df=(R−1)∗(2−1)=(2−1)∗(2−1)=1∗1=1

 The corresponding p-value of the test statistic is p = 0.737.

DECISION AND CONCLUSIONS

Since the p-value (p = 0.737) is greater than our chosen significance level (α = 0.05), we do not
reject the null hypothesis. Rather, we conclude that there is not enough evidence to suggest an
association between gender and High blood Pressure .

Based on the results, we can state the following:

No association was found between gender and High blood Pressure (Χ2(2)> = 0.113, p = 0.737).

m. Test the mean BMI is equal to 25.

One-Sample Test

Test Value = 25

t df Sig. (2- Mean 95% Confidence Interval of


tailed) Difference the Difference

Lower Upper

BM 9.826 862 0.00 1.9878 1.590 2.384


I 0 7 8 9
Hypothesis

Ho: The mean of MBI is equal to 25

H1: The mean of MBI is differ from 25

Notice that the 95% confidence interval of the mean of BMI does not contain the
hypothesized value 25. There fore we reject the null hypothesis.

n. Test the mean SBP different between female and male?

ANOVA
  Sum of df Mean F Sig.
Squares Square
systolic Between (Combined) 3233.923 1 3233.923 8.490 0.004
blood Groups
pressure * Within Groups 317311.873 833 380.927    
sex of the
Total 320545.796 834      
subject

Hypothesis
Ho : The mean of SBP Female and the mean of SBP of male is equal.
H1: There is the difference between SBP for male and female.
As the table show the P-value is 0.004 which is less than the level of significant (alpha =
0.05.) so, we reject the null hypothesis.

You might also like