Ambo University Woliso Campus: Advanced Biostatics Assignment

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

AMBO UNIVERSITY WOLISO

CAMPUS

Advanced biostatics assignment

Name: Setegn Gemechu

ID: MPHE/PGW/018/14

Summited to Tariku Dejene (Associate professor, PhD fellow)

Submission date April 3, 2022

1
PART I: T-Test and ANOVA

A. How the outcome variable, weight loss, is generated?

By computing variables and the steps are:

 Click the on transform on the main menu


 Select compute variables
 Select target variable in the compute variables dialog box and enter the name of new
variable in these case “weight loss”.
 Click numeric expression area on the compute variables dialog box and enter
“preweight – weight after 10weeks” in the box provided.
 Click ok and weight loss outcome will be produced.

B. Is weight loss the same for males and females?

Weight loss is a continuous variable and we are comparing it between the groups (male and
females); appropriate test is independent samples t-test.

1. State the claim

H0: there is no mean weight lost difference between male and female

HA: mean weight lost is different among male and females

2. Check assumptions

Weight loss is normally distributed in both groups (Female and Male) as p-value > 0.05 (which
is 0.200) from Kolmogorov–Smirnov test. Assumption of homogeneity of variances is met
because Levene’s test is non-significant (i.e. p > .05 which is 0.601), then we do not have
sufficient evidence to reject the null hypothesis that the difference between the variances is zero
– in other words, we can assume that the variances are roughly equal and the assumption is
tenable. So we should read the test statistics in the row labeled Equal variances assumed.

3. State significance level (Alfa = 0.05)


4. Compute test statistic(t)

2
t = observed difference between sample means - expected difference between population means
Estimate of the standard error of the difference between two sample means
t = -0.12213 = - 0.209
0.58365

5. Decision

We fail to reject the null hypothesis that says no difference in weight loss between male and
female because t-value is non-significant at two-tailed value of p = 0.835 which is greater than
0.05

6. Conclusion

The value of t (74) = -0.209 is non-significant because the two-tailed value of p is 0.835 which is
greater than 0.05, and so we would have to conclude that there was no significant difference
between the means weight loss of male and female. On average, males have greater weight loss
(Mean = 4.0152, SE = 2.52984) than females (Mean = 3.8930, SE = 2.51589). This difference
was not significant of t (74) = -0.209, p 0.8305).

c. Which diet is best for losing weight?


We have three different categories of diet for weight loss. We are comparing which one of the
three diets is best for losing weight. Our outcome variable (weight loss) is a continuous and
normally distributed variable (i.e. p > 0.05) and when the explanatory variable (diets) is
categorical with three categories. Therefore the appropriate test for this claim is one way
ANOVA. But see which diet is different we use custom tables
Table: 1 weight loss and diet categories

Weight lost (kg)


Mean
deit2 cat one 3.30

cat two 3.03


cat three 5.15

From the table we see that diet one and two have closer weight loss but diet three has 5.15 mean
weight loss which is different from the other. This indicates that diet three is best for weight loss.

3
PART II: Correlation and Linear Regression

The “BirthWeightData.sav” data has newborn and parental characteristics of 42 newborns. Use
any relevant types of data summarization technique (numerical summary measures and graphs)
and explore any form of relationship birth weight of a baby has with potential parental
characteristics (Any type of descriptive bivariate analysis can be utilized)

From the given data most of the parenteral variable are a scale variable; we can use the scatter
plot diagram to see their relationship with birth weight of a baby. For the categorical variables
we can use bar-graphs and descriptive frequencies

Table: 2 mean birth weight among smoking status, mothers age category and low birth weight
baby
Birth weight (kg)
Mean
smoker Non-smoker 3.51
Smoker 3.13
Mother aged over 35 Aged < 35 3.33
Aged 35+ 3.11
Low birth weight baby Not low birth weight 3.47
Low birth weight 2.36

4
The frequency tables are reveal that only 9.5% of the mothers have age 35 and greater, but

almost all have age less than 35 years(90.5%). 85.7% of the mothers in the given data have

history of low birth weight baby while 14.3% does. Almost half (47.6%) of the mother and

around one third (35.7%) of the father are not smoking cigarettes.

On average, child of non-smoker (mean = 3.53) mother has greater birth weight than smoker

mother (mean = 3.13). Mothers who smoke greater than 25 cigarettes per day has low birth

weight baby (mean<= 2.93) when compared to mothers who smoke less than 25 cigarettes per

day. When we compare mothers based on the previous history of low birth weight baby; mothers

those who have no history low birth weight has greater birth weight(mean = 3.47) in current than

mothers who has low birth weight baby in past pregnancy(mean = 2.36).

The pattern made by the points plotted on the scatter diagram (figure 1) shows they are scattered
around the straight line crossing the Y- axis above two. It shows the linear relationship between
the gestational ages of the mother with birth weight of the baby. There is a moderate relationship

5
between them (r2 0.502). From the fitted straight line, for one week increase in the gestational
age; birth weight of the baby increase by 0.16kg.

When we see the relationship of the years father was in education (r2 = 0.005), fathers height (r2
= 0.00096) and the father’s age (r2 = 0.031) with the birth weight (fig 2), they have almost no
linear association with the birth weight of the baby. The scatter plot shows the weak association
between the birth weight and mothers pre-pregnancy weight (r2 = 0.161). The points are
scattered away from the fitted line (y = 1.38+0.03x).

B. The potential predictors of the birth weight of the baby are gestational at birth,
low birth weight baby, smoking status and maternal height.

Table: 1 model summary of the multiple linear regressions

Model Summaryb
R Std. Error Change Statistics
Mo Squa Adjusted of the R Square Sig. F
del R re R Square Estimate Change F Change df1 df2 Change
1 .81 .656 .619 .37277 .656 17.651 4 37 .000
a
0
a. Predictors: (Constant), Low birth weight baby, Maternal height (cm), smoker, Gestational age at birth
(wks)
b. Dependent Variable: Birth weight (kg)

Table: 2 ANOVA tables of the multiple regression models


ANOVAa
Sum of
Model Squares df Mean Square F Sig.
1 Regression 9.811 4 2.453 17.651 .000b
Residual 5.141 37 .139
Total 14.952 41
a. Dependent Variable: Birth weight (kg)
b. Predictors: (Constant), Low birth weight baby, Maternal height (cm), smoker,
Gestational age at birth (weeks)

6
From the coefficients table 3

Y = -4.005 + 0.111gestation – 0.240smoker + 0.19mheight – 0.458 low birth weight baby

From this model by keeping the other predictors constant, when the gestational age increase by
one week, the birth weight increase by 0.111kg. When mother is smoking the birth weight is
decrease by 0.240kg keeping other factor constant. Birth weight increases by 0.19kg for each
unit increase in maternal height when all of the other explanatory variables are held constant. In
the same manner keeping other constant; Birth weight decrease by 0.458 if the mother had low
birth weight baby.

When we see the ANOVA table (table 2); the value of model sum square is larger the residual
sum squares. This implies that the regression model better than using the mean model to predict
the birth weight. The F- test value is 17.651 which is significant with the p-value <0.05. The
table 1 shows the value of the adjusted R square of 0.619. That means 61.9% of the birth weight
is explained by the model.
c. Use your model and produce the predicted birth weights and residuals for a
newborn with id = 27

The newborn on the ID = 27 have the gestational age at birth of 33weeks, a smoker mothers,
maternal height of 1.61m and had low birth weight baby. When we put that information into the
following formula;

Y = -4.005 + 0.111gestation – 0.240smoker + 0.19mheight – 0.458 low birth weight baby

Y= -4.005 + 0.111x33weeks – 0.240x1 + 0.19x1.61m – 0.458x1

y = - 0.7341 which is unstandardized predicted value and the unstandardized residual error
corresponding to predicted birth weights is -0.13577

d. After fitting the model, you also need to check assumptions of a linear regression
model such as multicollinearity, homoscedasticity, and normality and comment on the
result.

7
The Durbin-Watson test value is 1.549 which is close to 2 due to that residual term is
uncorrelated. The following scatterplot shows homoscedasticity of the variance of the residual
Terms as the points are scattered around the fitted straight line.

You might also like