Case Study 3 Explanation

You might also like

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

Part 1 :

For the part one, we have generated sample data for age and blood pressure, normalized the
variables, fitted a linear model, and created a plot showing the relationship between age and blood
pressure. The plot shows the scaled age on the x axis and scaled blood pressure on the y axis, with a
regression line indicating the relationship between those 2 variables.

Code breakdown :

1. Data generation

Here, we set the seed for reproducibility, generate random ages between 25 and 70, and calculate
blood pressure using a linear relationship with some added noise. We then create a data frame
df_part1 to store these values.

2. Normalization

We normalize the numeric variables (age and blood pressure) using the scale() function, which
centers and scales the variables to have a mean of 0 and standard deviation of 1.

3. Linear Model Fitting

We fit a linear model (lm()) to examine the relationship between blood pressure and age. The formula
Blood_Pressure ~ Age specifies that blood pressure is the response variable and age is the predictor
variable.

4. Plotting

We create a scatter plot of scaled age against scaled blood pressure using plot(). Then, we add a
regression line (abline()) based on the linear model we fitted earlier. The regression line helps
visualize the relationship between age and blood pressure.
5. Enhance plotting with ggplot2 :

This block uses the ggplot2 package to create a more visually appealing plot. We map age to the x-
axis and blood pressure to the y-axis. The geom_point() function adds scatter plot points, and
geom_smooth() adds a regression line with confidence intervals.

Result : we can this that the graph shows a positive linear line. We also added the confidence interval
that may result of a statistical prediction of where the line can or could be present.

Overall it is not really important here because we can see a positive correlation showing that the age
and the blood pressure are linked : The blood pressure tend to increase with age.

Part 2

1. Here, we're generating sample data for 120 patients divided into three drug types, "Drug A",
"Drug B", and "Drug C". We're assuming a normal distribution of blood pressure reduction
values with a mean of 10 and standard deviation of 3.
2. We're converting the "Drug_Types" variable to a factor with specified levels to ensure correct
order for plotting. Additionally, we're scaling the "Blood_PressureReduction" variable for
normalization.

3. ANOVA is performed to analyze the variance between drug types and their effects on blood
pressure reduction. Tukey's HSD test is conducted to identify significant differences between
drug types.

4. A boxplot is created to visualize the distribution of blood pressure reduction for each drug
type. The colors are specified for each drug type.

5. Here, Tukey_Test$Drug_Types[, 4] retrieves the p-values for the pairwise comparisons


between drug types from Tukey's test results. The [ , 4] part specifies that we are extracting
the p-values, which are in the fourth column of the Tukey's test results for drug types.
6. These lines convert the Drug_Types factor into numeric indices and retrieve the levels of the
factor. These will be used later to position the significance indicators correctly on the plot.
7. The loop iterates through significant pairs identified by Tukey's test. It ensures correct
matching for "Drug C" and adds significance indicators "*" to the boxplot at the appropriate
positions for significant differences between drug types.

You might also like