Professional Documents
Culture Documents
Case Study 3 Explanation
Case Study 3 Explanation
Case Study 3 Explanation
For the part one, we have generated sample data for age and blood pressure, normalized the
variables, fitted a linear model, and created a plot showing the relationship between age and blood
pressure. The plot shows the scaled age on the x axis and scaled blood pressure on the y axis, with a
regression line indicating the relationship between those 2 variables.
Code breakdown :
1. Data generation
Here, we set the seed for reproducibility, generate random ages between 25 and 70, and calculate
blood pressure using a linear relationship with some added noise. We then create a data frame
df_part1 to store these values.
2. Normalization
We normalize the numeric variables (age and blood pressure) using the scale() function, which
centers and scales the variables to have a mean of 0 and standard deviation of 1.
We fit a linear model (lm()) to examine the relationship between blood pressure and age. The formula
Blood_Pressure ~ Age specifies that blood pressure is the response variable and age is the predictor
variable.
4. Plotting
We create a scatter plot of scaled age against scaled blood pressure using plot(). Then, we add a
regression line (abline()) based on the linear model we fitted earlier. The regression line helps
visualize the relationship between age and blood pressure.
5. Enhance plotting with ggplot2 :
This block uses the ggplot2 package to create a more visually appealing plot. We map age to the x-
axis and blood pressure to the y-axis. The geom_point() function adds scatter plot points, and
geom_smooth() adds a regression line with confidence intervals.
Result : we can this that the graph shows a positive linear line. We also added the confidence interval
that may result of a statistical prediction of where the line can or could be present.
Overall it is not really important here because we can see a positive correlation showing that the age
and the blood pressure are linked : The blood pressure tend to increase with age.
Part 2
1. Here, we're generating sample data for 120 patients divided into three drug types, "Drug A",
"Drug B", and "Drug C". We're assuming a normal distribution of blood pressure reduction
values with a mean of 10 and standard deviation of 3.
2. We're converting the "Drug_Types" variable to a factor with specified levels to ensure correct
order for plotting. Additionally, we're scaling the "Blood_PressureReduction" variable for
normalization.
3. ANOVA is performed to analyze the variance between drug types and their effects on blood
pressure reduction. Tukey's HSD test is conducted to identify significant differences between
drug types.
4. A boxplot is created to visualize the distribution of blood pressure reduction for each drug
type. The colors are specified for each drug type.