Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

NAME-VISHAL DEV HOTA

REGNO-21BLC1084

EXPERIMENT 8

ESSENTIALS OF DATA ANALYTICS

code
rm(list=ls())

data <- data.frame(sale.count=c(40,60,70,30,50,30,30,10,70,60,50,60,30,20,20),

type=c("Can-A","Can-A","Can-A","Can-A","Can-A","Can-B","Can-B","Can-B","Can-B","Can-
B","Can-C","Can-C","Can-C","Can-C","Can-C"))

library(dplyr)

group_by(data,type) %>% summarise(count = n(),mean = mean(sale.count, na.rm = TRUE))

# ANOVA

result<- aov(sale.count~type, data = data)

summary(result)

data<- PlantGrowth

library(dplyr)

group_by(data,group) %>% summarise(count = n(),mean = mean(weight, na.rm = TRUE))

# ANOVA
result<- aov(weight~group, data = data)

summary(result)

# Tukey HSD (Tukey Honest Significant Differences)

TukeyHSD(result)

# Homogeneity of variances (equal variances assumption)

plot(result, 1)

# Normality assumption

plot(result, 2)

# Kruskal-Wallis rank sum test (used when ANOVA assumptions are not met)

kruskal.test(weight~group, data = data)

output
explanation
1. Sales Data Analysis:
• A data frame data is created containing two variables: sale.count (sales
counts) and type (types of products).
• The dplyr library is loaded to perform data manipulation tasks.
• group_by() function groups the data by type.
• summarise() function calculates the count and mean sale counts for each
product type.
• ANOVA is performed using the aov() function to compare mean sale
counts across different product types.
• The results of ANOVA are summarized using summary() function.
2. Plant Growth Data Analysis:
• The PlantGrowth dataset, which is available in R by default, is used. It
contains data on plant growth under different treatment groups ( group)
and the corresponding plant weights (weight).
• Again, dplyr library is loaded to manipulate the data.
• The data is grouped by group using group_by() function.
• summarise() function calculates the count and mean weight for each
treatment group.
• ANOVA is performed using the aov() function to compare mean
weights across different treatment groups.
• The results of ANOVA are summarized using summary() function.
3. Post hoc Analysis (Tukey's HSD Test):
• The TukeyHSD() function is used to perform Tukey's HSD test, which
identifies significant differences between pairs of groups.
• This test is applied to the results obtained from ANOVA.
4. Assumption Checks:
• Plotting functions ( plot()) are used to check the assumptions of
ANOVA:
• plot(result, 1) checks for homogeneity of variances assumption.
• plot(result, 2) checks for normality assumption.
• Additionally, if the assumptions of ANOVA are not met, the Kruskal-
Wallis rank sum test (kruskal.test()) can be performed as an alternative
non-parametric test.

Overall, this code demonstrates a comprehensive approach to conducting statistical


analysis, including checking assumptions and performing post hoc tests to identify
significant differences between groups. It showcases the use of popular R packages
such as dplyr for data manipulation and stats for statistical analysis.

You might also like