A statistical method called one way analysis of variance (ANOVA) is used to compare the means of three or more groups to see if there is a statistically significant difference between them. It is a parametric test that counts on data having a normal distribution and variances that are comparable between groups.
One way ANOVA’s null hypothesis is that there is no difference between the group means. The alternative theory is that one or more of the groups have different means.
The F-statistic, which is the proportion of between-group variation to within-group variance, must be calculated for the test. The null hypothesis is rejected if the F-statistic is large enough to show that the variance across groups is significantly higher than the variation within groups.
One way ANOVA’s presumptions include:
- Each group’s data is often disseminated differently.
- Each group’s variances are equal when there is homogeneity of variance.
- Independence: Each group’s observations are made independently of one another.
- Non-parametric tests like the Kruskal-Wallis test may be employed in place of parametric tests if the assumptions are not met.
One way ANOVA is commonly used in many fields, such as psychology, education, and business, to compare means of different groups and make statistical inferences about the population.
So let’s get started!
Contents
Clear R environment
As a first step, I always recommend to clear data objects and values in global environment with rm()
function. In R, rm()
is a function used to remove objects (variables, functions, etc.) from the R workspace. You can provide names of objects, variables or functions as list using ls()
function that takes an argument all
to remove all data from R workspace. You can close all open graphic devices using the graphics.off()
function. The function shell()
is a function that allows you to execute shell commands or scripts from within R. Typing “cls” in this function will clear the console environement.
rm(list = ls(all = TRUE))
graphics.off()
shell("cls")
Importing data
You may read Excel files into R using the read_excel()
function from the readxl package in R. It can read files in both the .xls and .xlsx formats. Here, path
is the path to the Excel file you want to read. The argument col_names
specifies whether or not to read the column names from the file. The default is TRUE
.
The function head()
in R is used to display the first few rows of a data frame or matrix. It requests the object’s name and the number of rows you wish to display as input (by default, 6). We have converted the first 3 variables to factors by using as.factor()
. The attach()
function in R is used to include a data frame in the environment’s search route. The variables in a data frame can be referred to by their names without mentioning the data frame name when it is attached.
library(readxl)
data = read_excel(
path = 'one-way-data.xlsx',
col_names = TRUE
)
head(data)
data$Treatment = as.factor(data$Treatment)
attach(data)
# # A tibble: 6 x 2 # Treatment Yield # <chr> <dbl> # 1 Ctrl 29.8 # 2 Ctrl 23 # 3 Ctrl 24.3 # 4 NP 16.2 # 5 NP 11.9 # 6 NP 14.4
There are two variables in the imported data. One is independent variable labelled as Treatment. Second is dependent variable labelled as Yield. Treatment factor comprised of four different levels. These levels are control, nutrient priming, hydro priming and osmo priming.
One way ANOVA
We wish to investigate if there are statistically significant variations in sweet potato yield between the groups of treatment variables. We must first do an analysis of variance. The aov()
function will return the analysis of variance table that takes certain arguments for example formula
and data
in this case. Here, formula
is a model formula that specifies the variables and factors to be analyzed in the dataset, and data is the name of the data frame containing the variables. We can use summary()
function to get results from the ANOVA model. This will reveal whether or not the treatment’s effect is significant.
anova <- aov(Yield ~ Treatment)
summary(anova)
# Df Sum Sq Mean Sq F value Pr(>F) # Treatment 3 1170.2 390.1 17.34 0.000733 *** # Residuals 8 179.9 22.5 # --- # Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can see the attributes of anova by using attributes function for anova object.
attributes(anova)
# $names # [1] "coefficients" "residuals" "effects" "rank" # [5] "fitted.values" "assign" "qr" "df.residual" # [9] "contrasts" "xlevels" "call" "terms" # [13] "model" # # $class # [1] "aov" "lm"
You can also get information for the means by typing model.tables()
function for ANOVA model.
model.tables(anova, "means")
# Tables of means # Grand mean # # 28.925 # # Treatment # Treatment # Ctrl HP NP OP # 25.70 37.63 14.17 38.20
Mean separation test
TukeyHSD test
Following an ANOVA, the TukeyHSD()
function in R is used to do a post-hoc analysis to determine which group differences are statistically significant. Specifically, it performs Tukey’s honestly significant difference (HSD) test to compare all possible pairs of means and determine which pairs are significantly different from each other.
TukeyHSD(x = anova,
which = "Treatment",
conf.level = 0.95)
# Tukey multiple comparisons of means # 95% family-wise confidence level # # Fit: aov(formula = Yield ~ Treatment) # # $Treatment # diff lwr upr p adj # HP-Ctrl 11.9333333 -0.4663365 24.3330031 0.0592490 # NP-Ctrl -11.5333333 -23.9330031 0.8663365 0.0685547 # OP-Ctrl 12.5000000 0.1003302 24.8996698 0.0482107 # NP-HP -23.4666667 -35.8663365 -11.0669969 0.0013612 # OP-HP 0.5666667 -11.8330031 12.9663365 0.9987858 # OP-NP 24.0333333 11.6336635 36.4330031 0.0011621
LSD test
The LSD.test()
is a function that is used to execute the Least Significant Difference (LSD) test, which is a post-hoc test used to compare the means of several groups following an ANOVA study. The LSD test determines whether pairs of groups have substantially different means by comparing the difference between the means of each pair of groups to the standard error of the difference.
library(agricolae)
LSD = LSD.test(y = anova,
trt = "Treatment",
DFerror = anova$df.residual,
MSerror = deviance(anova)/anova$df.residual,
alpha = 0.05,
group = TRUE,
console = TRUE)
# # Study: anova ~ "Treatment" # # LSD t Test for Yield # # Mean Square Error: 22.48917 # # Treatment, means and individual ( 95 %) CI # # Yield std r LCL UCL Min Max # Ctrl 25.70000 3.609709 3 19.386268 32.01373 23.0 29.8 # HP 37.63333 7.333030 3 31.319601 43.94707 29.3 43.1 # NP 14.16667 2.159475 3 7.852935 20.48040 11.9 16.2 # OP 38.20000 4.300000 3 31.886268 44.51373 33.4 41.7 # # Alpha: 0.05 ; DF Error: 8 # Critical Value of t: 2.306004 # # least Significant Difference: 8.928965 # # Treatments with the same letter are not significantly different. # # Yield groups # OP 38.20000 a # HP 37.63333 a # Ctrl 25.70000 b # NP 14.16667 c
Scheffe test
The Scheffe test, a post-hoc test used to compare the means of several groups following an ANOVA study, is carried out in R using the function scheffe.test()
. The Scheffe test is helpful when there are several groups being compared since it is a more conservative test when compared to other post-hoc tests, such Tukey’s HSD test.
scheffe.test(y = anova,
trt = "Treatment",
DFerror = anova$df.residual,
MSerror = deviance(anova)/anova$df.residual,
Fc = summary(anova)[[1]][1,4],
alpha = 0.05,
group = TRUE,
console = TRUE)
# # Study: anova ~ "Treatment" # # Scheffe Test for Yield # # Mean Square Error : 22.48917 # # Treatment, means # # Yield std r Min Max # Ctrl 25.70000 3.609709 3 23.0 29.8 # HP 37.63333 7.333030 3 29.3 43.1 # NP 14.16667 2.159475 3 11.9 16.2 # OP 38.20000 4.300000 3 33.4 41.7 # # Alpha: 0.05 ; DF Error: 8 # Critical Value of F: 4.066181 # # Minimum Significant Difference: 13.52368 # # Means with the same letter are not significantly different. # # Yield groups # OP 38.20000 a # HP 37.63333 a # Ctrl 25.70000 ab # NP 14.16667 b
Visualizing results
The results can be visualized using bar.err()
function. It plots bars of the averages of treatments and standard error or standard deviance. It uses the objects generated by a procedure of comparison like LSD, HSD, Kruskal and Waller-Duncan.
The argument x
specify the object means of the comparisons as mentioned above. In variation
you can specify the values as SE, range and IQR.
bar.err(x = LSD$means,
variation = "SE",
ylim = c(0,45),
angle = 125,
density = 6)
title(main = "Standard deviation",
cex.main = 1,
xlab = "Seed priming",ylab = "Yield")
bar.err(x = LSD$means,
variation = "range",
ylim = c(0,45),
bar = FALSE,
col = "green")
title(main = "Range",
cex.main = 1,
xlab = "Seed priming",ylab = "Yield")
From the above plots you can remove the bars to only show the error bars by setting the logical value of bar
argument to FALSE.
bar.err(x = LSD$means,
variation = "range",
ylim = c(0,45),
bar = FALSE,
col = 0)
abline(h = 0)
title(main = "bar = FALSE",
cex.main = 1,
xlab = "Seed priming",ylab = "Yield")
Your comments and suggestions are highly appreciated.
Download data file — Click here
Download R program — Click here
Download R studio — Click here
Outstanding feature
Hey! There Admins I like Your Blog And Appreciate your hard work keep it up.
Thank you for your kind words.
I really like your post because this post is very helpful to me. Thank You For Sharing.
Thank you for your kind words, I’m glad to hear that the post was helpful to you.