AGRON INFO TECH

One way analysis of variance in R

A statistical method called one way analysis of variance (ANOVA) is used to compare the means of three or more groups to see if there is a statistically significant difference between them. It is a parametric test that counts on data having a normal distribution and variances that are comparable between groups.

One way ANOVA’s null hypothesis is that there is no difference between the group means. The alternative theory is that one or more of the groups have different means.

The F-statistic, which is the proportion of between-group variation to within-group variance, must be calculated for the test. The null hypothesis is rejected if the F-statistic is large enough to show that the variance across groups is significantly higher than the variation within groups.

One way ANOVA’s presumptions include:

  • Each group’s data is often disseminated differently.
  • Each group’s variances are equal when there is homogeneity of variance.
  • Independence: Each group’s observations are made independently of one another.
  • Non-parametric tests like the Kruskal-Wallis test may be employed in place of parametric tests if the assumptions are not met.

One way ANOVA is commonly used in many fields, such as psychology, education, and business, to compare means of different groups and make statistical inferences about the population.

So let’s get started!

Clear R environment

As a first step, I always recommend to clear data objects and values in global environment with rm() function. In R, rm() is a function used to remove objects (variables, functions, etc.) from the R workspace. You can provide names of objects, variables or functions as list using ls() function that takes an argument all to remove all data from R workspace. You can close all open graphic devices using the graphics.off() function. The function shell() is a function that allows you to execute shell commands or scripts from within R. Typing “cls” in this function will clear the console environement.

rm(list = ls(all = TRUE))
graphics.off()
shell("cls")

Importing data

You may read Excel files into R using the read_excel() function from the readxl package in R. It can read files in both the .xls and .xlsx formats. Here, path is the path to the Excel file you want to read. The argument col_names specifies whether or not to read the column names from the file. The default is TRUE.

The function head() in R is used to display the first few rows of a data frame or matrix. It requests the object’s name and the number of rows you wish to display as input (by default, 6). We have converted the first 3 variables to factors by using as.factor(). The attach() function in R is used to include a data frame in the environment’s search route. The variables in a data frame can be referred to by their names without mentioning the data frame name when it is attached.

library(readxl)
data = read_excel(
          path = 'one-way-data.xlsx',
          col_names = TRUE
)

head(data)

data$Treatment = as.factor(data$Treatment)

attach(data)
# # A tibble: 6 x 2
#   Treatment Yield
#   <chr>     <dbl>
# 1 Ctrl       29.8
# 2 Ctrl       23  
# 3 Ctrl       24.3
# 4 NP         16.2
# 5 NP         11.9
# 6 NP         14.4

There are two variables in the imported data. One is independent variable labelled as Treatment. Second is dependent variable labelled as Yield. Treatment factor comprised of four different levels. These levels are control, nutrient priming, hydro priming and osmo priming.

One way ANOVA

We wish to investigate if there are statistically significant variations in sweet potato yield between the groups of treatment variables. We must first do an analysis of variance. The aov() function will return the analysis of variance table that takes certain arguments for example formula and data in this case. Here, formula is a model formula that specifies the variables and factors to be analyzed in the dataset, and data is the name of the data frame containing the variables. We can use summary() function to get results from the ANOVA model. This will reveal whether or not the treatment’s effect is significant.

anova <- aov(Yield ~ Treatment)

summary(anova)
#             Df Sum Sq Mean Sq F value   Pr(>F)    
# Treatment    3 1170.2   390.1   17.34 0.000733 ***
# Residuals    8  179.9    22.5                     
# ---
# Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can see the attributes of anova by using attributes function for anova object.

attributes(anova)
# $names
#  [1] "coefficients"  "residuals"     "effects"       "rank"         
#  [5] "fitted.values" "assign"        "qr"            "df.residual"  
#  [9] "contrasts"     "xlevels"       "call"          "terms"        
# [13] "model"        
# 
# $class
# [1] "aov" "lm"

You can also get information for the means by typing model.tables() function for ANOVA model.

model.tables(anova, "means")
# Tables of means
# Grand mean
#        
# 28.925 
# 
#  Treatment 
# Treatment
#  Ctrl    HP    NP    OP 
# 25.70 37.63 14.17 38.20

Mean separation test

TukeyHSD test

Following an ANOVA, the TukeyHSD() function in R is used to do a post-hoc analysis to determine which group differences are statistically significant. Specifically, it performs Tukey’s honestly significant difference (HSD) test to compare all possible pairs of means and determine which pairs are significantly different from each other.

TukeyHSD(x = anova,
         which = "Treatment",
         conf.level = 0.95)
#   Tukey multiple comparisons of means
#     95% family-wise confidence level
# 
# Fit: aov(formula = Yield ~ Treatment)
# 
# $Treatment
#                diff         lwr         upr     p adj
# HP-Ctrl  11.9333333  -0.4663365  24.3330031 0.0592490
# NP-Ctrl -11.5333333 -23.9330031   0.8663365 0.0685547
# OP-Ctrl  12.5000000   0.1003302  24.8996698 0.0482107
# NP-HP   -23.4666667 -35.8663365 -11.0669969 0.0013612
# OP-HP     0.5666667 -11.8330031  12.9663365 0.9987858
# OP-NP    24.0333333  11.6336635  36.4330031 0.0011621

LSD test

The LSD.test() is a function that is used to execute the Least Significant Difference (LSD) test, which is a post-hoc test used to compare the means of several groups following an ANOVA study. The LSD test determines whether pairs of groups have substantially different means by comparing the difference between the means of each pair of groups to the standard error of the difference.

library(agricolae)
LSD = LSD.test(y = anova,
               trt = "Treatment",
               DFerror = anova$df.residual,
               MSerror = deviance(anova)/anova$df.residual,
               alpha = 0.05,
               group = TRUE,
               console = TRUE)
# 
# Study: anova ~ "Treatment"
# 
# LSD t Test for Yield 
# 
# Mean Square Error:  22.48917 
# 
# Treatment,  means and individual ( 95 %) CI
# 
#         Yield      std r       LCL      UCL  Min  Max
# Ctrl 25.70000 3.609709 3 19.386268 32.01373 23.0 29.8
# HP   37.63333 7.333030 3 31.319601 43.94707 29.3 43.1
# NP   14.16667 2.159475 3  7.852935 20.48040 11.9 16.2
# OP   38.20000 4.300000 3 31.886268 44.51373 33.4 41.7
# 
# Alpha: 0.05 ; DF Error: 8
# Critical Value of t: 2.306004 
# 
# least Significant Difference: 8.928965 
# 
# Treatments with the same letter are not significantly different.
# 
#         Yield groups
# OP   38.20000      a
# HP   37.63333      a
# Ctrl 25.70000      b
# NP   14.16667      c

Scheffe test

The Scheffe test, a post-hoc test used to compare the means of several groups following an ANOVA study, is carried out in R using the function scheffe.test(). The Scheffe test is helpful when there are several groups being compared since it is a more conservative test when compared to other post-hoc tests, such Tukey’s HSD test.

scheffe.test(y = anova,
             trt = "Treatment", 
             DFerror = anova$df.residual,
             MSerror = deviance(anova)/anova$df.residual,
             Fc = summary(anova)[[1]][1,4],
             alpha = 0.05,
             group = TRUE,
             console = TRUE)
# 
# Study: anova ~ "Treatment"
# 
# Scheffe Test for Yield 
# 
# Mean Square Error  : 22.48917 
# 
# Treatment,  means
# 
#         Yield      std r  Min  Max
# Ctrl 25.70000 3.609709 3 23.0 29.8
# HP   37.63333 7.333030 3 29.3 43.1
# NP   14.16667 2.159475 3 11.9 16.2
# OP   38.20000 4.300000 3 33.4 41.7
# 
# Alpha: 0.05 ; DF Error: 8 
# Critical Value of F: 4.066181 
# 
# Minimum Significant Difference: 13.52368 
# 
# Means with the same letter are not significantly different.
# 
#         Yield groups
# OP   38.20000      a
# HP   37.63333      a
# Ctrl 25.70000     ab
# NP   14.16667      b

Visualizing results

The results can be visualized using bar.err() function. It plots bars of the averages of treatments and standard error or standard deviance. It uses the objects generated by a procedure of comparison like LSD, HSD, Kruskal and Waller-Duncan.

The argument x specify the object means of the comparisons as mentioned above. In variation you can specify the values as SE, range and IQR.

bar.err(x = LSD$means,
        variation = "SE",
        ylim = c(0,45),
        angle = 125,
        density = 6)

title(main = "Standard deviation",
      cex.main = 1,
      xlab = "Seed priming",ylab = "Yield")

bar.err(x = LSD$means,
        variation = "range",
        ylim = c(0,45),
        bar = FALSE,
        col = "green")

title(main = "Range",
      cex.main = 1,
      xlab = "Seed priming",ylab = "Yield")

From the above plots you can remove the bars to only show the error bars by setting the logical value of bar argument to FALSE.

bar.err(x = LSD$means,
        variation = "range",
        ylim = c(0,45),
        bar = FALSE,
        col = 0)
abline(h = 0)

title(main = "bar = FALSE",
      cex.main = 1,
      xlab = "Seed priming",ylab = "Yield")

Your comments and suggestions are highly appreciated.


Download data file — Click here


Download R program — Click here

Download R studio — Click here

5 thoughts on “One way analysis of variance in R”

Comments are closed.