AGRON INFO TECH

Generate publication ready LSD test results with R

In this blog post, we’ll walk through how to generate publication-ready LSD test results using R. This method is quick, easy, and adaptable for multiple variables. Let’s dive in!

Required Packages

First, we need to load the necessary packages. These include readxl for reading Excel files, dplyr and tibble for data manipulation, flextable for creating flexible table layout, and agricolae for statistical analysis.

library(readxl)
library(dplyr)
library(tibble)
library(flextable)
library(agricolae)

Importing Data

Next, we import the data from an Excel file. Make sure to replace "Data.xlsx" with the path to your own data file.

data = read_excel(
          path = "Data.xlsx",
          col_names = TRUE
)
head(data)
## # A tibble: 6 × 10
##     Rep Water Priming `Plant height` Spikelets `Spike length` `Grain per spike`
##   <dbl> <chr> <chr>            <dbl>     <dbl>          <dbl>             <dbl>
## 1     1 CONV  HP                  93       293           15.3                59
## 2     2 CONV  HP                  99       261           13.2                54
## 3     3 CONV  HP                  85       231           15.2                56
## 4     1 CONV  NP                  90       204           12                  35
## 5     2 CONV  NP                  95       252           13                  32
## 6     3 CONV  NP                  92       191           14.5                33
## # ℹ 3 more variables: `grain weight` <dbl>, `Biological yield` <dbl>,
## #   `Grain yield` <dbl>

Preparing Data

We then convert categorical variables to factor variables. In this example, we’re converting the Water and Priming variables.

data$Water = as.factor(data$Water)
data$Priming = as.factor(data$Priming)
str(data)
## tibble [27 × 10] (S3: tbl_df/tbl/data.frame)
##  $ Rep             : num [1:27] 1 2 3 1 2 3 1 2 3 1 ...
##  $ Water           : Factor w/ 3 levels "CONV","FLOOD",..: 1 1 1 1 1 1 1 1 1 2 ...
##  $ Priming         : Factor w/ 3 levels "HP","NP","OP": 1 1 1 2 2 2 3 3 3 1 ...
##  $ Plant height    : num [1:27] 93 99 85 90 95 92 95 98 85 96 ...
##  $ Spikelets       : num [1:27] 293 261 231 204 252 191 322 287 312 304 ...
##  $ Spike length    : num [1:27] 15.3 13.2 15.2 12 13 ...
##  $ Grain per spike : num [1:27] 59 54 56 35 32 33 64 62 69 38 ...
##  $ grain weight    : num [1:27] 46 46.3 42.5 37 40.5 38.2 56.5 59.3 58.2 49.8 ...
##  $ Biological yield: num [1:27] 16.5 14.2 15.9 13.3 13.6 ...
##  $ Grain yield     : num [1:27] 5.3 5.05 5.5 4.2 4.03 4.45 5.85 6.15 6.58 5.3 ...

Analysis of Variance

We perform an analysis of variance (ANOVA) on our data. This is done for each column in our data frame, excluding the first three columns because the first variable is replication and second and third variables are factor variables.

First we created an empty list named aov.model. This list will be used to store the results of the ANOVA for each variable in the dataset. Then we used a for loop that iterates over each column in the dataset, excluding the first three columns. Inside the loop, we created a vector cols that contains the names of all response variables in the dataset.

Then we used the lapply function to apply a function to each element in the cols vector. The function being applied is specified in the following lines. The function that is going to apply on each response variable was sepcified in the argument termlabesl within aov() function.

It performs an ANOVA using the aov function. The reformulate function is used to create a formula for the ANOVA. The formula specifies that the response variable (the variable we’re interested in) is x (the current element in cols), and the predictor variables (the variables we think might affect the response variable) are RepWater, and Priming. The data = data argument specifies that the data for the ANOVA comes from the data object we created earlier.

The results of each ANOVA are stored in the aov.model list.

aov.model <- list()

for(i in 1:ncol(data[-c(1:3)])) {
          
          cols <- names(data)[4:ncol(data)]
          
          aov.model <- lapply(X = cols, FUN = function(x) 
                    aov(reformulate(termlabels = "Rep + Water*Priming", 
                                    response = x), 
                        data = data))
}

LSD Mean Comparison Test

Then we performed a Least Significant Difference (LSD) test on the results of the previously conducted Analysis of Variance (ANOVA). First we created an empty list named test to store the results of the LSD test.

We again used a for loop that iterates over each response variable in the dataset. Inside the loop, we performed an LSD test on the ith ANOVA model stored in aov.model. The LSD test is used to compare the means of groups in the ANOVA to find out if they are significantly different from each other. The y argument is the response variable (the ANOVA model), trt is the treatment variable (in this case, “Water”), DFerror is the degrees of freedom of the residuals, MSerror is the mean square error, group is a logical argument indicating whether to calculate the groups or not, and console is a logical argument indicating whether to print the results to the console or not.

The results of each LSD test are stored in the test list.

test <- list()
for(i in 1:ncol(data[-c(1:3)])) {
          
          test[[i]] = LSD.test(y = aov.model[[i]], 
                               trt = "Water", 
                               DFerror = aov.model[[i]]$df.residual,
                               MSerror = deviance(aov.model[[i]])/aov.model[[i]]$df.residual,
                               group = T,
                               console = F)
}

Separating Groups and Adding LSD Values

We separate the groups and add LSD values. If the ANOVA result for the Water (factor A) variable is significant (p < 0.05), we round the values to 2 decimal places and add the LSD value. If the result is not significant, we add “ns” instead of the LSD value.

We first created an empty list named groups. This list will be used to store the results of the groupings for each variable in the dataset. Then again we used a for loop that iterates over each column in the dataset. Inside if() command we ensured to check if the p-value of the “Water” factor in the ANOVA model for the ith column is less than 0.05. If it is, it means that the “Water” factor has a significant effect on the ith variable.

The it creates a new row in the groups list for the ith variable, rounds the values to two decimal places, adds the LSD value from the test list, converts the list to a data frame, rounds the “LSD” row to four decimal places, and finally keeps only the first column of the data frame.

If the p-value is not less than 0.05, the else statement is executed. This code creates a new row in the groups list for the ith variable, rounds the values to two decimal places, and adds “ns” (not significant) to the end of the list.

groups <- list()
for(i in 1:ncol(data[-c(1:3)])) {
          if(anova(aov.model[[i]])[5]["Water",] < 0.05) { 
                    groups[[i]] = test[[i]][["groups"]]
                    groups[[i]][,1] = round(as.numeric(groups[[i]][,1]), digits = 2)
                    groups[[i]][nrow(groups[[i]]) + 1,] = c(test[[i]]$statistics$LSD, "")
                    rownames(groups[[i]]) = c(rownames(test[[i]]$groups), "LSD")
                    groups[[i]][,1] = paste(groups[[i]][,1], groups[[i]][,2])
                    groups[[i]] = as.data.frame(groups[[i]])
                    groups[[i]]["LSD",] = round(as.numeric(groups[[i]]["LSD",]), digits = 4)
                    groups[[i]] = groups[[i]][1]
          }
          else{
                    groups[[i]] = test[[i]][["groups"]][1]
                    groups[[i]][,1] = round(as.numeric(groups[[i]][,1]), digits = 2)
                    groups[[i]][nrow(groups[[i]]) + 1,] = "ns"
                    rownames(groups[[i]]) = c(rownames(test[[i]]$groups), "LSD")
          }
}

groups
## [[1]]
##         Plant height
## FLOOD             94
## PARTIAL        93.15
## CONV           92.44
## LSD               ns
## 
## [[2]]
##         Spikelets
## FLOOD      263.11
## CONV       261.44
## PARTIAL    252.96
## LSD            ns
## 
## [[3]]
##         Spike length
## CONV           15.43
## PARTIAL        15.34
## FLOOD          14.79
## LSD               ns
## 
## [[4]]
##         Grain per spike
## CONV            51.56 a
## PARTIAL         45.33 b
## FLOOD           44.56 b
## LSD              3.0844
## 
## [[5]]
##         grain weight
## FLOOD          48.17
## CONV           47.17
## PARTIAL        47.06
## LSD               ns
## 
## [[6]]
##         Biological yield
## CONV               15.87
## PARTIAL             15.8
## FLOOD              15.59
## LSD                   ns
## 
## [[7]]
##         Grain yield
## PARTIAL      5.53 a
## FLOOD       5.42 ab
## CONV         5.23 b
## LSD          0.1929

Combining Results

Finally, we combine both lists (significant and non-significant) into a single data frame and print the ANOVA table.

First we combined all the elements of the groups list into a single data frame. The do.call(cbind, groups) part is applying the cbind function (column bind) to all elements of the groups list, effectively binding all the columns together. The result is then converted into a data frame and stored in groups.tab.

The we created a flexible table (for MS word document) from the groups.tab data frame. The rownames_to_column("Treatment") part is converting the row names of the data frame into a column named “Treatment”. The resulting table is stored in the table variable.

We finally made the header of the table bold. The bold = TRUE part is specifying that the text should be bold, and the part = "header" part is specifying that this formatting should be applied to the header of the table.

groups.tab = as.data.frame(do.call(cbind, groups))

table = flextable(data = groups.tab %>%
                            rownames_to_column("Treatment")) 

bold(table, bold = TRUE, part = "header")
TreatmentPlant heightSpikeletsSpike lengthGrain per spikegrain weightBiological yieldGrain yield
FLOOD94263.1115.4351.56 a48.1715.875.53 a
PARTIAL93.15261.4415.3445.33 b47.1715.85.42 ab
CONV92.44252.9614.7944.56 b47.0615.595.23 b
LSDnsnsns3.0844nsns0.1929

Printing Results for Second Factor Variable

To do the same for the second factor variable (priming) we need to just edit two arguments as shown below:

The trt argument in LSD.test() function used above, replace “water” with “priming”. Secondly replace “water” with “priming” in the command where we created list for groups inside the if conditional statement.

test <- list()
for(i in 1:ncol(data[-c(1:3)])) {
          
          test[[i]] = LSD.test(y = aov.model[[i]], 
                               trt = "Priming", 
                               DFerror = aov.model[[i]]$df.residual,
                               MSerror = deviance(aov.model[[i]])/aov.model[[i]]$df.residual,
                               group = T,
                               console = F)
}

groups <- list()
for(i in 1:ncol(data[-c(1:3)])) {
          if(anova(aov.model[[i]])[5]["Priming",] < 0.05) { 
                    groups[[i]] = test[[i]][["groups"]]
                    groups[[i]][,1] = round(as.numeric(groups[[i]][,1]), digits = 2)
                    groups[[i]][nrow(groups[[i]]) + 1,] = c(test[[i]]$statistics$LSD, "")
                    rownames(groups[[i]]) = c(rownames(test[[i]]$groups), "LSD")
                    groups[[i]][,1] = paste(groups[[i]][,1], groups[[i]][,2])
                    groups[[i]] = as.data.frame(groups[[i]])
                    groups[[i]]["LSD",] = round(as.numeric(groups[[i]]["LSD",]), digits = 4)
                    groups[[i]] = groups[[i]][1]
          }
          else{
                    groups[[i]] = test[[i]][["groups"]][1]
                    groups[[i]][,1] = round(as.numeric(groups[[i]][,1]), digits = 2)
                    groups[[i]][nrow(groups[[i]]) + 1,] = "ns"
                    rownames(groups[[i]]) = c(rownames(test[[i]]$groups), "LSD")
          }
}


groups.tab = as.data.frame(do.call(cbind, groups))

Printing Results for Interaction Term

Again we shall use the flextable function to print the MS word friendly table for the second factor variable.

table = flextable(data = groups.tab %>%
                            rownames_to_column("Treatment")) 

bold(table, bold = TRUE, part = "header")
TreatmentPlant heightSpikeletsSpike lengthGrain per spikegrain weightBiological yieldGrain yield
HP93.89295.04 a17.51 a65.56 a57.39 a18.19 a6.38 a
OP93.44265.93 b14.68 b43.89 b46.88 b15.39 b5.38 b
NP92.26216.56 c13.37 c32 c38.13 c13.68 c4.43 c
LSDns25.80681.0953.08443.63930.82060.1929

In similar way to print the interaction table for both the factor variables you can simply replace “Priming” with c(“Water”, “Priming”) in the trt argument in LSD.test() function. Also replace “Priming” with “Water:Priming” in the groups command as shown below:

test <- list()
for(i in 1:ncol(data[-c(1:3)])) {
          
          test[[i]] = LSD.test(y = aov.model[[i]], 
                               trt = c("Water", "Priming"), 
                               DFerror = aov.model[[i]]$df.residual,
                               MSerror = deviance(aov.model[[i]])/aov.model[[i]]$df.residual,
                               group = T,
                               console = F)
}

groups <- list()
for(i in 1:ncol(data[-c(1:3)])) {
          if(anova(aov.model[[i]])[5]["Water:Priming",] < 0.05) { 
                    groups[[i]] = test[[i]][["groups"]]
                    groups[[i]][,1] = round(as.numeric(groups[[i]][,1]), digits = 2)
                    groups[[i]][nrow(groups[[i]]) + 1,] = c(test[[i]]$statistics$LSD, "")
                    rownames(groups[[i]]) = c(rownames(test[[i]]$groups), "LSD")
                    groups[[i]][,1] = paste(groups[[i]][,1], groups[[i]][,2])
                    groups[[i]] = as.data.frame(groups[[i]])
                    groups[[i]]["LSD",] = round(as.numeric(groups[[i]]["LSD",]), digits = 4)
                    groups[[i]] = groups[[i]][1]
          }
          else{
                    groups[[i]] = test[[i]][["groups"]][1]
                    groups[[i]][,1] = round(as.numeric(groups[[i]][,1]), digits = 2)
                    groups[[i]][nrow(groups[[i]]) + 1,] = "ns"
                    rownames(groups[[i]]) = c(rownames(test[[i]]$groups), "LSD")
          }
}


groups.tab = as.data.frame(do.call(cbind, groups))

Again we shall print the results using the flextable function.

table = flextable(data = groups.tab %>%
                            rownames_to_column("Treatment")) 

bold(table, bold = TRUE, part = "header")
TreatmentPlant heightSpikeletsSpike lengthGrain per spikegrain weightBiological yieldGrain yield
FLOOD:HP96.3330718.5366 a58.4318.386.58
PARTIAL:OP95294.3317.165.67 a5818.16.36
FLOOD:NP93283.7816.965 a55.7318.096.19
PARTIAL:HP93277.6715.0256.33 b48.5815.525.44
CONV:OP92.67261.6714.640 c47.1315.385.41
FLOOD:OP92.67258.4414.4335.33 cd44.9315.275.28
CONV:HP92.33217.3314.1133.33 d38.5714.034.58
CONV:NP92.33216.6713.1732.33 d38.3313.724.47
PARTIAL:NP91.44215.6712.8330.33 d37.513.284.23
LSDnsnsns5.3424nsnsns

And there you have it! With this R code, you can generate publication-ready LSD test results in a matter of minutes. Just by changing the factor variable name, you can quickly generate the same output for a second variable. Happy coding!

👉 For more details and informative videos 📺, you can also subscribe to our YouTube Channel AGRON Info Tech.


Download R program and R studio — Click_here

Download data file — Click_here