In this blog post, we’ll walk through how to generate publication-ready LSD test results using R. This method is quick, easy, and adaptable for multiple variables. Let’s dive in!
Contents
Required Packages
First, we need to load the necessary packages. These include readxl
for reading Excel files, dplyr
and tibble
for data manipulation, flextable
for creating flexible table layout, and agricolae
for statistical analysis.
library(readxl)
library(dplyr)
library(tibble)
library(flextable)
library(agricolae)
Importing Data
Next, we import the data from an Excel file. Make sure to replace "Data.xlsx"
with the path to your own data file.
data = read_excel(
path = "Data.xlsx",
col_names = TRUE
)
head(data)
## # A tibble: 6 × 10 ## Rep Water Priming `Plant height` Spikelets `Spike length` `Grain per spike` ## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <dbl> ## 1 1 CONV HP 93 293 15.3 59 ## 2 2 CONV HP 99 261 13.2 54 ## 3 3 CONV HP 85 231 15.2 56 ## 4 1 CONV NP 90 204 12 35 ## 5 2 CONV NP 95 252 13 32 ## 6 3 CONV NP 92 191 14.5 33 ## # ℹ 3 more variables: `grain weight` <dbl>, `Biological yield` <dbl>, ## # `Grain yield` <dbl>
Preparing Data
We then convert categorical variables to factor variables. In this example, we’re converting the Water
and Priming
variables.
data$Water = as.factor(data$Water)
data$Priming = as.factor(data$Priming)
str(data)
## tibble [27 × 10] (S3: tbl_df/tbl/data.frame) ## $ Rep : num [1:27] 1 2 3 1 2 3 1 2 3 1 ... ## $ Water : Factor w/ 3 levels "CONV","FLOOD",..: 1 1 1 1 1 1 1 1 1 2 ... ## $ Priming : Factor w/ 3 levels "HP","NP","OP": 1 1 1 2 2 2 3 3 3 1 ... ## $ Plant height : num [1:27] 93 99 85 90 95 92 95 98 85 96 ... ## $ Spikelets : num [1:27] 293 261 231 204 252 191 322 287 312 304 ... ## $ Spike length : num [1:27] 15.3 13.2 15.2 12 13 ... ## $ Grain per spike : num [1:27] 59 54 56 35 32 33 64 62 69 38 ... ## $ grain weight : num [1:27] 46 46.3 42.5 37 40.5 38.2 56.5 59.3 58.2 49.8 ... ## $ Biological yield: num [1:27] 16.5 14.2 15.9 13.3 13.6 ... ## $ Grain yield : num [1:27] 5.3 5.05 5.5 4.2 4.03 4.45 5.85 6.15 6.58 5.3 ...
Analysis of Variance
We perform an analysis of variance (ANOVA) on our data. This is done for each column in our data frame, excluding the first three columns because the first variable is replication and second and third variables are factor variables.
First we created an empty list named aov.model
. This list will be used to store the results of the ANOVA for each variable in the dataset. Then we used a for
loop that iterates over each column in the dataset, excluding the first three columns. Inside the loop, we created a vector cols
that contains the names of all response variables in the dataset.
Then we used the lapply
function to apply a function to each element in the cols
vector. The function being applied is specified in the following lines. The function that is going to apply on each response variable was sepcified in the argument termlabesl
within aov()
function.
It performs an ANOVA using the aov
function. The reformulate
function is used to create a formula for the ANOVA. The formula specifies that the response variable (the variable we’re interested in) is x
(the current element in cols
), and the predictor variables (the variables we think might affect the response variable) are Rep
, Water
, and Priming
. The data = data
argument specifies that the data for the ANOVA comes from the data
object we created earlier.
The results of each ANOVA are stored in the aov.model
list.
aov.model <- list()
for(i in 1:ncol(data[-c(1:3)])) {
cols <- names(data)[4:ncol(data)]
aov.model <- lapply(X = cols, FUN = function(x)
aov(reformulate(termlabels = "Rep + Water*Priming",
response = x),
data = data))
}
LSD Mean Comparison Test
Then we performed a Least Significant Difference (LSD) test on the results of the previously conducted Analysis of Variance (ANOVA). First we created an empty list named test
to store the results of the LSD test.
We again used a for
loop that iterates over each response variable in the dataset. Inside the loop, we performed an LSD test on the i
th ANOVA model stored in aov.model
. The LSD test is used to compare the means of groups in the ANOVA to find out if they are significantly different from each other. The y
argument is the response variable (the ANOVA model), trt
is the treatment variable (in this case, “Water”), DFerror
is the degrees of freedom of the residuals, MSerror
is the mean square error, group
is a logical argument indicating whether to calculate the groups or not, and console
is a logical argument indicating whether to print the results to the console or not.
The results of each LSD test are stored in the test
list.
test <- list()
for(i in 1:ncol(data[-c(1:3)])) {
test[[i]] = LSD.test(y = aov.model[[i]],
trt = "Water",
DFerror = aov.model[[i]]$df.residual,
MSerror = deviance(aov.model[[i]])/aov.model[[i]]$df.residual,
group = T,
console = F)
}
Separating Groups and Adding LSD Values
We separate the groups and add LSD values. If the ANOVA result for the Water
(factor A) variable is significant (p < 0.05), we round the values to 2 decimal places and add the LSD value. If the result is not significant, we add “ns” instead of the LSD value.
We first created an empty list named groups
. This list will be used to store the results of the groupings for each variable in the dataset. Then again we used a for
loop that iterates over each column in the dataset. Inside if()
command we ensured to check if the p-value of the “Water” factor in the ANOVA model for the i
th column is less than 0.05. If it is, it means that the “Water” factor has a significant effect on the i
th variable.
The it creates a new row in the groups
list for the i
th variable, rounds the values to two decimal places, adds the LSD value from the test
list, converts the list to a data frame, rounds the “LSD” row to four decimal places, and finally keeps only the first column of the data frame.
If the p-value is not less than 0.05, the else
statement is executed. This code creates a new row in the groups
list for the i
th variable, rounds the values to two decimal places, and adds “ns” (not significant) to the end of the list.
groups <- list()
for(i in 1:ncol(data[-c(1:3)])) {
if(anova(aov.model[[i]])[5]["Water",] < 0.05) {
groups[[i]] = test[[i]][["groups"]]
groups[[i]][,1] = round(as.numeric(groups[[i]][,1]), digits = 2)
groups[[i]][nrow(groups[[i]]) + 1,] = c(test[[i]]$statistics$LSD, "")
rownames(groups[[i]]) = c(rownames(test[[i]]$groups), "LSD")
groups[[i]][,1] = paste(groups[[i]][,1], groups[[i]][,2])
groups[[i]] = as.data.frame(groups[[i]])
groups[[i]]["LSD",] = round(as.numeric(groups[[i]]["LSD",]), digits = 4)
groups[[i]] = groups[[i]][1]
}
else{
groups[[i]] = test[[i]][["groups"]][1]
groups[[i]][,1] = round(as.numeric(groups[[i]][,1]), digits = 2)
groups[[i]][nrow(groups[[i]]) + 1,] = "ns"
rownames(groups[[i]]) = c(rownames(test[[i]]$groups), "LSD")
}
}
groups
## [[1]] ## Plant height ## FLOOD 94 ## PARTIAL 93.15 ## CONV 92.44 ## LSD ns ## ## [[2]] ## Spikelets ## FLOOD 263.11 ## CONV 261.44 ## PARTIAL 252.96 ## LSD ns ## ## [[3]] ## Spike length ## CONV 15.43 ## PARTIAL 15.34 ## FLOOD 14.79 ## LSD ns ## ## [[4]] ## Grain per spike ## CONV 51.56 a ## PARTIAL 45.33 b ## FLOOD 44.56 b ## LSD 3.0844 ## ## [[5]] ## grain weight ## FLOOD 48.17 ## CONV 47.17 ## PARTIAL 47.06 ## LSD ns ## ## [[6]] ## Biological yield ## CONV 15.87 ## PARTIAL 15.8 ## FLOOD 15.59 ## LSD ns ## ## [[7]] ## Grain yield ## PARTIAL 5.53 a ## FLOOD 5.42 ab ## CONV 5.23 b ## LSD 0.1929
Combining Results
Finally, we combine both lists (significant and non-significant) into a single data frame and print the ANOVA table.
First we combined all the elements of the groups
list into a single data frame. The do.call(cbind, groups)
part is applying the cbind
function (column bind) to all elements of the groups
list, effectively binding all the columns together. The result is then converted into a data frame and stored in groups.tab
.
The we created a flexible table (for MS word document) from the groups.tab
data frame. The rownames_to_column("Treatment")
part is converting the row names of the data frame into a column named “Treatment”. The resulting table is stored in the table
variable.
We finally made the header of the table
bold. The bold = TRUE
part is specifying that the text should be bold, and the part = "header"
part is specifying that this formatting should be applied to the header of the table.
groups.tab = as.data.frame(do.call(cbind, groups))
table = flextable(data = groups.tab %>%
rownames_to_column("Treatment"))
bold(table, bold = TRUE, part = "header")
Treatment | Plant height | Spikelets | Spike length | Grain per spike | grain weight | Biological yield | Grain yield |
---|---|---|---|---|---|---|---|
FLOOD | 94 | 263.11 | 15.43 | 51.56 a | 48.17 | 15.87 | 5.53 a |
PARTIAL | 93.15 | 261.44 | 15.34 | 45.33 b | 47.17 | 15.8 | 5.42 ab |
CONV | 92.44 | 252.96 | 14.79 | 44.56 b | 47.06 | 15.59 | 5.23 b |
LSD | ns | ns | ns | 3.0844 | ns | ns | 0.1929 |
Printing Results for Second Factor Variable
To do the same for the second factor variable (priming) we need to just edit two arguments as shown below:
The trt
argument in LSD.test()
function used above, replace “water” with “priming”. Secondly replace “water” with “priming” in the command where we created list for groups inside the if
conditional statement.
test <- list()
for(i in 1:ncol(data[-c(1:3)])) {
test[[i]] = LSD.test(y = aov.model[[i]],
trt = "Priming",
DFerror = aov.model[[i]]$df.residual,
MSerror = deviance(aov.model[[i]])/aov.model[[i]]$df.residual,
group = T,
console = F)
}
groups <- list()
for(i in 1:ncol(data[-c(1:3)])) {
if(anova(aov.model[[i]])[5]["Priming",] < 0.05) {
groups[[i]] = test[[i]][["groups"]]
groups[[i]][,1] = round(as.numeric(groups[[i]][,1]), digits = 2)
groups[[i]][nrow(groups[[i]]) + 1,] = c(test[[i]]$statistics$LSD, "")
rownames(groups[[i]]) = c(rownames(test[[i]]$groups), "LSD")
groups[[i]][,1] = paste(groups[[i]][,1], groups[[i]][,2])
groups[[i]] = as.data.frame(groups[[i]])
groups[[i]]["LSD",] = round(as.numeric(groups[[i]]["LSD",]), digits = 4)
groups[[i]] = groups[[i]][1]
}
else{
groups[[i]] = test[[i]][["groups"]][1]
groups[[i]][,1] = round(as.numeric(groups[[i]][,1]), digits = 2)
groups[[i]][nrow(groups[[i]]) + 1,] = "ns"
rownames(groups[[i]]) = c(rownames(test[[i]]$groups), "LSD")
}
}
groups.tab = as.data.frame(do.call(cbind, groups))
Printing Results for Interaction Term
Again we shall use the flextable function to print the MS word friendly table for the second factor variable.
table = flextable(data = groups.tab %>%
rownames_to_column("Treatment"))
bold(table, bold = TRUE, part = "header")
Treatment | Plant height | Spikelets | Spike length | Grain per spike | grain weight | Biological yield | Grain yield |
---|---|---|---|---|---|---|---|
HP | 93.89 | 295.04 a | 17.51 a | 65.56 a | 57.39 a | 18.19 a | 6.38 a |
OP | 93.44 | 265.93 b | 14.68 b | 43.89 b | 46.88 b | 15.39 b | 5.38 b |
NP | 92.26 | 216.56 c | 13.37 c | 32 c | 38.13 c | 13.68 c | 4.43 c |
LSD | ns | 25.8068 | 1.095 | 3.0844 | 3.6393 | 0.8206 | 0.1929 |
In similar way to print the interaction table for both the factor variables you can simply replace “Priming” with c(“Water”, “Priming”) in the trt
argument in LSD.test()
function. Also replace “Priming” with “Water:Priming” in the groups command as shown below:
test <- list()
for(i in 1:ncol(data[-c(1:3)])) {
test[[i]] = LSD.test(y = aov.model[[i]],
trt = c("Water", "Priming"),
DFerror = aov.model[[i]]$df.residual,
MSerror = deviance(aov.model[[i]])/aov.model[[i]]$df.residual,
group = T,
console = F)
}
groups <- list()
for(i in 1:ncol(data[-c(1:3)])) {
if(anova(aov.model[[i]])[5]["Water:Priming",] < 0.05) {
groups[[i]] = test[[i]][["groups"]]
groups[[i]][,1] = round(as.numeric(groups[[i]][,1]), digits = 2)
groups[[i]][nrow(groups[[i]]) + 1,] = c(test[[i]]$statistics$LSD, "")
rownames(groups[[i]]) = c(rownames(test[[i]]$groups), "LSD")
groups[[i]][,1] = paste(groups[[i]][,1], groups[[i]][,2])
groups[[i]] = as.data.frame(groups[[i]])
groups[[i]]["LSD",] = round(as.numeric(groups[[i]]["LSD",]), digits = 4)
groups[[i]] = groups[[i]][1]
}
else{
groups[[i]] = test[[i]][["groups"]][1]
groups[[i]][,1] = round(as.numeric(groups[[i]][,1]), digits = 2)
groups[[i]][nrow(groups[[i]]) + 1,] = "ns"
rownames(groups[[i]]) = c(rownames(test[[i]]$groups), "LSD")
}
}
groups.tab = as.data.frame(do.call(cbind, groups))
Again we shall print the results using the flextable
function.
table = flextable(data = groups.tab %>%
rownames_to_column("Treatment"))
bold(table, bold = TRUE, part = "header")
Treatment | Plant height | Spikelets | Spike length | Grain per spike | grain weight | Biological yield | Grain yield |
---|---|---|---|---|---|---|---|
FLOOD:HP | 96.33 | 307 | 18.53 | 66 a | 58.43 | 18.38 | 6.58 |
PARTIAL:OP | 95 | 294.33 | 17.1 | 65.67 a | 58 | 18.1 | 6.36 |
FLOOD:NP | 93 | 283.78 | 16.9 | 65 a | 55.73 | 18.09 | 6.19 |
PARTIAL:HP | 93 | 277.67 | 15.02 | 56.33 b | 48.58 | 15.52 | 5.44 |
CONV:OP | 92.67 | 261.67 | 14.6 | 40 c | 47.13 | 15.38 | 5.41 |
FLOOD:OP | 92.67 | 258.44 | 14.43 | 35.33 cd | 44.93 | 15.27 | 5.28 |
CONV:HP | 92.33 | 217.33 | 14.11 | 33.33 d | 38.57 | 14.03 | 4.58 |
CONV:NP | 92.33 | 216.67 | 13.17 | 32.33 d | 38.33 | 13.72 | 4.47 |
PARTIAL:NP | 91.44 | 215.67 | 12.83 | 30.33 d | 37.5 | 13.28 | 4.23 |
LSD | ns | ns | ns | 5.3424 | ns | ns | ns |
And there you have it! With this R code, you can generate publication-ready LSD test results in a matter of minutes. Just by changing the factor variable name, you can quickly generate the same output for a second variable. Happy coding!
👉 For more details and informative videos 📺, you can also subscribe to our YouTube Channel AGRON Info Tech.
Download R program and R studio —
Click_here
Download data file —
Click_here