Contents
Introduction
In this blog post, we will explore an improved method for generating publication-ready ANOVA tables in R. Conducting analysis of variance (ANOVA) is a common statistical technique used to compare means across multiple groups or treatments. The ANOVA table provides valuable information about the sources of variation and the significance of each factor in the analysis.
Setup and Data Import
To begin, we need to load the necessary libraries for our analysis. We will be using the readxl
, dplyr
, tibble
, and flextable
packages. The readxl
package allows us to import data from Excel files, while dplyr
and tibble
provide helpful functions for data manipulation. Finally, the flextable
package will help us generate the publication-ready ANOVA table.
# Library
library(readxl)
library(dplyr)
library(tibble)
library(flextable)
Next, we will import the data for our analysis. Assuming the data is stored in an Excel file named “Data.xlsx”, we can use the read_excel()
function from the readxl
package to import the data into R. We set the col_names
parameter to TRUE
to indicate that the first row of the Excel file contains column names.
# Importing data
data <- read_excel(path = "Data.xlsx", col_names = TRUE)
Data Preparation
Before conducting the ANOVA analysis, it is important to ensure that the categorical variables are correctly identified as factors in R. In our data, the variables “Rep”, “Water”, and “Priming” are categorical. We can use the as.factor()
function to convert these variables into factors.
# Convert categorical variables to factor variables
data$Rep <- as.factor(data$Rep)
data$Water <- as.factor(data$Water)
data$Priming <- as.factor(data$Priming)
Analysis of Variance
Now we are ready to perform the ANOVA analysis. We will use a for loop to iterate over each response variable in our data set. In each iteration, we will fit an ANOVA model using the aov()
function and store the model in a list.
# Analysis of variance
aov.model <- list()
for(i in 1:ncol(data[-c(1:3)])) {
cols <- names(data)[4:ncol(data)]
aov.model <- lapply(X = cols, FUN = function(x)
aov(reformulate(termlabels = "Rep + Water*Priming",
response = x),
data = data))
}
Next, we will create another list to store the ANOVA tables for each response variable. We will iterate over the ANOVA models and use the anova()
function to extract the relevant information from each model. We will round the values to three decimal places, assign appropriate column names, add asterisks to indicate significance levels, and merge the mean squares (MS) and asterisk columns. Finally, we will remove unnecessary columns and update the column names.
# Creating list of ANOVA tables
aov.anova <- list()
for(i in 1:ncol(data[-c(1:3)])) {
aov.anova[[i]] = do.call(cbind, anova(aov.model[[i]]))
rownames(aov.anova[[i]]) = rownames(anova(aov.model[[i]]))
aov.anova[[i]] = as.data.frame(round(aov.anova[[i]][,c(1,3,5)], digits = 3))
# Setting column names
colnames(aov.anova[[i]]) = c("DF", "MS", "P-value")
# Assign astericks according to p values
aov.anova[[i]]$sign[aov.anova[[i]][3] < 0.05] <- "*"
aov.anova[[i]]$sign[aov.anova[[i]][3] < 0.01] <- "**"
aov.anova[[i]]$sign[aov.anova[[i]][3] > 0.05] <- "ns"
# Replace NAs with blank
aov.anova[[i]][is.na(aov.anova[[i]])] <- " "
# Merge MS and Sign column together
aov.anova[[i]]$MS_comb = paste(aov.anova[[i]][,2], aov.anova[[i]][,4])
# Removing p-value and F-value columns
aov.anova[[i]] = aov.anova[[i]][,-c(2:4)]
colnames(aov.anova[[i]])[2] <- cols[i]
}
Generating the ANOVA Table
Now we will combine all the ANOVA tables from the list into a single data frame. We will remove any duplicate columns for degrees of freedom (DF) and create the final ANOVA table.
# Combine all ANOVA tables into a single data frame
aov.table <- as.data.frame(do.call(cbind, aov.anova))
# Remove duplicate columns for DF
dup.cols <- which(duplicated(names(aov.table)))
aov.table <- aov.table[, -dup.cols]
# Creating the ANOVA table using flextable
table <- flextable(data = aov.table %>%
rownames_to_column("SOV"))
# Formatting the table header
bold(table, bold = TRUE, part = "header")
SOV | DF | Plant height | Spikelets | Spike length | Grain per spike | grain weight | Biological yield | Grain yield |
Rep | 2 | 0.449 ns | 7947.473 ** | 8.709 ** | 24.412 ns | 79.416 ** | 1.927 ns | 0.265 ** |
Water | 2 | 29.93 ns | 266.547 ns | 1.095 ns | 21.115 ns | 3.372 ns | 0.199 ns | 0.196 * |
Priming | 2 | 0.498 ns | 14166.35 ** | 40.299 ** | 1374.041 ** | 836.568 ** | 46.744 ** | 8.577 ** |
Water:Priming | 4 | 15.85 ns | 229.535 ns | 1.437 ns | 30.412 ns | 7.004 ns | 0.177 ns | 0.02 ns |
Residuals | 16 | 15.499 | 709.575 | 1.273 | 42.05 | 5.229 | 0.703 | 0.034 |
Conclusion
In this blog post, we have demonstrated an improved method for generating publication-ready ANOVA tables in R. By automating the process using a for loop and leveraging the flexibility of the flextable
package, we can efficiently generate ANOVA tables with detailed information, including degrees of freedom, mean squares, and p-values.
This method not only saves time but also ensures accuracy in reporting ANOVA results. The resulting ANOVA table can be easily exported to various formats, such as Word, PDF, or HTML, allowing seamless integration into research reports and publications.
We hope you find this method useful in your statistical analyses. Happy researching!
Please note that the above R Markdown code assumes that you have the necessary packages installed and the data file named “Data.xlsx” in your working directory. You may need to modify the code accordingly to match your specific setup.
Downlaod data file — Click_here |
Download R program — Click_here |
Download R studio — Click_here |