Contents

### Introduction

In this blog post, we will go through the steps of quickly creating multiple bar plots with standard error and lettering using R. We will perform an Analysis of Variance (ANOVA) and a Least Significant Difference (LSD) test to understand the effects of different nitrogen (N) rates on rice. Finally, we will create visualizations to present the results. Let’s dive in!

### Load Necessary Libraries

We begin by loading the required libraries. These libraries will help us with data manipulation, statistical analysis, and visualization.

**library**(ggplot2)
**library**(dplyr)
**library**(tidyr)
**library**(agricolae)
**library**(reshape2)
**library**(readxl)

### Import the Data Set

Next, we import the dataset from an Excel file named “data_rice.xlsx”. We display the first few rows and the column names to understand the structure of the data.

```
df <- read_excel(path = "data_rice.xlsx", col_names = TRUE)
head(df)
```

## # A tibble: 6 × 5 ## rep Nrate Heading Maturity Yield ## <dbl> <chr> <dbl> <dbl> <dbl> ## 1 1 Control (No Urea) 54.2 26.5 2.88 ## 2 2 Control (No Urea) 78.3 27.0 3.03 ## 3 3 Control (No Urea) 72.7 26.7 2.76 ## 4 1 Urea 25 kg/ha 72.5 36.6 3.48 ## 5 2 Urea 25 kg/ha 84.3 32.1 3.17 ## 6 3 Urea 25 kg/ha 75.3 34.8 2.93

`names(df)`

## [1] "rep" "Nrate" "Heading" "Maturity" "Yield"

### Convert Variables to Factors

We convert the columns rep and Nrate to factors because they represent categorical variables in our analysis.

```
df$rep <- as.factor(df$rep)
df$Nrate <- as.factor(df$Nrate)
str(df)
```

## tibble [15 × 5] (S3: tbl_df/tbl/data.frame) ## $ rep : Factor w/ 3 levels "1","2","3": 1 2 3 1 2 3 1 2 3 1 ... ## $ Nrate : Factor w/ 5 levels "Control (No Urea)",..: 1 1 1 3 3 3 4 4 4 5 ... ## $ Heading : num [1:15] 54.2 78.3 72.7 72.5 84.3 ... ## $ Maturity: num [1:15] 26.5 27 26.7 36.6 32.1 ... ## $ Yield : num [1:15] 2.88 3.03 2.76 3.48 3.17 ...

### Define Response Variables

We define response_vars as all columns from the third to the last column in the dataset. These columns represent different growth and yield characteristics.

`response_vars <- colnames(df)[3:ncol(df)]`

### Analysis of Variance (ANOVA)

We are performing ANOVA (Analysis of Variance) for multiple response variables stored in a list called ** response_vars**. We first initialize an empty list called

**to store the ANOVA results for each response variable. The code then iterates over each response variable in**

`anova_result`

**. If a response variable contains a space in its name, it is wrapped in backticks to ensure it is correctly interpreted within the formula.**

`response_vars`

A formula is then dynamically constructed for each response variable, where the response variable is modeled as a function of two factors: “rep” and “Nrate”. This formula is passed to the ** aov** function, which performs the ANOVA using the data frame

**. The resulting ANOVA object is stored in the**

`df`

**list, with the response variable name as the key. For each response variable, the code prints a message indicating that ANOVA is being performed, followed by the summary of the ANOVA results. This process allows us to systematically conduct and review ANOVA for multiple response variables in a dataset.**

`anova_result`

```
anova_result <- list()
```**for** (i **in** response_vars) {
**if**(grepl(" ", i)) {
i <- paste0("`", i, "`")
}
formula <- as.formula(paste(i, "~", "rep", "+", "Nrate"))
anova_result[[i]] <- aov(formula, data = df)
print(paste("ANOVA for", i))
print(summary(anova_result[[i]]))
}

## [1] "ANOVA for Heading" ## Df Sum Sq Mean Sq F value Pr(>F) ## rep 2 345.8 172.88 3.147 0.0981 . ## Nrate 4 1211.6 302.90 5.514 0.0198 * ## Residuals 8 439.5 54.94 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## [1] "ANOVA for Maturity" ## Df Sum Sq Mean Sq F value Pr(>F) ## rep 2 26.2 13.08 1.456 0.28879 ## Nrate 4 507.3 126.82 14.126 0.00107 ** ## Residuals 8 71.8 8.98 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## [1] "ANOVA for Yield" ## Df Sum Sq Mean Sq F value Pr(>F) ## rep 2 0.160 0.0802 0.377 0.69737 ## Nrate 4 7.487 1.8716 8.801 0.00501 ** ## Residuals 8 1.701 0.2127 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

### Least Significant Difference (LSD) Test

Then we conducted post-hoc analysis using the Least Significant Difference (LSD) test for multiple response variables stored in ** response_vars**. We first initialize an empty list called

**to store the LSD test results for each response variable. The code iterates over each response variable in**

`lsd_result`

**, and if a variable’s name contains a space, it is enclosed in backticks to ensure it is correctly processed. For each response variable, the LSD test is performed using the corresponding ANOVA result from the previously stored**

`response_vars`

**list, focusing on the treatment factor “Nrate” with a significance level (alpha) of 0.05. The test results, including means and groups, are then merged into a single data frame.**

`anova_result`

The standard error (SE) for each response variable is calculated by dividing the standard deviation by the square root of the number of replicates. The resulting data frame is stored in the ** lsd_result** list, with each data frame’s column containing the response variable values, means, groups, and standard errors. The column name for the response variable is standardized to “response”. Finally, the results for each response variable are printed. This process ensures a detailed post-hoc analysis following ANOVA, allowing for a clear comparison of treatment means.

```
lsd_result <- list()
```**for** (i **in** response_vars) {
**if**(grepl(" ", i)) {
i <- paste0("`", i, "`")
}
lsd_result[[i]] <- LSD.test(anova_result[[i]], trt = "Nrate", alpha = 0.05)
lsd_result[[i]] <- merge(x = lsd_result[[i]]$means[1:3], y = lsd_result[[i]]$groups[2], by.x = "row.names", by.y = "row.names", all.x = FALSE)
lsd_result[[i]]$SE <- lsd_result[[i]]$std / sqrt(lsd_result[[i]]$r)
lsd_result[[i]] <- data.frame(lsd_result[[i]])
colnames(lsd_result[[i]])[2] <- "response"
print(lsd_result[[i]])
}

## Row.names response std r groups SE ## 1 Control (No Urea) 68.44509 12.609047 3 c 7.279837 ## 2 Urea 100 kg/ha 87.97074 5.763612 3 ab 3.327623 ## 3 Urea 25 kg/ha 77.39595 6.182959 3 bc 3.569733 ## 4 Urea 50 kg/ha 80.05120 4.757606 3 bc 2.746805 ## 5 Urea 75 kg/ha 94.64645 11.813072 3 a 6.820281 ## Row.names response std r groups SE ## 1 Control (No Urea) 26.74720 0.2452797 3 d 0.1416123 ## 2 Urea 100 kg/ha 39.30555 3.5689391 3 ab 2.0605280 ## 3 Urea 25 kg/ha 34.49546 2.2648864 3 bc 1.3076328 ## 4 Urea 50 kg/ha 33.48755 5.2644154 3 c 3.0394116 ## 5 Urea 75 kg/ha 44.04475 1.8292321 3 a 1.0561076 ## Row.names response std r groups SE ## 1 Control (No Urea) 2.890565 0.1376631 3 b 0.07947983 ## 2 Urea 100 kg/ha 2.710516 0.3322514 3 b 0.19182546 ## 3 Urea 25 kg/ha 3.191620 0.2721221 3 b 0.15710978 ## 4 Urea 50 kg/ha 2.861858 0.7981537 3 b 0.46081427 ## 5 Urea 75 kg/ha 4.636202 0.3007376 3 a 0.17363091

### Prepare Data for Plotting

Next, we are combining the results of the LSD tests for multiple response variables into a single data frame for further analysis or visualization. First, we use `do.call()`

to merge all individual data frames stored in `lsd_result`

into one large data frame called `combined_df`

. We then add a new column, `variables`

, to `combined_df`

to store the row names, which represent the response variable names. The original row names are subsequently removed using `rownames(combined_df) <- NULL`

. The first column of `combined_df`

is renamed to “Nrate” to reflect the treatment levels.

Next, we clean up the `variables`

column by removing any characters following a period (.) using `sub()`

function, which simplifies the variable names. Additionally, backticks are removed from the variable names using `gsub()`

function.

Finally, the `Nrate`

column is converted to a factor with specific levels to ensure that the treatment levels are ordered meaningfully: “Control (No Urea)”, “Urea 25 kg/ha”, “Urea 50 kg/ha”, “Urea 75 kg/ha”, and “Urea 100 kg/ha”. This transformation facilitates easier plotting and interpretation of the results. The resulting `combined_df`

contains the LSD test results for all response variables, with standardized treatment levels and cleaned variable names, ready for further analysis.

```
combined_df <- do.call(rbind, lsd_result)
combined_df$variables <- rownames(combined_df)
rownames(combined_df) <- NULL
colnames(combined_df)[1] <- "Nrate"
combined_df$variables <- sub("\\..*", "", combined_df$variables)
combined_df$variables <- gsub("`", "", combined_df$variables)
combined_df$Nrate <- factor(combined_df$Nrate, levels = c(
"Control (No Urea)",
"Urea 25 kg/ha",
"Urea 50 kg/ha",
"Urea 75 kg/ha",
"Urea 100 kg/ha"
))
```

### Create the Bar Plot

Next, we created a detailed and customized bar plot using the ** ggplot2** package to visualize the results of the LSD tests for different response variables. The code plots the

**data frame, specifically the rows where**

`combined_df`

**match the names in**

`variables`

**.**

`response_vars`

**Setting up the plot**: Thefunction initializes the plot with the data filtered to include only the relevant response variables. The`ggplot`

function sets up the aesthetic mappings, specifying`aes`

on the x-axis,`Nrate`

on the y-axis, and filling the bars based on`response`

.`Nrate`

**Adding bars**: Thefunction creates bar plots with the specified aesthetics, drawing bars with black borders and setting their position to dodge each other for clarity. The width of the bars is set to 0.9.`geom_bar`

**Adding error bars**: Thefunction adds error bars to each bar, representing the standard error (`geom_errorbar`

). The error bars are positioned to align with the bars and have a width of 0.10.`SE`

**Adding text labels**: Thefunction adds text labels above each bar, displaying the group labels. The labels are positioned slightly above the error bars for better visibility.`geom_text`

**Customizing labels and theme**: Thefunction sets empty titles and labels for the x-axis and fill legend, while the y-axis label is set to “Growth and yield characteristics”. The`labs`

function customizes the axis text to rotate the x-axis labels by 45 degrees for better readability and sets other aesthetic elements like legend position, strip placement, and panel spacing.`theme`

**Faceting**: Thefunction splits the plot into multiple panels, one for each response variable. Each panel has its own y-axis scale and is arranged in a single column.`facet_wrap`

**Further customization**: The plot’s theme is further customized to remove major and minor grid lines, set the panel background to blank, and ensure the axis lines are clearly visible. Thefunction ensures the y-axis scales appropriately with a slight expansion for better spacing.`scale_y_continuous`

**Customizing fill colors**: Thefunction applies a specific color palette from the`scale_fill_manual`

package, using the “Journal of Clinical Oncology” (JCO) color palette for the bars.`ggsci`

This comprehensive code creates a clear, well-labeled, and aesthetically pleasing visualization of the LSD test results, making it easy to compare the effects of different ** Nrate** treatments on various response variables.

`plot <- ggplot(combined_df[combined_df$variables %`**in**% response_vars, ],
aes(x = factor(Nrate),
y = response,
fill = factor(Nrate))) +
geom_bar(stat = "identity",
color = "black",
position = position_dodge(width = 0.5),
width = 0.9) +
geom_errorbar(aes(ymin = response - SE,
ymax = response + SE),
position = position_dodge(width = 0.9),
width = 0.10) +
geom_text(aes(x = Nrate,
y = response + SE,
label = as.matrix(groups)),
position = position_dodge(width = 0.9),
vjust = -0.5,
hjust = 0.5) +
labs(title = "",
x = "",
y = "Growth and yield characteristics",
fill = "") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
facet_wrap(facets = ~variables,
ncol = 1,
scales = "free_y",
**switch** = "y") +
theme(legend.position = "right",
strip.placement = "outside",
strip.background = element_blank(),
panel.spacing = unit(1, "lines")) +
scale_y_continuous(expand = expansion(mult = c(0.2, 0.2))) +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.background = element_blank(),
axis.line.y = element_line(color = "black", size = 0.5),
axis.line.x = element_line(color = "black", size = 0.5)) +
scale_fill_manual(values = ggsci::pal_jco()(5))
plot

### Customize Labels and Update Plot

Then we defined a custom labeller using the ** as_labeller** function from the

**package to create more descriptive and formatted facet labels for our plot. The custom labeller is created as a named vector that maps the original variable names to more detailed and readable labels. Specifically, the**

`ggplot2`

**object associates:**

`custom_labeller`

- “Heading” with the label “Heading~(days)”
- “Maturity” with the label “Maturity~(days)”
- “Yield” with the label “Yield~(t~ha^{-1})”

The ** label_parsed** argument is used to ensure that the labels are parsed as expressions, allowing the inclusion of mathematical notation and formatting in the labels. For example,

**represents tons per hectare with appropriate superscripting. These custom labels enhance the readability and presentation quality of the facet labels in the ggplot, making the plot more informative and visually appealing.**

`t~ha^{-1}`

```
custom_labeller <- as_labeller(c(
Heading = "Heading~(days)",
Maturity = "Maturity~(days)",
Yield = "Yield~(t~ha^{-1})"
), label_parsed)
```

Finally, we customize the facet labels for better readability and update the plot accordingly. We updated the labels in facet_wrap() function to use the above modified labels. We display the updated plot using the below code.

```
plot + facet_wrap(facets = ~variables,
ncol = 1,
scales = "free_y",
```**switch** = "y",
labeller = custom_labeller)

👉 For more details and informative videos 📺, you can also subscribe to our YouTube Channel AGRON Info Tech.

Download R program and R studio —

`Click_here`

Download data file —

`Click_here`