Contents

### What is a scatterplot?

A scatterplot is a type of graph used to display the relationship between two continuous variables. It is called a scatterplot because it displays the individual data points as scattered dots on the graph.

Each dot in a scatterplot represents a single observation or piece of data. One variable is represented by the horizontal axis, and the other is represented by the vertical axis. Each dotâ€™s location on the graph indicates how much each variable was worth for that particular observation. In this blog post we shall create an elegant styled scatter plot with regression equation in R programming language.

### Loading iris data

The **iris** dataset is a commonly used dataset in machine learning and data analysis. It contains measurements of the sepal length, sepal width, petal length, and petal width for three different species of iris flowers (setosa, versicolor, and virginica), with 50 samples of each species.

In R, the iris dataset is included in the base installation, so we can load it directly without any additional packages. Hereâ€™s how to load and explore the iris dataset in R. We used `head()`

function to print the first six rows of the dataset.

```
data("iris")
head(iris)
```

# Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 5.1 3.5 1.4 0.2 setosa # 2 4.9 3.0 1.4 0.2 setosa # 3 4.7 3.2 1.3 0.2 setosa # 4 4.6 3.1 1.5 0.2 setosa # 5 5.0 3.6 1.4 0.2 setosa # 6 5.4 3.9 1.7 0.4 setosa

### Correlation analysis

The `cor.test()`

function in R from *stats* package which is used to test for the correlation between two variables in a dataset. It calculates the correlation coefficient (r), the p-value, and the confidence interval for the correlation. We shall compute these values to test the correlation between sepal length and Petal length in the iris dataset. The output will include the correlation coefficient, the p-value, and the confidence interval for the correlation.

```
res <- cor.test(iris$Sepal.Length,
iris$Petal.Length,
method = 'pearson')
res
```

# # Pearson's product-moment correlation # # data: iris$Sepal.Length and iris$Petal.Length # t = 21.646, df = 148, p-value < 2.2e-16 # alternative hypothesis: true correlation is not equal to 0 # 95 percent confidence interval: # 0.8270363 0.9055080 # sample estimates: # cor # 0.8717538

### Creating scatterplot

In addition to `ggplot2`

, `smplot2`

is a R package for statistical data visualisation. This package is an example of what I wish I had when I first started learning R. It seeks to simplify each phase of data visualisation. We shall enhance the visualization style using smplot2 package in addition to ggplot2 package.

#### Creating ggplot object

The `ggplot()`

function is the main function of the ggplot2 package in R, used to create a new ggplot object. The ggplot object is a blank canvas that we can layer different graphical elements onto to create a visualization.

```
library(ggplot2)
library(smplot2)
plot <- ggplot(data = iris,
mapping = aes(x = Sepal.Length,
y = Petal.Length))
plot
```

#### Adding points to scatter plot

The `geom_point()`

is a function in the ggplot2 package in R that is used to create a scatter plot. It adds a layer of individual points to a plot created using `ggplot()`

.

```
plot <- plot +
geom_point(shape = 21,
fill = '#0f993d',
color = 'white',
size = 3)
plot
```

#### Annotating correlation and p value

The `annotate()`

is a function in the ggplot2 package in R that is used to add annotation to a plot created using ggplot(). Annotation includes text, labels, arrows, and other graphical elements that provide additional information about the plot.

```
plot <- plot +
annotate('text', x = 5, y = 6,
label = paste0('R = ', round(res$estimate,2), ', p < 0.001'))
plot
```

#### Adding trend line

The `sm_statCorr()`

is a function in the sm package in R that is used to compute the correlation coefficient between two variables in a dataset. The sm package provides nonparametric methods for smoothing and exploratory analysis of data, and `sm_statCorr()`

is a function for exploring the relationship between two variables using correlation analysis.

```
plot +
sm_statCorr(show_text = FALSE,
fit.params = list(color = 'black',
linetype = 'solid'))
```

Download R program â€” `Click_here`

Download R studio â€” `Click_here`