In this blog post, we will do exploration of the Nile Riverâ€™s annual streamflow data using time series analysis. The data, which spans from 1871 to 1970, offers a rich context for understanding the riverâ€™s flow patterns over time. We will employ ARIMA models, a popular choice for time series forecasting, to analyze this data. The process involves identifying the optimal parameters for our model, fitting the model to the data, and then generating forecasts. By comparing different ARIMA models, we aim to select the one that best fits our data and provides the most accurate forecasts.

Letâ€™s start!

Contents

### Introduction

We will use the famous Nile Riverâ€™s annual streamflow data, conveniently stored in a time series data set named `Nile`

. This dataset goes beyond just numbers; it also includes metadata, providing valuable context. When you call `print(Nile)`

, youâ€™ll see details like `Start = 1871`

, indicating the first year of data collection, and `End = 1970`

, marking the final year covered in this specific dataset.

```
data("Nile")
print(Nile)
```

## Time Series: ## Start = 1871 ## End = 1970 ## Frequency = 1 ## [1] 1120 1160 963 1210 1160 1160 813 1230 1370 1140 995 935 1110 994 1020 ## [16] 960 1180 799 958 1140 1100 1210 1150 1250 1260 1220 1030 1100 774 840 ## [31] 874 694 940 833 701 916 692 1020 1050 969 831 726 456 824 702 ## [46] 1120 1100 832 764 821 768 845 864 862 698 845 744 796 1040 759 ## [61] 781 865 845 944 984 897 822 1010 771 676 649 846 812 742 801 ## [76] 1040 860 874 848 890 744 749 838 1050 918 986 797 923 975 815 ## [91] 1020 906 901 1170 912 746 919 718 714 740

To make sure the Nile is a time series data you can use `is.ts()`

and `class()`

function to verify this.

`is.ts(Nile)`

## [1] TRUE

`class(Nile)`

## [1] "ts"

The `plot()`

function visualizes the historical flow patterns of data. We can observe variations in flow throughout the years, potentially indicating trends or long-term changes.

`plot(Nile)`

### Model Identification

We shall use the two most commonly used functions (`acf`

and `pacf`

) to analyze the time series properties of this data.

```
par(mfrow = c(1,2))
acf(Nile)
pacf(Nile)
```

Based on the partial autocorrelation (ACF) plot in the image, it appears that an ARIMA order of (1, 1, 0) might be appropriate. This is because the ACF plot shows a single spike at lag 1, and then dies off afterwards. This suggests that there is an autoregressive term of order 1 (AR(1)) in the model, but no higher order terms are needed. Further we can also see that the autocorrelation values also do not come to zero reasonably quickly, indicates no-stationarity and the series need to be differenced. The autocorrelation (ACF) plot also shows some spikes that cut off after a few lags, but they are not as clear-cut as the PACF plot. This further suggests that a moving average term (MA) is not required in the model (MA(0)).

There is no evidence in the plots to suggest that a seasonal component should be included in the model. The PACF and ACF plots do not exhibit any periodic patterns, which would be indicative of a seasonal component.

Based on these facts we will select ARIMA model of order (1, 1, 0).

Letâ€™s use the `auto.arima()`

function to identify the optimal order for ARIMA model for this time series data. This function will automatically search for the best combination of parameters (p, d, q) based on statistical criteria.

**library**(forecast)
auto.arima(Nile)

## Series: Nile ## ARIMA(1,1,1) ## ## Coefficients: ## ar1 ma1 ## 0.2544 -0.8741 ## s.e. 0.1194 0.0605 ## ## sigma^2 = 20177: log likelihood = -630.63 ## AIC=1267.25 AICc=1267.51 BIC=1275.04

### Fitting ARIMA models

While `auto.ARIMA`

suggests ARIMA(1,1,1), itâ€™s wise to compare it to ARIMA(1,1,0) (from the ACF plot) and ARIMA(1,0,0) (no differencing). This will ensure to select the best possible model by comparing their characteristics.

*# Fit an AR model to Nile*
Auto_fit <- auto.arima(Nile)
AR_fit <- arima(Nile, order = c(1, 0, 0))
ARd_fit <- arima(Nile, order = c(1, 1, 0))

### Comparing ARIMA models

We can use statistical tests like AIC (Akaike Information Criterion) or BIC (Bayesian Information Criterion) to see which model achieves a better fit for this data.

*# Define the model fits*
fits <- list(Auto_fit, AR_fit, ARd_fit)
fit_names <- c("Auto_fit", "AR_fit", "ARd_fit")
*# Initialize an empty data frame*
results_df <- data.frame(AIC = numeric(length(fits)),
BIC = numeric(length(fits)),
row.names = fit_names)
*# Calculate AIC and BIC for each model and store in the data frame*
**for** (i **in** seq_along(fits)) {
results_df[i, "AIC"] <- AIC(fits[[i]])
results_df[i, "BIC"] <- BIC(fits[[i]])
}
*# Print the results data frame*
print(results_df)

## AIC BIC ## Auto_fit 1267.255 1275.040 ## AR_fit 1285.904 1293.720 ## ARd_fit 1281.480 1286.671

Both AIC (1267.255) and BIC (1275.04) for Auto_fit are lower than the corresponding values for AR_fit and ARd_fit. Generally, lower AIC and BIC indicate a better fit for the data. While the difference in AIC between Auto_fit and ARd_fit is relatively small (around 14), it still suggests a slight preference for Auto_fit. BIC also favors Auto_fit with a clearer margin.

### Generating Forecasts

Given that the ARIMA(1,1,1) model demonstrated a better fit compared to the other two models, we can proceed with generating 10-step forecasts using this model. Itâ€™s important to remember that these forecasts are based on the assumption that the underlying patterns in the data will continue to hold for the forecasted period.

While the ARIMA(1,1,1) model will be our primary forecast due to its superior fit, it can also be informative to generate forecasts for the other two models. This comparison can help us visualize the potential range of future values and assess the confidence level in our primary forecast.

*# Use predict to make 1-step through 10-step forecasts*
Auto_pred <- predict(Auto_fit, n.ahead = 10)
AR_pred <- predict(AR_fit, n.ahead = 10)
ARd_pred <- predict(ARd_fit, n.ahead = 10)

**Comparing ARIMA Model Forecasts**

This code block generates three visualizations comparing the forecasted Nile River streamflow using different ARIMA models. The `par(mfrow = c(3,1))`

line sets up a grid layout with three rows and one column to display the plots efficiently.

The `ts.plot()`

line displays the actual Nile River streamflow data (`Nile`

) along with a time axis limited to the years 1871 to 1980 (`xlim`

). The `points()`

function plots the predicted streamflow values (stored in `Auto_pred$pred`

) as a red line.

```
par(mfrow = c(3,1))
```*# Predictions of Auto model ARIMA(1, 1, 1)*
ts.plot(Nile, xlim = c(1871, 1980),
ylab = "Nile River Flow",
main = "Predictions of Auto_fit model ARIMA(1, 1, 1)")
points(Auto_pred$pred, type = "l", col = "red")
points(Auto_pred$pred - Auto_pred$se, type = "l", col = "red", lty = 2)
points(Auto_pred$pred + Auto_pred$se, type = "l", col = "red", lty = 2)
*# Predictions of AR model ARIMA(1, 0, 0)*
ts.plot(Nile, xlim = c(1871, 1980),
ylab = "Nile River Flow",
main = "Predictions of AR_fit model ARIMA(1, 0, 0)")
points(AR_pred$pred, type = "l", col = "red")
points(AR_pred$pred - AR_pred$se, type = "l", col = "red", lty = 2)
points(AR_pred$pred + AR_pred$se, type = "l", col = "red", lty = 2)
*# Predictions of ARd model ARIMA(1, 1, 0)*
ts.plot(Nile, xlim = c(1871, 1980), ylab = "Nile River Flow",
main = "Predictions of ARd_fit model ARIMA(1, 1, 0)")
points(ARd_pred$pred, type = "l", col = "red")
points(ARd_pred$pred - ARd_pred$se, type = "l", col = "red", lty = 2)
points(ARd_pred$pred + ARd_pred$se, type = "l", col = "red", lty = 2)

Here are some observations based on the above predictions:

**Auto_fit**model provided relatively small prediction ranges and standard errors compared to the other two models, indicating potentially more precise predictions.**AR_fit**model has larger prediction ranges but still reasonable standard errors, suggesting moderate precision.**ARd_fit**model has the smallest prediction range but also the largest standard errors, indicating potentially less reliable predictions due to higher uncertainty.

We can enhance the visualization of predictions from various models using the ggplot function, enabling more effective comparisons.

*# Initialize an empty list to store the data frames*
pred_dfs <- list()
*# Loop through the models and make forecasts*
models <- list(Auto_fit, AR_fit, ARd_fit)
model_names <- c("Auto_pred", "AR_pred", "ARd_pred")
**for** (i **in** seq_along(models)) {
pred <- predict(models[[i]], n.ahead = 10)
pred_df <- data.frame(
x = c(time(pred$pred)),
pred = c(pred$pred),
se = c(pred$se)
)
pred_dfs[[i]] <- cbind(pred_df, model = model_names[i])
}
*# Combine the data frames into one*
combined_df <- do.call(rbind, pred_dfs)
*# Load required libraries*
**library**(ggplot2)
*# Plotting*
ggplot(
combined_df, aes(x = x, y = pred, color = model)
) +
geom_line(linewidth = 1) +
geom_ribbon(
aes(ymin = pred - se, ymax = pred + se, fill = model),
alpha = 0.3, lty = 2
) +
labs(x = "Year",
y = expression("Predicted flow of River Nile (m"^3*")"),
title = "Prediction with Standard Error") +
theme_minimal() +
scale_x_continuous(breaks = seq(1971, 1980, by = 2)) +
facet_wrap(facets = ~model) +
theme(legend.position = "none")

### Conclusion

This analysis of the Nile Riverâ€™s annual streamflow data demonstrated the utility of ARIMA models for time series forecasting. The ARIMA(1,1,1) model was identified as the best fit for the data, outperforming the ARIMA(1,0,0) and ARIMA(1,1,0) models according to AIC and BIC criteria. The forecasts generated by the ARIMA(1,1,1) model were more precise, as indicated by smaller prediction ranges and standard errors. However, itâ€™s important to remember that these forecasts are based on the assumption that the underlying patterns in the data will continue to hold for the forecasted period. This analysis underscores the importance of model selection in time series forecasting and the value of comparing different models to ensure the best possible predictions.

For more details and informative videos you can also subscribe to our YouTube Channel AGRON Info Tech.

Download R program and R studio â€”

`Click_here`