Spatial Analysis with AR1 × AR1 Model: Theory & Complete R Analysis
Updated:
Field experiments are routinely affected by spatial heterogeneity — systematic variation in soil fertility, moisture, drainage, pH, and microclimate that creates patches of high and low performance across the trial. When this variation is ignored, it inflates the residual variance, reduces heritability estimates, biases treatment comparisons, and misranks genotypes. The AR1 × AR1 model (first-order autoregressive process in both row and column directions) is the gold standard for capturing and removing this spatial structure from field trial data.
1. What is Spatial Autocorrelation?
Definition
Two plots are spatially autocorrelated if their residuals are correlated as a function of the distance between them. Nearby plots tend to be more similar than distant plots — a phenomenon described by Tobler’s First Law of Geography:
“Everything is related to everything else, but near things are more related than distant things.”
Sources in field trials
- Soil nutrient gradients (nitrogen, phosphorus, organic matter)
- Moisture gradients (slope, drainage, irrigation patterns)
- Historical land-use effects (previous crops, tillage)
- Microclimate variation (shade, wind, temperature)
Consequences of ignoring spatial structure
| Effect | Consequence |
|---|---|
| Inflated $MS_E$ | Lower F-statistics, reduced power |
| Biased treatment means | Incorrect ranking of genotypes |
| Underestimated $H^2$ | Conservative breeding decisions |
| Inflated PEV | Overestimated prediction errors |
2. Visualising Spatial Heterogeneity
Before fitting any model, always visualise raw yield as a spatial heatmap. Patterns indicate the type and direction of spatial gradient:
| Pattern | Likely cause |
|---|---|
| Row gradient (north–south) | Soil fertility, slope |
| Column gradient (east–west) | Irrigation, wind |
| Patchy clusters | Historical cropping, drainage pockets |
| Random | No spatial structure — standard ANOVA sufficient |
3. The AR1 Process
One-dimensional AR1
For a sequence of residuals $\xi_1, \xi_2, \ldots, \xi_n$ along a row (or column):
\[\xi_t = \rho\,\xi_{t-1} + \eta_t, \qquad \eta_t \overset{\text{iid}}{\sim} \mathcal{N}(0,\, \sigma^2(1-\rho^2))\]where $\rho \in (-1, 1)$ is the autocorrelation parameter (lag-1 correlation).
Variance of the AR1 process:
\[\text{Var}(\xi_t) = \frac{\sigma^2(1-\rho^2)}{1-\rho^2} = \sigma^2\]Covariance between plots at lag $k$:
\[\text{Cov}(\xi_t, \xi_{t+k}) = \sigma^2\,\rho^k\]Correlation matrix for $n$ equally-spaced plots:
\[\mathbf{R}(\rho) = \begin{pmatrix} 1 & \rho & \rho^2 & \cdots & \rho^{n-1} \\ \rho & 1 & \rho & \cdots & \rho^{n-2} \\ \rho^2 & \rho & 1 & \cdots & \rho^{n-3} \\ \vdots & & & \ddots & \vdots \\ \rho^{n-1} & \rho^{n-2} & \rho^{n-3} & \cdots & 1 \end{pmatrix}\]| Key property: $[\mathbf{R}(\rho)]_{ij} = \rho^{ | i-j | }$ |
4. The AR1 × AR1 Model
Two-dimensional extension
The AR1 × AR1 model extends the one-dimensional AR1 to two spatial directions simultaneously using the Kronecker product:
\[\text{Var}(\boldsymbol{\xi}) = \sigma_s^2\; \mathbf{R}_r(\rho_r) \otimes \mathbf{R}_c(\rho_c)\]where:
- $\mathbf{R}_r(\rho_r)$ is the AR1 correlation matrix along rows ($n_r \times n_r$)
- $\mathbf{R}_c(\rho_c)$ is the AR1 correlation matrix along columns ($n_c \times n_c$)
- $\otimes$ is the Kronecker product
- $\rho_r,\, \rho_c \in (-1, 1)$ are row and column autocorrelation parameters
Covariance between plots $(r_1, c_1)$ and $(r_2, c_2)$:
\[\text{Cov}(y_{r_1 c_1},\, y_{r_2 c_2}) = \sigma_s^2\;\rho_r^{|r_1 - r_2|}\;\rho_c^{|c_1 - c_2|}\]This separable structure means the spatial correlation decays multiplicatively in both directions — plots far apart in rows AND columns have very low correlation.
5. Full Mixed Model Specification
Complete model
\[\mathbf{y} = \mathbf{X}\boldsymbol{\tau} + \mathbf{Z}_g\mathbf{g} + \mathbf{Z}_b\mathbf{b} + \boldsymbol{\xi} + \boldsymbol{\varepsilon}\]| Term | Distribution | Meaning |
|---|---|---|
| $\mathbf{X}\boldsymbol{\tau}$ | Fixed | Overall mean, trial, year effects |
| $\mathbf{Z}_g\mathbf{g}$ | $\mathbf{g} \sim \mathcal{N}(\mathbf{0},\,\sigma_g^2\mathbf{I})$ | Random genotype effects |
| $\mathbf{Z}_b\mathbf{b}$ | $\mathbf{b} \sim \mathcal{N}(\mathbf{0},\,\sigma_b^2\mathbf{I})$ | Random incomplete block effects |
| $\boldsymbol{\xi}$ | $\mathcal{N}(\mathbf{0},\,\sigma_s^2\,\mathbf{R}_r\otimes\mathbf{R}_c)$ | Spatially structured error (AR1×AR1) |
| $\boldsymbol{\varepsilon}$ | $\mathcal{N}(\mathbf{0},\,\sigma_n^2\mathbf{I})$ | Independent nugget error |
Total residual variance
\[\text{Var}(\mathbf{y}\,|\,\mathbf{g}) = \underbrace{\sigma_s^2\,\mathbf{R}_r(\rho_r)\otimes\mathbf{R}_c(\rho_c)}_{\text{spatial}} + \underbrace{\sigma_n^2\,\mathbf{I}}_{\text{nugget}}\]Marginal model
\[\mathbf{y} \sim \mathcal{N}\!\left(\mathbf{X}\boldsymbol{\tau},\; \sigma_g^2\mathbf{Z}_g\mathbf{Z}_g^\top + \sigma_s^2\,\mathbf{R}_r\otimes\mathbf{R}_c + \sigma_n^2\mathbf{I}\right)\]6. Estimation
REML (Restricted Maximum Likelihood)
Parameters $(\sigma_g^2, \sigma_s^2, \rho_r, \rho_c, \sigma_n^2)$ are estimated by maximising the restricted log-likelihood:
\[\ell_R(\boldsymbol{\theta}) = -\frac{1}{2}\left[ \log|\mathbf{V}| + \log|\mathbf{X}^\top\mathbf{V}^{-1}\mathbf{X}| + (\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\tau}})^\top \mathbf{V}^{-1} (\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\tau}}) \right]\]where $\mathbf{V} = \text{Var}(\mathbf{y})$.
BLUPs (Best Linear Unbiased Predictors)
\[\hat{\mathbf{g}} = \sigma_g^2\mathbf{Z}_g^\top\mathbf{V}^{-1} (\mathbf{y} - \mathbf{X}\hat{\boldsymbol{\tau}})\]The BLUPs shrink towards zero — the amount of shrinkage depends on $\sigma_g^2 / (\sigma_g^2 + \sigma^2/\bar{r})$; low $H^2$ → more shrinkage.
7. Heritability and Accuracy
Broad-sense heritability
\[H^2 = \frac{\sigma_g^2}{\sigma_g^2 + \bar{v}/2}\]where $\bar{v} = \frac{1}{\binom{n}{2}}\sum_{i < j}(\hat{g}_i - \hat{g}_j)^2$ is the mean pairwise prediction error variance (Cullis et al., 2006).
Approximation using variance components
\[H^2 \approx \frac{\sigma_g^2}{\sigma_g^2 + \sigma_s^2/\bar{r}}\]Selection accuracy
\[r_{g\hat{g}} = \sqrt{1 - \frac{\bar{v}}{2\sigma_g^2}}\]Effective replication (Fasoulas)
\[\bar{r}_{\text{eff}} = \frac{\sigma_g^2 + \sigma_s^2/\bar{r}}{\sigma_s^2/\bar{r}}\]8. Model Selection Criteria
Akaike Information Criterion (AIC)
\[AIC = -2\ell_R + 2k\]Bayesian Information Criterion (BIC)
\[BIC = -2\ell_R + k\ln(n)\]Likelihood Ratio Test
\[\Lambda = -2(\ell_{R,0} - \ell_{R,1}) \sim \chi^2_{df}\]Used to test whether spatial parameters ($\rho_r$, $\rho_c$) significantly improve fit.
9. Full R Analysis
Step 1 — Packages
pkgs <- c("sommer", "SpATS", "lme4", "lmerTest",
"emmeans", "ggplot2", "dplyr", "tidyr",
"tibble", "patchwork", "viridis",
"fields", "gstat", "sp", "car", "ggrepel")
install.packages(setdiff(pkgs, rownames(installed.packages())))
library(sommer) # AR1×AR1 mixed models
library(SpATS) # P-spline spatial models
library(lme4) # standard mixed models
library(lmerTest) # p-values for lmer
library(emmeans) # estimated marginal means
library(ggplot2)
library(dplyr)
library(tidyr)
library(tibble)
library(patchwork)
library(viridis)
library(fields) # variogram, kriging
library(gstat) # empirical variogram
library(sp) # spatial data structures
library(car)
library(ggrepel)
Step 2 — Simulate a Spatially Correlated Field Trial
# ── Trial parameters ───────────────────────────────────────────────────────
set.seed(42)
n_row <- 20 # field rows
n_col <- 15 # field columns
n_plots <- n_row * n_col # 300 total plots
n_geno <- 250 # 250 genotypes (some replicated)
# ── True parameters ────────────────────────────────────────────────────────
mu_true <- 55 # grand mean (q/ha)
sigma_g <- 6 # genotype SD
sigma_s <- 5 # spatial SD
rho_r_true <- 0.75 # row autocorrelation
rho_c_true <- 0.65 # column autocorrelation
sigma_n <- 1.5 # nugget (independent error)
# ── Build AR1 correlation matrix ──────────────────────────────────────────
ar1_matrix <- function(n, rho) {
idx <- 0:(n-1)
outer(idx, idx, function(i, j) rho^abs(i - j))
}
R_row <- ar1_matrix(n_row, rho_r_true)
R_col <- ar1_matrix(n_col, rho_c_true)
# ── Simulate spatial surface via Cholesky ─────────────────────────────────
R_combined <- kronecker(R_row, R_col) # (n_row*n_col) × (n_row*n_col)
L <- chol(sigma_s^2 * R_combined + sigma_n^2 * diag(n_plots))
xi_vec <- as.vector(t(L) %*% rnorm(n_plots))
# ── Assign genotypes (p-rep: 50 genotypes replicated twice) ──────────────
geno_ids <- c(
rep(paste0("G", sprintf("%03d", 1:50)), 2), # 50 p-rep genotypes
paste0("G", sprintf("%03d", 51:n_geno)) # 200 once-replicated
)
geno_ids <- sample(geno_ids) # randomise order
# ── True genotype values ───────────────────────────────────────────────────
tbv <- setNames(rnorm(n_geno, 0, sigma_g),
paste0("G", sprintf("%03d", 1:n_geno)))
# ── Assemble field data ────────────────────────────────────────────────────
field_df <- data.frame(
ROW = rep(1:n_row, each = n_col),
COL = rep(1:n_col, times = n_row),
GENO = geno_ids,
spatial = xi_vec
) |>
mutate(
Yield = mu_true + tbv[GENO] + spatial + rnorm(n(), 0, 0.5),
ROW_f = factor(ROW),
COL_f = factor(COL),
PLOT = row_number()
)
cat("Field trial summary\n")
cat(" Rows :", n_row, "\n")
cat(" Columns :", n_col, "\n")
cat(" Total plots:", nrow(field_df), "\n")
cat(" Genotypes :", n_geno, "\n")
cat(" Grand mean :", round(mean(field_df$Yield), 3), "\n")
cat(" Yield SD :", round(sd(field_df$Yield), 3), "\n")
Step 3 — Exploratory Spatial Analysis
# ── 3a. Raw yield heatmap ─────────────────────────────────────────────────
p_raw <- ggplot(field_df,
aes(x = COL, y = ROW, fill = Yield)) +
geom_tile(colour = NA) +
scale_fill_viridis_c(option = "plasma", name = "Yield\n(q/ha)") +
scale_y_reverse() +
labs(title = "Raw Yield Heatmap",
subtitle = "Spatial gradient clearly visible",
x = "Column", y = "Row") +
theme_minimal(base_size = 11) +
theme(panel.grid = element_blank())
# ── 3b. Row marginal means ────────────────────────────────────────────────
p_row <- field_df |>
group_by(ROW) |>
summarise(Mean = mean(Yield), SE = sd(Yield)/sqrt(n())) |>
ggplot(aes(ROW, Mean)) +
geom_ribbon(aes(ymin = Mean - SE, ymax = Mean + SE),
fill = "#9ECAE1", alpha = 0.5) +
geom_line(colour = "#2C7BB6", linewidth = 1) +
geom_point(colour = "#2C7BB6", size = 2) +
geom_smooth(method = "loess", se = FALSE,
colour = "#E41A1C", linetype = "dashed",
linewidth = 1) +
labs(title = "Row Marginal Means",
x = "Row", y = "Mean Yield (q/ha)") +
theme_minimal(base_size = 11)
# ── 3c. Column marginal means ─────────────────────────────────────────────
p_col <- field_df |>
group_by(COL) |>
summarise(Mean = mean(Yield), SE = sd(Yield)/sqrt(n())) |>
ggplot(aes(COL, Mean)) +
geom_ribbon(aes(ymin = Mean - SE, ymax = Mean + SE),
fill = "#A1D99B", alpha = 0.5) +
geom_line(colour = "#31A354", linewidth = 1) +
geom_point(colour = "#31A354", size = 2) +
geom_smooth(method = "loess", se = FALSE,
colour = "#E41A1C", linetype = "dashed",
linewidth = 1) +
labs(title = "Column Marginal Means",
x = "Column", y = "Mean Yield (q/ha)") +
theme_minimal(base_size = 11)
(p_raw | (p_row / p_col)) +
plot_layout(widths = c(1.5, 1)) +
plot_annotation(title = "Exploratory Spatial Analysis")
Step 4 — Empirical Variogram
The variogram quantifies spatial autocorrelation by plotting semi-variance against distance:
\[\hat{\gamma}(h) = \frac{1}{2|N(h)|}\sum_{(i,j)\in N(h)}(y_i - y_j)^2\]where $N(h)$ is the set of plot pairs separated by lag $h$.
# ── Compute empirical variogram ────────────────────────────────────────────
library(gstat); library(sp)
sp_data <- field_df
coordinates(sp_data) <- ~ COL + ROW
vgm_emp <- variogram(Yield ~ 1, data = sp_data,
cutoff = 12, width = 1)
# ── Fit theoretical variogram model ───────────────────────────────────────
vgm_fit <- fit.variogram(
vgm_emp,
model = vgm(psill = var(field_df$Yield) * 0.8,
model = "Exp",
range = 5,
nugget = var(field_df$Yield) * 0.1)
)
print(vgm_fit)
# ── Plot variogram ────────────────────────────────────────────────────────
vgm_df <- as.data.frame(vgm_emp)
vgm_th <- variogramLine(vgm_fit,
maxdist = max(vgm_df$dist))
ggplot(vgm_df, aes(dist, gamma)) +
geom_point(size = 3, colour = "#2C7BB6") +
geom_line(data = vgm_th,
aes(dist, gamma),
colour = "#E41A1C", linewidth = 1.2) +
geom_hline(yintercept = var(field_df$Yield),
linetype = "dashed", colour = "grey50") +
annotate("text", x = 10, y = var(field_df$Yield) * 1.05,
label = "Sill (total variance)", colour = "grey40",
size = 3.5) +
labs(title = "Empirical Variogram",
subtitle = "Red = fitted exponential model",
x = "Lag distance (plot units)",
y = "Semi-variance γ(h)") +
theme_minimal(base_size = 13)
Reading the variogram:
| Feature | Meaning |
|---|---|
| Nugget (y-intercept) | Micro-scale variation / measurement error |
| Sill (plateau) | Total spatial variance |
| Range (x at sill) | Distance beyond which plots are uncorrelated |
Step 5 — Autocorrelation Function (ACF)
# ── Row-direction ACF ──────────────────────────────────────────────────────
row_means <- field_df |>
group_by(ROW) |>
summarise(m = mean(Yield)) |>
pull(m)
par(mfrow = c(1, 2))
acf(row_means, main = "ACF — Row Direction", lag.max = 15)
pacf(row_means, main = "PACF — Row Direction", lag.max = 15)
par(mfrow = c(1, 1))
# ── Column-direction ACF ───────────────────────────────────────────────────
col_means <- field_df |>
group_by(COL) |>
summarise(m = mean(Yield)) |>
pull(m)
par(mfrow = c(1, 2))
acf(col_means, main = "ACF — Column Direction", lag.max = 10)
pacf(col_means, main = "PACF — Column Direction", lag.max = 10)
par(mfrow = c(1, 1))
Interpretation: Significant ACF at lag 1, cutting off after lag 1 in PACF → AR(1) structure is appropriate in both directions.
Step 6 — Model 0: Naive ANOVA (Ignore Spatial)
# ── Baseline: genotype as fixed, no spatial correction ────────────────────
model_0 <- aov(Yield ~ GENO, data = field_df)
anova_0 <- summary(model_0)[[1]]
MS_T_0 <- anova_0["GENO", "Mean Sq"]
MS_E_0 <- anova_0["Residuals", "Mean Sq"]
F_0 <- MS_T_0 / MS_E_0
CV_0 <- sqrt(MS_E_0) / mean(field_df$Yield) * 100
cat("=== Model 0: Naive ANOVA ===\n")
cat("MS Treatment :", round(MS_T_0, 3), "\n")
cat("MS Error :", round(MS_E_0, 3), "\n")
cat("F statistic :", round(F_0, 3), "\n")
cat("CV (%) :", round(CV_0, 2), "\n")
cat("AIC :", round(AIC(model_0), 2), "\n\n")
Step 7 — Model 1: Row + Column as Fixed Covariates
# ── Row and column trends as fixed polynomials ────────────────────────────
model_1 <- lm(Yield ~ GENO + poly(ROW, 2) + poly(COL, 2),
data = field_df)
MS_E_1 <- mean(residuals(model_1)^2)
CV_1 <- sqrt(MS_E_1) / mean(field_df$Yield) * 100
cat("=== Model 1: Row + Column Polynomial ===\n")
cat("Residual SD :", round(sigma(model_1), 3), "\n")
cat("CV (%) :", round(CV_1, 2), "\n")
cat("AIC :", round(AIC(model_1), 2), "\n\n")
Step 8 — Model 2: Simple Mixed Model (No Spatial)
# ── Genotype as random, no spatial structure ──────────────────────────────
model_2 <- lmer(Yield ~ 1 + (1 | GENO),
data = field_df, REML = TRUE)
vc_2 <- as.data.frame(VarCorr(model_2))
sg2_2 <- vc_2[vc_2$grp == "GENO", "vcov"]
se2_2 <- vc_2[vc_2$grp == "Residual", "vcov"]
# Approximate heritability
r_bar <- nrow(field_df) / n_geno
H2_2 <- sg2_2 / (sg2_2 + se2_2 / r_bar)
cat("=== Model 2: Simple Mixed Model ===\n")
cat("σ²_g :", round(sg2_2, 4), "\n")
cat("σ²_e :", round(se2_2, 4), "\n")
cat("H² :", round(H2_2, 3), "\n")
cat("AIC :", round(AIC(model_2), 2), "\n\n")
Step 9 — Model 3: AR1 × AR1 with sommer
This is the full spatial model. sommer fits the AR1 correlation structure using the AR1() function for the residual variance-covariance matrix.
# ── AR1×AR1 model via sommer ───────────────────────────────────────────────
model_ar1 <- mmer(
fixed = Yield ~ 1,
random = ~ vsr(GENO),
rcov = ~ vsr(ROW_f, Gu = AR1(field_df$ROW_f, rho = 0.7))
: vsr(COL_f, Gu = AR1(field_df$COL_f, rho = 0.6)),
data = field_df,
verbose = FALSE
)
# ── Variance components ────────────────────────────────────────────────────
vc_ar1 <- summary(model_ar1)$varcomp
print(vc_ar1)
sg2_ar1 <- vc_ar1["u:GENO", "VarComp"]
ss2_ar1 <- vc_ar1[grep("AR1", rownames(vc_ar1))[1], "VarComp"]
cat("\n=== Model 3: AR1×AR1 (sommer) ===\n")
cat("σ²_g (genotype) :", round(sg2_ar1, 4), "\n")
cat("σ²_s (spatial) :", round(ss2_ar1, 4), "\n")
cat("Log-likelihood :", round(model_ar1$logLik, 4), "\n")
cat("AIC :", round(model_ar1$AIC, 4), "\n")
# ── GBLUPs ────────────────────────────────────────────────────────────────
gblups_ar1 <- randef(model_ar1)$`u:GENO`
blup_ar1_df <- data.frame(
GENO = names(gblups_ar1),
BLUP = as.numeric(gblups_ar1)
) |>
arrange(desc(BLUP))
cat("\nTop 10 genotypes (AR1×AR1 BLUPs):\n")
print(head(blup_ar1_df, 10))
Step 10 — Model 4: SpATS (P-Spline Spatial Model)
SpATS uses 2D penalised B-splines to model the spatial trend — a more flexible alternative that captures non-stationary spatial patterns.
# ── SpATS spatial model ───────────────────────────────────────────────────
model_spats <- SpATS(
response = "Yield",
genotype = "GENO",
genotype.as.random = TRUE,
fixed = NULL,
spatial = SAP(ROW, COL),
data = as.data.frame(field_df),
control = list(
tolerance = 1e-06,
monitoring = 0,
update.psi = TRUE
)
)
summary(model_spats)
Output:
Variance components:
Variance Ratio
GENO 34.127 1.000
Residual 2.318 0.068
Effective dimensions:
Intercept GENO f(ROW, COL)
1.000 192.4 48.3
Heritability (H²): 0.934
# ── Extract key statistics ────────────────────────────────────────────────
H2_spats <- getHeritability(model_spats)
vc_spats <- model_spats$var.comp
sg2_spats <- vc_spats["GENO"]
se2_spats <- vc_spats["Residual"]
cat("\n=== Model 4: SpATS ===\n")
cat("σ²_g (genotype) :", round(sg2_spats, 4), "\n")
cat("σ²_e (residual) :", round(se2_spats, 4), "\n")
cat("H² :", round(H2_spats, 4), "\n")
cat("AIC :", round(model_spats$model$aic, 4), "\n")
Step 11 — Model Comparison
# ── Compare all models ─────────────────────────────────────────────────────
model_comp <- data.frame(
Model = c("M0: Naive ANOVA",
"M1: Row+Col Polynomial",
"M2: Simple Mixed",
"M3: AR1×AR1 (sommer)",
"M4: SpATS P-splines"),
AIC = c(AIC(model_0),
AIC(model_1),
AIC(model_2),
model_ar1$AIC,
model_spats$model$aic),
H2 = c(NA, NA,
round(H2_2, 3),
round(sg2_ar1 / (sg2_ar1 + ss2_ar1 / r_bar), 3),
round(H2_spats, 3)),
CV = c(round(CV_0, 2),
round(CV_1, 2),
NA, NA, NA)
) |>
arrange(AIC)
print(model_comp)
# ── Visualise model comparison ────────────────────────────────────────────
ggplot(na.omit(model_comp[, c("Model", "H2")]),
aes(x = reorder(Model, H2), y = H2, fill = Model)) +
geom_col(alpha = 0.85, width = 0.6,
colour = "grey20", linewidth = 0.4) +
geom_text(aes(label = round(H2, 3)),
hjust = -0.2, size = 4, fontface = "bold") +
scale_fill_brewer(palette = "Set2") +
coord_flip() +
ylim(0, 1) +
labs(title = "Heritability Estimates by Model",
subtitle = "Spatial models recover more genetic signal",
x = NULL, y = "H²") +
theme_minimal(base_size = 13) +
theme(legend.position = "none")
Step 12 — Spatial Trend Decomposition
# ── Visualise fitted spatial surface (SpATS) ──────────────────────────────
plot(model_spats,
main = "Fitted Spatial Trend Surface (SpATS)",
col = viridis(100, option = "plasma"),
spat.int.plot = TRUE,
layout = c(3, 1))
# ── Extract spatial trend values ──────────────────────────────────────────
field_df$Trend_SpATS <- fitted(model_spats) -
predict(model_spats, which = "GENO")$predicted.values[
match(field_df$GENO,
predict(model_spats, which = "GENO")$GENO)
]
# ── Side-by-side: raw, spatial trend, detrended ───────────────────────────
field_df$Detrended <- residuals(model_spats)
pa <- ggplot(field_df, aes(COL, ROW, fill = Yield)) +
geom_tile(colour = NA) +
scale_fill_viridis_c(option = "plasma") +
scale_y_reverse() +
labs(title = "Raw Yield", x = "Col", y = "Row") +
theme_minimal(base_size = 10) +
theme(panel.grid = element_blank())
pb <- ggplot(field_df, aes(COL, ROW, fill = spatial)) +
geom_tile(colour = NA) +
scale_fill_viridis_c(option = "magma") +
scale_y_reverse() +
labs(title = "True Spatial Surface", x = "Col", y = "Row") +
theme_minimal(base_size = 10) +
theme(panel.grid = element_blank())
pc <- ggplot(field_df, aes(COL, ROW, fill = Detrended)) +
geom_tile(colour = NA) +
scale_fill_gradient2(low = "#d73027", mid = "#ffffbf",
high = "#1a9850", midpoint = 0) +
scale_y_reverse() +
labs(title = "Detrended Residuals", x = "Col", y = "Row") +
theme_minimal(base_size = 10) +
theme(panel.grid = element_blank())
pa + pb + pc + plot_layout(ncol = 3)
Step 13 — Residual Diagnostics
# ── Residuals from AR1×AR1 and SpATS ──────────────────────────────────────
field_df$Resid_AR1 <- residuals(model_ar1)
field_df$Resid_SpATS <- residuals(model_spats)
# ── Residual heatmaps ─────────────────────────────────────────────────────
plot_resid <- function(data, resid_col, title) {
ggplot(data, aes(COL, ROW, fill = .data[[resid_col]])) +
geom_tile(colour = NA) +
scale_fill_gradient2(low = "#d73027", mid = "#ffffbf",
high = "#1a9850", midpoint = 0) +
scale_y_reverse() +
labs(title = title, x = "Col", y = "Row",
fill = "Residual") +
theme_minimal(base_size = 10) +
theme(panel.grid = element_blank())
}
r1 <- plot_resid(field_df, "Resid_AR1", "AR1×AR1 Residuals")
r2 <- plot_resid(field_df, "Resid_SpATS", "SpATS Residuals")
r1 + r2 + plot_layout(ncol = 2)
# ── Q-Q plots ─────────────────────────────────────────────────────────────
qq_ar1 <- ggplot(data.frame(r = field_df$Resid_AR1),
aes(sample = r)) +
stat_qq(colour = "#2C7BB6", size = 1.5) +
stat_qq_line(colour = "#E41A1C", linewidth = 1) +
labs(title = "Q-Q: AR1×AR1", x = "Theoretical", y = "Sample") +
theme_minimal(base_size = 11)
qq_sp <- ggplot(data.frame(r = field_df$Resid_SpATS),
aes(sample = r)) +
stat_qq(colour = "#31A354", size = 1.5) +
stat_qq_line(colour = "#E41A1C", linewidth = 1) +
labs(title = "Q-Q: SpATS", x = "Theoretical", y = "Sample") +
theme_minimal(base_size = 11)
qq_ar1 + qq_sp + plot_layout(ncol = 2)
# ── Normality tests ────────────────────────────────────────────────────────
shapiro.test(sample(field_df$Resid_AR1, 50))
shapiro.test(sample(field_df$Resid_SpATS, 50))
# ── ACF of residuals ──────────────────────────────────────────────────────
# Good spatial model: residual ACF should be white noise
acf(field_df$Resid_SpATS, main = "ACF of SpATS Residuals")
acf(field_df$Resid_AR1, main = "ACF of AR1×AR1 Residuals")
Step 14 — BLUP Comparison Across Models
# ── Extract BLUPs from each model ─────────────────────────────────────────
blups_m2 <- ranef(model_2)$GENO |>
rownames_to_column("GENO") |>
rename(BLUP_M2 = `(Intercept)`)
blups_m3 <- data.frame(
GENO = names(gblups_ar1),
BLUP_M3 = as.numeric(gblups_ar1)
)
blups_m4 <- predict(model_spats, which = "GENO") |>
rename(GENO = GENO, BLUP_M4 = predicted.values) |>
mutate(BLUP_M4 = BLUP_M4 - mean(BLUP_M4))
# ── Merge ─────────────────────────────────────────────────────────────────
blup_compare <- blups_m2 |>
left_join(blups_m3, by = "GENO") |>
left_join(select(blups_m4, GENO, BLUP_M4), by = "GENO") |>
left_join(data.frame(GENO = names(tbv),
TBV = tbv), by = "GENO")
# ── Correlation with true breeding values ─────────────────────────────────
cat("Correlation with TBV:\n")
cat(" M2 (Simple mixed) :",
round(cor(blup_compare$BLUP_M2, blup_compare$TBV,
use = "complete.obs"), 4), "\n")
cat(" M3 (AR1×AR1) :",
round(cor(blup_compare$BLUP_M3, blup_compare$TBV,
use = "complete.obs"), 4), "\n")
cat(" M4 (SpATS) :",
round(cor(blup_compare$BLUP_M4, blup_compare$TBV,
use = "complete.obs"), 4), "\n")
# ── Scatter: True vs predicted BLUPs ─────────────────────────────────────
blup_long <- blup_compare |>
pivot_longer(cols = starts_with("BLUP"),
names_to = "Model",
values_to = "BLUP") |>
mutate(Model = recode(Model,
BLUP_M2 = "M2: Simple Mixed",
BLUP_M3 = "M3: AR1×AR1",
BLUP_M4 = "M4: SpATS"
))
ggplot(blup_long,
aes(x = TBV, y = BLUP, colour = Model)) +
geom_abline(slope = 1, intercept = 0,
linetype = "dashed", colour = "grey60") +
geom_point(alpha = 0.4, size = 1.8) +
geom_smooth(method = "lm", se = FALSE, linewidth = 1.2) +
scale_colour_manual(values = c("#E41A1C", "#2C7BB6", "#31A354")) +
facet_wrap(~ Model) +
labs(title = "True Breeding Values vs BLUP Estimates",
subtitle = "Diagonal = perfect prediction | Slope = accuracy",
x = "True Breeding Value",
y = "BLUP Estimate",
colour = NULL) +
theme_minimal(base_size = 12) +
theme(legend.position = "none",
strip.text = element_text(face = "bold"))
Step 15 — Estimating ρ_r and ρ_c
# ── Profile likelihood for rho_r ───────────────────────────────────────────
rho_seq <- seq(0.1, 0.95, by = 0.05)
ll_profile_r <- sapply(rho_seq, function(rho) {
tryCatch({
m <- mmer(
fixed = Yield ~ 1,
random = ~ vsr(GENO),
rcov = ~ vsr(ROW_f,
Gu = AR1(field_df$ROW_f, rho = rho)),
data = field_df, verbose = FALSE
)
m$logLik
}, error = function(e) NA_real_)
})
rho_best_r <- rho_seq[which.max(ll_profile_r)]
cat("Best rho_r (profile likelihood):", rho_best_r, "\n")
ggplot(data.frame(rho = rho_seq, ll = ll_profile_r),
aes(rho, ll)) +
geom_line(colour = "#2C7BB6", linewidth = 1.2) +
geom_point(size = 2.5, colour = "#2C7BB6") +
geom_vline(xintercept = rho_best_r,
linetype = "dashed",
colour = "#E41A1C",
linewidth = 1) +
annotate("text",
x = rho_best_r + 0.05,
y = min(ll_profile_r, na.rm = TRUE),
label = paste("ρ̂_r =", rho_best_r),
colour = "#E41A1C", size = 4) +
labs(title = "Profile Likelihood for Row Autocorrelation ρ_r",
x = "ρ_r", y = "Log-Likelihood") +
theme_minimal(base_size = 13)
Step 16 — Likelihood Ratio Test for Spatial Terms
# ── LRT: AR1×AR1 vs simple mixed ──────────────────────────────────────────
ll_spatial <- model_ar1$logLik
ll_base <- as.numeric(logLik(model_2))
LRT_stat <- -2 * (ll_base - ll_spatial)
LRT_df <- 2 # rho_r and rho_c are extra parameters
LRT_pval <- pchisq(LRT_stat, df = LRT_df, lower.tail = FALSE)
cat("=== LRT: Spatial vs Non-Spatial ===\n")
cat("Log-likelihood (spatial) :", round(ll_spatial, 2), "\n")
cat("Log-likelihood (non-spatial):", round(ll_base, 2), "\n")
cat("LRT statistic :", round(LRT_stat, 3), "\n")
cat("df :", LRT_df, "\n")
cat("p-value :", format(LRT_pval, digits = 4), "\n")
cat(ifelse(LRT_pval < 0.05,
"→ Spatial model significantly better (p < 0.05). Use AR1×AR1.\n",
"→ No significant spatial autocorrelation. Simple mixed model sufficient.\n"))
Step 17 — Selection Under Spatial Model
# ── Select top 10% genotypes ──────────────────────────────────────────────
spats_preds <- predict(model_spats, which = "GENO")
sel_df <- data.frame(
GENO = spats_preds$GENO,
BLUP = spats_preds$predicted.values,
SE = spats_preds$standard.errors,
PEV = spats_preds$standard.errors^2,
Acc = sqrt(pmax(0, 1 - spats_preds$standard.errors^2 /
(2 * sg2_spats)))
) |>
arrange(desc(BLUP)) |>
mutate(Rank = row_number())
threshold <- quantile(sel_df$BLUP, 0.90)
sel_df$Selected <- sel_df$BLUP >= threshold
cat("Selection threshold (top 10%):", round(threshold, 3), "\n")
cat("Mean accuracy — selected :",
round(mean(sel_df$Acc[sel_df$Selected]), 3), "\n")
cat("Mean accuracy — all :",
round(mean(sel_df$Acc), 3), "\n")
# Selection gain
mu_all <- mean(sel_df$BLUP)
mu_sel <- mean(sel_df$BLUP[sel_df$Selected])
i_val <- mean(scale(sel_df$BLUP[sel_df$Selected]))
delta_G <- i_val * sqrt(sg2_spats) * mean(sel_df$Acc[sel_df$Selected])
cat(sprintf("Expected ΔG: %.3f q/ha\n", delta_G))
# ── Publication ranking plot ───────────────────────────────────────────────
ggplot(head(sel_df, 50),
aes(x = reorder(GENO, BLUP),
y = BLUP,
fill = Selected)) +
geom_col(alpha = 0.85, width = 0.75,
colour = "grey20", linewidth = 0.3) +
geom_errorbar(aes(ymin = BLUP - 1.96 * SE,
ymax = BLUP + 1.96 * SE),
width = 0.3, linewidth = 0.6,
colour = "grey30") +
geom_hline(yintercept = threshold,
linetype = "dashed",
colour = "#E41A1C",
linewidth = 1) +
scale_fill_manual(values = c(`TRUE` = "#1A6496",
`FALSE` = "#9ECAE1")) +
coord_flip() +
labs(title = "Top 50 Genotypes — SpATS BLUPs",
subtitle = "Dashed = 10% threshold | Error bars = 95% CI",
x = NULL, y = "BLUP (q/ha)",
fill = "Selected") +
theme_minimal(base_size = 10) +
theme(plot.title = element_text(face = "bold"),
legend.position = "bottom")
10. Complete Workflow
1. Visualise raw yield heatmap
│
No pattern Strong pattern
│ │
▼ ▼
Standard ANOVA Continue spatial analysis
│
▼
2. Compute empirical variogram + ACF
Confirm AR(1) decay in rows and columns
│
▼
3. Fit model ladder:
M0: Naive ANOVA
M1: Row+Col polynomial
M2: Simple mixed (lmer)
M3: AR1×AR1 (sommer)
M4: SpATS P-splines
│
▼
4. Compare models by AIC / LRT
Select best fitting model
│
▼
5. Profile likelihood for ρ_r, ρ_c
Confirm autocorrelation estimates
│
▼
6. Residual diagnostics:
Heatmap, Q-Q, ACF of residuals
Check: no remaining spatial pattern
│
▼
7. Extract BLUPs + accuracy (PEV)
Compute H², selection accuracy
│
▼
8. Select superior genotypes
ΔG = i × σ_g × r_{g,ĝ}
│
▼
9. Compare BLUPs vs TBV correlation
Validate spatial model effectiveness
11. Parameter Interpretation Guide
| Parameter | Typical range | Interpretation |
|---|---|---|
| $\rho_r$ | 0.3 – 0.9 | Row autocorrelation; > 0.7 = strong gradient |
| $\rho_c$ | 0.3 – 0.9 | Column autocorrelation; > 0.7 = strong gradient |
| $\sigma_g^2$ | > 0 | Genetic variance — drives $H^2$ |
| $\sigma_s^2$ | > 0 | Spatial variance — recovered by model |
| $\sigma_n^2$ | Small | Nugget — micro-scale noise, unmeasured |
| $H^2$ | 0 – 1 | > 0.7 good; > 0.9 excellent |
| $r_{g\hat{g}}$ | 0 – 1 | > 0.85 high accuracy |
| CV (%) | < 15 | < 10 excellent for field trials |
12. AR1×AR1 vs Alternative Spatial Models
| Model | Assumptions | Flexibility | R package |
|---|---|---|---|
| AR1×AR1 | Stationary, separable | Moderate |
sommer, ASReml-R
|
| P-splines (SpATS) | Non-stationary | High | SpATS |
| Kriging | Isotropic variogram | Moderate |
gstat, fields
|
| Random regression | Smooth gradient | High |
lme4 + polynomials |
| NNspatial | Nearest-neighbour | Low | mvtnorm |
13. Summary Table
| Feature | M0: Naive | M2: Mixed | M3: AR1×AR1 | M4: SpATS |
|---|---|---|---|---|
| Spatial correction | ❌ | ❌ | ✅ Parametric | ✅ Non-parametric |
| Parameters | Fixed | 2 VC | 4 VC + 2 ρ | Effective df |
| BLUPs | Fixed means | Yes | Yes | Yes |
| H² estimate | None | Approx. | Accurate | Most accurate |
| Convergence | Always | Usually | Sometimes | Usually |
| R package | Base | lme4 |
sommer |
SpATS |
14. References
- Gilmour, A. R., Cullis, B. R., & Verbyla, A. P. (1997). Accounting for natural and extraneous variation in the analysis of field experiments. JABES, 2(3), 269–293.
- Cullis, B. R., & Gleeson, A. C. (1991). Spatial analysis of field experiments — an extension to two dimensions. Biometrics, 47(4), 1449–1460.
- Rodríguez-Álvarez, M. X. et al. (2018). Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spatial Statistics, 23, 52–71.
- Smith, A. B., Cullis, B. R., & Thompson, R. (2005). The analysis of crop cultivar breeding and evaluation trials. Journal of Agricultural Science, 143(6), 449–462.
- Covarrubias-Pazaran, G. (2016). Genome-assisted prediction of quantitative traits using the R package sommer. PLOS ONE, 11(6).
- Tobler, W. R. (1970). A computer movie simulating urban growth in the Detroit region. Economic Geography, 46, 234–240.
- R Core Team (2026). R: A Language and Environment for Statistical Computing.
Leave a comment