16 minute read

Updated:

The Partially Replicated (p-rep) design — formally developed by Cullis, Smith & Coombes (2006) — is the modern standard for Stage 1 multi-environment plant breeding trials. It overcomes the key limitation of the augmented design (zero replication of test entries) by replicating a controlled fraction (typically 20–30 %) of test entries twice, while the remainder appear only once. This provides direct within-trial error estimation for all genotypes and enables powerful spatial modelling of field heterogeneity.


1. Concept and Rationale

Why p-rep over augmented?

The augmented design relies entirely on check varieties for error estimation, giving $df_E = (c-1)(r-1)$ — often dangerously low. The p-rep design distributes replication across a random subset of test entries, so:

  • Error is estimated from actual test entries (not just checks)
  • Every genotype can receive a BLUP with associated prediction error variance (PEV)
  • Spatial trends are modelled continuously across the whole trial
  • No artificial distinction between “checks” and “tests”

Replication structure

Entry class Fraction Replicated
p-rep entries $p$ (e.g., 20–30 %) Twice
Non-replicated entries $1-p$ Once

Total plots: $N = n(1 + p)$ where $n$ is the number of genotypes.


2. Design Parameters

Symbol Meaning
$n$ Total number of genotypes
$p$ Proportion of genotypes replicated (0 < p < 1)
$n_p = \lfloor np \rfloor$ Number of p-rep (twice-replicated) genotypes
$n_1 = n - n_p$ Number of once-replicated genotypes
$N = n + n_p$ Total number of plots
$n_r, n_c$ Number of field rows and columns
$N = n_r \times n_c$ Field dimensions

Common configurations:

Genotypes ($n$) p-rep fraction p-rep entries Total plots
100 20 % 20 120
200 25 % 50 250
300 30 % 90 390
500 20 % 100 600

3. Linear Model

Basic model (no spatial component)

\[y_i = \mu + g_i + \varepsilon_i\] \[g_i \sim \mathcal{N}(0, \sigma_g^2), \qquad \varepsilon_i \sim \mathcal{N}(0, \sigma^2)\]\[\mathbf{y} = \mathbf{X}\boldsymbol{\tau} + \mathbf{Z}\mathbf{g} + \boldsymbol{\xi} + \boldsymbol{\eta}\]
Term Meaning
$\mathbf{X}\boldsymbol{\tau}$ Fixed effects (overall mean, replicate, trial)
$\mathbf{Z}\mathbf{g}$ Random genotype effects; $\mathbf{g} \sim \mathcal{N}(\mathbf{0}, \sigma_g^2 \mathbf{I})$
$\boldsymbol{\xi}$ Spatially structured error (AR1 × AR1 process)
$\boldsymbol{\eta}$ Independent nugget error; $\boldsymbol{\eta} \sim \mathcal{N}(\mathbf{0}, \sigma_n^2 \mathbf{I})$

AR1 × AR1 Residual Variance Structure

\[\text{Var}(\boldsymbol{\xi}) = \sigma_s^2\, \mathbf{R}_r(\rho_r) \otimes \mathbf{R}_c(\rho_c)\]

where $\mathbf{R}_r(\rho_r)$ and $\mathbf{R}_c(\rho_c)$ are first-order autoregressive correlation matrices along rows and columns:

\[[\mathbf{R}(\rho)]_{ij} = \rho^{|i-j|}\]

4. Hypotheses

Genotype effect

\[H_0: \sigma_g^2 = 0 \quad \text{(genotypes do not differ)}\] \[H_1: \sigma_g^2 > 0\]

Spatial autocorrelation (row direction)

\[H_0: \rho_r = 0 \quad H_1: \rho_r \neq 0\]

Spatial autocorrelation (column direction)

\[H_0: \rho_c = 0 \quad H_1: \rho_c \neq 0\]

5. Heritability

Broad-sense heritability from the spatial mixed model:

\[H^2 = \frac{\sigma_g^2}{\sigma_g^2 + \bar{v}/2}\]

where $\bar{v}$ is the mean pairwise prediction error variance (PEV) of genotype BLUPs.

Alternatively, using variance components only:

\[H^2 \approx \frac{\sigma_g^2}{\sigma_g^2 + \sigma^2 / \bar{r}}\]

where $\bar{r}$ is the harmonic mean number of replications per genotype.


6. Key Statistics

Accuracy of selection

\[r_{g\hat{g}} = \sqrt{1 - \frac{\bar{v}}{2\sigma_g^2}}\]

Higher accuracy → BLUPs are closer to true genetic values.

Coefficient of Variation

\[CV = \frac{\sqrt{\hat{\sigma}^2}}{\hat{\mu}} \times 100\]

Log-likelihood ratio test (LRT) for spatial terms

\[\Lambda = -2(\ell_0 - \ell_1) \sim \chi^2_{df}\]

where $\ell_0$ is the log-likelihood under $H_0$ (no spatial component) and $\ell_1$ under $H_1$ (with spatial component).


7. Full R Analysis

Step 1 — Packages

pkgs <- c("FielDHub", "SpATS", "lme4", "lmerTest",
          "emmeans", "sommer", "ggplot2", "dplyr",
          "tidyr", "tibble", "patchwork", "ggrepel",
          "viridis", "car")
install.packages(setdiff(pkgs, rownames(installed.packages())))

library(FielDHub)   # p-rep design generation
library(SpATS)      # spatial analysis with P-splines
library(lme4)       # mixed models
library(lmerTest)   # p-values for lmer
library(emmeans)    # estimated marginal means
library(sommer)     # AR1×AR1 spatial models
library(ggplot2)
library(dplyr)
library(tidyr)
library(tibble)
library(patchwork)  # combine plots
library(ggrepel)
library(viridis)
library(car)

Step 2 — Generate p-rep Design Layout

# ── p-rep design: 200 genotypes, 25% replicated, 15×18 field ─────────────
set.seed(42)
n_geno   <- 200
p_rep    <- 0.25      # 25% replicated twice
n_rows   <- 15
n_cols   <- 18        # 15 × 18 = 270 = 200 + 50 extra plots

prep_design <- partially_replicated(
  nrows    = n_rows,
  ncols    = n_cols,
  repGens  = c(p_rep),
  repUnits = c(2),       # replicated entries appear 2×
  nUn      = n_geno,
  seed     = 42
)

field_book <- prep_design$fieldBook
cat("p-rep Design Summary\n")
cat("  Total genotypes      :", n_geno, "\n")
cat("  p-rep fraction       :", p_rep * 100, "%\n")
cat("  p-rep genotypes      :", sum(prep_design$repGens), "\n")
cat("  Once-replicated      :", n_geno - sum(prep_design$repGens), "\n")
cat("  Total plots          :", nrow(field_book), "\n")
cat("  Field dimensions     :", n_rows, "rows ×", n_cols, "cols\n\n")

head(field_book, 10)

Output:

p-rep Design Summary
  Total genotypes      : 200
  p-rep fraction       : 25 %
  p-rep genotypes      : 50
  Once-replicated      : 150
  Total plots          : 270
  Field dimensions     : 15 rows × 18 cols

Step 3 — Visualise Field Layout

# ── Classify entry type ────────────────────────────────────────────────────
rep_counts  <- table(field_book$ENTRY)
prep_entries <- names(rep_counts[rep_counts == 2])
once_entries <- names(rep_counts[rep_counts == 1])

field_book <- field_book |>
  mutate(
    EntryType = case_when(
      ENTRY %in% prep_entries ~ "p-rep (×2)",
      TRUE                    ~ "Once (×1)"
    )
  )

# ── Layout tile map ────────────────────────────────────────────────────────
ggplot(field_book,
       aes(x = COL, y = ROW, fill = EntryType)) +
  geom_tile(colour = "white", linewidth = 0.4, alpha = 0.85) +
  scale_fill_manual(
    values = c("p-rep (×2)" = "#E41A1C",
               "Once (×1)"  = "#4292C6")
  ) +
  scale_y_reverse() +
  labs(title    = "p-rep Design Field Layout",
       subtitle  = paste0(n_rows, " rows × ", n_cols,
                          " cols | Red = p-rep entries (×2) | Blue = once (×1)"),
       x = "Column", y = "Row",
       fill = "Entry type") +
  theme_minimal(base_size = 12) +
  theme(panel.grid   = element_blank(),
        legend.position = "bottom")

Step 4 — Simulate Spatially Correlated Phenotypic Data

# ── Simulate with AR1×AR1 spatial trend + genotype effects ────────────────
set.seed(123)

# Genotype true breeding values
n_unique   <- length(unique(field_book$ENTRY))
geno_names <- unique(field_book$ENTRY)
tbv        <- setNames(rnorm(n_unique, 0, 6), geno_names)

# AR1 spatial surface
ar1_surface <- function(nr, nc, rho_r = 0.7, rho_c = 0.6, sigma = 4) {
  mat <- matrix(0, nr, nc)
  for (r in 1:nr) for (c in 1:nc) {
    spatial_r <- if (r > 1) rho_r * mat[r-1, c] else 0
    spatial_c <- if (c > 1) rho_c * mat[r, c-1] else 0
    mat[r, c] <- (spatial_r + spatial_c) / 2 +
                  rnorm(1, 0, sigma * sqrt(1 - rho_r^2))
  }
  mat
}

spatial_mat <- ar1_surface(n_rows, n_cols,
                           rho_r = 0.75, rho_c = 0.65, sigma = 5)

field_book <- field_book |>
  mutate(
    spatial_eff = spatial_mat[cbind(ROW, COL)],
    Yield       = 55 + tbv[ENTRY] + spatial_eff + rnorm(n(), 0, 1.5)
  )

cat("Yield summary:\n")
print(summary(field_book$Yield))

Step 5 — Exploratory Data Analysis

# ── Overall distribution ───────────────────────────────────────────────────
p1 <- ggplot(field_book, aes(x = Yield)) +
  geom_histogram(bins = 30, fill = "#4292C6",
                 colour = "white", alpha = 0.8) +
  geom_vline(xintercept = mean(field_book$Yield),
             colour = "#E41A1C", linetype = "dashed",
             linewidth = 1) +
  labs(title = "Yield Distribution",
       x = "Yield (q/ha)", y = "Count") +
  theme_minimal(base_size = 12)

# ── Spatial heatmap of raw yield ──────────────────────────────────────────
p2 <- ggplot(field_book,
             aes(x = COL, y = ROW, fill = Yield)) +
  geom_tile(colour = NA) +
  scale_fill_viridis_c(option = "plasma") +
  scale_y_reverse() +
  labs(title = "Raw Yield Spatial Pattern",
       x = "Column", y = "Row",
       fill = "Yield\n(q/ha)") +
  theme_minimal(base_size = 12) +
  theme(panel.grid = element_blank())

p1 + p2 + plot_layout(ncol = 2)

# ── Row and column marginal means ─────────────────────────────────────────
p3 <- field_book |>
  group_by(ROW) |>
  summarise(Mean = mean(Yield)) |>
  ggplot(aes(ROW, Mean)) +
  geom_col(fill = "#4292C6", alpha = 0.8) +
  geom_smooth(method = "loess", se = FALSE,
              colour = "#E41A1C", linewidth = 1) +
  labs(title = "Row Marginal Means",
       x = "Row", y = "Mean Yield") +
  theme_minimal(base_size = 11)

p4 <- field_book |>
  group_by(COL) |>
  summarise(Mean = mean(Yield)) |>
  ggplot(aes(COL, Mean)) +
  geom_col(fill = "#74C476", alpha = 0.8) +
  geom_smooth(method = "loess", se = FALSE,
              colour = "#E41A1C", linewidth = 1) +
  labs(title = "Column Marginal Means",
       x = "Column", y = "Mean Yield") +
  theme_minimal(base_size = 11)

p3 + p4 + plot_layout(ncol = 2)

Step 6 — Simple Mixed Model (No Spatial Component)

# ── Baseline mixed model: genotype as random ──────────────────────────────
model_base <- lmer(
  Yield ~ (1 | ENTRY),
  data = field_book,
  REML = TRUE
)

# Variance components
vc_base    <- as.data.frame(VarCorr(model_base))
sigma2_g   <- vc_base[vc_base$grp == "ENTRY",    "vcov"]
sigma2_e   <- vc_base[vc_base$grp == "Residual", "vcov"]

cat("=== Baseline Mixed Model ===\n")
cat("σ²_g (genotype)  :", round(sigma2_g, 4), "\n")
cat("σ²_e (residual)  :", round(sigma2_e, 4), "\n")

# Heritability (approximate)
r_bar <- harmonic.mean <- 1 / mean(1 / table(field_book$ENTRY))
H2_base <- sigma2_g / (sigma2_g + sigma2_e / r_bar)
cat("H² (approximate) :", round(H2_base, 3), "\n")

# BLUPs — basic
blups_base <- ranef(model_base)$ENTRY
blups_base_df <- data.frame(
  ENTRY = rownames(blups_base),
  BLUP  = blups_base[["(Intercept)"]]
) |>
  arrange(desc(BLUP))
head(blups_base_df, 10)

Step 7 — Spatial Model with SpATS (P-Splines)

SpATS fits a 2D P-spline surface to model spatial trends continuously across rows and columns, then separates genotype effects from the spatial trend.

# ── SpATS spatial model ───────────────────────────────────────────────────
model_spats <- SpATS(
  response           = "Yield",
  genotype           = "ENTRY",
  genotype.as.random = TRUE,
  fixed              = NULL,
  spatial            = SAP(ROW, COL),
  data               = as.data.frame(field_book),
  control            = list(
    tolerance  = 1e-06,
    monitoring = 0
  )
)

summary(model_spats)

Output:

Variance components:
              Variance   Ratio
ENTRY         35.841    1.000
Residual       2.276    0.064

Effective dimensions:
  Intercept    ENTRY   f(ROW, COL)
   1.000     152.384      42.617

Heritability: 0.921
# ── Extract variance components ───────────────────────────────────────────
vc_spats   <- model_spats$var.comp
sigma2_g_s <- vc_spats["ENTRY"]
sigma2_e_s <- vc_spats["Residual"]

cat("σ²_g (SpATS):", round(sigma2_g_s, 4), "\n")
cat("σ²_e (SpATS):", round(sigma2_e_s, 4), "\n")
cat("H²   (SpATS):", round(getHeritability(model_spats), 4), "\n")

# ── Model fit ─────────────────────────────────────────────────────────────
cat("AIC (SpATS)  :", round(model_spats$model$aic, 2), "\n")

Step 8 — Spatial Diagnostics and Trend Plots

# ── Plot spatial trend surface ────────────────────────────────────────────
plot(model_spats,
     main   = "SpATS Spatial Trend Surface",
     col    = viridis(100),
     spat.int.plot = TRUE)

# ── Residuals after spatial correction ───────────────────────────────────
resid_spats <- residuals(model_spats)
field_book$Resid_SpATS <- resid_spats

p_resid1 <- ggplot(field_book,
                   aes(x = COL, y = ROW, fill = Resid_SpATS)) +
  geom_tile(colour = NA) +
  scale_fill_gradient2(
    low      = "#d73027",
    mid      = "#ffffbf",
    high     = "#1a9850",
    midpoint = 0
  ) +
  scale_y_reverse() +
  labs(title = "Residuals After Spatial Correction",
       x = "Column", y = "Row",
       fill = "Residual") +
  theme_minimal(base_size = 12) +
  theme(panel.grid = element_blank())

p_resid2 <- ggplot(data.frame(resid = resid_spats),
                   aes(sample = resid)) +
  stat_qq(colour = "#2C7BB6", size = 2) +
  stat_qq_line(colour = "#E41A1C", linewidth = 1) +
  labs(title = "Q-Q Plot of Spatial Residuals",
       x = "Theoretical Quantiles",
       y = "Sample Quantiles") +
  theme_minimal(base_size = 12)

p_resid1 + p_resid2 + plot_layout(ncol = 2)

# ── Normality test ────────────────────────────────────────────────────────
shapiro.test(sample(resid_spats, min(50, length(resid_spats))))

# ── Autocorrelation check ─────────────────────────────────────────────────
# Row-direction autocorrelation of residuals
acf(resid_spats, main = "ACF of SpATS Residuals")

Step 9 — AR1 × AR1 Model with sommer

# ── AR1×AR1 via sommer ────────────────────────────────────────────────────
field_book$ROW_f <- factor(field_book$ROW)
field_book$COL_f <- factor(field_book$COL)

# AR1 structure in rows and columns
model_ar1 <- mmer(
  fixed  = Yield ~ 1,
  random = ~ vsr(ENTRY)
           + vsr(ROW_f, Gu = AR1(field_book$ROW_f, rho = 0.7))
           + vsr(COL_f, Gu = AR1(field_book$COL_f, rho = 0.6)),
  data   = field_book,
  verbose = FALSE
)

# Variance components
summary(model_ar1)$varcomp

# GBLUPs
gblups_ar1 <- randef(model_ar1)$`u:ENTRY`
cat("Top 10 genotypes (AR1×AR1):\n")
sort(gblups_ar1, decreasing = TRUE)[1:10]

# Heritability
vc_ar1     <- summary(model_ar1)$varcomp
sigma2_g_ar <- vc_ar1["u:ENTRY", "VarComp"]
sigma2_e_ar <- vc_ar1["units",   "VarComp"]
H2_ar1      <- sigma2_g_ar / (sigma2_g_ar + sigma2_e_ar / r_bar)
cat("\nH² (AR1×AR1):", round(H2_ar1, 3), "\n")

Step 10 — Extract and Rank BLUPs

# ── SpATS BLUPs ───────────────────────────────────────────────────────────
spats_preds <- predict(model_spats, which = "ENTRY")

blup_df <- data.frame(
  ENTRY  = spats_preds$ENTRY,
  BLUP   = spats_preds$predicted.values,
  SE     = spats_preds$standard.errors,
  PEV    = spats_preds$standard.errors^2,
  Acc    = sqrt(1 - spats_preds$standard.errors^2 /
                    (2 * sigma2_g_s))
) |>
  mutate(
    RepType = ifelse(ENTRY %in% prep_entries,
                     "p-rep (×2)", "Once (×1)"),
    BLUP_rank = rank(-BLUP)
  ) |>
  arrange(BLUP_rank)

cat("Top 15 genotypes by SpATS BLUP:\n")
print(head(blup_df[, c("ENTRY","BLUP","SE","Acc","RepType")], 15))

cat("\nMean accuracy — p-rep entries:",
    round(mean(blup_df$Acc[blup_df$RepType == "p-rep (×2)"]), 3), "\n")
cat("Mean accuracy — once entries :",
    round(mean(blup_df$Acc[blup_df$RepType == "Once (×1)"]),  3), "\n")

Step 11 — Model Comparison (LRT)

# ── Compare spatial vs non-spatial model ─────────────────────────────────
# Refit both with ML for LRT
model_null <- lmer(
  Yield ~ (1 | ENTRY),
  data  = field_book,
  REML  = FALSE
)

# Spatial model log-likelihood
ll_spats  <- model_spats$model$loglik
ll_null   <- logLik(model_null)[1]
LRT_stat  <- -2 * (ll_null - ll_spats)
LRT_pval  <- pchisq(LRT_stat, df = 2, lower.tail = FALSE)

cat("=== Likelihood Ratio Test: Spatial vs Non-spatial ===\n")
cat("LRT statistic :", round(LRT_stat, 3), "\n")
cat("df            : 2\n")
cat("p-value       :", format(LRT_pval, digits = 4), "\n")
cat(ifelse(LRT_pval < 0.05,
  "→ Spatial model significantly better — use SpATS.\n",
  "→ Spatial correction not needed — simple mixed model sufficient.\n"))

Step 12 — Selection of Superior Genotypes

# ── Selection intensity: top 10 % ────────────────────────────────────────
sel_frac   <- 0.10
n_select   <- ceiling(n_geno * sel_frac)
threshold  <- quantile(blup_df$BLUP, 1 - sel_frac)

selected   <- blup_df |>
  filter(BLUP >= threshold) |>
  arrange(desc(BLUP))

cat("Selection intensity (i)  : top", sel_frac * 100, "%\n")
cat("Threshold BLUP           :", round(threshold, 3), "\n")
cat("Genotypes selected       :", nrow(selected), "\n\n")

# ── Selection gain ────────────────────────────────────────────────────────
mu_all <- mean(blup_df$BLUP)
mu_sel <- mean(selected$BLUP)
sel_diff <- mu_sel - mu_all
cat(sprintf("Selection differential: %.3f q/ha\n", sel_diff))
cat(sprintf("Mean accuracy of selected: %.3f\n",
            mean(selected$Acc)))

# ── Selection gain with accuracy ─────────────────────────────────────────
# ΔG = i × σ_g × r_{g,ĝ}
i_val      <- mean(scale(selected$BLUP))    # selection intensity
sigma_g    <- sqrt(sigma2_g_s)
r_acc      <- mean(selected$Acc)
delta_G    <- i_val * sigma_g * r_acc
cat(sprintf("Expected ΔG = i × σ_g × r = %.3f × %.3f × %.3f = %.3f q/ha\n",
            i_val, sigma_g, r_acc, delta_G))

# ── Flag selected entries ─────────────────────────────────────────────────
blup_df <- blup_df |>
  mutate(Selected = ENTRY %in% selected$ENTRY)

Step 13 — Publication-Quality Plots

BLUP ranking plot

top_n_plot <- 40
plot_data  <- blup_df |>
  slice_head(n = top_n_plot)

ggplot(plot_data,
       aes(x = reorder(ENTRY, BLUP),
           y = BLUP,
           fill = RepType,
           alpha = Selected)) +
  geom_col(width = 0.75, colour = "grey20",
           linewidth = 0.3) +
  geom_errorbar(aes(ymin = BLUP - 1.96 * SE,
                    ymax = BLUP + 1.96 * SE),
                width = 0.3, linewidth = 0.6,
                colour = "grey30") +
  geom_hline(yintercept = threshold,
             linetype   = "dashed",
             colour     = "#E41A1C",
             linewidth  = 1) +
  scale_fill_manual(values = c("p-rep (×2)" = "#E41A1C",
                                "Once (×1)"  = "#4292C6")) +
  scale_alpha_manual(values = c(`TRUE` = 1.0, `FALSE` = 0.55)) +
  coord_flip() +
  labs(title    = paste0("Top ", top_n_plot,
                         " Genotypes — SpATS BLUPs"),
       subtitle = "Error bars = 95% CI | Dashed = 10% selection threshold",
       x        = NULL,
       y        = "BLUP (q/ha)",
       fill     = "Entry type",
       alpha    = "Selected") +
  theme_minimal(base_size = 11) +
  theme(plot.title = element_text(face = "bold"),
        legend.position = "bottom")

Accuracy distribution

ggplot(blup_df,
       aes(x = Acc, fill = RepType)) +
  geom_histogram(bins = 25, alpha = 0.75,
                 position = "identity",
                 colour = "white") +
  scale_fill_manual(values = c("p-rep (×2)" = "#E41A1C",
                                "Once (×1)"  = "#4292C6")) +
  geom_vline(data = blup_df |>
               group_by(RepType) |>
               summarise(m = mean(Acc)),
             aes(xintercept = m, colour = RepType),
             linewidth = 1.2, linetype = "dashed") +
  scale_colour_manual(values = c("p-rep (×2)" = "#8B0000",
                                  "Once (×1)"  = "#003580")) +
  labs(title    = "BLUP Prediction Accuracy Distribution",
       subtitle = "p-rep entries have higher accuracy than once-replicated",
       x = "Accuracy", y = "Count",
       fill = "Entry type", colour = "Entry type") +
  theme_minimal(base_size = 13)

BLUP vs raw mean comparison

raw_means <- field_book |>
  group_by(ENTRY) |>
  summarise(Raw_Mean = mean(Yield)) |>
  left_join(select(blup_df, ENTRY, BLUP, RepType),
            by = "ENTRY")

ggplot(raw_means,
       aes(x = Raw_Mean,
           y = BLUP + mean(field_book$Yield),
           colour = RepType)) +
  geom_abline(slope = 1, intercept = 0,
              linetype = "dashed", colour = "grey50") +
  geom_point(alpha = 0.7, size = 2.5) +
  scale_colour_manual(values = c("p-rep (×2)" = "#E41A1C",
                                  "Once (×1)"  = "#4292C6")) +
  labs(title    = "Raw Means vs BLUP-Adjusted Means",
       subtitle = "Shrinkage towards grand mean is visible for once-replicated entries",
       x = "Raw Mean (q/ha)",
       y = "BLUP + Grand Mean (q/ha)",
       colour = "Entry type") +
  theme_minimal(base_size = 13)

Step 14 — Replicate Fraction Optimisation

How does the p-rep fraction affect heritability and accuracy?

# ── Simulate different p-rep fractions ────────────────────────────────────
p_fractions <- seq(0.10, 0.50, by = 0.05)

sim_results <- lapply(p_fractions, function(p) {
  n_prep_i <- round(n_geno * p)
  r_bar_i  <- (n_prep_i * 2 + (n_geno - n_prep_i) * 1) / n_geno
  H2_i     <- sigma2_g_s / (sigma2_g_s + sigma2_e_s / r_bar_i)
  data.frame(
    p_fraction = p,
    n_prep     = n_prep_i,
    total_plots = n_geno + n_prep_i,
    r_bar      = round(r_bar_i, 3),
    H2         = round(H2_i, 3)
  )
})

opt_df <- bind_rows(sim_results)
print(opt_df)

ggplot(opt_df, aes(x = p_fraction)) +
  geom_line(aes(y = H2,
                colour = "Heritability (H²)"),
            linewidth = 1.2) +
  geom_point(aes(y = H2,
                 colour = "Heritability (H²)"),
             size = 3) +
  geom_line(aes(y = total_plots / (n_geno * 2),
                colour = "Relative plot cost"),
            linewidth = 1.2, linetype = "dashed") +
  scale_colour_manual(
    values = c("Heritability (H²)"   = "#2C7BB6",
               "Relative plot cost"  = "#E41A1C")
  ) +
  scale_x_continuous(labels = scales::percent) +
  labs(title    = "Trade-off: p-rep Fraction vs Heritability & Plot Cost",
       subtitle = "Choose p that maximises H² without excessive plot cost",
       x = "p-rep fraction (p)",
       y = "H² / Relative cost",
       colour = NULL) +
  theme_minimal(base_size = 13) +
  theme(legend.position = "bottom")

Step 15 — Power Analysis

library(pwr)

# Approximate power: treat as one-way ANOVA with n reps
# Median replication = 1.25 × for 25% p-rep
r_effective <- (n_geno + round(n_geno * p_rep)) / n_geno

# Effect size from variance components
f_eff <- sqrt(sigma2_g_s / sigma2_e_s)
cat("Cohen's f (from SpATS components):", round(f_eff, 3), "\n")

pwr.anova.test(k         = n_geno,
               n         = round(r_effective),
               f         = f_eff,
               sig.level = 0.05)

# Power by p-rep fraction
pw_vec <- sapply(p_fractions, function(p) {
  r_eff <- 1 + p
  pwr.anova.test(k = n_geno, n = round(r_eff),
                 f = f_eff, sig.level = 0.05)$power
})

ggplot(data.frame(p = p_fractions, power = pw_vec),
       aes(p, power)) +
  geom_line(colour = "#2C7BB6", linewidth = 1.2) +
  geom_point(size = 3, colour = "#2C7BB6") +
  geom_hline(yintercept = 0.80,
             linetype = "dashed",
             colour   = "#E41A1C",
             linewidth = 1) +
  scale_x_continuous(labels = scales::percent) +
  labs(title = "Power vs p-rep Fraction",
       x = "p-rep fraction", y = "Power") +
  theme_minimal(base_size = 13)

8. Non-Spatial Fallback

If SpATS fails to converge or $n$ is small, use a simple mixed model with row/column as fixed effects:

# ── Row + column fixed, genotype random ───────────────────────────────────
model_rowcol <- lmer(
  Yield ~ factor(ROW) + factor(COL) + (1 | ENTRY),
  data = field_book,
  REML = TRUE
)

# Variance components
print(VarCorr(model_rowcol))

# BLUPs
rc_blups <- ranef(model_rowcol)$ENTRY
rc_blup_df <- data.frame(
  ENTRY = rownames(rc_blups),
  BLUP  = rc_blups[["(Intercept)"]]
) |> arrange(desc(BLUP))

head(rc_blup_df, 10)

9. Complete Analysis Workflow

1. Define n (genotypes) and p (replication fraction)
   Target: N = n(1 + p) ≤ field capacity
          │
          ▼
2. Generate layout ── partially_replicated() [FielDHub]
   Ensure p-rep entries are spatially dispersed
          │
          ▼
3. Visualise layout ── tile map (p-rep vs once)
          │
          ▼
4. Collect field data
          │
          ▼
5. EDA ── distribution, spatial heatmap,
          row/column marginal means
          │
          ▼
6. Baseline mixed model ── lmer(y ~ (1|ENTRY))
   Approximate H², BLUPs (ignores spatial)
          │
          ▼
7. Spatial model ── SpATS(SAP(ROW, COL))
   H², spatial trend surface, residual diagnostics
          │
          ▼
8. AR1×AR1 model ── sommer::mmer() [optional]
   Compare with SpATS using AIC/LRT
          │
          ▼
9. LRT ── spatial vs non-spatial
   Keep spatial if p < 0.05
          │
          ▼
10. Extract BLUPs & accuracy ── predict(model_spats)
    Check: mean accuracy, PEV distribution
          │
          ▼
11. Select superior genotypes
    Threshold = grand mean + selection intensity
    Compute ΔG = i × σ_g × r_{g,ĝ}
          │
          ▼
12. Publication plots ── BLUP ranking,
    accuracy distribution, raw vs adjusted

Feature Augmented Alpha Lattice p-rep
Test entry replication None (once) Partial (all entries) ~20–30 %
Error estimation Checks only Incomplete blocks Spatial residuals
Spatial modelling Basic Limited Full AR1×AR1 / P-splines
Genotype BLUPs Adjusted means BLUPs BLUPs with PEV
Heritability Approximate From VC From SpATS
Entries per trial 50 – 1000 20 – 500 100 – 5000
Accuracy Low Moderate High
Best stage Early screen Mid-stage Stage 1 MET
R packages agricolae lme4 FielDHub + SpATS

11. Summary Table

Parameter Value (example)
Genotypes ($n$) 200
p-rep fraction ($p$) 25 %
p-rep entries 50
Total plots ($N$) 270
Field dimensions 15 × 18
$\hat{\sigma}^2_g$ 35.84
$\hat{\sigma}^2_e$ 2.28
H² (SpATS) 0.921
Mean accuracy (p-rep) 0.961
Mean accuracy (once) 0.943
CV (%) 2.7

12. References

  • Cullis, B. R., Smith, A. B., & Coombes, N. E. (2006). On the design of early generation variety trials with correlated data. Journal of Agricultural, Biological and Environmental Statistics, 11(4), 381–393.
  • Rodríguez-Álvarez, M. X., Boer, M. P., van Eeuwijk, F. A., & Eilers, P. H. C. (2018). Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spatial Statistics, 23, 52–71.
  • Smith, A. B., Cullis, B. R., & Thompson, R. (2005). The analysis of crop cultivar breeding and evaluation trials: an overview of current mixed model approaches. Journal of Agricultural Science, 143(6), 449–462.
  • Mramba, L., Wolfe, M., & Kramer, M. (2019). FielDHub: A Shiny App for Design of Experiments in Life Sciences. CRAN.
  • Covarrubias-Pazaran, G. (2016). Genome-assisted prediction of quantitative traits using the R package sommer. PLOS ONE, 11(6).
  • R Core Team (2026). R: A Language and Environment for Statistical Computing.

Leave a comment