Completely Randomized Design (CRD): Theory & Complete R Analysis

11 minute read

Updated: May 02, 2024

The Completely Randomized Design (CRD) is the simplest experimental design. Treatments are assigned to experimental units purely at random, with no restrictions. It is the starting point for understanding all other designs (RCBD, Latin Square, Alpha-lattice) and remains widely used in controlled laboratory and greenhouse experiments. —

1. When to Use CRD

CRD is appropriate when:

Experimental units are homogeneous (uniform pots, petri dishes, growth chambers)
Environmental variation is absent or negligible
The experiment is conducted indoors or in a controlled facility
Units can be individually randomized without constraint

Do not use CRD in open field experiments where soil fertility gradients, slope, or drainage differences exist — use RCBD or more advanced designs instead.

2. Design Structure

$k$ treatments
$n_i$ replications per treatment (can be unequal)
Total observations $N = \sum_{i=1}^{k} n_i$
Each unit receives exactly one treatment, assigned completely at random

Balanced CRD: $n_1 = n_2 = \cdots = n_k = r$, so $N = kr$

3. Linear Model

\[y_{ij} = \mu + \tau_i + \varepsilon_{ij}\]

Symbol	Meaning
$y_{ij}$	Observation $j$ under treatment $i$
$\mu$	Overall (grand) mean
$\tau_i$	Effect of treatment $i$; $\sum_{i=1}^{k} \tau_i = 0$
$\varepsilon_{ij}$	Random error; $\varepsilon_{ij} \overset{\text{iid}}{\sim} \mathcal{N}(0, \sigma^2)$

4. Hypotheses

\[H_0: \mu_1 = \mu_2 = \cdots = \mu_k \quad \text{(all treatment means are equal)}\] \[H_1: \mu_i \neq \mu_j \quad \text{for at least one pair } i \neq j\]

Equivalently in terms of treatment effects:

\[H_0: \tau_1 = \tau_2 = \cdots = \tau_k = 0\] \[H_1: \tau_i \neq 0 \quad \text{for at least one } i\]

5. Partitioning of Variation

\[SS_{\text{Total}} = SS_{\text{Treatment}} + SS_{\text{Error}}\] \[\sum_{i=1}^{k}\sum_{j=1}^{n_i}(y_{ij} - \bar{y})^2 = \sum_{i=1}^{k} n_i(\bar{y}_i - \bar{y})^2 + \sum_{i=1}^{k}\sum_{j=1}^{n_i}(y_{ij} - \bar{y}_i)^2\]

ANOVA Table

Source	SS	df	MS	F
Treatment	$SS_T$	$k - 1$	$MS_T = SS_T/(k-1)$	$MS_T / MS_E$
Error	$SS_E$	$N - k$	$MS_E = SS_E/(N-k)$	—
Total	$SS_{Tot}$	$N - 1$	—	—

F Statistic

\[F = \frac{MS_{\text{Treatment}}}{MS_{\text{Error}}} \sim F_{(k-1,\; N-k)} \quad \text{under } H_0\]

Expected mean squares:

\[E[MS_E] = \sigma^2\] \[E[MS_T] = \sigma^2 + \frac{r \sum_{i=1}^{k} \tau_i^2}{k - 1}\]

When $H_0$ is true, $E[MS_T] = \sigma^2$, so $F \approx 1$.

6. Coefficient of Variation

The CV expresses experimental error as a percentage of the grand mean — a key indicator of experiment quality:

\[CV = \frac{\sqrt{MS_E}}{\bar{y}} \times 100\]

CV range	Experiment quality
< 10 %	Excellent
10 – 20 %	Good
20 – 30 %	Acceptable
> 30 %	Poor — check design or data

7. Full R Analysis

Step 1 — Install and Load Packages

# Install if needed
pkgs <- c("agricolae", "ggplot2", "dplyr", "emmeans",
          "car", "multcomp", "effectsize", "pwr")
install.packages(setdiff(pkgs, rownames(installed.packages())))

library(agricolae)   # LSD, Duncan, experimental designs
library(ggplot2)     # graphics
library(dplyr)       # data wrangling
library(emmeans)     # estimated marginal means
library(car)         # Levene's test
library(multcomp)    # Tukey contrasts
library(effectsize)  # eta-squared
library(pwr)         # power analysis

Step 2 — Create CRD Layout

# ── Design layout: 5 treatments, 6 replications ───────────────────────────
set.seed(42)
treatments <- paste0("T", 1:5)
r          <- 6

crd_design <- design.crd(
  trt  = treatments,
  r    = r,
  seed = 42
)

# Randomized field book
field_book <- crd_design$book
print(field_book)

cat("\nDesign parameters:\n")
cat("  Treatments :", length(treatments), "\n")
cat("  Replications:", r, "\n")
cat("  Total units :", nrow(field_book), "\n")

Output:

   plots r trt
1    101 1  T3
2    102 1  T1
3    103 1  T5
4    104 1  T2
...
30   130 6  T4

Design parameters:
  Treatments : 5
  Replications: 6
  Total units : 30

Step 3 — Simulate / Attach Data

# ── Simulated grain yield (q/ha) ──────────────────────────────────────────
# True treatment means: T1=42, T2=48, T3=55, T4=52, T5=45
set.seed(7)
true_means <- c(T1 = 42, T2 = 48, T3 = 55, T4 = 52, T5 = 45)

field_book$Yield <- unlist(lapply(field_book$trt, function(t) {
  rnorm(1, mean = true_means[t], sd = 4)
}))

head(field_book, 10)

Step 4 — Exploratory Data Analysis

# ── Summary statistics per treatment ─────────────────────────────────────
crd_summary <- field_book |>
  group_by(trt) |>
  summarise(
    n    = n(),
    Mean = round(mean(Yield), 3),
    SD   = round(sd(Yield),   3),
    SE   = round(sd(Yield) / sqrt(n()), 3),
    Min  = round(min(Yield), 2),
    Max  = round(max(Yield), 2),
    CV   = round(sd(Yield) / mean(Yield) * 100, 1)
  )
print(crd_summary)

# A tibble: 5 × 8
  trt       n  Mean    SD    SE   Min   Max    CV
  <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 T1        6  41.8  3.72  1.52  37.2  47.1   8.9
2 T2        6  48.3  4.11  1.68  42.9  54.1   8.5
3 T3        6  55.1  3.89  1.59  50.3  60.4   7.1
4 T4        6  51.8  4.05  1.65  46.1  57.3   7.8
5 T5        6  44.9  3.63  1.48  40.2  49.8   8.1

# ── Boxplot ────────────────────────────────────────────────────────────────
ggplot(field_book, aes(x = trt, y = Yield, fill = trt)) +
  geom_boxplot(alpha = 0.7, width = 0.5, outlier.shape = 19) +
  geom_jitter(width = 0.12, size = 2.5, alpha = 0.7) +
  stat_summary(fun = mean, geom = "point",
               shape = 23, size = 4, fill = "white") +
  scale_fill_brewer(palette = "Set2") +
  labs(title    = "CRD: Grain Yield by Treatment",
       subtitle = "Diamond = group mean | Points = individual observations",
       x = "Treatment", y = "Yield (q/ha)") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none")

Step 5 — ANOVA

# ── Fit CRD model ─────────────────────────────────────────────────────────
model_crd <- aov(Yield ~ trt, data = field_book)
anova_table <- summary(model_crd)
print(anova_table)

Output:

            Df Sum Sq Mean Sq F value   Pr(>F)
trt          4  701.4  175.34   11.52 1.73e-05 ***
Residuals   25  380.3   15.21
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05

# ── Grand mean and CV ─────────────────────────────────────────────────────
grand_mean <- mean(field_book$Yield)
MS_error   <- anova_table[[1]]["Residuals", "Mean Sq"]
CV         <- sqrt(MS_error) / grand_mean * 100

cat("Grand mean :", round(grand_mean, 3), "q/ha\n")
cat("MS Error   :", round(MS_error,   3), "\n")
cat("CV         :", round(CV, 2), "%\n")

Grand mean : 48.382 q/ha
MS Error   : 15.213
CV         : 8.08 %

Interpretation: $F_{(4,25)} = 11.52,\ p < 0.001$. We reject $H_0$. CV = 8.08 % indicates excellent experimental precision.

Step 6 — Assumptions Check

# ── 1. Residual plots ─────────────────────────────────────────────────────
par(mfrow = c(2, 2))
plot(model_crd, which = 1:4)
par(mfrow = c(1, 1))

# ── 2. Normality of residuals ─────────────────────────────────────────────
resids <- residuals(model_crd)
shapiro.test(resids)

# Q-Q plot (ggplot)
ggplot(data.frame(resid = resids), aes(sample = resid)) +
  stat_qq(colour = "#377EB8", size = 2) +
  stat_qq_line(colour = "#E41A1C", linewidth = 1) +
  labs(title = "Normal Q-Q Plot of Residuals",
       x = "Theoretical Quantiles",
       y = "Sample Quantiles") +
  theme_minimal(base_size = 13)

# ── 3. Homogeneity of variance ────────────────────────────────────────────
leveneTest(Yield ~ trt, data = field_book)
bartlett.test(Yield ~ trt, data = field_book)

Output:

Shapiro-Wilk: W = 0.981, p = 0.847   → normality OK
Levene's test: F = 0.142, p = 0.966  → equal variances OK

Step 7 — Post-Hoc Multiple Comparisons

7a. LSD Test (Least Significant Difference)

\[LSD = t_{\alpha/2,\, df_E} \times \sqrt{MS_E \left(\frac{1}{n_i} + \frac{1}{n_j}\right)}\]

lsd_result <- LSD.test(model_crd, "trt",
                        p.adj    = "bonferroni",
                        console  = TRUE)

Output:

Treatment  Yield  groups
T3         55.1   a
T4         51.8   ab
T2         48.3   bc
T5         44.9   cd
T1         41.8   d

7b. Tukey HSD

\[HSD = q_{\alpha,\, k,\, df_E} \times \sqrt{\frac{MS_E}{r}}\]

tukey_result <- TukeyHSD(model_crd, "trt")
print(tukey_result)

# Compact letter display
library(multcompView)
tukey_p   <- tukey_result$trt[, "p adj"]
tukey_cld <- multcompLetters(tukey_p)
print(tukey_cld$Letters)

# Plot Tukey intervals
plot(tukey_result, las = 1, col = "#2C7BB6",
     main = "Tukey HSD 95% Confidence Intervals")

7c. Duncan’s Multiple Range Test

duncan_result <- duncan.test(model_crd, "trt", console = TRUE)

7d. Dunnett’s Test (vs Control)

# Compare all treatments against T1 (control)
library(multcomp)
dunnett <- glht(model_crd,
                linfct = mcp(trt = "Dunnett"),
                base = 1)   # T1 as reference
summary(dunnett)
confint(dunnett)

7e. Estimated Marginal Means (emmeans)

emm <- emmeans(model_crd, ~ trt)
print(emm)

# Pairwise contrasts
pairs(emm, adjust = "tukey")

# Compact letter display
library(multcomp)
cld(emm, Letters = letters, adjust = "tukey")

Step 8 — Effect Size

# Eta-squared (η²) — proportion of variance explained by treatment
eta_squared(model_crd, partial = FALSE)

# Omega-squared (ω²) — less biased estimate
omega_squared(model_crd, partial = FALSE)

# Manual η²
SS_trt   <- anova_table[[1]]["trt",       "Sum Sq"]
SS_total <- sum(anova_table[[1]][, "Sum Sq"])
eta2_manual <- SS_trt / SS_total
cat("η² (manual):", round(eta2_manual, 4), "\n")

Output:

η²     = 0.648   → large effect (Cohen: > 0.14 is large)
ω²     = 0.596

Step 9 — Publication-Quality Summary Plot

# ── Mean ± SE bar chart with significance letters ─────────────────────────
letters_df <- data.frame(
  trt    = names(lsd_result$groups$groups),
  letter = lsd_result$groups$groups
)

plot_df <- crd_summary |>
  left_join(letters_df, by = "trt")

ggplot(plot_df, aes(x = reorder(trt, -Mean), y = Mean, fill = trt)) +
  geom_col(alpha = 0.85, width = 0.6, colour = "grey30") +
  geom_errorbar(aes(ymin = Mean - SE, ymax = Mean + SE),
                width = 0.2, linewidth = 0.8) +
  geom_text(aes(y = Mean + SE + 1.5, label = letter),
            size = 5, fontface = "bold") +
  scale_fill_brewer(palette = "Set2") +
  labs(title    = "Mean Grain Yield by Treatment (CRD)",
       subtitle = "Error bars = ±1 SE | Letters = LSD grouping (Bonferroni, α = 0.05)",
       x = "Treatment", y = "Yield (q/ha)") +
  theme_minimal(base_size = 13) +
  theme(legend.position = "none")

Step 10 — Power Analysis

# ── Post-hoc power ────────────────────────────────────────────────────────
library(pwr)

# Effect size f from η²
eta2 <- eta2_manual
f_effect <- sqrt(eta2 / (1 - eta2))
cat("Cohen's f:", round(f_effect, 3), "\n")

# Power of current design
pwr.anova.test(k   = 5,
               n   = 6,
               f   = f_effect,
               sig.level = 0.05)

# ── Sample size for 80% power ─────────────────────────────────────────────
# Assuming medium effect (f = 0.25)
pwr.anova.test(k         = 5,
               f         = 0.25,
               sig.level = 0.05,
               power     = 0.80)

# ── Power curve ───────────────────────────────────────────────────────────
n_seq   <- 3:20
pwr_seq <- sapply(n_seq, function(n) {
  pwr.anova.test(k = 5, n = n, f = 0.25, sig.level = 0.05)$power
})

ggplot(data.frame(n = n_seq, power = pwr_seq), aes(n, power)) +
  geom_line(colour = "#2C7BB6", linewidth = 1.2) +
  geom_point(size = 2.5, colour = "#2C7BB6") +
  geom_hline(yintercept = 0.80, linetype = "dashed",
             colour = "#E41A1C", linewidth = 1) +
  annotate("text", x = 18, y = 0.82, label = "80% power",
           colour = "#E41A1C", size = 4) +
  labs(title    = "Power Curve — CRD with 5 Treatments (f = 0.25)",
       x = "Replications per Treatment",
       y = "Power") +
  theme_minimal(base_size = 13)

8. Unbalanced CRD

When replication numbers differ across treatments, SS formulas adjust automatically in R. The Type III SS (from car::Anova()) is preferred.

# ── Unbalanced CRD ────────────────────────────────────────────────────────
set.seed(99)
unbal_data <- data.frame(
  trt   = rep(paste0("T", 1:4), times = c(4, 6, 5, 7)),
  Yield = c(rnorm(4, 40, 3), rnorm(6, 47, 3),
            rnorm(5, 53, 3), rnorm(7, 50, 3))
)

# Type III ANOVA for unbalanced data
model_unbal <- lm(Yield ~ trt, data = unbal_data)
car::Anova(model_unbal, type = "III")

# Adjusted means (LSMeans)
emmeans(model_unbal, ~ trt)

9. Non-Parametric Alternative — Kruskal-Wallis

When normality or homoscedasticity assumptions fail:

# ── Kruskal-Wallis test ───────────────────────────────────────────────────
kw_result <- kruskal.test(Yield ~ trt, data = field_book)
print(kw_result)

# ── Post-hoc: Dunn's test ─────────────────────────────────────────────────
library(FSA)
dunnTest(Yield ~ trt, data = field_book, method = "bonferroni")

10. CRD vs RCBD — When Does Blocking Help?

# ── Relative Efficiency: how much does blocking gain? ─────────────────────
# RE > 100% → RCBD more efficient than CRD
# RE < 100% → blocking was unnecessary

# Simulate data with a block effect
set.seed(55)
rcbd_data <- expand.grid(
  trt   = paste0("T", 1:5),
  block = paste0("B", 1:6)
) |>
  mutate(
    Yield = 45
      + c(0, 3, 8, 6, 2)[as.integer(factor(trt))]
      + c(0, 2, -2, 4, -1, 3)[as.integer(factor(block))]
      + rnorm(30, 0, 2)
  )

model_rcbd <- aov(Yield ~ trt + block, data = rcbd_data)
aov_rcbd   <- summary(model_rcbd)[[1]]

MS_block <- aov_rcbd["block",     "Mean Sq"]
MS_err   <- aov_rcbd["Residuals", "Mean Sq"]
df_block <- aov_rcbd["block",     "Df"]
df_err   <- aov_rcbd["Residuals", "Df"]

RE <- (df_block * MS_block + df_err * MS_err) /
      ((df_block + df_err) * MS_err) * 100

cat(sprintf("Relative Efficiency of RCBD over CRD: %.1f%%\n", RE))
cat(ifelse(RE > 100,
           "→ Blocking was beneficial; RCBD preferred over CRD.",
           "→ Blocking was not beneficial; CRD would have been sufficient."))

11. Summary

Feature	CRD
Blocking	None
Randomization	Complete, unrestricted
Error df	$N - k$
Best suited for	Homogeneous units (lab, greenhouse)
Key assumption	Uniform experimental material
Main R function	`aov(y ~ trt)`
Post-hoc tests	LSD, Tukey, Duncan, Dunnett, emmeans
Non-parametric alt	`kruskal.test()` + `dunnTest()`

12. References

Cochran, W. G., & Cox, G. M. (1957). Experimental Designs (2nd ed.). Wiley.
Montgomery, D. C. (2017). Design and Analysis of Experiments (9th ed.). Wiley.
de Mendiburu, F. (2023). agricolae: Statistical Procedures for Agricultural Research. CRAN.
Lenth, R. V. (2024). emmeans: Estimated Marginal Means. CRAN.
R Core Team (2026). R: A Language and Environment for Statistical Computing.

Share on

Twitter Facebook LinkedIn

Dr. M. Shamshad

Completely Randomized Design (CRD): Theory & Complete R Analysis

1. When to Use CRD

2. Design Structure

3. Linear Model

4. Hypotheses

5. Partitioning of Variation

ANOVA Table

F Statistic

6. Coefficient of Variation

7. Full R Analysis

Step 1 — Install and Load Packages

Step 2 — Create CRD Layout

Step 3 — Simulate / Attach Data

Step 4 — Exploratory Data Analysis

Step 5 — ANOVA

Step 6 — Assumptions Check

Step 7 — Post-Hoc Multiple Comparisons

7a. LSD Test (Least Significant Difference)

7b. Tukey HSD

7c. Duncan’s Multiple Range Test

7d. Dunnett’s Test (vs Control)

7e. Estimated Marginal Means (emmeans)

Step 8 — Effect Size

Step 9 — Publication-Quality Summary Plot

Step 10 — Power Analysis

8. Unbalanced CRD

9. Non-Parametric Alternative — Kruskal-Wallis

10. CRD vs RCBD — When Does Blocking Help?

11. Summary

12. References

Share on

Leave a comment

You may also enjoy

Strip Plot Design

Split Plot Design

Factorial Experimental Design

Honeycomb Design Analysis in R