Statistical Summary in R: A Complete Guide

Prerequisites: Basic R knowledge, R (≥ 4.0) installed, optionally RStudio.
Packages used: dplyr, skimr, psych, ggplot2

Why Statistical Summaries Matter

Before building models or drawing conclusions, you need to understand your data. Statistical summaries answer three core questions:

Question	Statistic
Where is the data centered?	Mean, Median, Mode
How spread out is it?	SD, Variance, IQR, Range
What shape does it have?	Skewness, Kurtosis, Quantiles

The Built-in `summary()` Function

R ships with a powerful one-liner for a quick overview:

# Using the built-in mtcars dataset
data(mtcars)

summary(mtcars)

Sample output:

      mpg             cyl             disp      
 Min.   :10.40   Min.   :4.000   Min.   : 71.1  
 1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8  
 Median :19.20   Median :6.000   Median :196.3  
 Mean   :20.09   Mean   :6.188   Mean   :230.7  
 3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0  
 Max.   :33.90   Max.   :8.000   Max.   :472.0  

For a single vector, summary() returns the five-number summary plus the mean:

summary(mtcars$mpg)
#  Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# 10.40   15.43   19.20   20.09   22.80   33.90

What each value means

Min / Max — the smallest and largest observed values
1st Qu. / 3rd Qu. — the 25th and 75th percentiles (the middle 50 % of data lies between these)
Median — the middle value; robust to outliers
Mean — the arithmetic average; sensitive to outliers

Measures of Central Tendency

x <- mtcars$mpg

# Mean
mean(x)          # 20.09

# Median
median(x)        # 19.2

# Mode (no built-in; write a helper)
mode_val <- function(v) {
  tbl <- table(v)
  as.numeric(names(tbl)[tbl == max(tbl)])
}
mode_val(mtcars$cyl)   # 8

Tip: When mean > median, the distribution is right-skewed. When mean < median, it is left-skewed.

Measures of Dispersion

x <- mtcars$mpg

# Variance and Standard Deviation
var(x)    # 36.32
sd(x)     # 6.027

# Range
range(x)           # 10.40  33.90
diff(range(x))     # 23.5  (max − min)

# Interquartile Range
IQR(x)    # 7.375

# Quantiles (any percentile)
quantile(x, probs = c(0.10, 0.25, 0.50, 0.75, 0.90))

Coefficient of Variation (CV)

CV expresses spread relative to the mean — useful for comparing variables on different scales:

cv <- function(x) sd(x) / mean(x) * 100
cv(mtcars$mpg)    # ~30 %
cv(mtcars$disp)   # ~53 %

Frequency Tables & Proportions

Categorical variables need counts, not averages.

# Absolute frequency
table(mtcars$cyl)
# 4  6  8 
# 11  7 14

# Relative frequency (proportions)
prop.table(table(mtcars$cyl))
#        4        6        8 
# 0.34375  0.21875  0.43750

# Cross-tabulation
table(mtcars$cyl, mtcars$am)  # cylinders × transmission type

Grouped Summaries with dplyr

dplyr makes group-wise statistics clean and readable.

# Install once: install.packages("dplyr")
library(dplyr)

mtcars %>%
  group_by(cyl) %>%
  summarise(
    n          = n(),
    mean_mpg   = mean(mpg),
    median_mpg = median(mpg),
    sd_mpg     = sd(mpg),
    min_mpg    = min(mpg),
    max_mpg    = max(mpg),
    .groups    = "drop"
  )

Output:

cyl	n	mean_mpg	median_mpg	sd_mpg	min_mpg	max_mpg
4	11	26.66	26.0	4.51	21.4	33.9
6	7	19.74	19.7	1.45	17.8	21.4
8	14	15.10	15.2	2.56	10.4	19.2

Summarising multiple columns at once

mtcars %>%
  group_by(cyl) %>%
  summarise(across(c(mpg, hp, wt), list(mean = mean, sd = sd),
                   .names = "{.col}_{.fn}"))

Rich Summaries with skimr

skimr produces a richer, better-formatted summary with histograms in the console.

# Install once: install.packages("skimr")
library(skimr)

skim(mtcars)

Highlights of skim() output:

n_missing — count of NA values
complete_rate — proportion of non-missing
hist — inline ASCII histogram
p0, p25, p50, p75, p100 — full five-number summary

You can also skim by group:

mtcars %>%
  group_by(cyl) %>%
  skim()

Handling Missing Data

Real data has gaps. Always check before summarising.

# Detect missing values
sum(is.na(mtcars))       # total NAs
colSums(is.na(mtcars))   # NAs per column

# Safe mean ignoring NAs
mean(c(1, 2, NA, 4), na.rm = TRUE)   # 2.33

# Drop rows with any NA
clean_df <- na.omit(mtcars)

# Fill NAs with column median (example)
library(dplyr)
mtcars_filled <- mtcars %>%
  mutate(across(where(is.numeric),
                ~ ifelse(is.na(.), median(., na.rm = TRUE), .)))

Visualising Your Summary

Numbers alone can hide patterns. Pair summaries with plots.

Histogram + density

library(ggplot2)

ggplot(mtcars, aes(x = mpg)) +
  geom_histogram(aes(y = after_stat(density)),
                 binwidth = 2, fill = "#3B82F6", colour = "white") +
  geom_density(colour = "#EF4444", linewidth = 1) +
  labs(title = "Distribution of MPG", x = "Miles per Gallon", y = "Density") +
  theme_minimal()

Box plot by group

ggplot(mtcars, aes(x = factor(cyl), y = mpg, fill = factor(cyl))) +
  geom_boxplot(alpha = 0.7, outlier.colour = "red") +
  scale_fill_brewer(palette = "Set2") +
  labs(title = "MPG by Cylinder Count",
       x = "Cylinders", y = "Miles per Gallon", fill = "Cyl") +
  theme_minimal()

Violin plot (shows distribution shape)

ggplot(mtcars, aes(x = factor(cyl), y = mpg, fill = factor(cyl))) +
  geom_violin(trim = FALSE, alpha = 0.6) +
  geom_boxplot(width = 0.1, fill = "white") +
  labs(title = "MPG Distribution by Cylinder (Violin + Box)",
       x = "Cylinders", y = "MPG") +
  theme_minimal(base_size = 13)

Putting It All Together

Here is a reusable summary report function you can drop into any project:

library(dplyr)
library(skimr)

full_summary <- function(df, group_var = NULL) {

  cat("========================================\n")
  cat(" DATASET OVERVIEW\n")
  cat("========================================\n")
  cat("Rows     :", nrow(df), "\n")
  cat("Columns  :", ncol(df), "\n")
  cat("Total NAs:", sum(is.na(df)), "\n\n")

  cat("--- Numeric Summary (skimr) ---\n")
  print(skim(df))

  if (!is.null(group_var)) {
    cat("\n--- Grouped Means by", group_var, "---\n")
    result <- df %>%
      group_by(across(all_of(group_var))) %>%
      summarise(across(where(is.numeric), mean, na.rm = TRUE),
                n = n(), .groups = "drop")
    print(result)
  }
}

# Example usage
full_summary(mtcars, group_var = "cyl")

Quick Reference Cheatsheet

Task	Function
Five-number summary	`summary(x)`
Mean	`mean(x, na.rm=TRUE)`
Median	`median(x, na.rm=TRUE)`
Standard deviation	`sd(x, na.rm=TRUE)`
Variance	`var(x, na.rm=TRUE)`
IQR	`IQR(x, na.rm=TRUE)`
Quantiles	`quantile(x, probs=...)`
Frequency table	`table(x)`
Proportions	`prop.table(table(x))`
Count NAs	`sum(is.na(x))`
Rich summary	`skimr::skim(df)`
Group summary	`dplyr::group_by() %>% summarise()`

Statistical Summary in R: A Complete Guide

Dr. M. Shamshad

Why Statistical Summaries Matter

The Built-in `summary()` Function

What each value means

Measures of Central Tendency

Measures of Dispersion

Coefficient of Variation (CV)

Frequency Tables & Proportions

Grouped Summaries with dplyr

Summarising multiple columns at once

Rich Summaries with skimr

Handling Missing Data

Visualising Your Summary

Histogram + density

Box plot by group

Violin plot (shows distribution shape)

Putting It All Together

Quick Reference Cheatsheet

Further Reading

Share on

You may also enjoy

Honeycomb Design Analysis in R

Spatial Analysis with AR1 × AR1 Model: Theory & Complete R Analysis

Partially Replicated (p-rep) Design: Theory & Complete R Analysis

Augmented Design: Theory & Complete R Analysis

Statistical Summary in R: A Complete Guide

Dr. M. Shamshad

Why Statistical Summaries Matter

The Built-in summary() Function

What each value means

Measures of Central Tendency

Measures of Dispersion

Coefficient of Variation (CV)

Frequency Tables & Proportions

Grouped Summaries with dplyr

Summarising multiple columns at once

Rich Summaries with skimr

Handling Missing Data

Visualising Your Summary

Histogram + density

Box plot by group

Violin plot (shows distribution shape)

Putting It All Together

Quick Reference Cheatsheet

Further Reading

Share on

You may also enjoy

Honeycomb Design Analysis in R

Spatial Analysis with AR1 × AR1 Model: Theory & Complete R Analysis

Partially Replicated (p-rep) Design: Theory & Complete R Analysis

Augmented Design: Theory & Complete R Analysis

The Built-in `summary()` Function