R - table() and prop.table() | Application Architect

Key Insights

The table() function creates frequency tables from categorical data, supporting one-way, two-way, and multi-way contingency tables with optional NA handling
prop.table() converts frequency counts to proportions by dividing by row, column, or total margins, essential for percentage-based analysis
These functions form the foundation for categorical data analysis in R, integrating seamlessly with visualization tools and statistical tests like chi-square

Understanding table() Fundamentals

The table() function counts occurrences of unique values in vectors or factor combinations. It returns an object of class “table” that behaves like a named array.

# One-way frequency table
colors <- c("red", "blue", "red", "green", "blue", "red", "green", "blue", "blue")
table(colors)
# colors
#  blue green   red 
#     4     2     3

# With factors - includes all levels even if count is zero
color_factor <- factor(colors, levels = c("red", "blue", "green", "yellow"))
table(color_factor)
# color_factor
#    red   blue  green yellow 
#      3      4      2      0

The function automatically sorts output alphabetically for character vectors. For factors, it respects level ordering, which matters for ordinal data.

# Ordinal data with proper ordering
satisfaction <- factor(
  c("low", "high", "medium", "low", "high", "medium", "high"),
  levels = c("low", "medium", "high"),
  ordered = TRUE
)
table(satisfaction)
# satisfaction
#    low medium   high 
#      2      2      3

Two-Way Contingency Tables

Pass multiple vectors to table() for cross-tabulation. The first argument defines rows, the second defines columns.

# Sample dataset
gender <- c("M", "F", "M", "F", "M", "F", "M", "F", "M", "M")
department <- c("IT", "HR", "IT", "HR", "Sales", "IT", "Sales", "HR", "IT", "Sales")

# Two-way table
table(gender, department)
#       department
# gender HR IT Sales
#      F  3  1     0
#      M  0  3     3

Access table elements using bracket notation. The dimnames attribute stores row and column labels.

cross_tab <- table(gender, department)

# Access specific cell
cross_tab["M", "IT"]  # 3

# Get dimensions
dim(cross_tab)  # 2 3

# Extract dimension names
dimnames(cross_tab)
# $gender
# [1] "F" "M"
# 
# $department
# [1] "HR"    "IT"    "Sales"

Handling Missing Values

By default, table() ignores NA values. Use useNA parameter to include them in counts.

data_with_na <- c("A", "B", NA, "A", "B", NA, "C", "A")

# Default behavior - excludes NA
table(data_with_na)
# data_with_na
# A B C 
# 3 2 1

# Include NA counts
table(data_with_na, useNA = "always")
# data_with_na
#    A    B    C <NA> 
#    3    2    1    2

# Include NA only if present
table(data_with_na, useNA = "ifany")
# data_with_na
#    A    B    C <NA> 
#    3    2    1    2

For multi-way tables, useNA applies to all dimensions.

x <- c("A", "B", NA, "A")
y <- c("X", NA, "Y", "X")

table(x, y, useNA = "ifany")
#      y
# x     X    Y <NA>
#   A   2    0    0
#   B   0    0    1
#   <NA> 0    1    0

Converting to Proportions with prop.table()

The prop.table() function divides table values by marginal sums. The margin parameter controls the direction: 1 for rows, 2 for columns, NULL for total.

# Create sample table
survey <- table(gender, department)

# Overall proportions (sum to 1)
prop.table(survey)
#       department
# gender   HR   IT Sales
#      F 0.3 0.10   0.0
#      M 0.0 0.30   0.3

# Row proportions (each row sums to 1)
prop.table(survey, margin = 1)
#       department
# gender        HR        IT     Sales
#      F 0.7500000 0.2500000 0.0000000
#      M 0.0000000 0.5000000 0.5000000

# Column proportions (each column sums to 1)
prop.table(survey, margin = 2)
#       department
# gender HR        IT     Sales
#      F  1 0.2500000 0.0000000
#      M  0 0.7500000 1.0000000

Convert to percentages by multiplying by 100.

# Percentage by row
round(prop.table(survey, margin = 1) * 100, 1)
#       department
# gender   HR   IT Sales
#      F 75.0 25.0   0.0
#      M  0.0 50.0  50.0

Three-Way and Higher Dimensional Tables

The table() function handles multiple dimensions, though interpretation becomes more complex.

# Three-way table
gender <- c("M", "F", "M", "F", "M", "F", "M", "F")
dept <- c("IT", "IT", "HR", "HR", "IT", "IT", "HR", "HR")
level <- c("Junior", "Senior", "Junior", "Senior", "Senior", "Junior", "Senior", "Junior")

table(gender, dept, level)
# , , level = Junior
# 
#       dept
# gender HR IT
#      F  1  1
#      M  1  1
# 
# , , level = Senior
# 
#       dept
# gender HR IT
#      F  1  1
#      M  1  1

Apply prop.table() with specific margins for multi-dimensional proportions.

three_way <- table(gender, dept, level)

# Proportions within each level (margin 3)
prop.table(three_way, margin = 3)
# Each slice sums to 1

# Proportions by gender and department (margins c(1,2))
prop.table(three_way, margin = c(1, 2))
# Each gender-department combination sums to 1 across levels

Practical Applications

Data Quality Checks

# Check for unexpected values
survey_data <- data.frame(
  response = c("Yes", "No", "Yes", "yes", "NO", "Yes"),
  stringsAsFactors = FALSE
)

table(survey_data$response)
# NO  No YES Yes yes 
#  1   1   2   2   1

# Reveals case inconsistencies requiring cleaning

Creating Summaries for Reports

# Customer segmentation analysis
customers <- data.frame(
  region = sample(c("North", "South", "East", "West"), 100, replace = TRUE),
  tier = sample(c("Bronze", "Silver", "Gold"), 100, replace = TRUE)
)

segment_table <- table(customers$region, customers$tier)
segment_pct <- round(prop.table(segment_table, margin = 1) * 100, 1)

# Add margin totals
addmargins(segment_table)
#        Bronze Silver Gold Sum
# East       11      6    8  25
# North       8      8    8  24
# South       9      6   10  25
# West        7      5   14  26
# Sum        35     25   40 100

Statistical Test Preparation

# Prepare data for chi-square test
treatment <- factor(rep(c("A", "B"), each = 50))
outcome <- c(
  sample(c("Success", "Failure"), 50, replace = TRUE, prob = c(0.7, 0.3)),
  sample(c("Success", "Failure"), 50, replace = TRUE, prob = c(0.5, 0.5))
)

contingency <- table(treatment, outcome)
chisq.test(contingency)

# Examine expected vs observed
chisq.test(contingency)$expected
chisq.test(contingency)$observed

Performance Considerations

For large datasets, table() stores all combinations in memory. Use dplyr::count() or data.table for datasets exceeding millions of rows.

# Memory-efficient alternative for large data
library(data.table)
dt <- data.table(
  category = sample(letters[1:5], 1e6, replace = TRUE),
  group = sample(LETTERS[1:3], 1e6, replace = TRUE)
)

# Faster than table() for large data
dt[, .N, by = .(category, group)]

The table() and prop.table() functions remain essential for exploratory data analysis, providing quick insights into categorical data distributions. Their integration with base R’s statistical functions and straightforward syntax makes them the first choice for frequency analysis in most scenarios.