How to Calculate Relative Frequency in Python

Key Insights

Relative frequency transforms raw counts into proportions, making it possible to compare datasets of different sizes and estimate probabilities from observed data.
Python’s collections.Counter handles basic relative frequency calculations without dependencies, while pandas’ value_counts(normalize=True) provides a one-liner for most real-world scenarios.
For continuous data, you must bin values into ranges first—use pd.cut() for custom bins or np.histogram() for automatic binning before calculating relative frequencies.

Introduction

When you count how many times each value appears in a dataset, you get absolute frequency. When you divide those counts by the total number of observations, you get relative frequency. This simple transformation unlocks powerful capabilities: comparing datasets of different sizes, estimating probabilities, and normalizing data for visualization.

Consider analyzing survey responses from two departments—one with 50 employees, another with 500. Comparing raw counts is meaningless. But relative frequencies let you say “15% of Department A prefers remote work versus 22% of Department B.” That’s actionable insight.

Relative frequency appears everywhere in data analysis: probability estimation from empirical data, creating normalized histograms, preprocessing categorical features for machine learning, and analyzing A/B test results. Let’s explore how to calculate it efficiently in Python.

The Math Behind Relative Frequency

The formula is straightforward:

relative_frequency = frequency_of_value / total_observations

For a dataset [1, 2, 2, 3, 3, 3]:

Value 1 appears 1 time → 1/6 = 0.167
Value 2 appears 2 times → 2/6 = 0.333
Value 3 appears 3 times → 3/6 = 0.500

The relative frequencies always sum to 1.0 (or 100% if you prefer percentages). This property makes relative frequency a valid probability distribution when you’re estimating probabilities from observed data.

Here’s the manual calculation in Python:

data = [1, 2, 2, 3, 3, 3]
total = len(data)

# Count occurrences manually
frequencies = {}
for value in data:
    frequencies[value] = frequencies.get(value, 0) + 1

# Calculate relative frequencies
relative_frequencies = {}
for value, count in frequencies.items():
    relative_frequencies[value] = count / total

print(relative_frequencies)
# Output: {1: 0.167, 2: 0.333, 3: 0.5}

This works, but it’s verbose. Let’s improve it.

Calculating Relative Frequency with Pure Python

The collections.Counter class handles frequency counting elegantly. Combined with a dictionary comprehension, you get a clean, dependency-free solution:

from collections import Counter

def relative_frequency(data):
    """Calculate relative frequency for each unique value in data."""
    counts = Counter(data)
    total = len(data)
    return {value: count / total for value, count in counts.items()}

# Example usage
survey_responses = ['yes', 'no', 'yes', 'yes', 'maybe', 'no', 'yes', 'maybe']
result = relative_frequency(survey_responses)

for value, freq in sorted(result.items(), key=lambda x: -x[1]):
    print(f"{value}: {freq:.1%}")

# Output:
# yes: 50.0%
# no: 25.0%
# maybe: 25.0%

This approach works with any hashable data type—strings, numbers, tuples. For percentage output, multiply by 100 or use Python’s :.1% format specifier.

If you need the results sorted by frequency (most common first), Counter provides the most_common() method:

from collections import Counter

def relative_frequency_sorted(data):
    """Return relative frequencies sorted by frequency (descending)."""
    counts = Counter(data)
    total = len(data)
    return [(value, count / total) for value, count in counts.most_common()]

responses = ['cat', 'dog', 'cat', 'bird', 'dog', 'cat', 'cat', 'dog', 'fish']
for animal, freq in relative_frequency_sorted(responses):
    print(f"{animal}: {freq:.2f}")

# Output:
# cat: 0.44
# dog: 0.33
# bird: 0.11
# fish: 0.11

Using Pandas for Relative Frequency

Pandas makes relative frequency calculation trivial with value_counts(normalize=True). This is the method you’ll use 90% of the time in practice:

import pandas as pd

# Create sample data
df = pd.DataFrame({
    'department': ['Engineering', 'Sales', 'Engineering', 'Marketing', 
                   'Sales', 'Engineering', 'Sales', 'Marketing', 
                   'Engineering', 'Engineering'],
    'satisfaction': ['High', 'Medium', 'High', 'Low', 'High', 
                     'Medium', 'Low', 'High', 'High', 'Medium']
})

# Absolute frequency
print("Absolute Frequency:")
print(df['department'].value_counts())

# Relative frequency
print("\nRelative Frequency:")
print(df['department'].value_counts(normalize=True))

# Output:
# Absolute Frequency:
# Engineering    5
# Sales          3
# Marketing      2
# Name: department, dtype: int64

# Relative Frequency:
# Engineering    0.5
# Sales          0.3
# Marketing      0.2
# Name: department, dtype: float64

For cross-tabulations (relative frequency across multiple categories), use pd.crosstab() with the normalize parameter:

# Cross-tabulation with relative frequencies
cross_tab = pd.crosstab(
    df['department'], 
    df['satisfaction'], 
    normalize='index'  # Normalize across rows
)
print(cross_tab)

# This shows what percentage of each department falls into each satisfaction level

The normalize parameter accepts three values:

'index': Normalize across rows (each row sums to 1)
'columns': Normalize across columns (each column sums to 1)
'all': Normalize across entire table (all cells sum to 1)

Binning Continuous Data

Continuous variables don’t have discrete values to count. You must first bin them into ranges. Pandas’ pd.cut() creates bins with custom edges, while pd.qcut() creates quantile-based bins:

import pandas as pd
import numpy as np

# Sample continuous data: customer ages
np.random.seed(42)
ages = np.random.normal(loc=35, scale=12, size=1000).astype(int)
ages = np.clip(ages, 18, 80)  # Clip to realistic range

df = pd.DataFrame({'age': ages})

# Create age bins
bins = [18, 25, 35, 45, 55, 65, 80]
labels = ['18-24', '25-34', '35-44', '45-54', '55-64', '65+']

df['age_group'] = pd.cut(df['age'], bins=bins, labels=labels, right=False)

# Calculate relative frequency of age groups
age_distribution = df['age_group'].value_counts(normalize=True).sort_index()
print(age_distribution)

# Output:
# age_group
# 18-24    0.123
# 25-34    0.298
# 35-44    0.301
# 45-54    0.187
# 55-64    0.072
# 65+      0.019
# Name: proportion, dtype: float64

For automatic binning, NumPy’s histogram() function returns both counts and bin edges:

import numpy as np

# Automatic binning with numpy
prices = np.random.exponential(scale=50, size=500)

# Get counts and bin edges
counts, bin_edges = np.histogram(prices, bins=10)
total = counts.sum()

# Calculate relative frequencies
relative_freqs = counts / total

# Display results
for i in range(len(counts)):
    print(f"${bin_edges[i]:.0f}-${bin_edges[i+1]:.0f}: {relative_freqs[i]:.1%}")

Visualizing Relative Frequency

Visualizations communicate relative frequency effectively. Here’s how to create both absolute and relative frequency plots for comparison:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np

# Sample data
np.random.seed(42)
data = np.random.choice(['A', 'B', 'C', 'D'], size=200, p=[0.4, 0.3, 0.2, 0.1])
df = pd.DataFrame({'category': data})

# Calculate frequencies
abs_freq = df['category'].value_counts().sort_index()
rel_freq = df['category'].value_counts(normalize=True).sort_index()

# Create side-by-side plots
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Absolute frequency
axes[0].bar(abs_freq.index, abs_freq.values, color='steelblue', edgecolor='black')
axes[0].set_title('Absolute Frequency')
axes[0].set_ylabel('Count')
axes[0].set_xlabel('Category')

# Relative frequency
axes[1].bar(rel_freq.index, rel_freq.values, color='coral', edgecolor='black')
axes[1].set_title('Relative Frequency')
axes[1].set_ylabel('Proportion')
axes[1].set_xlabel('Category')
axes[1].yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))

plt.tight_layout()
plt.savefig('frequency_comparison.png', dpi=150)
plt.show()

For continuous data, create a relative frequency histogram by setting density=True or stat='proportion' in seaborn:

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

data = np.random.normal(100, 15, 1000)

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Absolute frequency histogram
axes[0].hist(data, bins=30, color='steelblue', edgecolor='black')
axes[0].set_title('Absolute Frequency Histogram')
axes[0].set_ylabel('Count')

# Relative frequency histogram
axes[1].hist(data, bins=30, color='coral', edgecolor='black', density=True)
axes[1].set_title('Relative Frequency Histogram')
axes[1].set_ylabel('Density')

plt.tight_layout()
plt.show()

Practical Application: Comparing Distributions

Here’s where relative frequency proves its worth. Imagine comparing customer feedback from two product launches with vastly different sample sizes:

import pandas as pd
import matplotlib.pyplot as plt

# Product A: 150 responses
product_a = ['Excellent'] * 45 + ['Good'] * 52 + ['Average'] * 38 + ['Poor'] * 15

# Product B: 1,200 responses  
product_b = ['Excellent'] * 300 + ['Good'] * 480 + ['Average'] * 312 + ['Poor'] * 108

# Calculate relative frequencies
rel_freq_a = pd.Series(product_a).value_counts(normalize=True)
rel_freq_b = pd.Series(product_b).value_counts(normalize=True)

# Combine into DataFrame for easy comparison
comparison = pd.DataFrame({
    'Product A': rel_freq_a,
    'Product B': rel_freq_b
}).reindex(['Excellent', 'Good', 'Average', 'Poor'])

print("Relative Frequency Comparison:")
print(comparison.apply(lambda x: x.map('{:.1%}'.format)))

# Output:
# Relative Frequency Comparison:
#           Product A Product B
# Excellent     30.0%     25.0%
# Good          34.7%     40.0%
# Average       25.3%     26.0%
# Poor          10.0%      9.0%

# Visualize the comparison
comparison.plot(kind='bar', figsize=(10, 6), color=['steelblue', 'coral'])
plt.title('Customer Satisfaction: Product A vs Product B')
plt.ylabel('Relative Frequency')
plt.xlabel('Rating')
plt.legend(title='Product')
plt.xticks(rotation=0)
plt.gca().yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0%}'))
plt.tight_layout()
plt.show()

Despite Product B having 8x more responses, relative frequency lets us compare distributions directly. We can now see that Product A has a higher proportion of “Excellent” ratings (30% vs 25%), while Product B has more “Good” ratings (40% vs 35%).

This pattern applies to any comparison scenario: A/B tests, demographic analysis, time-period comparisons, or cross-market studies. Relative frequency normalizes the playing field, letting the underlying distributions speak for themselves.