How to Create a Frequency Table in Python

A frequency table counts how often each unique value appears in your dataset. It's one of the first tools you should reach for when exploring new data. Before running complex models or generating...

Key Insights

  • Python’s collections.Counter handles simple frequency counting, but pandas’ value_counts() and crosstab() provide richer functionality for real-world data analysis
  • Always include relative frequencies (percentages) alongside raw counts—stakeholders care more about proportions than absolute numbers
  • Use pd.cut() for equal-width bins and pd.qcut() for equal-frequency bins when converting continuous data into categorical frequency tables

Introduction to Frequency Tables

A frequency table counts how often each unique value appears in your dataset. It’s one of the first tools you should reach for when exploring new data. Before running complex models or generating fancy visualizations, you need to understand what’s actually in your data—and frequency tables tell you exactly that.

Frequency tables answer fundamental questions: What categories exist in my data? How are values distributed? Are there unexpected entries that suggest data quality issues? They’re essential for categorical analysis, identifying outliers, validating data imports, and communicating distributions to stakeholders.

Python offers multiple approaches to building frequency tables, from lightweight standard library solutions to full-featured pandas methods. The right choice depends on your data structure and what you need to do with the results.

Basic Frequency Table with Python Collections

For simple frequency counting without external dependencies, Python’s collections.Counter does the job efficiently. It’s fast, memory-efficient, and works with any iterable.

from collections import Counter

# Survey responses about preferred programming language
survey_responses = [
    'Python', 'JavaScript', 'Python', 'Java', 'Python',
    'JavaScript', 'C++', 'Python', 'Java', 'Python',
    'JavaScript', 'Python', 'Go', 'Python', 'JavaScript',
    'Python', 'Java', 'Python', 'JavaScript', 'Python'
]

# Create frequency table
freq_table = Counter(survey_responses)

print("Language Frequency Table")
print("-" * 25)
for language, count in freq_table.most_common():
    print(f"{language:12} | {count}")

# Output:
# Language Frequency Table
# -------------------------
# Python       | 10
# JavaScript   | 5
# Java         | 3
# C++          | 1
# Go           | 1

Counter returns a dictionary subclass, so you can access counts directly with freq_table['Python']. The most_common() method returns items sorted by frequency, which is usually what you want for reporting.

However, Counter has limitations. It doesn’t integrate with DataFrames, doesn’t calculate percentages automatically, and requires manual work for more sophisticated analysis. For anything beyond quick counts, pandas is the better choice.

Creating Frequency Tables with Pandas

Pandas provides two primary methods for frequency tables: value_counts() for single-variable analysis and crosstab() for examining relationships between two variables.

Single Variable with value_counts()

import pandas as pd

# Sample e-commerce data
sales_data = pd.DataFrame({
    'order_id': range(1, 101),
    'product_category': ['Electronics'] * 35 + ['Clothing'] * 28 + 
                        ['Home & Garden'] * 20 + ['Books'] * 12 + ['Sports'] * 5,
    'payment_method': ['Credit Card'] * 45 + ['PayPal'] * 30 + 
                      ['Debit Card'] * 15 + ['Apple Pay'] * 10,
    'order_value': [150, 45, 200, 35, 80] * 20
})

# Basic frequency table
category_counts = sales_data['product_category'].value_counts()
print("Product Category Frequency Table")
print(category_counts)

# Output:
# Electronics      35
# Clothing         28
# Home & Garden    20
# Books            12
# Sports            5
# Name: product_category, dtype: int64

By default, value_counts() sorts by frequency descending. Pass sort=False to maintain the original order, or ascending=True to flip the sort direction.

Two Variables with crosstab()

When you need to examine the relationship between two categorical variables, pd.crosstab() creates a contingency table:

# Create more realistic sample data
import numpy as np
np.random.seed(42)

orders = pd.DataFrame({
    'category': np.random.choice(['Electronics', 'Clothing', 'Books'], 200),
    'payment': np.random.choice(['Credit Card', 'PayPal', 'Debit Card'], 200),
    'region': np.random.choice(['North', 'South', 'East', 'West'], 200)
})

# Two-way frequency table
cross_table = pd.crosstab(orders['category'], orders['payment'])
print("Category vs Payment Method")
print(cross_table)

# Output:
# payment     Credit Card  Debit Card  PayPal
# category                                    
# Books                24          18      20
# Clothing             22          25      26
# Electronics          22          20      23

Add margins=True to include row and column totals:

cross_table_with_totals = pd.crosstab(
    orders['category'], 
    orders['payment'], 
    margins=True,
    margins_name='Total'
)
print(cross_table_with_totals)

Adding Relative and Cumulative Frequencies

Raw counts rarely tell the whole story. A complete frequency table includes relative frequencies (percentages) and cumulative frequencies for distribution analysis.

def create_complete_frequency_table(series, name='Value'):
    """
    Build a comprehensive frequency table with counts, 
    relative frequencies, and cumulative frequencies.
    """
    # Get counts
    counts = series.value_counts().sort_index()
    
    # Build the table
    freq_table = pd.DataFrame({
        name: counts.index,
        'Frequency': counts.values,
        'Relative Freq': counts.values / counts.sum(),
        'Percentage': (counts.values / counts.sum()) * 100,
        'Cumulative Freq': counts.cumsum().values,
        'Cumulative %': (counts.cumsum().values / counts.sum()) * 100
    })
    
    return freq_table

# Example: Customer satisfaction ratings
ratings = pd.Series([1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5])

satisfaction_table = create_complete_frequency_table(ratings, 'Rating')
print("Customer Satisfaction Frequency Table")
print(satisfaction_table.to_string(index=False))

# Output:
# Customer Satisfaction Frequency Table
#  Rating  Frequency  Relative Freq  Percentage  Cumulative Freq  Cumulative %
#       1          1           0.05         5.0                1           5.0
#       2          2           0.10        10.0                3          15.0
#       3          4           0.20        20.0                7          35.0
#       4          6           0.30        30.0               13          65.0
#       5          7           0.35        35.0               20         100.0

The cumulative percentage tells you that 65% of customers rated their experience 4 or below—information that’s immediately actionable for a product team.

For crosstabs, use the normalize parameter:

# Normalize by rows (each row sums to 1)
pd.crosstab(orders['category'], orders['payment'], normalize='index')

# Normalize by columns (each column sums to 1)
pd.crosstab(orders['category'], orders['payment'], normalize='columns')

# Normalize by total (entire table sums to 1)
pd.crosstab(orders['category'], orders['payment'], normalize='all')

Handling Continuous Data with Binning

Continuous variables like age, income, or temperature don’t work directly with frequency tables—you’d get one row per unique value. Instead, you group values into bins.

Pandas offers two binning functions:

  • pd.cut(): Creates equal-width bins
  • pd.qcut(): Creates equal-frequency bins (quantiles)
# Sample age data
np.random.seed(42)
customer_ages = pd.Series(np.random.randint(18, 75, 500))

# Equal-width bins (every 10 years)
age_bins = pd.cut(customer_ages, bins=[17, 25, 35, 45, 55, 65, 75], 
                  labels=['18-25', '26-35', '36-45', '46-55', '56-65', '66-75'])

age_frequency = age_bins.value_counts().sort_index()
print("Age Group Distribution (Equal Width)")
print(age_frequency)

# Output:
# 18-25     70
# 26-35     87
# 36-45     91
# 46-55     85
# 56-65     86
# 66-75     81
# Name: count, dtype: int64

Use pd.qcut() when you want each bin to contain approximately the same number of observations:

# Equal-frequency bins (quartiles)
income = pd.Series(np.random.exponential(50000, 1000))
income_quartiles = pd.qcut(income, q=4, labels=['Q1', 'Q2', 'Q3', 'Q4'])

print("Income Quartile Distribution")
print(income_quartiles.value_counts().sort_index())

# Each quartile contains ~250 observations

Choose pd.cut() when the bin boundaries matter (regulatory thresholds, standard age groups). Choose pd.qcut() when you want to compare groups of equal size.

Visualization Options

Frequency tables become more impactful when visualized. Bar charts work for categorical data; histograms work for binned continuous data.

import matplotlib.pyplot as plt
import seaborn as sns

# Set up the figure
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Sample data
categories = ['Electronics', 'Clothing', 'Home & Garden', 'Books', 'Sports']
frequencies = [35, 28, 20, 12, 5]

# Bar chart for categorical frequency
ax1 = axes[0]
colors = sns.color_palette('viridis', len(categories))
bars = ax1.bar(categories, frequencies, color=colors, edgecolor='black')
ax1.set_xlabel('Product Category')
ax1.set_ylabel('Frequency')
ax1.set_title('Sales by Product Category')
ax1.tick_params(axis='x', rotation=45)

# Add value labels on bars
for bar, freq in zip(bars, frequencies):
    ax1.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.5, 
             str(freq), ha='center', va='bottom', fontweight='bold')

# Histogram for continuous data
ax2 = axes[1]
np.random.seed(42)
ages = np.random.normal(40, 12, 500)
ax2.hist(ages, bins=10, color='steelblue', edgecolor='black', alpha=0.7)
ax2.set_xlabel('Age')
ax2.set_ylabel('Frequency')
ax2.set_title('Customer Age Distribution')

plt.tight_layout()
plt.savefig('frequency_visualization.png', dpi=150, bbox_inches='tight')
plt.show()

For quick exploratory analysis, pandas integrates directly with matplotlib:

# One-liner visualization from value_counts
sales_data['product_category'].value_counts().plot(kind='bar', rot=45)
plt.ylabel('Count')
plt.title('Product Category Frequency')
plt.tight_layout()
plt.show()

Conclusion

Frequency tables are foundational to data analysis. Start with collections.Counter for quick, dependency-free counts on simple iterables. Move to pandas value_counts() when working with DataFrames or when you need sorting, percentages, or integration with other pandas operations. Use crosstab() to examine relationships between two categorical variables.

Always enhance raw counts with relative frequencies—percentages communicate better than absolute numbers. For continuous data, choose between pd.cut() (equal-width bins) and pd.qcut() (equal-frequency bins) based on whether the bin boundaries or the group sizes matter more for your analysis.

These techniques form the basis for more advanced statistical analysis, including chi-square tests for independence, distribution fitting, and feature engineering for machine learning models.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.