How to Use Value Counts in Pandas

When you're exploring a new dataset, one of the first questions you'll ask is 'what values exist in this column and how often do they appear?' The `value_counts()` method answers this question...

Key Insights

  • value_counts() is the fastest way to understand categorical data distribution, returning frequency counts sorted by occurrence in a single method call.
  • The normalize parameter transforms raw counts into percentages, while bins lets you apply frequency analysis to continuous numeric data.
  • Applying value_counts() to multiple DataFrame columns simultaneously reveals combination patterns that single-column analysis misses.

Introduction to Value Counts

When you’re exploring a new dataset, one of the first questions you’ll ask is “what values exist in this column and how often do they appear?” The value_counts() method answers this question instantly.

This method counts unique values in a Series or DataFrame column and returns them sorted by frequency. It’s the go-to tool for quick frequency analysis during exploratory data analysis, data validation, and feature engineering. You’ll use it constantly—whether you’re checking for data quality issues, understanding categorical distributions, or identifying rare values that might need special handling.

Let’s dig into how to use it effectively.

Basic Syntax and Usage

The simplest form of value_counts() operates on a pandas Series and returns another Series with unique values as the index and their counts as values.

import pandas as pd

# Sample e-commerce data
df = pd.DataFrame({
    'product_category': ['Electronics', 'Clothing', 'Electronics', 'Home', 
                         'Clothing', 'Electronics', 'Clothing', 'Home',
                         'Electronics', 'Clothing', 'Books', 'Electronics'],
    'city': ['New York', 'Chicago', 'New York', 'Chicago', 'New York',
             'Los Angeles', 'Chicago', 'New York', 'Chicago', 'New York',
             'Los Angeles', 'New York'],
    'order_value': [299, 45, 599, 120, 89, 450, 67, 85, 199, 55, 25, 899]
})

# Basic value counts
print(df['product_category'].value_counts())

Output:

product_category
Electronics    5
Clothing       4
Home           2
Books          1
Name: count, dtype: int64

By default, value_counts() sorts results in descending order by count. The most frequent value appears first. This behavior is intentional—you typically care most about dominant categories.

Key Parameters Explained

The method signature includes several parameters that dramatically change its behavior:

Series.value_counts(normalize=False, sort=True, ascending=False, 
                    bins=None, dropna=True)

The normalize Parameter

When you need percentages instead of raw counts, normalize=True converts the output to proportions that sum to 1.0.

# Get percentage distribution
print(df['product_category'].value_counts(normalize=True))

Output:

product_category
Electronics    0.416667
Clothing       0.333333
Home           0.166667
Books          0.083333
Name: proportion, dtype: float64

This tells you Electronics represents about 42% of orders. For reporting, multiply by 100:

# Format as percentages
percentages = df['product_category'].value_counts(normalize=True) * 100
print(percentages.round(1))

Output:

product_category
Electronics    41.7
Clothing       33.3
Home           16.7
Books           8.3
Name: proportion, dtype: float64

The bins Parameter

Here’s where value_counts() gets interesting for numeric data. The bins parameter discretizes continuous values into intervals, then counts occurrences in each bin.

# Bin order values into price ranges
print(df['order_value'].value_counts(bins=4, sort=False))

Output:

(24.126, 242.5]     8
(242.5, 461.0]      2
(461.0, 679.5]      1
(679.5, 899.0]      1
Name: count, dtype: int64

I used sort=False to display bins in ascending order rather than by frequency. This reveals that most orders fall in the lowest price bracket.

For more control, pass explicit bin edges:

# Custom price tiers
bins = [0, 50, 100, 250, 500, 1000]
print(df['order_value'].value_counts(bins=bins, sort=False))

Output:

(-0.001, 50.0]     3
(50.0, 100.0]      4
(100.0, 250.0]     2
(250.0, 500.0]     2
(500.0, 1000.0]    1
Name: count, dtype: int64

The sort and ascending Parameters

Control the output order with these two parameters:

# Sort alphabetically by category name (ascending by index)
print(df['product_category'].value_counts().sort_index())

# Sort by count ascending (least frequent first)
print(df['product_category'].value_counts(ascending=True))

The sort=False option returns values in the order they first appear in the data—useful when your categories have a natural ordering you want to preserve.

Using Value Counts on DataFrames

Starting with pandas 1.1.0, you can call value_counts() directly on a DataFrame to count unique row combinations. This is powerful for analyzing multi-dimensional categorical patterns.

# Add region data
df['region'] = ['East', 'Midwest', 'East', 'Midwest', 'East',
                'West', 'Midwest', 'East', 'Midwest', 'East',
                'West', 'East']

# Count combinations of region and product category
combo_counts = df[['region', 'product_category']].value_counts()
print(combo_counts)

Output:

region   product_category
East     Electronics         3
         Clothing            2
Midwest  Clothing            2
         Electronics         1
         Home                1
East     Home                1
West     Electronics         1
         Books               1
Name: count, dtype: int64

This immediately shows that East region dominates Electronics purchases. You can reset the index for a cleaner DataFrame format:

combo_df = df[['region', 'product_category']].value_counts().reset_index()
combo_df.columns = ['region', 'product_category', 'order_count']
print(combo_df)

Output:

    region product_category  order_count
0     East      Electronics            3
1     East         Clothing            2
2  Midwest         Clothing            2
3  Midwest      Electronics            1
4  Midwest             Home            1
5     East             Home            1
6     West      Electronics            1
7     West            Books            1

Handling Missing Data

By default, value_counts() excludes NaN values. The dropna parameter controls this behavior.

# Add some missing data
df_with_nulls = df.copy()
df_with_nulls.loc[0, 'product_category'] = None
df_with_nulls.loc[5, 'product_category'] = None

# Default behavior - NaN excluded
print("dropna=True (default):")
print(df_with_nulls['product_category'].value_counts())
print()

# Include NaN in counts
print("dropna=False:")
print(df_with_nulls['product_category'].value_counts(dropna=False))

Output:

dropna=True (default):
product_category
Clothing       4
Electronics    3
Home           2
Books          1
Name: count, dtype: int64

dropna=False:
product_category
Clothing       4
Electronics    3
Home           2
NaN            2
Books          1
Name: count, dtype: int64

Always check with dropna=False during data validation. Those two NaN values might indicate a data pipeline issue you need to fix upstream.

Common Use Cases and Patterns

Data Validation

Use value_counts() to catch unexpected values:

# Check for valid status codes
valid_statuses = ['pending', 'shipped', 'delivered', 'cancelled']
status_series = pd.Series(['pending', 'shipped', 'SHIPPED', 'delivered', 
                           'canceled', 'pending', 'shipped'])

counts = status_series.value_counts()
invalid = counts.index.difference(valid_statuses)
if len(invalid) > 0:
    print(f"Invalid status values found: {list(invalid)}")
    print(counts[invalid])

Output:

Invalid status values found: ['SHIPPED', 'canceled']
SHIPPED     1
canceled    1
Name: count, dtype: int64

This catches both the case sensitivity issue and the typo (“canceled” vs “cancelled”).

Filtering by Frequency Threshold

Remove rare categories that might cause issues in machine learning models:

# Keep only categories appearing 3+ times
category_counts = df['product_category'].value_counts()
frequent_categories = category_counts[category_counts >= 3].index

df_filtered = df[df['product_category'].isin(frequent_categories)]
print(f"Kept {len(df_filtered)} rows with frequent categories: {list(frequent_categories)}")

Output:

Kept 9 rows with frequent categories: ['Electronics', 'Clothing']

Finding the Top N Values

Grab the most common values for focused analysis:

# Top 3 product categories
top_3 = df['product_category'].value_counts().head(3)
print(top_3)

# Use nlargest for the same result
top_3_alt = df['product_category'].value_counts().nlargest(3)

Quick Cardinality Check

Before encoding categorical variables, check how many unique values exist:

# Cardinality summary for all object columns
for col in df.select_dtypes(include='object').columns:
    n_unique = df[col].value_counts().shape[0]
    print(f"{col}: {n_unique} unique values")

Output:

product_category: 4 unique values
city: 4 unique values
region: 3 unique values

Conclusion

The value_counts() method handles the majority of frequency analysis tasks you’ll encounter. Remember these key patterns:

  • Use normalize=True for percentage distributions
  • Use bins to analyze continuous numeric data
  • Apply to multiple columns simultaneously for combination analysis
  • Always check dropna=False during data validation
  • Filter rare values by indexing the resulting Series

This method shines during exploratory analysis when you need quick answers about data distribution. It’s faster than writing groupby aggregations and more readable than custom counting logic. Reach for it first when you need to understand what’s in your data.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.