Pandas - Convert DataFrame to Dictionary

The `to_dict()` method accepts an `orient` parameter that determines the resulting dictionary structure. Each orientation serves different use cases, from API responses to data transformation...

Key Insights

  • Pandas offers five distinct methods to convert DataFrames to dictionaries (to_dict() with orientations: ‘dict’, ’list’, ‘series’, ‘split’, ‘records’, ‘index’), each producing different structures optimized for specific use cases
  • The orientation parameter fundamentally changes dictionary structure—‘records’ creates row-based lists ideal for JSON APIs, while ‘dict’ produces nested column-based dictionaries better for data manipulation
  • Performance varies significantly by orientation: ‘dict’ and ’list’ are fastest for column operations, while ‘records’ adds overhead but provides the most intuitive structure for row-based processing

Understanding to_dict() Orientations

The to_dict() method accepts an orient parameter that determines the resulting dictionary structure. Each orientation serves different use cases, from API responses to data transformation pipelines.

import pandas as pd

df = pd.DataFrame({
    'product_id': [101, 102, 103],
    'name': ['Laptop', 'Mouse', 'Keyboard'],
    'price': [999.99, 29.99, 79.99],
    'stock': [15, 150, 45]
})

# Default orientation: 'dict'
result_dict = df.to_dict()
print(result_dict)

Output:

{
    'product_id': {0: 101, 1: 102, 2: 103},
    'name': {0: 'Laptop', 1: 'Mouse', 2: 'Keyboard'},
    'price': {0: 999.99, 1: 29.99, 2: 79.99},
    'stock': {0: 15, 1: 150, 2: 45}
}

Orient=‘dict’: Nested Column Dictionaries

The default ‘dict’ orientation creates a nested structure where each column becomes a key mapping to another dictionary of index-value pairs. This format excels when you need to access entire columns or perform column-wise operations.

# Accessing specific column data
product_dict = df.to_dict(orient='dict')
all_prices = product_dict['price']
laptop_price = product_dict['price'][0]

# Use case: Column-based data validation
def validate_stock_levels(data_dict):
    low_stock = {idx: val for idx, val in data_dict['stock'].items() if val < 50}
    return low_stock

print(validate_stock_levels(product_dict))
# Output: {0: 15, 2: 45}

Orient=‘list’: Column Arrays

The ’list’ orientation converts each column to a list, creating a flatter structure than ‘dict’. This format integrates seamlessly with systems expecting array-based data.

list_dict = df.to_dict(orient='list')
print(list_dict)

Output:

{
    'product_id': [101, 102, 103],
    'name': ['Laptop', 'Mouse', 'Keyboard'],
    'price': [999.99, 29.99, 79.99],
    'stock': [15, 150, 45]
}

Practical application for bulk operations:

# Calculate statistics without iteration
import statistics

list_data = df.to_dict(orient='list')
price_stats = {
    'mean': statistics.mean(list_data['price']),
    'median': statistics.median(list_data['price']),
    'total_inventory_value': sum(p * s for p, s in zip(list_data['price'], list_data['stock']))
}
print(price_stats)
# Output: {'mean': 369.99, 'median': 79.99, 'total_inventory_value': 22949.35}

Orient=‘records’: Row-Based List of Dictionaries

The ‘records’ orientation produces a list where each element is a dictionary representing one row. This is the standard format for RESTful API responses and JSON serialization.

records_list = df.to_dict(orient='records')
print(records_list)

Output:

[
    {'product_id': 101, 'name': 'Laptop', 'price': 999.99, 'stock': 15},
    {'product_id': 102, 'name': 'Mouse', 'price': 29.99, 'stock': 150},
    {'product_id': 103, 'name': 'Keyboard', 'price': 79.99, 'stock': 45}
]

Real-world API integration:

import json

def prepare_api_payload(df, endpoint_type):
    records = df.to_dict(orient='records')
    
    if endpoint_type == 'bulk_create':
        return json.dumps({'products': records})
    elif endpoint_type == 'individual':
        return [json.dumps(record) for record in records]
    
# Bulk API payload
payload = prepare_api_payload(df, 'bulk_create')
print(payload)

Orient=‘index’: Index-Based Nested Dictionaries

The ‘index’ orientation inverts the ‘dict’ structure, using DataFrame indices as top-level keys. Each index maps to a dictionary of column-value pairs.

index_dict = df.to_dict(orient='index')
print(index_dict)

Output:

{
    0: {'product_id': 101, 'name': 'Laptop', 'price': 999.99, 'stock': 15},
    1: {'product_id': 102, 'name': 'Mouse', 'price': 29.99, 'stock': 150},
    2: {'product_id': 103, 'name': 'Keyboard', 'price': 79.99, 'stock': 45}
}

Useful with custom indices:

df_custom = df.set_index('product_id')
product_lookup = df_custom.to_dict(orient='index')

# Direct product access by ID
print(product_lookup[102])
# Output: {'name': 'Mouse', 'price': 29.99, 'stock': 150}

# Update inventory
product_lookup[102]['stock'] -= 10

Orient=‘series’: Column Series Objects

The ‘series’ orientation maintains pandas Series objects as values, preserving pandas functionality within the dictionary structure.

series_dict = df.to_dict(orient='series')
print(type(series_dict['price']))  # <class 'pandas.core.series.Series'>

# Leverage pandas operations
expensive_items = series_dict['price'][series_dict['price'] > 50]
print(expensive_items)

Output:

0    999.99
2     79.99
Name: price, dtype: float64

Orient=‘split’: Separated Components

The ‘split’ orientation decomposes the DataFrame into separate index, columns, and data components. This format is efficient for DataFrame reconstruction and data transmission.

split_dict = df.to_dict(orient='split')
print(split_dict)

Output:

{
    'index': [0, 1, 2],
    'columns': ['product_id', 'name', 'price', 'stock'],
    'data': [
        [101, 'Laptop', 999.99, 15],
        [102, 'Mouse', 29.99, 150],
        [103, 'Keyboard', 79.99, 45]
    ]
}

Reconstruction example:

# Serialize and reconstruct
split_data = df.to_dict(orient='split')
reconstructed_df = pd.DataFrame(
    split_data['data'],
    index=split_data['index'],
    columns=split_data['columns']
)
print(reconstructed_df.equals(df))  # True

Handling MultiIndex DataFrames

MultiIndex DataFrames require special consideration. The ‘index’ and ‘records’ orientations behave differently with hierarchical indices.

# Create MultiIndex DataFrame
multi_df = pd.DataFrame({
    'sales': [100, 150, 200, 120],
    'returns': [5, 8, 10, 3]
}, index=pd.MultiIndex.from_tuples([
    ('Q1', 'North'), ('Q1', 'South'), 
    ('Q2', 'North'), ('Q2', 'South')
], names=['Quarter', 'Region']))

# MultiIndex with orient='index' creates tuple keys
multi_index_dict = multi_df.to_dict(orient='index')
print(multi_index_dict[('Q1', 'North')])
# Output: {'sales': 100, 'returns': 5}

# Reset index for records orientation
multi_records = multi_df.reset_index().to_dict(orient='records')
print(multi_records[0])
# Output: {'Quarter': 'Q1', 'Region': 'North', 'sales': 100, 'returns': 5}

Performance Considerations

Different orientations have varying performance characteristics. For large DataFrames, choosing the right orientation impacts execution time.

import time
import pandas as pd
import numpy as np

# Create large DataFrame
large_df = pd.DataFrame(
    np.random.rand(10000, 50),
    columns=[f'col_{i}' for i in range(50)]
)

orientations = ['dict', 'list', 'records', 'index', 'split']
timings = {}

for orient in orientations:
    start = time.time()
    result = large_df.to_dict(orient=orient)
    timings[orient] = time.time() - start

for orient, duration in sorted(timings.items(), key=lambda x: x[1]):
    print(f"{orient}: {duration:.4f}s")

Typical results show ‘dict’ and ’list’ as fastest, ‘records’ slowest due to row iteration overhead. Choose based on downstream usage patterns rather than conversion speed alone.

Converting Specific Columns

Select specific columns before conversion to reduce payload size and improve performance.

# Convert subset of columns
subset_dict = df[['product_id', 'price']].to_dict(orient='records')
print(subset_dict)
# Output: [{'product_id': 101, 'price': 999.99}, ...]

# Conditional column selection
numeric_cols = df.select_dtypes(include=['number']).to_dict(orient='list')
print(numeric_cols.keys())
# Output: dict_keys(['product_id', 'price', 'stock'])

This approach is essential when interfacing with external systems that expect specific data schemas or when minimizing memory usage in data pipelines.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.