How to Create a DataFrame from a Dictionary in Pandas
When you're working with Pandas, the DataFrame is everything. It's the central data structure you'll manipulate, analyze, and transform. And more often than not, your data starts life as a Python...
Key Insights
- Dictionaries map naturally to DataFrames: keys become column names, values become column data, making
pd.DataFrame(dict)the most intuitive way to create structured data in Pandas. - The structure of your dictionary determines your approach: column-oriented dictionaries (keys → columns) use the default constructor, while row-oriented data (list of dicts) requires understanding how Pandas infers structure.
- Use
pd.DataFrame.from_dict()with theorientparameter when you need explicit control over how dictionary keys and values map to rows versus columns.
Why Dictionaries Are the Natural Starting Point
When you’re working with Pandas, the DataFrame is everything. It’s the central data structure you’ll manipulate, analyze, and transform. And more often than not, your data starts life as a Python dictionary—whether you’re parsing JSON from an API, collecting results from a loop, or simply organizing data in your code.
The good news: Pandas makes dictionary-to-DataFrame conversion straightforward. The complexity comes from understanding which method to use based on your dictionary’s structure. Let’s break down every approach you’ll need.
Basic Dictionary to DataFrame Conversion
The simplest case is a dictionary where each key represents a column name and each value is a list of data for that column. This is column-oriented data, and it’s what pd.DataFrame() expects by default.
import pandas as pd
# Column-oriented dictionary
employee_data = {
'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'department': ['Engineering', 'Marketing', 'Engineering', 'Sales'],
'salary': [95000, 72000, 88000, 91000],
'years_employed': [3, 5, 2, 7]
}
df = pd.DataFrame(employee_data)
print(df)
Output:
name department salary years_employed
0 Alice Engineering 95000 3
1 Bob Marketing 72000 5
2 Charlie Engineering 88000 2
3 Diana Sales 91000 7
Each dictionary key becomes a column header. Each list becomes the column’s values. The index is automatically assigned as integers starting from 0. This is the pattern you’ll use 80% of the time.
One critical requirement: all lists must have the same length. Pandas can’t construct a rectangular DataFrame from jagged data without explicit instructions on how to handle the mismatch.
Controlling Column Order and Selection
Dictionary key order is preserved in Python 3.7+, but you shouldn’t rely on it when column order matters for your output. The columns parameter gives you explicit control.
# Specify exact column order
df_ordered = pd.DataFrame(
employee_data,
columns=['name', 'salary', 'department', 'years_employed']
)
print(df_ordered)
Output:
name salary department years_employed
0 Alice 95000 Engineering 3
1 Bob 72000 Marketing 5
2 Charlie 88000 Engineering 2
3 Diana 91000 Sales 7
You can also use columns to select a subset of keys:
# Select only specific columns
df_subset = pd.DataFrame(
employee_data,
columns=['name', 'salary']
)
print(df_subset)
Output:
name salary
0 Alice 95000
1 Bob 72000
2 Charlie 88000
3 Diana 91000
If you specify a column name that doesn’t exist in the dictionary, Pandas creates that column filled with NaN values. This behavior can be useful for initializing placeholder columns, but it’s also a source of silent bugs if you mistype a key name.
# Typo creates NaN column instead of raising an error
df_typo = pd.DataFrame(
employee_data,
columns=['name', 'salry'] # typo: 'salry' instead of 'salary'
)
print(df_typo)
Output:
name salry
0 Alice NaN
1 Bob NaN
2 Charlie NaN
3 Diana NaN
This is a gotcha worth remembering. Pandas won’t warn you about the mismatch.
Creating DataFrames from Different Dictionary Structures
Real-world data doesn’t always arrive in the convenient column-oriented format. Here are the three structures you’ll encounter and how to handle each.
Dictionary of Lists (Column-Oriented)
This is the default case we’ve already covered. Keys are columns, values are column data.
# Column-oriented (default)
col_oriented = {
'product': ['Widget', 'Gadget', 'Gizmo'],
'price': [29.99, 49.99, 19.99],
'stock': [150, 75, 300]
}
df = pd.DataFrame(col_oriented)
print(df)
List of Dictionaries (Row-Oriented)
When each dictionary represents a single record, you have row-oriented data. This is common when parsing JSON arrays or collecting results iteratively.
# Row-oriented: each dict is one row
row_oriented = [
{'product': 'Widget', 'price': 29.99, 'stock': 150},
{'product': 'Gadget', 'price': 49.99, 'stock': 75},
{'product': 'Gizmo', 'price': 19.99, 'stock': 300}
]
df = pd.DataFrame(row_oriented)
print(df)
Output:
product price stock
0 Widget 29.99 150
1 Gadget 49.99 75
2 Gizmo 19.99 300
Pandas automatically infers the structure. Each dictionary’s keys become column names, and the values populate the corresponding row. If dictionaries have different keys, Pandas fills missing values with NaN.
Nested Dictionaries (Potential Multi-Index)
Nested dictionaries create a two-level structure where outer keys become columns and inner keys become row indices.
# Nested dictionary
nested = {
'Q1': {'revenue': 100000, 'expenses': 75000, 'profit': 25000},
'Q2': {'revenue': 120000, 'expenses': 80000, 'profit': 40000},
'Q3': {'revenue': 115000, 'expenses': 78000, 'profit': 37000}
}
df = pd.DataFrame(nested)
print(df)
Output:
Q1 Q2 Q3
revenue 100000 120000 115000
expenses 75000 80000 78000
profit 25000 40000 37000
Notice that outer keys (Q1, Q2, Q3) became columns, and inner keys became the index. If you want the opposite orientation—outer keys as rows—transpose the result or use from_dict() with explicit orientation.
# Transpose to flip rows and columns
df_transposed = pd.DataFrame(nested).T
print(df_transposed)
Output:
revenue expenses profit
Q1 100000 75000 25000
Q2 120000 80000 40000
Q3 115000 78000 37000
Setting Custom Indexes
The default integer index works fine for many use cases, but meaningful row labels make data more readable and enable label-based selection.
Using the index Parameter
Pass an index argument directly to the constructor:
dates = ['2024-01-15', '2024-01-16', '2024-01-17', '2024-01-18']
daily_metrics = {
'visitors': [1250, 1340, 1180, 1420],
'signups': [45, 52, 38, 61],
'revenue': [2340.50, 2890.00, 1950.75, 3120.25]
}
df = pd.DataFrame(daily_metrics, index=pd.to_datetime(dates))
print(df)
Output:
visitors signups revenue
2024-01-15 1250 45 2340.50
2024-01-16 1340 52 2890.00
2024-01-17 1180 38 1950.75
2024-01-18 1420 61 3120.25
Using set_index() After Creation
If one of your dictionary keys should be the index, create the DataFrame first, then promote that column:
employee_data = {
'employee_id': ['E001', 'E002', 'E003', 'E004'],
'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
'department': ['Engineering', 'Marketing', 'Engineering', 'Sales']
}
df = pd.DataFrame(employee_data).set_index('employee_id')
print(df)
Output:
name department
employee_id
E001 Alice Engineering
E002 Bob Marketing
E003 Charlie Engineering
E004 Diana Sales
Handling Missing or Uneven Data
Real data is messy. Dictionaries might have lists of different lengths, or row-oriented dictionaries might have inconsistent keys. Here’s how Pandas handles these cases.
Uneven Lists with from_dict()
The standard pd.DataFrame() constructor raises a ValueError if list lengths don’t match. For more control, use pd.DataFrame.from_dict() with the orient parameter.
# This would fail with pd.DataFrame()
uneven_data = {
'A': [1, 2, 3],
'B': [4, 5], # shorter list
'C': [6, 7, 8, 9] # longer list
}
# Using orient='index' treats keys as rows, values as row data
df = pd.DataFrame.from_dict(uneven_data, orient='index')
print(df)
Output:
0 1 2 3
A 1.0 2.0 3.0 NaN
B 4.0 5.0 NaN NaN
C 6.0 7.0 8.0 9.0
The orient='index' parameter interprets dictionary keys as row labels and values as row data. Pandas pads shorter rows with NaN to create a rectangular DataFrame.
Missing Keys in Row-Oriented Data
When using a list of dictionaries, missing keys are automatically filled with NaN:
incomplete_records = [
{'name': 'Alice', 'age': 30, 'city': 'NYC'},
{'name': 'Bob', 'age': 25}, # missing 'city'
{'name': 'Charlie', 'city': 'LA'} # missing 'age'
]
df = pd.DataFrame(incomplete_records)
print(df)
Output:
name age city
0 Alice 30.0 NYC
1 Bob 25.0 NaN
2 Charlie NaN LA
This behavior is helpful when working with inconsistent API responses or user-submitted data, but always check for unexpected NaN values before analysis.
Conclusion
Creating DataFrames from dictionaries comes down to understanding your data’s structure:
- Column-oriented dictionaries (keys = columns, values = lists): Use
pd.DataFrame(dict)directly. This is the most common case. - Row-oriented data (list of dictionaries): Pass the list to
pd.DataFrame(). Pandas infers columns from dictionary keys. - Nested dictionaries: Use
pd.DataFrame(dict)for outer-keys-as-columns, or transpose for outer-keys-as-rows. - Uneven or messy data: Use
pd.DataFrame.from_dict()withorient='index'for explicit control over how keys map to the DataFrame structure.
Start with the simplest approach—pd.DataFrame(your_dict)—and reach for from_dict() only when you need to handle edge cases or non-standard structures. The goal is readable, maintainable code, not clever one-liners.