How to Create a DataFrame from a Dictionary in Pandas

When you're working with Pandas, the DataFrame is everything. It's the central data structure you'll manipulate, analyze, and transform. And more often than not, your data starts life as a Python...

Key Insights

  • Dictionaries map naturally to DataFrames: keys become column names, values become column data, making pd.DataFrame(dict) the most intuitive way to create structured data in Pandas.
  • The structure of your dictionary determines your approach: column-oriented dictionaries (keys → columns) use the default constructor, while row-oriented data (list of dicts) requires understanding how Pandas infers structure.
  • Use pd.DataFrame.from_dict() with the orient parameter when you need explicit control over how dictionary keys and values map to rows versus columns.

Why Dictionaries Are the Natural Starting Point

When you’re working with Pandas, the DataFrame is everything. It’s the central data structure you’ll manipulate, analyze, and transform. And more often than not, your data starts life as a Python dictionary—whether you’re parsing JSON from an API, collecting results from a loop, or simply organizing data in your code.

The good news: Pandas makes dictionary-to-DataFrame conversion straightforward. The complexity comes from understanding which method to use based on your dictionary’s structure. Let’s break down every approach you’ll need.

Basic Dictionary to DataFrame Conversion

The simplest case is a dictionary where each key represents a column name and each value is a list of data for that column. This is column-oriented data, and it’s what pd.DataFrame() expects by default.

import pandas as pd

# Column-oriented dictionary
employee_data = {
    'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'department': ['Engineering', 'Marketing', 'Engineering', 'Sales'],
    'salary': [95000, 72000, 88000, 91000],
    'years_employed': [3, 5, 2, 7]
}

df = pd.DataFrame(employee_data)
print(df)

Output:

      name   department  salary  years_employed
0    Alice  Engineering   95000               3
1      Bob    Marketing   72000               5
2  Charlie  Engineering   88000               2
3    Diana        Sales   91000               7

Each dictionary key becomes a column header. Each list becomes the column’s values. The index is automatically assigned as integers starting from 0. This is the pattern you’ll use 80% of the time.

One critical requirement: all lists must have the same length. Pandas can’t construct a rectangular DataFrame from jagged data without explicit instructions on how to handle the mismatch.

Controlling Column Order and Selection

Dictionary key order is preserved in Python 3.7+, but you shouldn’t rely on it when column order matters for your output. The columns parameter gives you explicit control.

# Specify exact column order
df_ordered = pd.DataFrame(
    employee_data,
    columns=['name', 'salary', 'department', 'years_employed']
)
print(df_ordered)

Output:

      name  salary   department  years_employed
0    Alice   95000  Engineering               3
1      Bob   72000    Marketing               5
2  Charlie   88000  Engineering               2
3    Diana   91000        Sales               7

You can also use columns to select a subset of keys:

# Select only specific columns
df_subset = pd.DataFrame(
    employee_data,
    columns=['name', 'salary']
)
print(df_subset)

Output:

      name  salary
0    Alice   95000
1      Bob   72000
2  Charlie   88000
3    Diana   91000

If you specify a column name that doesn’t exist in the dictionary, Pandas creates that column filled with NaN values. This behavior can be useful for initializing placeholder columns, but it’s also a source of silent bugs if you mistype a key name.

# Typo creates NaN column instead of raising an error
df_typo = pd.DataFrame(
    employee_data,
    columns=['name', 'salry']  # typo: 'salry' instead of 'salary'
)
print(df_typo)

Output:

      name salry
0    Alice   NaN
1      Bob   NaN
2  Charlie   NaN
3    Diana   NaN

This is a gotcha worth remembering. Pandas won’t warn you about the mismatch.

Creating DataFrames from Different Dictionary Structures

Real-world data doesn’t always arrive in the convenient column-oriented format. Here are the three structures you’ll encounter and how to handle each.

Dictionary of Lists (Column-Oriented)

This is the default case we’ve already covered. Keys are columns, values are column data.

# Column-oriented (default)
col_oriented = {
    'product': ['Widget', 'Gadget', 'Gizmo'],
    'price': [29.99, 49.99, 19.99],
    'stock': [150, 75, 300]
}

df = pd.DataFrame(col_oriented)
print(df)

List of Dictionaries (Row-Oriented)

When each dictionary represents a single record, you have row-oriented data. This is common when parsing JSON arrays or collecting results iteratively.

# Row-oriented: each dict is one row
row_oriented = [
    {'product': 'Widget', 'price': 29.99, 'stock': 150},
    {'product': 'Gadget', 'price': 49.99, 'stock': 75},
    {'product': 'Gizmo', 'price': 19.99, 'stock': 300}
]

df = pd.DataFrame(row_oriented)
print(df)

Output:

  product  price  stock
0  Widget  29.99    150
1  Gadget  49.99     75
2   Gizmo  19.99    300

Pandas automatically infers the structure. Each dictionary’s keys become column names, and the values populate the corresponding row. If dictionaries have different keys, Pandas fills missing values with NaN.

Nested Dictionaries (Potential Multi-Index)

Nested dictionaries create a two-level structure where outer keys become columns and inner keys become row indices.

# Nested dictionary
nested = {
    'Q1': {'revenue': 100000, 'expenses': 75000, 'profit': 25000},
    'Q2': {'revenue': 120000, 'expenses': 80000, 'profit': 40000},
    'Q3': {'revenue': 115000, 'expenses': 78000, 'profit': 37000}
}

df = pd.DataFrame(nested)
print(df)

Output:

              Q1      Q2      Q3
revenue   100000  120000  115000
expenses   75000   80000   78000
profit     25000   40000   37000

Notice that outer keys (Q1, Q2, Q3) became columns, and inner keys became the index. If you want the opposite orientation—outer keys as rows—transpose the result or use from_dict() with explicit orientation.

# Transpose to flip rows and columns
df_transposed = pd.DataFrame(nested).T
print(df_transposed)

Output:

    revenue  expenses  profit
Q1   100000     75000   25000
Q2   120000     80000   40000
Q3   115000     78000   37000

Setting Custom Indexes

The default integer index works fine for many use cases, but meaningful row labels make data more readable and enable label-based selection.

Using the index Parameter

Pass an index argument directly to the constructor:

dates = ['2024-01-15', '2024-01-16', '2024-01-17', '2024-01-18']

daily_metrics = {
    'visitors': [1250, 1340, 1180, 1420],
    'signups': [45, 52, 38, 61],
    'revenue': [2340.50, 2890.00, 1950.75, 3120.25]
}

df = pd.DataFrame(daily_metrics, index=pd.to_datetime(dates))
print(df)

Output:

            visitors  signups   revenue
2024-01-15      1250       45   2340.50
2024-01-16      1340       52   2890.00
2024-01-17      1180       38   1950.75
2024-01-18      1420       61   3120.25

Using set_index() After Creation

If one of your dictionary keys should be the index, create the DataFrame first, then promote that column:

employee_data = {
    'employee_id': ['E001', 'E002', 'E003', 'E004'],
    'name': ['Alice', 'Bob', 'Charlie', 'Diana'],
    'department': ['Engineering', 'Marketing', 'Engineering', 'Sales']
}

df = pd.DataFrame(employee_data).set_index('employee_id')
print(df)

Output:

                 name   department
employee_id                       
E001            Alice  Engineering
E002              Bob    Marketing
E003          Charlie  Engineering
E004            Diana        Sales

Handling Missing or Uneven Data

Real data is messy. Dictionaries might have lists of different lengths, or row-oriented dictionaries might have inconsistent keys. Here’s how Pandas handles these cases.

Uneven Lists with from_dict()

The standard pd.DataFrame() constructor raises a ValueError if list lengths don’t match. For more control, use pd.DataFrame.from_dict() with the orient parameter.

# This would fail with pd.DataFrame()
uneven_data = {
    'A': [1, 2, 3],
    'B': [4, 5],  # shorter list
    'C': [6, 7, 8, 9]  # longer list
}

# Using orient='index' treats keys as rows, values as row data
df = pd.DataFrame.from_dict(uneven_data, orient='index')
print(df)

Output:

     0    1    2    3
A  1.0  2.0  3.0  NaN
B  4.0  5.0  NaN  NaN
C  6.0  7.0  8.0  9.0

The orient='index' parameter interprets dictionary keys as row labels and values as row data. Pandas pads shorter rows with NaN to create a rectangular DataFrame.

Missing Keys in Row-Oriented Data

When using a list of dictionaries, missing keys are automatically filled with NaN:

incomplete_records = [
    {'name': 'Alice', 'age': 30, 'city': 'NYC'},
    {'name': 'Bob', 'age': 25},  # missing 'city'
    {'name': 'Charlie', 'city': 'LA'}  # missing 'age'
]

df = pd.DataFrame(incomplete_records)
print(df)

Output:

      name   age city
0    Alice  30.0  NYC
1      Bob  25.0  NaN
2  Charlie   NaN   LA

This behavior is helpful when working with inconsistent API responses or user-submitted data, but always check for unexpected NaN values before analysis.

Conclusion

Creating DataFrames from dictionaries comes down to understanding your data’s structure:

  • Column-oriented dictionaries (keys = columns, values = lists): Use pd.DataFrame(dict) directly. This is the most common case.
  • Row-oriented data (list of dictionaries): Pass the list to pd.DataFrame(). Pandas infers columns from dictionary keys.
  • Nested dictionaries: Use pd.DataFrame(dict) for outer-keys-as-columns, or transpose for outer-keys-as-rows.
  • Uneven or messy data: Use pd.DataFrame.from_dict() with orient='index' for explicit control over how keys map to the DataFrame structure.

Start with the simplest approach—pd.DataFrame(your_dict)—and reach for from_dict() only when you need to handle edge cases or non-standard structures. The goal is readable, maintainable code, not clever one-liners.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.