Pandas - Create DataFrame from List | Application Architect

Key Insights

DataFrames can be created from simple lists, lists of lists, or lists of dictionaries, each serving different data structure needs
Column names and indexes can be explicitly defined during DataFrame creation or modified afterward for better data organization
Understanding the orientation of your list data (row-wise vs column-wise) is critical for correctly structuring your DataFrame

Creating DataFrames from Simple Lists

A simple Python list becomes a single-column DataFrame by default. This is the most straightforward conversion when you have a one-dimensional dataset.

import pandas as pd

# Single column from a simple list
fruits = ['apple', 'banana', 'orange', 'grape', 'mango']
df = pd.DataFrame(fruits)
print(df)

Output:

        0
0   apple
1  banana
2  orange
3   grape
4   mango

To assign a meaningful column name instead of the default numeric index:

df = pd.DataFrame(fruits, columns=['fruit_name'])
print(df)

Output:

  fruit_name
0      apple
1     banana
2     orange
3      grape
4      mango

Creating DataFrames from Lists of Lists

Lists of lists represent tabular data where each inner list corresponds to a row. This is the most common pattern for creating multi-column DataFrames.

# Each inner list is a row
data = [
    ['Alice', 25, 'New York'],
    ['Bob', 30, 'San Francisco'],
    ['Charlie', 35, 'Los Angeles'],
    ['Diana', 28, 'Chicago']
]

df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])
print(df)

Output:

      Name  Age           City
0    Alice   25       New York
1      Bob   30  San Francisco
2  Charlie   35    Los Angeles
3    Diana   28        Chicago

You can also specify custom index values:

df = pd.DataFrame(
    data, 
    columns=['Name', 'Age', 'City'],
    index=['emp001', 'emp002', 'emp003', 'emp004']
)
print(df)

Output:

            Name  Age           City
emp001     Alice   25       New York
emp002       Bob   30  San Francisco
emp003   Charlie   35    Los Angeles
emp004     Diana   28        Chicago

Creating DataFrames from Lists of Dictionaries

Lists of dictionaries provide the most flexible approach, as each dictionary represents a row with keys as column names. This format handles missing values gracefully.

# Each dictionary is a row
records = [
    {'product': 'Laptop', 'price': 1200, 'quantity': 5},
    {'product': 'Mouse', 'price': 25, 'quantity': 50},
    {'product': 'Keyboard', 'price': 75, 'quantity': 30},
    {'product': 'Monitor', 'price': 300, 'quantity': 15}
]

df = pd.DataFrame(records)
print(df)

Output:

    product  price  quantity
0    Laptop   1200         5
1     Mouse     25        50
2  Keyboard     75        30
3   Monitor    300        15

Handling missing values with lists of dictionaries:

records_incomplete = [
    {'product': 'Laptop', 'price': 1200, 'quantity': 5, 'warranty': '2 years'},
    {'product': 'Mouse', 'price': 25, 'quantity': 50},
    {'product': 'Keyboard', 'price': 75, 'warranty': '1 year'},
    {'product': 'Monitor', 'price': 300, 'quantity': 15}
]

df = pd.DataFrame(records_incomplete)
print(df)

Output:

    product  price  quantity  warranty
0    Laptop   1200       5.0   2 years
1     Mouse     25      50.0       NaN
2  Keyboard     75       NaN   1 year
3   Monitor    300      15.0       NaN

Creating Column-Oriented DataFrames from Dictionary of Lists

When your data is organized by columns rather than rows, use a dictionary where keys are column names and values are lists of column data.

# Dictionary of lists - each list is a column
data_dict = {
    'date': ['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04'],
    'temperature': [72, 75, 68, 71],
    'humidity': [45, 50, 55, 48]
}

df = pd.DataFrame(data_dict)
print(df)

Output:

         date  temperature  humidity
0  2024-01-01           72        45
1  2024-01-02           75        50
2  2024-01-03           68        55
3  2024-01-04           71        48

All lists in the dictionary must have the same length, or pandas will raise a ValueError:

# This will raise an error
try:
    bad_data = {
        'col1': [1, 2, 3],
        'col2': [4, 5]  # Different length
    }
    df = pd.DataFrame(bad_data)
except ValueError as e:
    print(f"Error: {e}")

Using List Comprehensions for Dynamic DataFrame Creation

List comprehensions enable dynamic DataFrame generation from computed values or filtered data.

# Generate DataFrame from computed values
import math

angles = [0, 30, 45, 60, 90]
trig_data = [
    {
        'angle': angle,
        'radians': math.radians(angle),
        'sine': round(math.sin(math.radians(angle)), 4),
        'cosine': round(math.cos(math.radians(angle)), 4)
    }
    for angle in angles
]

df = pd.DataFrame(trig_data)
print(df)

Output:

   angle  radians    sine  cosine
0      0  0.00000  0.0000  1.0000
1     30  0.52360  0.5000  0.8660
2     45  0.78540  0.7071  0.7071
3     60  1.04720  0.8660  0.5000
4     90  1.57080  1.0000  0.0000

Filtering data during DataFrame creation:

# Create DataFrame from filtered list
numbers = range(1, 21)
even_squares = [
    {'number': n, 'square': n**2}
    for n in numbers
    if n % 2 == 0
]

df = pd.DataFrame(even_squares)
print(df)

Output:

   number  square
0       2       4
1       4      16
2       6      36
3       8      64
4      10     100
5      12     144
6      14     196
7      16     256
8      18     324
9      20     400

Specifying Data Types During Creation

Control data types explicitly to optimize memory usage and ensure correct data handling.

# Specify data types
data = [
    ['001', '100', '2024-01-15'],
    ['002', '200', '2024-01-16'],
    ['003', '150', '2024-01-17']
]

df = pd.DataFrame(
    data,
    columns=['order_id', 'amount', 'date']
)

# Convert during creation using astype after
df = df.astype({
    'order_id': 'string',
    'amount': 'int64'
})
df['date'] = pd.to_datetime(df['date'])

print(df.dtypes)
print("\n", df)

Output:

order_id            string
amount               int64
date        datetime64[ns]
dtype: object

  order_id  amount       date
0      001     100 2024-01-15
1      002     200 2024-01-16
2      003     150 2024-01-17

Creating DataFrames with MultiIndex from Lists

Hierarchical indexing enables complex data structures from nested lists.

# Create MultiIndex DataFrame
data = [
    ['Q1', 'Jan', 1000, 800],
    ['Q1', 'Feb', 1200, 900],
    ['Q1', 'Mar', 1100, 850],
    ['Q2', 'Apr', 1300, 950],
    ['Q2', 'May', 1400, 1000],
    ['Q2', 'Jun', 1350, 980]
]

df = pd.DataFrame(
    data,
    columns=['Quarter', 'Month', 'Revenue', 'Costs']
)

# Set MultiIndex
df = df.set_index(['Quarter', 'Month'])
print(df)

Output:

               Revenue  Costs
Quarter Month                
Q1      Jan       1000    800
        Feb       1200    900
        Mar       1100    850
Q2      Apr       1300    950
        May       1400   1000
        Jun       1350    980

Creating DataFrames from lists is fundamental to pandas workflows. Choose the list structure that matches your data organization: simple lists for single columns, lists of lists for row-oriented data, lists of dictionaries for flexible schemas with potential missing values, and dictionary of lists for column-oriented data. Understanding these patterns enables efficient data ingestion and manipulation in your data processing pipelines.

Creating DataFrames from Simple Lists

Creating DataFrames from Lists of Lists

Creating DataFrames from Lists of Dictionaries

Creating Column-Oriented DataFrames from Dictionary of Lists

Using List Comprehensions for Dynamic DataFrame Creation

Specifying Data Types During Creation

Creating DataFrames with MultiIndex from Lists

Liked this? There's more.