Pandas - Create DataFrame from List
A simple Python list becomes a single-column DataFrame by default. This is the most straightforward conversion when you have a one-dimensional dataset.
Key Insights
- DataFrames can be created from simple lists, lists of lists, or lists of dictionaries, each serving different data structure needs
- Column names and indexes can be explicitly defined during DataFrame creation or modified afterward for better data organization
- Understanding the orientation of your list data (row-wise vs column-wise) is critical for correctly structuring your DataFrame
Creating DataFrames from Simple Lists
A simple Python list becomes a single-column DataFrame by default. This is the most straightforward conversion when you have a one-dimensional dataset.
import pandas as pd
# Single column from a simple list
fruits = ['apple', 'banana', 'orange', 'grape', 'mango']
df = pd.DataFrame(fruits)
print(df)
Output:
0
0 apple
1 banana
2 orange
3 grape
4 mango
To assign a meaningful column name instead of the default numeric index:
df = pd.DataFrame(fruits, columns=['fruit_name'])
print(df)
Output:
fruit_name
0 apple
1 banana
2 orange
3 grape
4 mango
Creating DataFrames from Lists of Lists
Lists of lists represent tabular data where each inner list corresponds to a row. This is the most common pattern for creating multi-column DataFrames.
# Each inner list is a row
data = [
['Alice', 25, 'New York'],
['Bob', 30, 'San Francisco'],
['Charlie', 35, 'Los Angeles'],
['Diana', 28, 'Chicago']
]
df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])
print(df)
Output:
Name Age City
0 Alice 25 New York
1 Bob 30 San Francisco
2 Charlie 35 Los Angeles
3 Diana 28 Chicago
You can also specify custom index values:
df = pd.DataFrame(
data,
columns=['Name', 'Age', 'City'],
index=['emp001', 'emp002', 'emp003', 'emp004']
)
print(df)
Output:
Name Age City
emp001 Alice 25 New York
emp002 Bob 30 San Francisco
emp003 Charlie 35 Los Angeles
emp004 Diana 28 Chicago
Creating DataFrames from Lists of Dictionaries
Lists of dictionaries provide the most flexible approach, as each dictionary represents a row with keys as column names. This format handles missing values gracefully.
# Each dictionary is a row
records = [
{'product': 'Laptop', 'price': 1200, 'quantity': 5},
{'product': 'Mouse', 'price': 25, 'quantity': 50},
{'product': 'Keyboard', 'price': 75, 'quantity': 30},
{'product': 'Monitor', 'price': 300, 'quantity': 15}
]
df = pd.DataFrame(records)
print(df)
Output:
product price quantity
0 Laptop 1200 5
1 Mouse 25 50
2 Keyboard 75 30
3 Monitor 300 15
Handling missing values with lists of dictionaries:
records_incomplete = [
{'product': 'Laptop', 'price': 1200, 'quantity': 5, 'warranty': '2 years'},
{'product': 'Mouse', 'price': 25, 'quantity': 50},
{'product': 'Keyboard', 'price': 75, 'warranty': '1 year'},
{'product': 'Monitor', 'price': 300, 'quantity': 15}
]
df = pd.DataFrame(records_incomplete)
print(df)
Output:
product price quantity warranty
0 Laptop 1200 5.0 2 years
1 Mouse 25 50.0 NaN
2 Keyboard 75 NaN 1 year
3 Monitor 300 15.0 NaN
Creating Column-Oriented DataFrames from Dictionary of Lists
When your data is organized by columns rather than rows, use a dictionary where keys are column names and values are lists of column data.
# Dictionary of lists - each list is a column
data_dict = {
'date': ['2024-01-01', '2024-01-02', '2024-01-03', '2024-01-04'],
'temperature': [72, 75, 68, 71],
'humidity': [45, 50, 55, 48]
}
df = pd.DataFrame(data_dict)
print(df)
Output:
date temperature humidity
0 2024-01-01 72 45
1 2024-01-02 75 50
2 2024-01-03 68 55
3 2024-01-04 71 48
All lists in the dictionary must have the same length, or pandas will raise a ValueError:
# This will raise an error
try:
bad_data = {
'col1': [1, 2, 3],
'col2': [4, 5] # Different length
}
df = pd.DataFrame(bad_data)
except ValueError as e:
print(f"Error: {e}")
Using List Comprehensions for Dynamic DataFrame Creation
List comprehensions enable dynamic DataFrame generation from computed values or filtered data.
# Generate DataFrame from computed values
import math
angles = [0, 30, 45, 60, 90]
trig_data = [
{
'angle': angle,
'radians': math.radians(angle),
'sine': round(math.sin(math.radians(angle)), 4),
'cosine': round(math.cos(math.radians(angle)), 4)
}
for angle in angles
]
df = pd.DataFrame(trig_data)
print(df)
Output:
angle radians sine cosine
0 0 0.00000 0.0000 1.0000
1 30 0.52360 0.5000 0.8660
2 45 0.78540 0.7071 0.7071
3 60 1.04720 0.8660 0.5000
4 90 1.57080 1.0000 0.0000
Filtering data during DataFrame creation:
# Create DataFrame from filtered list
numbers = range(1, 21)
even_squares = [
{'number': n, 'square': n**2}
for n in numbers
if n % 2 == 0
]
df = pd.DataFrame(even_squares)
print(df)
Output:
number square
0 2 4
1 4 16
2 6 36
3 8 64
4 10 100
5 12 144
6 14 196
7 16 256
8 18 324
9 20 400
Specifying Data Types During Creation
Control data types explicitly to optimize memory usage and ensure correct data handling.
# Specify data types
data = [
['001', '100', '2024-01-15'],
['002', '200', '2024-01-16'],
['003', '150', '2024-01-17']
]
df = pd.DataFrame(
data,
columns=['order_id', 'amount', 'date']
)
# Convert during creation using astype after
df = df.astype({
'order_id': 'string',
'amount': 'int64'
})
df['date'] = pd.to_datetime(df['date'])
print(df.dtypes)
print("\n", df)
Output:
order_id string
amount int64
date datetime64[ns]
dtype: object
order_id amount date
0 001 100 2024-01-15
1 002 200 2024-01-16
2 003 150 2024-01-17
Creating DataFrames with MultiIndex from Lists
Hierarchical indexing enables complex data structures from nested lists.
# Create MultiIndex DataFrame
data = [
['Q1', 'Jan', 1000, 800],
['Q1', 'Feb', 1200, 900],
['Q1', 'Mar', 1100, 850],
['Q2', 'Apr', 1300, 950],
['Q2', 'May', 1400, 1000],
['Q2', 'Jun', 1350, 980]
]
df = pd.DataFrame(
data,
columns=['Quarter', 'Month', 'Revenue', 'Costs']
)
# Set MultiIndex
df = df.set_index(['Quarter', 'Month'])
print(df)
Output:
Revenue Costs
Quarter Month
Q1 Jan 1000 800
Feb 1200 900
Mar 1100 850
Q2 Apr 1300 950
May 1400 1000
Jun 1350 980
Creating DataFrames from lists is fundamental to pandas workflows. Choose the list structure that matches your data organization: simple lists for single columns, lists of lists for row-oriented data, lists of dictionaries for flexible schemas with potential missing values, and dictionary of lists for column-oriented data. Understanding these patterns enables efficient data ingestion and manipulation in your data processing pipelines.