How to Create a Scatter Plot in Plotly
Plotly stands out among Python visualization libraries for its interactive capabilities and publication-ready output. Scatter plots are fundamental for exploring relationships between continuous...
Key Insights
- Plotly Express provides a high-level API for creating scatter plots in one line, while Graph Objects offers granular control for complex visualizations
- Scatter plots in Plotly are interactive by default with built-in zoom, pan, and hover capabilities that require zero configuration
- For datasets exceeding 100,000 points, use
scatterglinstead ofscatterto leverage WebGL rendering and maintain smooth performance
Introduction to Plotly Scatter Plots
Plotly stands out among Python visualization libraries for its interactive capabilities and publication-ready output. Scatter plots are fundamental for exploring relationships between continuous variables, identifying clusters, and detecting outliers in your data.
Plotly offers two distinct approaches for creating scatter plots. Plotly Express (px) provides a concise, declarative syntax ideal for rapid exploration and standard visualizations. Graph Objects (go) gives you fine-grained control over every visual element, necessary for complex multi-trace plots or highly customized dashboards.
Use scatter plots when you need to visualize correlation between two numeric variables, display distributions across categories, or create interactive exploratory tools for stakeholders who aren’t comfortable with code.
Basic Scatter Plot with Plotly Express
Plotly Express reduces scatter plot creation to a single function call. Here’s the minimal viable implementation:
import plotly.express as px
import pandas as pd
# Create sample data
df = pd.DataFrame({
'hours_studied': [1, 2, 3, 4, 5, 6, 7, 8],
'test_score': [65, 70, 75, 80, 82, 88, 90, 95],
'student_type': ['part-time', 'full-time', 'part-time', 'full-time',
'full-time', 'full-time', 'part-time', 'full-time']
})
fig = px.scatter(df, x='hours_studied', y='test_score')
fig.show()
This creates a fully interactive scatter plot with automatic axis labels derived from column names. The plot includes hover tooltips, zoom controls, and pan functionality without additional configuration.
For real-world analysis, you’ll typically work with existing datasets:
# Using the built-in iris dataset
df = px.data.iris()
fig = px.scatter(df, x='sepal_width', y='sepal_length')
fig.show()
The iris dataset comes bundled with Plotly Express, making it perfect for testing visualizations before applying them to your production data.
Customizing Scatter Plot Appearance
Basic plots rarely meet presentation standards. Plotly Express makes customization straightforward through function parameters:
df = px.data.iris()
fig = px.scatter(
df,
x='sepal_width',
y='sepal_length',
color='species', # Color by category
size='petal_length', # Marker size by value
hover_data=['petal_width'], # Additional hover info
title='Iris Dataset: Sepal Dimensions by Species',
labels={
'sepal_width': 'Sepal Width (cm)',
'sepal_length': 'Sepal Length (cm)'
},
color_discrete_map={ # Custom color scheme
'setosa': '#FF6B6B',
'versicolor': '#4ECDC4',
'virginica': '#45B7D1'
}
)
# Further customization via update methods
fig.update_traces(
marker=dict(
line=dict(width=1, color='DarkSlateGray'),
opacity=0.8
)
)
fig.update_layout(
font=dict(size=14),
plot_bgcolor='rgba(240, 240, 240, 0.5)',
hovermode='closest'
)
fig.show()
The color parameter automatically creates a legend and assigns distinct colors per category. The size parameter maps a numeric column to marker size, creating a bubble chart effect. Use labels to override default axis titles with more descriptive text.
The update_traces() and update_layout() methods provide post-creation customization. This two-step approach—initial creation with Express, refinement with update methods—balances convenience and control.
Advanced Features: Hover Data and Interactivity
Default hover tooltips show x and y values, but custom templates provide richer context:
df = pd.DataFrame({
'product': ['Widget A', 'Widget B', 'Widget C', 'Widget D', 'Widget E'],
'revenue': [120000, 95000, 180000, 150000, 110000],
'units_sold': [450, 380, 620, 510, 425],
'profit_margin': [0.32, 0.28, 0.35, 0.31, 0.29]
})
fig = px.scatter(
df,
x='units_sold',
y='revenue',
size='profit_margin',
hover_name='product',
hover_data={
'units_sold': ':,', # Thousand separator
'revenue': ':$,.0f', # Currency format
'profit_margin': ':.1%' # Percentage format
},
title='Product Performance Analysis'
)
fig.update_traces(
hovertemplate='<b>%{hovertext}</b><br>' +
'Units Sold: %{x:,}<br>' +
'Revenue: $%{y:,.0f}<br>' +
'Profit Margin: %{marker.size:.1%}<br>' +
'<extra></extra>' # Removes secondary box
)
fig.show()
The hovertemplate uses a specific syntax: %{x} references x-values, %{y} references y-values, and %{marker.size} accesses the size dimension. Format specifiers follow Python’s format string syntax. The <extra></extra> tag removes the default secondary hover box that shows trace names.
For datasets requiring drill-down capabilities, store additional metadata in the DataFrame and reference it in hover templates using customdata:
df['details_url'] = df['product'].apply(lambda x: f"https://example.com/{x.replace(' ', '-')}")
fig = px.scatter(df, x='units_sold', y='revenue', custom_data=['details_url'])
fig.update_traces(hovertemplate='%{customdata[0]}')
Using Graph Objects for Fine-Grained Control
Graph Objects become necessary when you need multiple traces with different styling, complex annotations, or subplot arrangements:
import plotly.graph_objects as go
import numpy as np
# Generate sample data for three product lines
np.random.seed(42)
quarters = np.arange(1, 13)
fig = go.Figure()
# Add first product line
fig.add_trace(go.Scatter(
x=quarters,
y=50 + np.random.randn(12) * 10,
mode='markers',
name='Product Line A',
marker=dict(
size=12,
color='rgba(255, 107, 107, 0.8)',
symbol='circle',
line=dict(width=2, color='DarkRed')
)
))
# Add second product line with different marker style
fig.add_trace(go.Scatter(
x=quarters,
y=75 + np.random.randn(12) * 8,
mode='markers',
name='Product Line B',
marker=dict(
size=12,
color='rgba(78, 205, 196, 0.8)',
symbol='diamond',
line=dict(width=2, color='DarkCyan')
)
))
# Add trend line
fig.add_trace(go.Scatter(
x=quarters,
y=50 + quarters * 2,
mode='lines',
name='Growth Target',
line=dict(color='gray', width=2, dash='dash')
))
fig.update_layout(
title='Quarterly Sales Performance',
xaxis_title='Quarter',
yaxis_title='Revenue ($M)',
legend=dict(x=0.02, y=0.98),
hovermode='x unified'
)
fig.show()
The mode parameter controls whether traces render as markers, lines, or both ('markers+lines'). The hovermode='x unified' setting displays all trace values at a given x-coordinate in a single hover box—useful for comparing multiple series.
Graph Objects require more verbose code but expose every configurable property. Use them when Plotly Express limitations become apparent, typically when building dashboards or publication figures.
Styling and Layout Options
Professional visualizations require attention to typography, spacing, and visual hierarchy:
df = px.data.gapminder().query("year == 2007")
fig = px.scatter(
df,
x='gdpPercap',
y='lifeExp',
size='pop',
color='continent',
log_x=True, # Logarithmic scale for GDP
size_max=60
)
fig.update_layout(
template='plotly_white', # Clean theme
title={
'text': 'Global Life Expectancy vs GDP (2007)',
'x': 0.5,
'xanchor': 'center',
'font': {'size': 20, 'family': 'Arial Black'}
},
xaxis={
'title': 'GDP per Capita (log scale)',
'gridcolor': 'lightgray',
'showline': True,
'linewidth': 2,
'linecolor': 'black'
},
yaxis={
'title': 'Life Expectancy (years)',
'gridcolor': 'lightgray',
'showline': True,
'linewidth': 2,
'linecolor': 'black'
},
legend={
'title': {'text': 'Continent'},
'orientation': 'v',
'x': 1.02,
'y': 1
},
width=1000,
height=600
)
# Add annotation
fig.add_annotation(
x=np.log10(40000),
y=50,
text="High GDP doesn't<br>guarantee longevity",
showarrow=True,
arrowhead=2,
arrowsize=1,
arrowwidth=2,
arrowcolor='red',
ax=-80,
ay=-40
)
fig.show()
Plotly includes several built-in templates: plotly, plotly_white, plotly_dark, ggplot2, seaborn, and simple_white. Choose templates based on your presentation medium—plotly_white works well for reports, while plotly_dark suits presentations.
Export plots for static use with fig.write_image('plot.png') or fig.write_html('plot.html') for interactive sharing. The HTML export includes the full Plotly.js library, creating self-contained interactive visualizations.
Best Practices and Performance Tips
Choose the right tool: Start with Plotly Express for exploration and standard plots. Switch to Graph Objects only when you need features Express doesn’t support, like multiple y-axes or custom shapes.
Optimize large datasets: For scatter plots exceeding 100,000 points, use px.scatter_gl() instead of px.scatter(). The WebGL renderer maintains interactivity with datasets containing millions of points:
# Standard scatter becomes sluggish with large data
# fig = px.scatter(large_df, x='x', y='y') # Slow
# WebGL version handles large datasets efficiently
fig = px.scatter_gl(large_df, x='x', y='y') # Fast
Downsample when appropriate: If your scatter plot contains millions of points but will be viewed at screen resolution, consider downsampling or aggregating data before visualization. A 1920x1080 display can’t show more than ~2 million distinct pixels.
Use categorical colors sparingly: Limit color-coded categories to 8-10 maximum. Beyond this, colors become difficult to distinguish and legends grow unwieldy.
Test interactivity: Always test zoom, pan, and hover behavior with representative data. Hover templates that work for small datasets may perform poorly with thousands of points.
Version your plots: Save plot configurations as functions or classes, especially for recurring reports. This makes updates consistent across multiple visualizations and simplifies maintenance.
Plotly scatter plots excel at creating interactive, publication-ready visualizations with minimal code. Master both Express and Graph Objects APIs to handle everything from quick exploratory plots to complex multi-trace dashboards.