How to Create a Bubble Chart in Plotly

Bubble charts extend traditional scatter plots by adding a third dimension through bubble size, with an optional fourth dimension represented by color. Each bubble's position on the x and y axes...

Key Insights

  • Bubble charts visualize three to four dimensions of data simultaneously using x-position, y-position, bubble size, and optionally color—making them ideal for exploring correlations with magnitude or comparing entities across multiple metrics.
  • Plotly’s sizeref and sizemode parameters are critical for proper bubble scaling; incorrect values create either microscopic dots or overlapping blobs that obscure your data.
  • Bubble charts fail when you have too many data points (causing overlap) or when the size dimension doesn’t add meaningful information—in these cases, switch to scatter plots or alternative visualizations.

Introduction to Bubble Charts

Bubble charts extend traditional scatter plots by adding a third dimension through bubble size, with an optional fourth dimension represented by color. Each bubble’s position on the x and y axes represents two variables, while the bubble’s area encodes a third metric. This makes bubble charts particularly effective for comparing entities across multiple dimensions simultaneously.

Use bubble charts when you need to visualize relationships between three or more variables. Common scenarios include comparing products by price, sales volume, and profit margin; analyzing companies by revenue, growth rate, and market share; or exploring geographic data with population, GDP per capita, and life expectancy. The key requirement is that your size dimension should represent magnitude or importance—something where “bigger” or “smaller” carries meaningful information.

Bubble charts work best with 10-50 data points. Too few, and you’re wasting the visualization’s potential. Too many, and bubbles overlap into an unreadable mess.

Setting Up Your Environment

Install Plotly and its common companions using pip. Plotly works standalone, but you’ll typically use it alongside pandas for data manipulation and numpy for numerical operations.

pip install plotly pandas numpy

Verify your installation with a quick import check:

import plotly.graph_objects as go
import plotly.express as px
import pandas as pd
import numpy as np

print(f"Plotly version: {plotly.__version__}")

Plotly offers two interfaces: plotly.express for quick, high-level visualizations and plotly.graph_objects for fine-grained control. We’ll use both, starting with graph_objects to understand the fundamentals.

Creating a Basic Bubble Chart

Let’s create a bubble chart analyzing fictional SaaS products across three dimensions: monthly cost, user satisfaction score, and market share.

import plotly.graph_objects as go

# Sample data: SaaS products
products = ['Product A', 'Product B', 'Product C', 'Product D', 'Product E',
            'Product F', 'Product G', 'Product H', 'Product I', 'Product J']
monthly_cost = [29, 49, 99, 149, 199, 79, 39, 129, 89, 159]
satisfaction = [4.2, 4.5, 4.7, 4.1, 4.8, 4.3, 3.9, 4.6, 4.4, 4.2]
market_share = [12, 8, 15, 5, 20, 10, 6, 18, 9, 14]

fig = go.Figure(data=[go.Scatter(
    x=monthly_cost,
    y=satisfaction,
    mode='markers',
    marker=dict(
        size=market_share,
        sizemode='diameter',
        sizeref=0.5,
        sizemin=4
    ),
    text=products,
    hovertemplate='<b>%{text}</b><br>' +
                  'Cost: $%{x}/month<br>' +
                  'Satisfaction: %{y}<br>' +
                  'Market Share: %{marker.size}%<extra></extra>'
)])

fig.update_layout(
    title='SaaS Product Comparison',
    xaxis_title='Monthly Cost ($)',
    yaxis_title='User Satisfaction (1-5)',
    showlegend=False,
    hovermode='closest'
)

fig.show()

The sizemode='diameter' parameter treats the size value as the bubble’s diameter rather than area. The sizeref parameter controls the scaling factor—smaller values create larger bubbles. The sizemin sets a minimum bubble size to ensure small values remain visible.

Customizing Bubble Appearance

Proper bubble sizing is crucial. The default scaling often produces unusable results. Here’s how to create visually balanced bubbles with custom colors and enhanced hover information:

import plotly.graph_objects as go
import numpy as np

# Generate more realistic data
np.random.seed(42)
n_products = 15

products = [f'Product {chr(65+i)}' for i in range(n_products)]
monthly_cost = np.random.randint(20, 200, n_products)
satisfaction = np.random.uniform(3.5, 5.0, n_products)
market_share = np.random.uniform(2, 25, n_products)

# Calculate proper sizeref
# Rule of thumb: sizeref = 2 * max(size) / (desired_max_diameter ** 2)
desired_max_diameter = 50
sizeref = 2.0 * max(market_share) / (desired_max_diameter ** 2)

fig = go.Figure(data=[go.Scatter(
    x=monthly_cost,
    y=satisfaction,
    mode='markers',
    marker=dict(
        size=market_share,
        sizemode='area',
        sizeref=sizeref,
        sizemin=4,
        color=satisfaction,  # Color by satisfaction
        colorscale='Viridis',
        showscale=True,
        colorbar=dict(title="Satisfaction"),
        line=dict(width=2, color='white'),
        opacity=0.8
    ),
    text=products,
    customdata=market_share,
    hovertemplate='<b>%{text}</b><br>' +
                  'Cost: $%{x}/month<br>' +
                  'Satisfaction: %{y:.2f}/5.0<br>' +
                  'Market Share: %{customdata:.1f}%<extra></extra>'
)])

fig.update_layout(
    title='Enhanced SaaS Product Analysis',
    xaxis_title='Monthly Cost ($)',
    yaxis_title='User Satisfaction Score',
    width=900,
    height=600,
    plot_bgcolor='rgba(240,240,240,0.5)'
)

fig.show()

The white border around bubbles (line=dict(width=2, color='white')) helps separate overlapping bubbles. Setting opacity=0.8 allows you to see through overlapping areas. The customdata parameter lets you store additional information for hover templates without affecting the visualization.

Adding a Fourth Dimension with Color

Color encoding adds a fourth dimension to your bubble chart. Use continuous color scales for numerical data and discrete colors for categories.

import plotly.express as px
import pandas as pd
import numpy as np

# Create dataset with regional data
np.random.seed(42)
n = 40

df = pd.DataFrame({
    'product': [f'Product {i}' for i in range(n)],
    'price': np.random.randint(10, 300, n),
    'satisfaction': np.random.uniform(3.0, 5.0, n),
    'users': np.random.randint(100, 10000, n),
    'region': np.random.choice(['North America', 'Europe', 'Asia', 'Latin America'], n),
    'growth_rate': np.random.uniform(-10, 50, n)
})

# Using plotly.express for categorical colors
fig = px.scatter(df, 
                 x='price', 
                 y='satisfaction',
                 size='users',
                 color='region',
                 hover_name='product',
                 size_max=40,
                 title='Product Analysis by Region',
                 labels={
                     'price': 'Price ($)',
                     'satisfaction': 'Satisfaction Score',
                     'users': 'Active Users'
                 })

fig.update_traces(marker=dict(line=dict(width=1, color='white')))
fig.show()

For continuous color mapping (like growth rate):

fig = px.scatter(df, 
                 x='price', 
                 y='satisfaction',
                 size='users',
                 color='growth_rate',
                 hover_name='product',
                 size_max=40,
                 color_continuous_scale='RdYlGn',
                 title='Product Analysis by Growth Rate',
                 labels={
                     'price': 'Price ($)',
                     'satisfaction': 'Satisfaction Score',
                     'growth_rate': 'Growth Rate (%)'
                 })

fig.show()

Real-World Example: Analyzing Tech Companies

Let’s analyze technology companies using realistic metrics: revenue, profit margin, employee count, and sector.

import plotly.express as px
import pandas as pd

# Realistic tech company data
companies_data = {
    'company': ['TechCorp', 'DataSystems', 'CloudNet', 'SecureIT', 'AIVentures',
                'MobileTech', 'SoftwarePlus', 'DevTools', 'Analytics Co', 'Platform Inc',
                'CodeBase', 'NetSolutions', 'AppBuilder', 'DataFlow', 'CyberGuard'],
    'revenue_millions': [450, 280, 820, 150, 390, 560, 210, 180, 340, 920,
                        125, 410, 290, 670, 230],
    'profit_margin': [22, 15, 28, 18, 12, 25, 20, 16, 24, 30,
                     14, 19, 17, 26, 21],
    'employees': [1200, 850, 2400, 450, 980, 1600, 720, 580, 1100, 3200,
                 380, 1350, 920, 2100, 780],
    'sector': ['Enterprise', 'Cloud', 'Cloud', 'Security', 'AI/ML',
              'Mobile', 'Enterprise', 'Developer Tools', 'Analytics', 'Cloud',
              'Developer Tools', 'Enterprise', 'Mobile', 'Analytics', 'Security']
}

df = pd.DataFrame(companies_data)

# Calculate revenue per employee for sizing
df['revenue_per_employee'] = (df['revenue_millions'] * 1000) / df['employees']

fig = px.scatter(df,
                 x='revenue_millions',
                 y='profit_margin',
                 size='employees',
                 color='sector',
                 hover_name='company',
                 hover_data={
                     'revenue_millions': ':,.0f',
                     'profit_margin': ':.1f',
                     'employees': ':,',
                     'revenue_per_employee': ':,.0f'
                 },
                 size_max=50,
                 title='Tech Company Analysis: Revenue vs. Profitability',
                 labels={
                     'revenue_millions': 'Annual Revenue ($M)',
                     'profit_margin': 'Profit Margin (%)',
                     'employees': 'Employee Count'
                 })

fig.update_layout(
    width=1000,
    height=700,
    hovermode='closest'
)

fig.show()

This example demonstrates data preparation with pandas, calculated fields, and formatted hover tooltips. The visualization immediately reveals which sectors are most profitable, which companies are largest, and the relationship between revenue and profitability.

Best Practices and Tips

Avoid these common mistakes:

  1. Poor size scaling: Always calculate sizeref based on your data range. Too large, and all bubbles look identical. Too small, and they overlap completely.

  2. Too many bubbles: Beyond 50-60 data points, consider filtering, aggregating, or using a different visualization. Bubble charts rely on being able to distinguish individual entities.

  3. Meaningless size dimension: Don’t use bubble size just because you can. The size variable should represent something where magnitude matters—population, sales volume, market share. Using bubble size for categorical data or arbitrary IDs wastes the dimension.

  4. Ignoring interactivity: Plotly’s strength is interactivity. Always implement informative hover templates. Users will explore your data by hovering over bubbles.

Here’s a comparison of poor vs. well-configured bubble charts:

import plotly.subplots as sp
import plotly.graph_objects as go
import numpy as np

np.random.seed(42)
x = np.random.randint(10, 100, 20)
y = np.random.uniform(1, 10, 20)
sizes = np.random.uniform(5, 50, 20)

# Create subplots
fig = sp.make_subplots(rows=1, cols=2, 
                       subplot_titles=('Poor Configuration', 'Good Configuration'))

# Poor: No size scaling, no transparency, no borders
fig.add_trace(go.Scatter(
    x=x, y=y, mode='markers',
    marker=dict(size=sizes, color='blue'),
    name='Poor'
), row=1, col=1)

# Good: Proper scaling, transparency, borders, color scale
sizeref = 2.0 * max(sizes) / (40 ** 2)
fig.add_trace(go.Scatter(
    x=x, y=y, mode='markers',
    marker=dict(
        size=sizes,
        sizemode='area',
        sizeref=sizeref,
        color=y,
        colorscale='Plasma',
        line=dict(width=1, color='white'),
        opacity=0.7
    ),
    name='Good'
), row=1, col=2)

fig.update_layout(showlegend=False, height=400, width=900)
fig.show()

When NOT to use bubble charts: If your third dimension doesn’t vary significantly, use a regular scatter plot. If you have more than 60 data points, consider a heatmap or contour plot. If relationships aren’t continuous, use grouped bar charts instead.

Export options: Plotly supports exporting to static images (PNG, SVG, PDF) and interactive HTML. For presentations, export as PNG. For web dashboards, embed the interactive HTML.

# Static export (requires kaleido)
fig.write_image("bubble_chart.png", width=1200, height=800)

# Interactive HTML
fig.write_html("bubble_chart.html")

Bubble charts are powerful tools when used correctly. Master the sizing parameters, choose appropriate datasets, and leverage Plotly’s interactivity to create visualizations that reveal insights across multiple dimensions simultaneously.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.