How to Create a Log-Scale Plot in Matplotlib
Logarithmic scales transform multiplicative relationships into additive ones. When your data spans several orders of magnitude—think bacteria doubling every hour or earthquake intensities ranging...
Key Insights
- Log-scale plots compress exponential data into linear visual patterns, making them essential for visualizing phenomena that span multiple orders of magnitude like viral spread, financial growth, or power-law distributions.
- Matplotlib offers both convenience functions (
semilogy,semilogx,loglog) and object-oriented methods (set_xscale,set_yscale) for creating log plots—use the latter for complex multi-axes figures where you need precise control. - Zero and negative values break logarithmic scales mathematically; always filter, offset, or handle these values explicitly before plotting to avoid silent errors or misleading visualizations.
Introduction to Logarithmic Scales
Logarithmic scales transform multiplicative relationships into additive ones. When your data spans several orders of magnitude—think bacteria doubling every hour or earthquake intensities ranging from imperceptible tremors to catastrophic events—linear scales fail. They either compress the small values into invisibility or force you to truncate the large ones.
Use log scales when your data exhibits exponential growth, follows power laws, or when you care about relative changes rather than absolute differences. A stock moving from $1 to $2 represents the same percentage gain as moving from $100 to $200, and log scales make this equivalence visually apparent.
Common applications include epidemiological modeling (infection curves), seismology (Richter scale), acoustics (decibels), finance (compound returns), and network analysis (degree distributions). If you find yourself saying “this grew by a factor of X,” you need a log scale.
Basic Log-Scale Plots
Matplotlib provides three convenience functions for creating log-scale plots directly:
import matplotlib.pyplot as plt
import numpy as np
# Generate exponential data
time = np.linspace(0, 10, 100)
exponential_growth = np.exp(0.5 * time)
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
# Linear scale (for comparison)
axes[0, 0].plot(time, exponential_growth)
axes[0, 0].set_title('Linear Scale')
axes[0, 0].set_xlabel('Time')
axes[0, 0].set_ylabel('Value')
axes[0, 0].grid(True, alpha=0.3)
# Semi-log Y (logarithmic y-axis)
axes[0, 1].semilogy(time, exponential_growth)
axes[0, 1].set_title('Semi-Log Y')
axes[0, 1].set_xlabel('Time')
axes[0, 1].set_ylabel('Value (log scale)')
axes[0, 1].grid(True, alpha=0.3)
# Semi-log X (logarithmic x-axis)
x_exp = np.logspace(0, 3, 100) # 1 to 1000
y_linear = 2 * x_exp + 10
axes[1, 0].semilogx(x_exp, y_linear)
axes[1, 0].set_title('Semi-Log X')
axes[1, 0].set_xlabel('Value (log scale)')
axes[1, 0].set_ylabel('Response')
axes[1, 0].grid(True, alpha=0.3)
# Log-log (both axes logarithmic)
power_law_x = np.logspace(0, 3, 100)
power_law_y = power_law_x**(-2) # Inverse square law
axes[1, 1].loglog(power_law_x, power_law_y)
axes[1, 1].set_title('Log-Log (Power Law)')
axes[1, 1].set_xlabel('Distance (log scale)')
axes[1, 1].set_ylabel('Intensity (log scale)')
axes[1, 1].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Notice how the exponential curve becomes a straight line in the semi-log plot. This is the key insight: exponential relationships appear linear on semi-log scales, and power laws appear linear on log-log scales. This transformation makes it trivial to identify the underlying mathematical relationship in your data.
Converting Existing Plots to Log Scale
When working with complex figures or multiple subplots, you’ll often want to create a plot first and then convert it to log scale. The object-oriented interface gives you this flexibility:
import matplotlib.pyplot as plt
import numpy as np
# Simulate population growth data
years = np.arange(1900, 2025)
population = 1.65e9 * np.exp(0.012 * (years - 1900)) # Starting from 1.65B in 1900
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))
# Create standard plot
ax1.plot(years, population, 'b-', linewidth=2)
ax1.set_xlabel('Year')
ax1.set_ylabel('Population')
ax1.set_title('World Population (Linear Scale)')
ax1.grid(True, alpha=0.3)
# Same data, convert to log scale
ax2.plot(years, population, 'r-', linewidth=2)
ax2.set_yscale('log') # Convert y-axis to log scale
ax2.set_xlabel('Year')
ax2.set_ylabel('Population (log scale)')
ax2.set_title('World Population (Log Scale)')
ax2.grid(True, alpha=0.3, which='both') # Show both major and minor grid lines
plt.tight_layout()
plt.show()
For dual-log scaling, simply call both set_xscale('log') and set_yscale('log'). This approach is particularly useful when you’re building complex dashboards or need conditional log scaling based on data characteristics.
Customizing Log-Scale Plots
Log scales require careful formatting to remain readable. You’ll want control over tick placement, grid density, and numeric formatting:
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.ticker import LogLocator, LogFormatter, FuncFormatter
# Generate data
x = np.logspace(-2, 4, 100)
y = x**2
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Basic log-log plot
axes[0, 0].loglog(x, y)
axes[0, 0].set_title('Default Formatting')
axes[0, 0].grid(True, which='both', alpha=0.3)
# Custom tick locators
axes[0, 1].loglog(x, y)
axes[0, 1].set_title('Custom Tick Density')
axes[0, 1].xaxis.set_major_locator(LogLocator(base=10, numticks=15))
axes[0, 1].yaxis.set_major_locator(LogLocator(base=10, numticks=12))
axes[0, 1].grid(True, which='major', linewidth=1.2, alpha=0.5)
axes[0, 1].grid(True, which='minor', linewidth=0.5, alpha=0.2)
# Custom base (log2 instead of log10)
axes[1, 0].plot(x, y)
axes[1, 0].set_xscale('log', base=2)
axes[1, 0].set_yscale('log', base=2)
axes[1, 0].set_title('Base-2 Logarithm')
axes[1, 0].grid(True, which='both', alpha=0.3)
# Custom formatter
def custom_formatter(value, pos):
if value >= 1e3:
return f'{value/1e3:.0f}K'
elif value >= 1:
return f'{value:.0f}'
else:
return f'{value:.2f}'
axes[1, 1].loglog(x, y)
axes[1, 1].set_title('Custom Label Formatting')
axes[1, 1].yaxis.set_major_formatter(FuncFormatter(custom_formatter))
axes[1, 1].grid(True, which='both', alpha=0.3)
plt.tight_layout()
plt.show()
The which='both' parameter in the grid function is crucial—it displays gridlines for both major and minor ticks, making it easier to read intermediate values on log scales. Without minor gridlines, readers struggle to interpolate between powers of ten.
Handling Edge Cases and Common Pitfalls
Logarithms of zero and negative numbers are undefined, which causes Matplotlib to either throw warnings or silently drop data points. Handle this explicitly:
import matplotlib.pyplot as plt
import numpy as np
import warnings
# Problematic data with zeros and negatives
x = np.linspace(0, 10, 100)
y_with_zeros = np.sin(x) + 1.5 # Contains values near zero
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
# Naive approach - will generate warnings
with warnings.catch_warnings():
warnings.simplefilter("ignore")
axes[0].semilogy(x, y_with_zeros)
axes[0].set_title('Naive (Drops Invalid Points)')
axes[0].grid(True, alpha=0.3)
# Filter zeros and negatives
y_filtered = y_with_zeros.copy()
valid_mask = y_filtered > 0
axes[1].semilogy(x[valid_mask], y_filtered[valid_mask], 'o-', markersize=3)
axes[1].set_title('Filtered (Only Positive Values)')
axes[1].grid(True, alpha=0.3)
# Offset technique - add small constant
epsilon = 1e-10
y_offset = y_with_zeros + epsilon
# Only plot where original was positive
y_offset[y_with_zeros <= 0] = np.nan
axes[2].semilogy(x, y_offset, 'o-', markersize=3)
axes[2].set_title('Offset Technique')
axes[2].grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Never silently ignore this issue in production code. Either filter your data explicitly or use symlog scale (set_yscale('symlog')) which handles negative values by using linear scale near zero and log scale elsewhere.
Real-World Application
Let’s analyze GitHub repository star growth, a classic exponential growth scenario:
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime, timedelta
# Simulated GitHub stars growth (realistic pattern)
days = np.arange(0, 730) # 2 years
# Early viral growth, then plateau
stars = 10 * np.exp(0.008 * days) + 0.5 * days + np.random.normal(0, 50, len(days))
stars = np.maximum(stars, 0) # No negative stars
cumulative_stars = np.cumsum(np.maximum(stars, 0))
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 10))
# Linear scale - hard to see early growth
ax1.plot(days, cumulative_stars, linewidth=2, color='#2ea44f')
ax1.set_xlabel('Days Since Launch')
ax1.set_ylabel('Total Stars')
ax1.set_title('Repository Growth - Linear Scale')
ax1.grid(True, alpha=0.3)
ax1.fill_between(days, 0, cumulative_stars, alpha=0.2, color='#2ea44f')
# Log scale - reveals growth phases
ax2.semilogy(days, cumulative_stars, linewidth=2, color='#0969da')
ax2.set_xlabel('Days Since Launch')
ax2.set_ylabel('Total Stars (log scale)')
ax2.set_title('Repository Growth - Log Scale (Growth Phases Visible)')
ax2.grid(True, which='both', alpha=0.3)
ax2.axhline(y=1000, color='r', linestyle='--', alpha=0.5, label='1K milestone')
ax2.axhline(y=10000, color='r', linestyle='--', alpha=0.5, label='10K milestone')
ax2.legend()
# Annotate growth rate changes
ax2.annotate('Viral phase', xy=(100, cumulative_stars[100]),
xytext=(200, cumulative_stars[100]*0.3),
arrowprops=dict(arrowstyle='->', color='black', lw=1.5),
fontsize=11, fontweight='bold')
plt.tight_layout()
plt.show()
The log-scale version immediately reveals distinct growth phases: initial slow adoption, viral acceleration, and eventual plateau. These patterns are invisible in the linear plot where everything before day 500 appears flat.
Log scales are not just a visualization trick—they’re a tool for understanding the underlying mathematics of your data. When you see a straight line on a semi-log plot, you’re looking at exponential growth. When you see it on a log-log plot, you’ve found a power law. Use them deliberately, format them carefully, and always handle edge cases explicitly.