NumPy Interview Questions and Answers

NumPy sits at the foundation of Python's scientific computing stack. Every pandas DataFrame, every TensorFlow tensor, every scikit-learn model relies on NumPy arrays under the hood. When interviewers...

Key Insights

  • NumPy interviews test your understanding of vectorization, memory management, and broadcasting—not just API knowledge. Interviewers want to see you think in arrays, not loops.
  • The gap between beginner and advanced candidates often comes down to understanding views vs. copies and memory layout. These concepts directly impact production code performance.
  • Practical problem-solving questions matter most. Memorizing function signatures won’t help if you can’t normalize a dataset or implement a moving average on the spot.

Why NumPy Interview Questions Matter

NumPy sits at the foundation of Python’s scientific computing stack. Every pandas DataFrame, every TensorFlow tensor, every scikit-learn model relies on NumPy arrays under the hood. When interviewers ask NumPy questions, they’re testing whether you understand the machinery that powers modern data science.

Expect questions ranging from basic array creation to nuanced discussions about memory layout and broadcasting semantics. This guide covers the questions I’ve seen repeatedly in interviews—both as an interviewer and a candidate.

Fundamental Concepts

Q: What’s the difference between a Python list and a NumPy array?

This question appears in nearly every data science interview. The answer goes beyond “NumPy is faster.”

import numpy as np

# Python list - heterogeneous, flexible
python_list = [1, "two", 3.0, [4, 5]]

# NumPy array - homogeneous, fixed type
numpy_array = np.array([1, 2, 3, 4])

print(numpy_array.dtype)  # int64
print(numpy_array.ndim)   # 1
print(numpy_array.shape)  # (4,)

The key differences: NumPy arrays are homogeneous (single data type), stored in contiguous memory blocks, and support vectorized operations. Python lists are flexible but pay for that flexibility with performance overhead.

Q: Explain reshape() and when it fails.

arr = np.arange(12)
print(arr.shape)  # (12,)

# Valid reshape - total elements match
reshaped = arr.reshape(3, 4)
print(reshaped.shape)  # (3, 4)

# Using -1 lets NumPy infer the dimension
auto_reshaped = arr.reshape(2, -1)
print(auto_reshaped.shape)  # (2, 6)

# This fails - 12 elements can't form a 5x3 array
try:
    arr.reshape(5, 3)
except ValueError as e:
    print(f"Error: {e}")  # cannot reshape array of size 12 into shape (5,3)

The rule: the product of dimensions must equal the total number of elements. Interviewers often follow up by asking about the difference between reshape() and resize()resize() can change the total number of elements by truncating or padding.

Indexing, Slicing, and Boolean Masking

Q: Demonstrate fancy indexing and explain when you’d use it.

arr = np.array([10, 20, 30, 40, 50])

# Fancy indexing with an array of indices
indices = np.array([0, 2, 4])
print(arr[indices])  # [10 30 50]

# 2D fancy indexing
matrix = np.arange(12).reshape(3, 4)
print(matrix)
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

# Select specific elements
rows = np.array([0, 1, 2])
cols = np.array([1, 2, 3])
print(matrix[rows, cols])  # [1 6 11] - elements at (0,1), (1,2), (2,3)

Q: How does np.where() work? Show multiple use cases.

arr = np.array([1, -2, 3, -4, 5])

# Find indices where condition is true
indices = np.where(arr > 0)
print(indices)  # (array([0, 2, 4]),)

# Conditional replacement - the ternary form
result = np.where(arr > 0, arr, 0)  # Replace negatives with 0
print(result)  # [1 0 3 0 5]

# Combining conditions
arr2d = np.arange(9).reshape(3, 3)
rows, cols = np.where(arr2d > 4)
print(list(zip(rows, cols)))  # [(1, 2), (2, 0), (2, 1), (2, 2)]

The three-argument form of np.where(condition, x, y) is a vectorized if-else. It’s cleaner and faster than list comprehensions for conditional array operations.

Array Operations and Broadcasting

Q: Explain broadcasting rules. Why does this code work?

# 2D array (3, 4)
matrix = np.ones((3, 4))

# 1D array (4,)
row_vector = np.array([1, 2, 3, 4])

# This works - row_vector broadcasts across all rows
result = matrix + row_vector
print(result)
# [[2. 3. 4. 5.]
#  [2. 3. 4. 5.]
#  [2. 3. 4. 5.]]

Broadcasting rules:

  1. Compare shapes from right to left
  2. Dimensions are compatible if they’re equal or one of them is 1
  3. Missing dimensions are treated as 1
# Common broadcasting error
col_vector = np.array([1, 2, 3])  # shape (3,)

try:
    # Fails: (3, 4) and (3,) - rightmost dimensions 4 and 3 don't match
    matrix + col_vector
except ValueError as e:
    print(f"Error: {e}")

# Fix: reshape to (3, 1) for proper broadcasting
col_vector_fixed = col_vector.reshape(3, 1)
result = matrix + col_vector_fixed  # Now works: (3, 4) + (3, 1)
print(result)
# [[2. 2. 2. 2.]
#  [3. 3. 3. 3.]
#  [4. 4. 4. 4.]]

Performance and Memory

Q: What’s the difference between a view and a copy? Why does it matter?

This question separates intermediate candidates from beginners.

original = np.arange(10)

# Slicing creates a VIEW - shares memory
view = original[2:5]
view[0] = 999
print(original)  # [  0   1 999   3   4   5   6   7   8   9] - modified!

# Explicit copy - independent memory
original = np.arange(10)
copy = original[2:5].copy()
copy[0] = 999
print(original)  # [0 1 2 3 4 5 6 7 8 9] - unchanged

# Fancy indexing ALWAYS creates a copy
original = np.arange(10)
fancy = original[[2, 3, 4]]
fancy[0] = 999
print(original)  # [0 1 2 3 4 5 6 7 8 9] - unchanged

Q: Explain C-order vs Fortran-order. When does it affect performance?

# C-order (row-major) - default
c_array = np.ones((1000, 1000), order='C')

# Fortran-order (column-major)
f_array = np.ones((1000, 1000), order='F')

# Row iteration is faster for C-order
# Column iteration is faster for F-order

# Check memory layout
print(c_array.flags['C_CONTIGUOUS'])  # True
print(f_array.flags['F_CONTIGUOUS'])  # True

Memory layout matters when iterating over large arrays or interfacing with libraries that expect specific layouts. BLAS/LAPACK operations, for instance, are optimized for Fortran-order.

Linear Algebra and Mathematical Functions

Q: Compute the mean of each row and each column in a 2D array.

matrix = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

# Mean of each row (collapse columns)
row_means = np.mean(matrix, axis=1)
print(row_means)  # [2. 5. 8.]

# Mean of each column (collapse rows)
col_means = np.mean(matrix, axis=0)
print(col_means)  # [4. 5. 6.]

# Keep dimensions for broadcasting
row_means_2d = np.mean(matrix, axis=1, keepdims=True)
print(row_means_2d.shape)  # (3, 1)

The axis parameter confuses many candidates. Think of it as “collapse this axis”—axis=0 collapses rows, leaving column-wise results.

Q: Solve a system of linear equations using NumPy.

# Solve: 2x + 3y = 8
#        3x + 4y = 11

A = np.array([[2, 3],
              [3, 4]])
b = np.array([8, 11])

# Method 1: np.linalg.solve (preferred for square systems)
x = np.linalg.solve(A, b)
print(x)  # [1. 2.] -> x=1, y=2

# Method 2: Using inverse (less numerically stable)
x_inv = np.linalg.inv(A) @ b
print(x_inv)  # [1. 2.]

# Verify
print(np.allclose(A @ x, b))  # True

Always prefer np.linalg.solve() over computing the inverse explicitly. It’s more numerically stable and faster.

Practical Problem-Solving Questions

Q: Normalize a dataset to have zero mean and unit variance.

data = np.array([[1, 2, 3],
                 [4, 5, 6],
                 [7, 8, 9]], dtype=float)

# Z-score normalization (standardization)
mean = np.mean(data, axis=0)
std = np.std(data, axis=0)
normalized = (data - mean) / std

print(normalized)
# [[-1.22474487 -1.22474487 -1.22474487]
#  [ 0.          0.          0.        ]
#  [ 1.22474487  1.22474487  1.22474487]]

# Verify
print(np.mean(normalized, axis=0))  # [0. 0. 0.]
print(np.std(normalized, axis=0))   # [1. 1. 1.]

Q: Implement a moving average without loops.

def moving_average(arr, window_size):
    """Compute moving average using convolution."""
    weights = np.ones(window_size) / window_size
    return np.convolve(arr, weights, mode='valid')

data = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
ma = moving_average(data, 3)
print(ma)  # [2. 3. 4. 5. 6. 7. 8. 9.]

Q: Handle missing values represented as np.nan.

data = np.array([1, 2, np.nan, 4, 5, np.nan, 7])

# Find NaN positions
nan_mask = np.isnan(data)
print(nan_mask)  # [False False  True False False  True False]

# Compute mean ignoring NaN
mean_value = np.nanmean(data)
print(mean_value)  # 3.8

# Replace NaN with mean
data_filled = np.where(np.isnan(data), mean_value, data)
print(data_filled)  # [1.  2.  3.8 4.  5.  3.8 7. ]

The np.nan* family of functions (nanmean, nanstd, nansum) ignores NaN values in computations. This is essential for real-world data processing.

Final Advice

When answering NumPy interview questions, think out loud. Explain why you’re choosing a particular approach. Mention trade-offs between readability and performance. If you’re unsure about a function’s exact signature, describe what you’re trying to accomplish—interviewers care more about your problem-solving approach than perfect syntax recall.

Practice writing NumPy code without autocomplete. The muscle memory of typing np.reshape(), np.where(), and axis= parameters will serve you well under interview pressure.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.