K-Nearest Neighbors: Complete Guide with Examples

Key Insights

KNN is a non-parametric, instance-based algorithm that makes predictions by finding the K most similar training examples to a new data point—no explicit training phase required
Feature scaling is absolutely critical for KNN because the algorithm relies on distance calculations; unscaled features with larger ranges will dominate the distance metric and destroy model performance
The optimal K value balances bias and variance: small K values create flexible but noisy decision boundaries, while large K values smooth predictions but may miss local patterns

Introduction to KNN

K-Nearest Neighbors (KNN) is one of the simplest yet most effective machine learning algorithms. Unlike models that learn parameters during training, KNN is a lazy learner—it simply stores the training data and defers all computation until prediction time. When you ask it to classify a new data point, it looks at the K closest training examples and lets them vote on the answer.

The intuition is straightforward: similar things tend to cluster together. If you want to know whether a fruit is an apple or an orange, look at the fruits most similar to it. If most of its neighbors are apples, it’s probably an apple too.

KNN works for both classification (predicting categories) and regression (predicting continuous values). You’ll find it used in recommendation systems, pattern recognition, credit scoring, and anywhere you need quick predictions based on similarity.

How KNN Works

The algorithm follows a simple process:

Calculate distances between the new point and all training points
Select the K nearest neighbors based on those distances
Make a prediction by majority vote (classification) or averaging (regression)

The distance metric matters. The most common options are:

Euclidean distance: Straight-line distance, the default choice for most problems
Manhattan distance: Sum of absolute differences, useful when features represent different dimensions
Minkowski distance: Generalization of both (Euclidean when p=2, Manhattan when p=1)

Here’s a visualization of how KNN classifies a new point:

import numpy as np
import matplotlib.pyplot as plt

# Training data: two classes
class_a = np.array([[1, 2], [2, 3], [2, 1.5], [3, 2]])
class_b = np.array([[6, 5], [7, 6], [6, 6.5], [7.5, 5.5]])

# New point to classify
new_point = np.array([4, 4])

# Plot
plt.figure(figsize=(8, 6))
plt.scatter(class_a[:, 0], class_a[:, 1], c='blue', marker='o', s=100, label='Class A')
plt.scatter(class_b[:, 0], class_b[:, 1], c='red', marker='s', s=100, label='Class B')
plt.scatter(new_point[0], new_point[1], c='green', marker='*', s=300, label='New Point')

# Calculate distances and find 3 nearest neighbors
all_points = np.vstack([class_a, class_b])
distances = np.sqrt(np.sum((all_points - new_point)**2, axis=1))
k_nearest_indices = np.argsort(distances)[:3]

# Draw lines to nearest neighbors
for idx in k_nearest_indices:
    plt.plot([new_point[0], all_points[idx, 0]], 
             [new_point[1], all_points[idx, 1]], 
             'g--', alpha=0.5)

plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.title('KNN Classification (K=3)')
plt.grid(True, alpha=0.3)
plt.show()

Implementing KNN from Scratch

Building KNN from scratch reveals how simple it really is. Here’s a complete implementation:

import numpy as np
from collections import Counter

class KNNClassifier:
    def __init__(self, k=3):
        self.k = k
        self.X_train = None
        self.y_train = None
    
    def fit(self, X, y):
        """Store training data"""
        self.X_train = X
        self.y_train = y
    
    def _euclidean_distance(self, x1, x2):
        """Calculate Euclidean distance between two points"""
        return np.sqrt(np.sum((x1 - x2)**2))
    
    def predict(self, X):
        """Predict class labels for samples in X"""
        predictions = [self._predict_single(x) for x in X]
        return np.array(predictions)
    
    def _predict_single(self, x):
        """Predict class label for a single sample"""
        # Calculate distances to all training points
        distances = [self._euclidean_distance(x, x_train) 
                    for x_train in self.X_train]
        
        # Get indices of K nearest neighbors
        k_indices = np.argsort(distances)[:self.k]
        
        # Get labels of K nearest neighbors
        k_nearest_labels = [self.y_train[i] for i in k_indices]
        
        # Return most common label
        most_common = Counter(k_nearest_labels).most_common(1)
        return most_common[0][0]

# Test the implementation
X_train = np.array([[1, 2], [2, 3], [3, 1], [6, 5], [7, 7], [8, 6]])
y_train = np.array([0, 0, 0, 1, 1, 1])

knn = KNNClassifier(k=3)
knn.fit(X_train, y_train)

X_test = np.array([[2, 2], [7, 6]])
predictions = knn.predict(X_test)
print(f"Predictions: {predictions}")  # Output: [0 1]

This implementation shows the core mechanics: store data, calculate distances, find neighbors, vote. No complex math or optimization required.

Using Scikit-learn’s KNN

For production use, scikit-learn provides an optimized implementation with additional features:

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Load dataset
iris = load_iris()
X, y = iris.data, iris.target

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42
)

# Create and train model
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)

# Make predictions
y_pred = knn.predict(X_test)

# Evaluate
print(f"Accuracy: {accuracy_score(y_test, y_pred):.3f}")
print("\nConfusion Matrix:")
print(confusion_matrix(y_test, y_pred))
print("\nClassification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

This gives you accuracy around 97-98% on the Iris dataset with minimal code.

Choosing the Right K Value

K is your only hyperparameter, and it matters. Small K (like 1) makes the model sensitive to noise—a single outlier can flip predictions. Large K smooths things out but may blur class boundaries.

Here’s how to find the optimal K:

from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt

# Test different K values
k_values = range(1, 31)
accuracies = []

for k in k_values:
    knn = KNeighborsClassifier(n_neighbors=k)
    scores = cross_val_score(knn, X_train, y_train, cv=5)
    accuracies.append(scores.mean())

# Plot results
plt.figure(figsize=(10, 6))
plt.plot(k_values, accuracies, marker='o')
plt.xlabel('K Value')
plt.ylabel('Cross-Validation Accuracy')
plt.title('Finding Optimal K')
plt.grid(True, alpha=0.3)
plt.axvline(x=k_values[np.argmax(accuracies)], color='r', 
            linestyle='--', label=f'Optimal K = {k_values[np.argmax(accuracies)]}')
plt.legend()
plt.show()

print(f"Best K: {k_values[np.argmax(accuracies)]} with accuracy: {max(accuracies):.3f}")

Use odd K values for binary classification to avoid ties. Start with K=5 as a reasonable default.

Feature Scaling and Preprocessing

This is where most KNN implementations fail. Because KNN uses distances, features with larger scales dominate the calculation. Consider a dataset with age (0-100) and income (0-1,000,000). Without scaling, income will completely overshadow age in distance calculations.

Here’s the impact:

from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification

# Create dataset with features at different scales
X, y = make_classification(n_samples=500, n_features=2, n_redundant=0, 
                          n_informative=2, random_state=42)
X[:, 0] = X[:, 0] * 1000  # Scale first feature

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Without scaling
knn_unscaled = KNeighborsClassifier(n_neighbors=5)
knn_unscaled.fit(X_train, y_train)
unscaled_accuracy = knn_unscaled.score(X_test, y_test)

# With scaling
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

knn_scaled = KNeighborsClassifier(n_neighbors=5)
knn_scaled.fit(X_train_scaled, y_train)
scaled_accuracy = knn_scaled.score(X_test_scaled, y_test)

print(f"Accuracy without scaling: {unscaled_accuracy:.3f}")
print(f"Accuracy with scaling: {scaled_accuracy:.3f}")
print(f"Improvement: {(scaled_accuracy - unscaled_accuracy) * 100:.1f}%")

Always scale your features. Use StandardScaler for normally distributed data or MinMaxScaler when you need values in a specific range.

Advantages, Limitations, and Best Practices

Advantages:

Simple to understand and implement
No training phase—just store the data
Naturally handles multi-class problems
Non-parametric—makes no assumptions about data distribution
Effective for small to medium datasets

Limitations:

Computationally expensive for large datasets (must compare to all training points)
Memory intensive (stores entire training set)
Sensitive to irrelevant features and feature scales
Curse of dimensionality—performance degrades in high-dimensional spaces
Poor performance with imbalanced datasets

When to use KNN:

You have a small to medium dataset (< 100k samples)
Features are meaningful and can be scaled appropriately
You need a simple baseline model
Your data has clear clustering patterns
Interpretability matters—you can explain predictions by showing similar examples

When to avoid KNN:

Large datasets where prediction speed matters
High-dimensional data (> 20 features without dimensionality reduction)
Features have unclear relationships to the target
You need a model that trains once and predicts fast

KNN shines as a baseline model and for problems where similarity is well-defined. For production systems with large datasets, consider tree-based methods or neural networks instead. But for quick prototyping and understanding your data, KNN remains an invaluable tool in your machine learning toolkit.