SQL - Normalization (1NF, 2NF, 3NF, BCNF)

Before diving into normal forms, you need to understand functional dependencies. A functional dependency X → Y means that if you know the value of X, you can determine the value of Y. In a table with...

Key Insights

  • Normalization eliminates data redundancy and update anomalies by systematically decomposing tables according to functional dependencies, with each normal form building upon the previous one.
  • First Normal Form (1NF) requires atomic values, Second Normal Form (2NF) eliminates partial dependencies, Third Normal Form (3NF) removes transitive dependencies, and Boyce-Codd Normal Form (BCNF) addresses remaining anomalies with overlapping candidate keys.
  • While normalization improves data integrity and reduces storage waste, real-world applications often require strategic denormalization for performance optimization—understanding the trade-offs is essential for effective database design.

Understanding Functional Dependencies

Before diving into normal forms, you need to understand functional dependencies. A functional dependency X → Y means that if you know the value of X, you can determine the value of Y. In a table with columns (StudentID, StudentName, CourseID, CourseName), StudentID → StudentName is a functional dependency because each student ID uniquely determines the student’s name.

-- Example table with multiple functional dependencies
CREATE TABLE student_enrollments (
    student_id INT,
    student_name VARCHAR(100),
    student_email VARCHAR(100),
    course_id INT,
    course_name VARCHAR(100),
    instructor_name VARCHAR(100),
    grade CHAR(2)
);

-- Functional dependencies present:
-- student_id → student_name, student_email
-- course_id → course_name, instructor_name
-- (student_id, course_id) → grade

First Normal Form (1NF): Atomic Values

A table is in 1NF when every column contains only atomic (indivisible) values, and each row is unique. No repeating groups or arrays are allowed.

Violation Example:

-- NOT in 1NF: phone_numbers contains multiple values
CREATE TABLE customers_bad (
    customer_id INT PRIMARY KEY,
    name VARCHAR(100),
    phone_numbers VARCHAR(200)  -- "555-1234, 555-5678, 555-9012"
);

1NF Solution:

-- Approach 1: Separate table for phone numbers
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    name VARCHAR(100)
);

CREATE TABLE customer_phones (
    customer_id INT,
    phone_number VARCHAR(20),
    phone_type VARCHAR(20),  -- 'mobile', 'home', 'work'
    PRIMARY KEY (customer_id, phone_number),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

-- Insert example
INSERT INTO customers VALUES (1, 'John Doe');
INSERT INTO customer_phones VALUES 
    (1, '555-1234', 'mobile'),
    (1, '555-5678', 'home');

Another common 1NF violation involves repeating columns:

-- NOT in 1NF: repeating column groups
CREATE TABLE orders_bad (
    order_id INT PRIMARY KEY,
    product1_id INT,
    product1_qty INT,
    product2_id INT,
    product2_qty INT,
    product3_id INT,
    product3_qty INT
);

-- 1NF solution
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    order_date DATE
);

CREATE TABLE order_items (
    order_id INT,
    product_id INT,
    quantity INT,
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (order_id) REFERENCES orders(order_id)
);

Second Normal Form (2NF): Eliminate Partial Dependencies

A table is in 2NF if it’s in 1NF and every non-key attribute is fully functionally dependent on the entire primary key. This matters only for tables with composite primary keys.

Violation Example:

-- NOT in 2NF: partial dependencies exist
CREATE TABLE course_enrollments (
    student_id INT,
    course_id INT,
    student_name VARCHAR(100),    -- depends only on student_id
    student_major VARCHAR(50),    -- depends only on student_id
    course_name VARCHAR(100),     -- depends only on course_id
    instructor VARCHAR(100),      -- depends only on course_id
    grade CHAR(2),                -- depends on both (full dependency)
    PRIMARY KEY (student_id, course_id)
);

The problem: student_name depends only on student_id, not on the full key (student_id, course_id). This creates redundancy and update anomalies.

2NF Solution:

-- Decompose into three tables
CREATE TABLE students (
    student_id INT PRIMARY KEY,
    student_name VARCHAR(100),
    student_major VARCHAR(50)
);

CREATE TABLE courses (
    course_id INT PRIMARY KEY,
    course_name VARCHAR(100),
    instructor VARCHAR(100)
);

CREATE TABLE enrollments (
    student_id INT,
    course_id INT,
    grade CHAR(2),
    PRIMARY KEY (student_id, course_id),
    FOREIGN KEY (student_id) REFERENCES students(student_id),
    FOREIGN KEY (course_id) REFERENCES courses(course_id)
);

-- Query to get full enrollment details
SELECT 
    s.student_name,
    s.student_major,
    c.course_name,
    c.instructor,
    e.grade
FROM enrollments e
JOIN students s ON e.student_id = s.student_id
JOIN courses c ON e.course_id = c.course_id;

Third Normal Form (3NF): Eliminate Transitive Dependencies

A table is in 3NF if it’s in 2NF and no non-key attribute depends on another non-key attribute (no transitive dependencies).

Violation Example:

-- NOT in 3NF: transitive dependency exists
CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    employee_name VARCHAR(100),
    department_id INT,
    department_name VARCHAR(100),  -- transitively dependent via department_id
    department_location VARCHAR(100)  -- transitively dependent via department_id
);

-- Problem: department_name depends on department_id, not directly on employee_id
-- employee_id → department_id → department_name (transitive)

3NF Solution:

CREATE TABLE departments (
    department_id INT PRIMARY KEY,
    department_name VARCHAR(100),
    department_location VARCHAR(100)
);

CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    employee_name VARCHAR(100),
    department_id INT,
    FOREIGN KEY (department_id) REFERENCES departments(department_id)
);

-- Query with department details
SELECT 
    e.employee_id,
    e.employee_name,
    d.department_name,
    d.department_location
FROM employees e
JOIN departments d ON e.department_id = d.department_id;

Another practical example:

-- NOT in 3NF
CREATE TABLE orders_bad (
    order_id INT PRIMARY KEY,
    customer_id INT,
    customer_name VARCHAR(100),
    customer_credit_limit DECIMAL(10,2),  -- depends on customer_id
    order_date DATE,
    order_total DECIMAL(10,2)
);

-- 3NF solution
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    customer_name VARCHAR(100),
    customer_credit_limit DECIMAL(10,2)
);

CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    order_total DECIMAL(10,2),
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);

Boyce-Codd Normal Form (BCNF): Stricter Than 3NF

BCNF is a stronger version of 3NF. A table is in BCNF if for every functional dependency X → Y, X must be a superkey. Most tables in 3NF are also in BCNF, but violations occur with overlapping candidate keys.

Violation Example:

-- NOT in BCNF despite being in 3NF
CREATE TABLE course_schedule (
    student_id INT,
    course_id INT,
    instructor VARCHAR(100),
    PRIMARY KEY (student_id, course_id)
);

-- Constraint: Each instructor teaches only one course
-- Functional dependencies:
-- (student_id, course_id) → instructor (satisfies 3NF)
-- instructor → course_id (violates BCNF: instructor is not a superkey)

The issue: if an instructor teaches only one course, instructor → course_id, but instructor isn’t a superkey.

BCNF Solution:

CREATE TABLE instructor_courses (
    instructor VARCHAR(100) PRIMARY KEY,
    course_id INT
);

CREATE TABLE student_instructors (
    student_id INT,
    instructor VARCHAR(100),
    PRIMARY KEY (student_id, instructor),
    FOREIGN KEY (instructor) REFERENCES instructor_courses(instructor)
);

-- Query to get student course assignments
SELECT 
    si.student_id,
    ic.course_id,
    si.instructor
FROM student_instructors si
JOIN instructor_courses ic ON si.instructor = ic.instructor;

Practical Considerations and Denormalization

While normalization improves data integrity, it can impact query performance due to multiple joins. Strategic denormalization is often necessary:

-- Denormalized for read-heavy workloads
CREATE TABLE order_summary (
    order_id INT PRIMARY KEY,
    customer_id INT,
    customer_name VARCHAR(100),  -- denormalized
    order_date DATE,
    order_total DECIMAL(10,2),
    item_count INT  -- computed/denormalized
);

-- Maintain with triggers or application logic
CREATE TRIGGER update_order_summary
AFTER INSERT ON order_items
FOR EACH ROW
BEGIN
    UPDATE order_summary
    SET item_count = item_count + 1,
        order_total = order_total + NEW.quantity * NEW.unit_price
    WHERE order_id = NEW.order_id;
END;

Use denormalization when:

  • Query performance is critical and data rarely changes
  • Aggregations are expensive to compute on-the-fly
  • Read operations vastly outnumber writes

Maintain normalized tables as the source of truth and create denormalized views or materialized views for performance:

CREATE MATERIALIZED VIEW order_details_mv AS
SELECT 
    o.order_id,
    c.customer_name,
    c.customer_email,
    o.order_date,
    SUM(oi.quantity * oi.unit_price) as order_total,
    COUNT(oi.product_id) as item_count
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
GROUP BY o.order_id, c.customer_name, c.customer_email, o.order_date;

-- Refresh periodically
REFRESH MATERIALIZED VIEW order_details_mv;

Normalization isn’t dogma—it’s a tool. Apply it to maintain data integrity, then selectively denormalize based on measured performance needs.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.