SQL - PRIMARY KEY Constraint | Application Architect

Key Insights

• PRIMARY KEY constraints enforce uniqueness and non-null values on one or more columns, serving as the fundamental mechanism for row identification in relational databases • Composite primary keys enable unique identification through multiple columns while introducing performance trade-offs compared to single-column surrogate keys • Primary key selection impacts foreign key relationships, index performance, and clustering behavior—choosing between natural and surrogate keys requires understanding your access patterns and data characteristics

Understanding PRIMARY KEY Constraints

A PRIMARY KEY constraint uniquely identifies each record in a table. It combines two fundamental properties: uniqueness (no duplicate values) and non-nullability (no NULL values allowed). Every table should have exactly one primary key, though that key may consist of multiple columns.

When you define a primary key, the database automatically creates a unique index on those columns. This index serves dual purposes: enforcing uniqueness and optimizing queries that search by primary key values.

CREATE TABLE users (
    user_id INT PRIMARY KEY,
    email VARCHAR(255) NOT NULL,
    username VARCHAR(50) NOT NULL
);

This creates a table where user_id serves as the primary key. The database ensures no two rows can have the same user_id, and no row can have a NULL user_id.

Single-Column vs Composite Primary Keys

Single-column primary keys use one field for unique identification:

CREATE TABLE orders (
    order_id BIGINT PRIMARY KEY,
    customer_id INT NOT NULL,
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    total_amount DECIMAL(10,2)
);

Composite primary keys combine multiple columns when no single column provides uniqueness:

CREATE TABLE order_items (
    order_id BIGINT,
    product_id INT,
    quantity INT NOT NULL,
    unit_price DECIMAL(10,2) NOT NULL,
    PRIMARY KEY (order_id, product_id)
);

In this example, neither order_id nor product_id alone uniquely identifies a row—an order can have multiple products, and a product appears in multiple orders. Together, they form a unique combination.

Defining Primary Keys: Inline vs Table-Level

You can define primary keys inline with the column definition or at the table level. Inline syntax works only for single-column keys:

-- Inline definition
CREATE TABLE products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(100) NOT NULL,
    price DECIMAL(10,2)
);

-- Table-level definition
CREATE TABLE products (
    product_id INT,
    product_name VARCHAR(100) NOT NULL,
    price DECIMAL(10,2),
    PRIMARY KEY (product_id)
);

Table-level syntax is required for composite keys and allows you to name the constraint:

CREATE TABLE enrollment (
    student_id INT,
    course_id INT,
    enrollment_date DATE NOT NULL,
    CONSTRAINT pk_enrollment PRIMARY KEY (student_id, course_id)
);

Naming constraints makes them easier to identify in error messages and system catalogs.

Adding and Dropping Primary Keys

For existing tables, use ALTER TABLE to add primary keys:

CREATE TABLE categories (
    category_id INT NOT NULL,
    category_name VARCHAR(50) NOT NULL
);

ALTER TABLE categories
ADD PRIMARY KEY (category_id);

Before adding a primary key, ensure the column(s) contain no NULL values or duplicates:

-- Check for NULLs
SELECT COUNT(*) FROM categories WHERE category_id IS NULL;

-- Check for duplicates
SELECT category_id, COUNT(*)
FROM categories
GROUP BY category_id
HAVING COUNT(*) > 1;

To drop a primary key:

-- MySQL/MariaDB
ALTER TABLE categories DROP PRIMARY KEY;

-- PostgreSQL/SQL Server/Oracle
ALTER TABLE categories DROP CONSTRAINT pk_category_id;

Auto-Incrementing Primary Keys

Most databases support auto-incrementing integers for primary keys, eliminating manual ID assignment:

-- MySQL/MariaDB
CREATE TABLE customers (
    customer_id INT AUTO_INCREMENT PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL
);

-- PostgreSQL
CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL
);

-- SQL Server
CREATE TABLE customers (
    customer_id INT IDENTITY(1,1) PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL
);

Insert operations don’t require specifying the primary key:

INSERT INTO customers (first_name, last_name)
VALUES ('John', 'Smith');

-- Retrieve the generated ID (method varies by database)
-- MySQL: SELECT LAST_INSERT_ID();
-- PostgreSQL: INSERT ... RETURNING customer_id;
-- SQL Server: SELECT SCOPE_IDENTITY();

Natural vs Surrogate Keys

Natural keys use existing data attributes:

CREATE TABLE countries (
    country_code CHAR(2) PRIMARY KEY,  -- ISO 3166-1 alpha-2
    country_name VARCHAR(100) NOT NULL
);

Surrogate keys are artificial identifiers with no business meaning:

CREATE TABLE countries (
    country_id INT AUTO_INCREMENT PRIMARY KEY,
    country_code CHAR(2) UNIQUE NOT NULL,
    country_name VARCHAR(100) NOT NULL
);

Natural keys work well when the attribute is truly immutable and naturally unique. Surrogate keys provide stability when natural attributes might change. For example, email addresses seem unique but users might want to change them.

Primary Keys and Foreign Key Relationships

Primary keys serve as targets for foreign key constraints, establishing relationships between tables:

CREATE TABLE departments (
    dept_id INT PRIMARY KEY,
    dept_name VARCHAR(100) NOT NULL
);

CREATE TABLE employees (
    emp_id INT PRIMARY KEY,
    emp_name VARCHAR(100) NOT NULL,
    dept_id INT,
    CONSTRAINT fk_emp_dept FOREIGN KEY (dept_id) 
        REFERENCES departments(dept_id)
);

The foreign key in employees must reference values that exist in departments.dept_id. This maintains referential integrity.

With composite primary keys:

CREATE TABLE course_sections (
    course_id INT,
    section_num INT,
    instructor VARCHAR(100),
    PRIMARY KEY (course_id, section_num)
);

CREATE TABLE student_enrollments (
    student_id INT,
    course_id INT,
    section_num INT,
    enrollment_date DATE,
    PRIMARY KEY (student_id, course_id, section_num),
    FOREIGN KEY (course_id, section_num) 
        REFERENCES course_sections(course_id, section_num)
);

Performance Considerations

Primary keys determine physical data organization in clustered index structures (default in SQL Server and InnoDB). Sequential integer keys enable efficient insertion:

-- Efficient: Sequential inserts append to the index
CREATE TABLE logs (
    log_id BIGINT AUTO_INCREMENT PRIMARY KEY,
    log_message TEXT,
    created_at TIMESTAMP
);

-- Less efficient: UUID keys cause random index insertion
CREATE TABLE logs (
    log_id CHAR(36) PRIMARY KEY,  -- UUID
    log_message TEXT,
    created_at TIMESTAMP
);

Composite keys create larger indexes, impacting join performance:

-- 4 bytes for primary key
CREATE TABLE users (
    user_id INT PRIMARY KEY,
    username VARCHAR(50)
);

-- 108 bytes for composite primary key
CREATE TABLE users (
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100),
    PRIMARY KEY (first_name, last_name, email)
);

For high-volume tables, smaller primary keys reduce index size and improve cache efficiency. Foreign key columns referencing composite keys also consume more space.

Common Pitfalls

Using mutable values as primary keys creates update cascading challenges:

-- Problematic: Email might change
CREATE TABLE users (
    email VARCHAR(255) PRIMARY KEY,
    full_name VARCHAR(100)
);

-- Better: Surrogate key with unique constraint on email
CREATE TABLE users (
    user_id INT AUTO_INCREMENT PRIMARY KEY,
    email VARCHAR(255) UNIQUE NOT NULL,
    full_name VARCHAR(100)
);

Forgetting to define primary keys on junction tables:

-- Missing primary key allows duplicate enrollments
CREATE TABLE student_courses (
    student_id INT,
    course_id INT,
    FOREIGN KEY (student_id) REFERENCES students(student_id),
    FOREIGN KEY (course_id) REFERENCES courses(course_id)
);

-- Correct: Primary key prevents duplicates
CREATE TABLE student_courses (
    student_id INT,
    course_id INT,
    enrollment_date DATE DEFAULT CURRENT_DATE,
    PRIMARY KEY (student_id, course_id),
    FOREIGN KEY (student_id) REFERENCES students(student_id),
    FOREIGN KEY (course_id) REFERENCES courses(course_id)
);

Primary key constraints form the foundation of relational data integrity. Choose them carefully based on your data characteristics, access patterns, and relationship requirements.