MongoDB Schema Design: Embedding vs Referencing

MongoDB's flexible schema allows you to structure related data through embedding (denormalization) or referencing (normalization). Unlike relational databases where normalization is the default,...

Key Insights

  • Embedding documents optimizes for read performance and atomic operations but can lead to unbounded document growth and data duplication
  • Referencing normalizes data and prevents document bloat but requires multiple queries or $lookup operations, impacting read performance
  • The choice between embedding and referencing depends on relationship cardinality, query patterns, update frequency, and data growth characteristics

Understanding Document Structure Trade-offs

MongoDB’s flexible schema allows you to structure related data through embedding (denormalization) or referencing (normalization). Unlike relational databases where normalization is the default, MongoDB requires deliberate decisions about data modeling based on access patterns.

Embedding stores related data within a single document:

// Embedded document
{
  _id: ObjectId("507f1f77bcf86cd799439011"),
  title: "Building Microservices",
  author: {
    name: "Sam Newman",
    email: "sam@example.com",
    bio: "Software architect..."
  },
  publisher: {
    name: "O'Reilly Media",
    location: "Sebastopol, CA"
  }
}

Referencing stores related data in separate collections with references:

// Books collection
{
  _id: ObjectId("507f1f77bcf86cd799439011"),
  title: "Building Microservices",
  authorId: ObjectId("507f191e810c19729de860ea"),
  publisherId: ObjectId("507f191e810c19729de860eb")
}

// Authors collection
{
  _id: ObjectId("507f191e810c19729de860ea"),
  name: "Sam Newman",
  email: "sam@example.com",
  bio: "Software architect..."
}

One-to-One Relationships

For one-to-one relationships, embedding is typically optimal unless the subdocument grows unbounded or is frequently accessed independently.

Embed when: The related data is always queried together and has a fixed size.

// User profile with embedded address
db.users.insertOne({
  _id: ObjectId("507f1f77bcf86cd799439011"),
  username: "jdoe",
  email: "jdoe@example.com",
  address: {
    street: "123 Main St",
    city: "Portland",
    state: "OR",
    zip: "97204"
  },
  preferences: {
    theme: "dark",
    notifications: true,
    language: "en"
  }
});

// Single query retrieves everything
const user = await db.users.findOne({ username: "jdoe" });

Reference when: The subdocument is large, rarely accessed with the parent, or updated frequently.

// User with referenced medical records
db.users.insertOne({
  _id: ObjectId("507f1f77bcf86cd799439011"),
  username: "jdoe",
  email: "jdoe@example.com",
  medicalRecordId: ObjectId("507f191e810c19729de860ea")
});

db.medicalRecords.insertOne({
  _id: ObjectId("507f191e810c19729de860ea"),
  userId: ObjectId("507f1f77bcf86cd799439011"),
  bloodType: "O+",
  allergies: ["penicillin"],
  history: [/* large array */]
});

One-to-Many Relationships

The cardinality of “many” determines the strategy. Consider whether it’s “one-to-few” (bounded) or “one-to-millions” (unbounded).

Embedding for one-to-few:

// Blog post with embedded comments (assuming < 100 comments)
db.posts.insertOne({
  _id: ObjectId("507f1f77bcf86cd799439011"),
  title: "MongoDB Schema Design",
  content: "...",
  comments: [
    {
      _id: ObjectId("507f191e810c19729de860ea"),
      author: "Alice",
      text: "Great article!",
      createdAt: ISODate("2024-01-15T10:30:00Z")
    },
    {
      _id: ObjectId("507f191e810c19729de860eb"),
      author: "Bob",
      text: "Very helpful",
      createdAt: ISODate("2024-01-15T11:45:00Z")
    }
  ]
});

// Efficient queries and updates
await db.posts.updateOne(
  { _id: ObjectId("507f1f77bcf86cd799439011") },
  { 
    $push: { 
      comments: {
        _id: ObjectId(),
        author: "Charlie",
        text: "Thanks for sharing",
        createdAt: new Date()
      }
    }
  }
);

Referencing for one-to-many:

// E-commerce: products with many reviews
db.products.insertOne({
  _id: ObjectId("507f1f77bcf86cd799439011"),
  name: "Wireless Keyboard",
  price: 79.99,
  reviewCount: 1523,
  averageRating: 4.5
});

db.reviews.insertMany([
  {
    _id: ObjectId("507f191e810c19729de860ea"),
    productId: ObjectId("507f1f77bcf86cd799439011"),
    userId: ObjectId("507f191e810c19729de860ec"),
    rating: 5,
    text: "Excellent product",
    createdAt: ISODate("2024-01-10T14:20:00Z")
  },
  {
    _id: ObjectId("507f191e810c19729de860eb"),
    productId: ObjectId("507f1f77bcf86cd799439011"),
    userId: ObjectId("507f191e810c19729de860ed"),
    rating: 4,
    text: "Good value",
    createdAt: ISODate("2024-01-12T09:15:00Z")
  }
]);

// Query with pagination
const reviews = await db.reviews
  .find({ productId: ObjectId("507f1f77bcf86cd799439011") })
  .sort({ createdAt: -1 })
  .limit(10)
  .toArray();

Many-to-Many Relationships

Many-to-many relationships require careful consideration of query patterns and data access frequency.

Embedding array of references:

// Students and courses (query from student perspective)
db.students.insertOne({
  _id: ObjectId("507f1f77bcf86cd799439011"),
  name: "Jane Smith",
  email: "jsmith@university.edu",
  enrolledCourses: [
    ObjectId("507f191e810c19729de860ea"),
    ObjectId("507f191e810c19729de860eb"),
    ObjectId("507f191e810c19729de860ec")
  ]
});

// Retrieve student with course details
const student = await db.students.aggregate([
  { $match: { _id: ObjectId("507f1f77bcf86cd799439011") } },
  {
    $lookup: {
      from: "courses",
      localField: "enrolledCourses",
      foreignField: "_id",
      as: "courseDetails"
    }
  }
]).toArray();

Bidirectional references for equal access patterns:

// Tags and articles (frequently queried both ways)
db.articles.insertOne({
  _id: ObjectId("507f1f77bcf86cd799439011"),
  title: "Introduction to MongoDB",
  content: "...",
  tagIds: [
    ObjectId("507f191e810c19729de860ea"),
    ObjectId("507f191e810c19729de860eb")
  ]
});

db.tags.insertMany([
  {
    _id: ObjectId("507f191e810c19729de860ea"),
    name: "mongodb",
    articleIds: [
      ObjectId("507f1f77bcf86cd799439011"),
      ObjectId("507f191e810c19729de860ed")
    ]
  },
  {
    _id: ObjectId("507f191e810c19729de860eb"),
    name: "databases",
    articleIds: [
      ObjectId("507f1f77bcf86cd799439011"),
      ObjectId("507f191e810c19729de860ee")
    ]
  }
]);

// Find articles by tag (no $lookup needed)
const articles = await db.articles
  .find({ tagIds: ObjectId("507f191e810c19729de860ea") })
  .toArray();

Hybrid Approach: Extended Reference Pattern

Store frequently accessed fields with the reference to minimize $lookup operations:

// Orders with extended product references
db.orders.insertOne({
  _id: ObjectId("507f1f77bcf86cd799439011"),
  userId: ObjectId("507f191e810c19729de860ea"),
  orderDate: ISODate("2024-01-15T10:30:00Z"),
  items: [
    {
      productId: ObjectId("507f191e810c19729de860eb"),
      name: "Wireless Keyboard",  // Denormalized
      price: 79.99,               // Denormalized (at time of order)
      quantity: 1
    },
    {
      productId: ObjectId("507f191e810c19729de860ec"),
      name: "USB Mouse",          // Denormalized
      price: 29.99,               // Denormalized
      quantity: 2
    }
  ],
  total: 139.97
});

// Query orders without joining products
const orders = await db.orders
  .find({ userId: ObjectId("507f191e810c19729de860ea") })
  .toArray();

This pattern accepts data duplication for historical accuracy and query performance. The order preserves product names and prices as they were at purchase time.

Subset Pattern for Large Documents

When documents contain large arrays but queries typically need only recent items:

// User activity with subset pattern
db.users.insertOne({
  _id: ObjectId("507f1f77bcf86cd799439011"),
  username: "jdoe",
  email: "jdoe@example.com",
  recentActivity: [  // Last 10 activities embedded
    {
      type: "login",
      timestamp: ISODate("2024-01-15T10:30:00Z"),
      ip: "192.168.1.1"
    },
    // ... 9 more recent activities
  ]
});

// Full activity history in separate collection
db.activityHistory.insertMany([
  {
    _id: ObjectId("507f191e810c19729de860ea"),
    userId: ObjectId("507f1f77bcf86cd799439011"),
    type: "login",
    timestamp: ISODate("2024-01-15T10:30:00Z"),
    ip: "192.168.1.1"
  },
  // ... thousands of historical records
]);

// Most queries only need recent data (fast)
const user = await db.users.findOne({ username: "jdoe" });

// Historical analysis requires separate query (rare)
const fullHistory = await db.activityHistory
  .find({ userId: ObjectId("507f1f77bcf86cd799439011") })
  .sort({ timestamp: -1 })
  .toArray();

Performance Considerations and Indexing

Embedding reduces the need for joins but requires proper indexing for array queries:

// Index for querying embedded arrays
db.posts.createIndex({ "comments.author": 1 });

// Query embedded documents efficiently
const posts = await db.posts
  .find({ "comments.author": "Alice" })
  .toArray();

// For referenced data, index foreign keys
db.reviews.createIndex({ productId: 1, createdAt: -1 });

Choose embedding when read performance is critical and documents remain under the 16MB limit. Choose referencing when data is frequently updated independently, grows unbounded, or needs to be queried in multiple contexts. Monitor document sizes and query patterns in production to validate your design decisions.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.