MongoDB Replication: Replica Sets

Key Insights

Replica sets provide automatic failover and data redundancy through a minimum of three nodes, with one primary accepting writes and secondaries replicating data asynchronously using the oplog
Elections occur within seconds when the primary fails, using a priority-based voting system where only nodes with the highest data freshness can become primary
Read preference strategies and write concerns allow fine-grained control over consistency versus performance trade-offs in distributed MongoDB deployments

Understanding Replica Set Architecture

A MongoDB replica set consists of multiple mongod instances that maintain identical data sets. The architecture includes one primary node that receives all write operations and multiple secondary nodes that replicate the primary’s oplog (operations log).

The minimum recommended configuration is three nodes: one primary and two secondaries. This configuration survives the failure of any single node while maintaining data availability. For production environments, odd numbers of voting members (3, 5, or 7) prevent split-brain scenarios during elections.

Here’s a basic replica set configuration:

rs.initiate({
  _id: "rs0",
  members: [
    { _id: 0, host: "mongodb0.example.net:27017", priority: 2 },
    { _id: 1, host: "mongodb1.example.net:27017", priority: 1 },
    { _id: 2, host: "mongodb2.example.net:27017", priority: 1 }
  ]
})

The priority field determines election preference. Higher values make a node more likely to become primary during elections. Setting priority to 0 creates a passive node that can never become primary but still holds data and can vote.

Oplog and Data Replication

The oplog is a capped collection that records all operations modifying data. Secondaries continuously tail the primary’s oplog and apply operations in the same order. This asynchronous replication means secondaries may lag behind the primary.

Check oplog status:

use local
db.oplog.rs.stats()

// Get oplog time window
db.oplog.rs.find().sort({$natural: 1}).limit(1).pretty()
db.oplog.rs.find().sort({$natural: -1}).limit(1).pretty()

The oplog size determines how long a secondary can be offline before requiring a full resync. Default sizing is typically 5% of free disk space. Configure it explicitly:

# mongod.conf
replication:
  oplogSizeMB: 10240
  replSetName: "rs0"

Monitor replication lag across members:

rs.printReplicationInfo()
rs.printSecondaryReplicationInfo()

Configuring Automatic Failover

When the primary becomes unavailable, remaining members hold an election. A node must receive votes from a majority of all voting members to become primary.

Configure heartbeat intervals and timeouts:

cfg = rs.conf()
cfg.settings = {
  heartbeatIntervalMillis: 2000,
  heartbeatTimeoutSecs: 10,
  electionTimeoutMillis: 10000
}
rs.reconfig(cfg)

Elections complete in approximately 12 seconds by default. Reducing electionTimeoutMillis speeds up failover but increases sensitivity to network issues.

Arbiter nodes participate in elections without holding data:

rs.addArb("mongodb3.example.net:27017")

Use arbiters cautiously in production. They’re suitable for two-data-center deployments where you need a tiebreaker in a third location without full data replication costs.

Write Concerns for Durability

Write concerns specify acknowledgment requirements before returning success. This directly impacts data durability and performance.

// Wait for majority acknowledgment
db.inventory.insertOne(
  { item: "widget", qty: 100 },
  { writeConcern: { w: "majority", wtimeout: 5000 } }
)

// Wait for specific number of nodes
db.inventory.updateOne(
  { item: "widget" },
  { $inc: { qty: 5 } },
  { writeConcern: { w: 3, wtimeout: 5000 } }
)

// Write to primary only (fastest, least durable)
db.inventory.insertOne(
  { item: "gadget", qty: 50 },
  { writeConcern: { w: 1 } }
)

The majority write concern ensures data survives most failure scenarios. It waits until writes replicate to a majority of voting members. This prevents rollbacks during failover in most cases.

Configure default write concern at database level:

db.runCommand({
  setDefaultRWConcern: 1,
  defaultWriteConcern: {
    w: "majority",
    wtimeout: 5000
  }
})

Read Preferences and Distribution

Read preferences control which replica set members receive read operations. The default primary mode sends all reads to the primary, ensuring strict consistency.

// Read from primary (default)
db.products.find().readPref("primary")

// Read from secondaries only
db.products.find().readPref("secondary")

// Prefer secondary, fallback to primary
db.products.find().readPref("secondaryPreferred")

// Nearest node by latency
db.products.find().readPref("nearest")

Tag-based read preferences enable geographic routing:

// Configure member tags
cfg = rs.conf()
cfg.members[0].tags = { dc: "east", usage: "production" }
cfg.members[1].tags = { dc: "west", usage: "production" }
cfg.members[2].tags = { dc: "east", usage: "analytics" }
rs.reconfig(cfg)

// Read from specific tagged members
db.products.find().readPref("nearest", [{ dc: "east" }])

// Multi-level preference
db.analytics.find().readPref("secondary", [
  { usage: "analytics" },
  { dc: "east" }
])

This approach isolates analytics workloads from production traffic and routes reads to geographically closer nodes.

Monitoring and Maintenance

Regular monitoring prevents replication issues from escalating. Key metrics include replication lag, oplog window, and member health.

// Detailed replica set status
rs.status()

// Check specific member lag
rs.status().members.forEach(function(member) {
  print(member.name + ": " + member.optimeDate)
})

// Identify sync source
rs.status().members.forEach(function(member) {
  if (member.syncSourceHost) {
    print(member.name + " syncs from " + member.syncSourceHost)
  }
})

Perform rolling maintenance without downtime:

// Step down current primary (triggers election)
db.adminCommand({ replSetStepDown: 60 })

// Remove member for maintenance
rs.remove("mongodb2.example.net:27017")

// After maintenance, add back
rs.add("mongodb2.example.net:27017")

For priority-based controlled failover:

cfg = rs.conf()
cfg.members[0].priority = 0  // Current primary
cfg.members[1].priority = 2  // Preferred new primary
rs.reconfig(cfg)

// After election completes, restore original priorities

Handling Network Partitions

Network partitions split the replica set into isolated groups. The majority partition continues operating while minority partitions become read-only.

// Simulate partition by adjusting priorities
cfg = rs.conf()
cfg.members[2].priority = 0
cfg.members[2].votes = 0
rs.reconfig(cfg)

Prevent split-brain with proper member distribution. In a three-node set across two data centers, place two members in one location and one in the other. The two-member location maintains majority during partition.

Configure member voting explicitly:

cfg = rs.conf()
cfg.members[3].votes = 0  // Non-voting member
cfg.members[3].priority = 0
rs.reconfig(cfg)

Non-voting members don’t participate in elections but maintain data copies. Use them for dedicated reporting or backup purposes without impacting election dynamics.

Connection String Configuration

Application connection strings must specify all replica set members for automatic failover:

const { MongoClient } = require('mongodb');

const uri = "mongodb://mongodb0.example.net:27017,mongodb1.example.net:27017,mongodb2.example.net:27017/?replicaSet=rs0";

const client = new MongoClient(uri, {
  readPreference: 'secondaryPreferred',
  w: 'majority',
  wtimeoutMS: 5000
});

async function run() {
  try {
    await client.connect();
    const db = client.db('myapp');
    
    // Writes go to primary with majority concern
    await db.collection('orders').insertOne(
      { order_id: 12345, total: 99.99 },
      { writeConcern: { w: 'majority' } }
    );
    
    // Reads prefer secondaries
    const products = await db.collection('products')
      .find({})
      .readPreference('secondaryPreferred')
      .toArray();
      
  } finally {
    await client.close();
  }
}

The driver automatically discovers topology changes and routes operations appropriately. Specify retryWrites: true for automatic retry on transient failures.

Replica sets form the foundation of MongoDB high availability. Proper configuration of voting members, write concerns, and read preferences balances consistency requirements against performance needs while maintaining resilience against node and network failures.