Design a Feature Flag System: Gradual Rollouts

Key Insights

Consistent hashing is the foundation of reliable gradual rollouts—it ensures users get the same experience across sessions without storing individual assignments
Your evaluation engine should run locally with cached rules rather than making network calls per request; a 50ms flag check defeats the purpose of feature flags
Treat feature flags as technical debt from day one—implement automated cleanup processes before your codebase drowns in conditional logic

Introduction: Why Gradual Rollouts Matter

Feature flags let you separate code deployment from feature release. Gradual rollouts take this further: instead of a binary on/off switch, you expose new functionality to a controlled percentage of users, expanding that percentage as confidence grows.

The business case is straightforward. When you deploy a new payment flow to 100% of users and it breaks, you’ve broken payments for everyone. When you roll it out to 1% first, you’ve limited blast radius to a manageable incident. Beyond risk mitigation, gradual rollouts enable A/B testing, canary releases, and customer-specific beta programs—all from the same underlying system.

A production-grade feature flag system needs three things: reliable storage for flag configurations, a fast evaluation engine that makes targeting decisions, and client SDKs that minimize latency while staying synchronized. Let’s build each component.

Core Architecture Components

The architecture splits into a control plane (where you configure flags) and a data plane (where flags get evaluated). The control plane is a standard CRUD service with an admin UI. The data plane is where performance matters.

interface FeatureFlag {
  key: string;
  name: string;
  description: string;
  enabled: boolean;
  defaultValue: boolean;
  rolloutPercentage: number;
  targetingRules: TargetingRule[];
  killSwitch: boolean;
  createdAt: Date;
  updatedAt: Date;
  owner: string;
  expiresAt?: Date;
}

interface TargetingRule {
  id: string;
  priority: number;
  conditions: Condition[];
  rolloutPercentage: number;
  value: boolean;
}

interface Condition {
  attribute: string;
  operator: 'equals' | 'contains' | 'in' | 'gt' | 'lt' | 'regex';
  values: string[];
}

interface EvaluationContext {
  userId: string;
  email?: string;
  country?: string;
  planTier?: string;
  customAttributes: Record<string, string | number | boolean>;
}

The TargetingRule array is evaluated in priority order. Each rule has conditions that must all match (AND logic), and if they do, that rule’s rolloutPercentage and value determine the outcome. This gives you flexibility: 100% rollout for internal employees, 10% for free tier users, 50% for premium users.

Your client SDK should fetch the full flag configuration on initialization and cache it locally. Evaluation happens in-process against this cache. A background process polls for updates or subscribes to a push channel.

Rollout Strategies & Targeting Rules

Percentage-based rollouts need to be deterministic. If a user sees the new checkout flow on Monday, they must see it on Tuesday. Random number generation per request creates a jarring experience where features flicker in and out.

Consistent hashing solves this. Hash the user ID combined with the flag key, normalize to a 0-100 range, and compare against the rollout percentage. The same inputs always produce the same hash, ensuring sticky assignment without storing state.

package flags

import (
    "crypto/sha256"
    "encoding/binary"
)

func GetBucket(userID, flagKey string, bucketCount int) int {
    input := flagKey + ":" + userID
    hash := sha256.Sum256([]byte(input))
    
    // Use first 8 bytes as uint64
    hashValue := binary.BigEndian.Uint64(hash[:8])
    
    // Normalize to bucket range
    return int(hashValue % uint64(bucketCount))
}

func IsUserInRollout(userID, flagKey string, percentage int) bool {
    if percentage <= 0 {
        return false
    }
    if percentage >= 100 {
        return true
    }
    
    bucket := GetBucket(userID, flagKey, 100)
    return bucket < percentage
}

func GetVariant(userID, flagKey string, variants []string) string {
    bucket := GetBucket(userID, flagKey, len(variants))
    return variants[bucket]
}

Including the flag key in the hash input is crucial. Without it, users bucketed into the bottom 10% would be in the bottom 10% for every flag. By including the flag key, each flag gets independent randomization while maintaining per-flag consistency.

For targeting rules, evaluation order matters. Process rules by priority, returning on the first match. If no rules match, fall back to the default rollout percentage.

The Evaluation Engine

The evaluation engine is the hot path—it runs on every feature check. It must be fast, deterministic, and handle edge cases gracefully.

class FlagEvaluator {
  private flagCache: Map<string, FeatureFlag>;

  evaluate(flagKey: string, context: EvaluationContext): EvaluationResult {
    const flag = this.flagCache.get(flagKey);
    
    if (!flag) {
      return { value: false, reason: 'FLAG_NOT_FOUND' };
    }

    if (flag.killSwitch) {
      return { value: false, reason: 'KILL_SWITCH' };
    }

    if (!flag.enabled) {
      return { value: flag.defaultValue, reason: 'FLAG_DISABLED' };
    }

    // Evaluate targeting rules in priority order
    const sortedRules = [...flag.targetingRules]
      .sort((a, b) => a.priority - b.priority);

    for (const rule of sortedRules) {
      if (this.matchesAllConditions(rule.conditions, context)) {
        const inRollout = this.isInRollout(
          context.userId,
          flagKey,
          rule.rolloutPercentage
        );
        
        return {
          value: inRollout ? rule.value : flag.defaultValue,
          reason: 'RULE_MATCH',
          ruleId: rule.id
        };
      }
    }

    // No rules matched, use default rollout
    const inRollout = this.isInRollout(
      context.userId,
      flagKey,
      flag.rolloutPercentage
    );

    return {
      value: inRollout,
      reason: 'DEFAULT_ROLLOUT'
    };
  }

  private matchesAllConditions(
    conditions: Condition[],
    context: EvaluationContext
  ): boolean {
    return conditions.every(condition => {
      const contextValue = this.getContextValue(condition.attribute, context);
      return this.evaluateCondition(condition, contextValue);
    });
  }

  private evaluateCondition(
    condition: Condition,
    contextValue: string | number | boolean | undefined
  ): boolean {
    if (contextValue === undefined) return false;

    switch (condition.operator) {
      case 'equals':
        return condition.values.includes(String(contextValue));
      case 'in':
        return condition.values.includes(String(contextValue));
      case 'contains':
        return condition.values.some(v => 
          String(contextValue).includes(v)
        );
      case 'gt':
        return Number(contextValue) > Number(condition.values[0]);
      case 'lt':
        return Number(contextValue) < Number(condition.values[0]);
      default:
        return false;
    }
  }

  private isInRollout(
    userId: string,
    flagKey: string,
    percentage: number
  ): boolean {
    const hash = this.hashUserFlag(userId, flagKey);
    const bucket = hash % 100;
    return bucket < percentage;
  }

  private hashUserFlag(userId: string, flagKey: string): number {
    // Simple djb2 hash for illustration
    const input = `${flagKey}:${userId}`;
    let hash = 5381;
    for (let i = 0; i < input.length; i++) {
      hash = ((hash << 5) + hash) + input.charCodeAt(i);
    }
    return Math.abs(hash);
  }
}

Cache the flag configurations aggressively. A typical pattern is to refresh the cache every 30 seconds via polling, with an immediate refresh capability triggered by webhooks when flags change. For latency-sensitive applications, the SDK should never block on a network call during evaluation.

Consistency & Distribution

Distributed systems complicate flag synchronization. Your API servers, background workers, and edge functions all need consistent flag state. A user hitting different servers shouldn’t see different feature states.

Redis pub/sub provides a lightweight solution for flag propagation:

import Redis from 'ioredis';

class FlagSyncService {
  private redis: Redis;
  private subscriber: Redis;
  private flagCache: Map<string, FeatureFlag>;
  private readonly CHANNEL = 'flag-updates';
  private readonly CACHE_KEY = 'feature-flags:all';

  async initialize(): Promise<void> {
    // Load initial state
    const flagData = await this.redis.get(this.CACHE_KEY);
    if (flagData) {
      const flags: FeatureFlag[] = JSON.parse(flagData);
      flags.forEach(f => this.flagCache.set(f.key, f));
    }

    // Subscribe to updates
    await this.subscriber.subscribe(this.CHANNEL);
    this.subscriber.on('message', (channel, message) => {
      if (channel === this.CHANNEL) {
        this.handleFlagUpdate(JSON.parse(message));
      }
    });
  }

  private handleFlagUpdate(update: FlagUpdateEvent): void {
    switch (update.type) {
      case 'UPDATED':
      case 'CREATED':
        this.flagCache.set(update.flag.key, update.flag);
        break;
      case 'DELETED':
        this.flagCache.delete(update.flagKey);
        break;
    }
  }

  async publishFlagUpdate(flag: FeatureFlag): Promise<void> {
    // Update persistent storage
    const allFlags = Array.from(this.flagCache.values());
    await this.redis.set(this.CACHE_KEY, JSON.stringify(allFlags));

    // Notify all subscribers
    await this.redis.publish(this.CHANNEL, JSON.stringify({
      type: 'UPDATED',
      flag,
      timestamp: Date.now()
    }));
  }
}

For anonymous users, generate a stable device identifier stored in a cookie or local storage. Use this identifier for hashing instead of a user ID. When the user authenticates, you can choose to maintain their anonymous bucket assignment or re-bucket them based on their user ID—just be consistent about which approach you take.

Observability & Safety Controls

You can’t improve what you don’t measure. Every flag evaluation should emit an exposure event for analytics.

interface ExposureEvent {
  flagKey: string;
  userId: string;
  value: boolean;
  reason: string;
  ruleId?: string;
  timestamp: number;
}

class ExposureLogger {
  private buffer: ExposureEvent[] = [];
  private readonly FLUSH_INTERVAL = 5000;
  private readonly BATCH_SIZE = 100;

  constructor(private analyticsClient: AnalyticsClient) {
    setInterval(() => this.flush(), this.FLUSH_INTERVAL);
  }

  log(event: ExposureEvent): void {
    this.buffer.push(event);
    
    if (this.buffer.length >= this.BATCH_SIZE) {
      this.flush();
    }
  }

  private async flush(): Promise<void> {
    if (this.buffer.length === 0) return;

    const events = this.buffer.splice(0, this.buffer.length);
    
    try {
      await this.analyticsClient.trackBatch('flag_exposure', events);
    } catch (error) {
      // Re-queue failed events (with limit to prevent memory issues)
      this.buffer.unshift(...events.slice(0, 1000));
    }
  }
}

Kill switches deserve special attention. When a flag’s killSwitch is true, the evaluation engine should immediately return false, bypassing all other logic. This gives you a one-click emergency shutoff that doesn’t require reasoning about targeting rules during an incident.

Audit logging is non-negotiable for compliance. Log every flag configuration change with the actor, timestamp, and before/after state. This becomes critical when debugging why a feature behaved differently last Tuesday.

Operational Considerations

Feature flags accumulate. A two-year-old codebase can easily have hundreds of flags, most of which are fully rolled out and forgotten. This creates maintenance burden and cognitive overhead.

Set expiration dates on flags at creation time. Run weekly reports showing flags past their expiration. Better yet, automate it: if a flag has been at 100% rollout for 30 days with no incidents, open a PR removing it.

For testing, your test suite should exercise both flag states. Create test helpers that explicitly set flag values rather than relying on production configuration:

describe('checkout flow', () => {
  it('shows new payment form when flag enabled', async () => {
    withFeatureFlag('new-payment-form', true, async () => {
      const result = await renderCheckout();
      expect(result).toContain('new-payment-form');
    });
  });
});

On build versus buy: LaunchDarkly is the market leader with excellent SDKs and targeting capabilities—but it’s expensive at scale. Unleash and Flagsmith offer solid open-source alternatives you can self-host. Build your own only if you have unusual requirements or feature flags are core to your product. For most teams, the operational overhead of maintaining a custom system isn’t worth the savings.

Start simple. A basic flag service with percentage rollouts covers 80% of use cases. Add targeting rules when you actually need them. The best feature flag system is one your team will actually use.