Python - Set Operations (Union, Intersection, Difference)

Sets are unordered collections of unique elements implemented as hash tables. Unlike lists or tuples, sets automatically eliminate duplicates and provide constant-time membership testing.

Key Insights

  • Python sets provide O(1) average-case lookup and mathematical operations that outperform list-based approaches by orders of magnitude for membership testing and deduplication
  • Set operations (union, intersection, difference, symmetric difference) use both operator syntax (|, &, -, ^) and method calls, with methods supporting any iterable while operators require sets
  • Frozen sets enable using sets as dictionary keys or set elements, while regular sets remain mutable and unhashable

Understanding Python Sets

Sets are unordered collections of unique elements implemented as hash tables. Unlike lists or tuples, sets automatically eliminate duplicates and provide constant-time membership testing.

# Creating sets
numbers = {1, 2, 3, 4, 5}
empty_set = set()  # {} creates a dict, not a set
from_list = set([1, 2, 2, 3, 3, 3])  # {1, 2, 3}

# Sets only contain hashable elements
valid_set = {1, "text", (1, 2), frozenset([3, 4])}
# invalid_set = {[1, 2]}  # TypeError: unhashable type: 'list'

Performance characteristics make sets ideal for specific use cases:

import time

# List membership test: O(n)
large_list = list(range(100000))
start = time.perf_counter()
99999 in large_list
list_time = time.perf_counter() - start

# Set membership test: O(1)
large_set = set(range(100000))
start = time.perf_counter()
99999 in large_set
set_time = time.perf_counter() - start

print(f"List: {list_time:.6f}s, Set: {set_time:.6f}s")
# List: 0.001234s, Set: 0.000001s (typical results)

Union Operations

Union combines elements from multiple sets, eliminating duplicates. Use the | operator or union() method.

set_a = {1, 2, 3, 4}
set_b = {3, 4, 5, 6}

# Operator syntax (requires both operands to be sets)
union_result = set_a | set_b  # {1, 2, 3, 4, 5, 6}

# Method syntax (accepts any iterable)
union_result = set_a.union(set_b)
union_result = set_a.union([5, 6, 7])  # Works with lists
union_result = set_a.union([5, 6], {7, 8}, (9, 10))  # Multiple iterables

# In-place union with |=
set_a |= set_b  # Modifies set_a
# Or use update()
set_a.update(set_b, [7, 8])

Practical example - combining user permissions:

class PermissionManager:
    def __init__(self):
        self.role_permissions = {
            'admin': {'read', 'write', 'delete', 'manage_users'},
            'editor': {'read', 'write'},
            'viewer': {'read'}
        }
    
    def get_user_permissions(self, roles):
        """Get combined permissions for all user roles"""
        permissions = set()
        for role in roles:
            permissions |= self.role_permissions.get(role, set())
        return permissions

pm = PermissionManager()
user_perms = pm.get_user_permissions(['editor', 'viewer'])
print(user_perms)  # {'read', 'write'}

Intersection Operations

Intersection returns elements common to all sets. Use & operator or intersection() method.

set_a = {1, 2, 3, 4, 5}
set_b = {3, 4, 5, 6, 7}
set_c = {4, 5, 6, 7, 8}

# Operator syntax
common = set_a & set_b  # {3, 4, 5}
common_all = set_a & set_b & set_c  # {4, 5}

# Method syntax
common = set_a.intersection(set_b)
common_all = set_a.intersection(set_b, set_c)
common = set_a.intersection([3, 4, 9])  # Works with any iterable

# In-place intersection
set_a &= set_b  # Modifies set_a
# Or use intersection_update()
set_a.intersection_update(set_b, set_c)

Real-world example - finding common tags across documents:

def find_common_tags(documents, min_documents=2):
    """Find tags appearing in at least min_documents"""
    if not documents:
        return set()
    
    tag_sets = [set(doc.get('tags', [])) for doc in documents]
    
    if min_documents == len(documents):
        # All documents must have the tag
        return set.intersection(*tag_sets) if tag_sets else set()
    
    # Count tag occurrences
    tag_count = {}
    for tags in tag_sets:
        for tag in tags:
            tag_count[tag] = tag_count.get(tag, 0) + 1
    
    return {tag for tag, count in tag_count.items() if count >= min_documents}

docs = [
    {'id': 1, 'tags': ['python', 'web', 'api']},
    {'id': 2, 'tags': ['python', 'data', 'api']},
    {'id': 3, 'tags': ['python', 'automation']}
]

print(find_common_tags(docs, min_documents=3))  # {'python'}
print(find_common_tags(docs, min_documents=2))  # {'python', 'api'}

Difference Operations

Difference returns elements in the first set but not in others. Use - operator or difference() method.

set_a = {1, 2, 3, 4, 5}
set_b = {3, 4, 5, 6, 7}

# Elements in set_a but not in set_b
diff = set_a - set_b  # {1, 2}

# Method syntax
diff = set_a.difference(set_b)
diff = set_a.difference(set_b, {1})  # {2}
diff = set_a.difference([3, 4, 5])  # Works with iterables

# In-place difference
set_a -= set_b  # Modifies set_a
# Or use difference_update()
set_a.difference_update(set_b)

Practical application - tracking changes in data:

class DataChangeTracker:
    def __init__(self, initial_data):
        self.current_data = set(initial_data)
        self.previous_data = set(initial_data)
    
    def update(self, new_data):
        """Update data and return changes"""
        new_set = set(new_data)
        
        added = new_set - self.current_data
        removed = self.current_data - new_set
        unchanged = self.current_data & new_set
        
        self.previous_data = self.current_data
        self.current_data = new_set
        
        return {
            'added': added,
            'removed': removed,
            'unchanged': unchanged
        }

tracker = DataChangeTracker(['user1', 'user2', 'user3'])
changes = tracker.update(['user2', 'user3', 'user4', 'user5'])
print(f"Added: {changes['added']}")      # {'user4', 'user5'}
print(f"Removed: {changes['removed']}")  # {'user1'}

Symmetric Difference

Symmetric difference returns elements in either set but not in both. Use ^ operator or symmetric_difference() method.

set_a = {1, 2, 3, 4}
set_b = {3, 4, 5, 6}

# Elements in either set but not both
sym_diff = set_a ^ set_b  # {1, 2, 5, 6}

# Method syntax
sym_diff = set_a.symmetric_difference(set_b)
sym_diff = set_a.symmetric_difference([3, 4, 5])

# In-place symmetric difference
set_a ^= set_b
# Or use symmetric_difference_update()
set_a.symmetric_difference_update(set_b)

Example - detecting configuration drift:

def detect_config_drift(expected_config, actual_config):
    """Compare expected vs actual configuration keys"""
    expected_keys = set(expected_config.keys())
    actual_keys = set(actual_config.keys())
    
    missing = expected_keys - actual_keys
    extra = actual_keys - expected_keys
    common = expected_keys & actual_keys
    
    mismatched = {
        key for key in common 
        if expected_config[key] != actual_config[key]
    }
    
    return {
        'missing_keys': missing,
        'extra_keys': extra,
        'mismatched_values': mismatched,
        'has_drift': bool(missing or extra or mismatched)
    }

expected = {'host': 'localhost', 'port': 8080, 'debug': False}
actual = {'host': 'localhost', 'port': 9000, 'ssl': True}

drift = detect_config_drift(expected, actual)
print(drift)
# {'missing_keys': {'debug'}, 'extra_keys': {'ssl'}, 
#  'mismatched_values': {'port'}, 'has_drift': True}

Advanced Set Operations

Sets support subset, superset, and disjoint testing:

set_a = {1, 2, 3}
set_b = {1, 2, 3, 4, 5}
set_c = {6, 7, 8}

# Subset testing
print(set_a.issubset(set_b))     # True
print(set_a <= set_b)             # True
print(set_a < set_b)              # True (proper subset)

# Superset testing
print(set_b.issuperset(set_a))   # True
print(set_b >= set_a)             # True
print(set_b > set_a)              # True (proper superset)

# Disjoint testing (no common elements)
print(set_a.isdisjoint(set_c))   # True
print(set_a.isdisjoint(set_b))   # False

Frozen sets for immutable collections:

# Frozen sets are hashable
regular_set = {1, 2, 3}
frozen = frozenset([1, 2, 3])

# Can be used as dictionary keys
cache = {frozen: "cached_value"}

# Can be elements of sets
set_of_sets = {frozenset([1, 2]), frozenset([3, 4])}

# Support all read-only operations
fs1 = frozenset([1, 2, 3])
fs2 = frozenset([2, 3, 4])
print(fs1 | fs2)  # frozenset({1, 2, 3, 4})
print(fs1 & fs2)  # frozenset({2, 3})

# No modification methods
# fs1.add(4)  # AttributeError

Set comprehensions for concise set creation:

# Basic comprehension
squares = {x**2 for x in range(10)}  # {0, 1, 4, 9, 16, 25, 36, 49, 64, 81}

# With conditions
even_squares = {x**2 for x in range(10) if x % 2 == 0}

# From nested data
users = [
    {'id': 1, 'tags': ['python', 'web']},
    {'id': 2, 'tags': ['python', 'data']}
]
all_tags = {tag for user in users for tag in user['tags']}

Set operations provide efficient solutions for deduplication, membership testing, and mathematical set theory implementations. Choose sets over lists when element uniqueness matters and order doesn’t.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.