Python - Set Tutorial with Examples | Application Architect

Key Insights

• Python sets are unordered collections of unique elements that provide O(1) average time complexity for membership testing, making them significantly faster than lists for checking element existence • Sets support mathematical operations like union, intersection, and difference through both operators and methods, enabling elegant solutions to common data manipulation problems • Mutable sets allow in-place modifications while frozensets provide immutable alternatives that can be used as dictionary keys or elements of other sets

Creating Sets

Python offers multiple ways to create sets. The most common approach uses curly braces with comma-separated values, while the set() constructor converts iterables into sets.

# Using curly braces
fruits = {'apple', 'banana', 'orange'}

# Using set() constructor
numbers = set([1, 2, 3, 4, 5])

# Creating from string (each character becomes an element)
letters = set('hello')
print(letters)  # {'h', 'e', 'l', 'o'}

# Empty set - must use set(), not {}
empty_set = set()  # Correct
empty_dict = {}    # This creates a dictionary, not a set

# Set with mixed types
mixed = {1, 'two', 3.0, (4, 5)}  # Tuples allowed, lists not

Sets automatically eliminate duplicates during creation, making them ideal for deduplication tasks:

# Remove duplicates from a list
numbers_with_dupes = [1, 2, 2, 3, 3, 3, 4, 5, 5]
unique_numbers = set(numbers_with_dupes)
print(unique_numbers)  # {1, 2, 3, 4, 5}

# Convert back to list if needed
unique_list = list(unique_numbers)

Adding and Removing Elements

Sets provide several methods for modifying their contents. Understanding the differences between similar methods prevents runtime errors.

colors = {'red', 'green', 'blue'}

# add() - adds a single element
colors.add('yellow')
colors.add('red')  # No effect, already exists
print(colors)  # {'red', 'green', 'blue', 'yellow'}

# update() - adds multiple elements from an iterable
colors.update(['purple', 'orange'])
colors.update('pink')  # Adds each character as element
print(colors)  # Contains 'p', 'i', 'n', 'k' as separate elements

# Correct way to add a string as single element
colors = {'red', 'green', 'blue'}
colors.add('pink')  # Adds 'pink' as one element

Removal methods have important behavioral differences:

numbers = {1, 2, 3, 4, 5}

# remove() - raises KeyError if element doesn't exist
numbers.remove(3)
# numbers.remove(10)  # Raises KeyError

# discard() - silent if element doesn't exist
numbers.discard(4)
numbers.discard(10)  # No error

# pop() - removes and returns arbitrary element
element = numbers.pop()
print(f"Popped: {element}")

# clear() - removes all elements
numbers.clear()
print(numbers)  # set()

Set Operations and Methods

Sets excel at mathematical operations. Python provides both operator and method-based syntax for these operations.

set_a = {1, 2, 3, 4, 5}
set_b = {4, 5, 6, 7, 8}

# Union - all elements from both sets
union_op = set_a | set_b
union_method = set_a.union(set_b)
print(union_op)  # {1, 2, 3, 4, 5, 6, 7, 8}

# Intersection - elements in both sets
intersection_op = set_a & set_b
intersection_method = set_a.intersection(set_b)
print(intersection_op)  # {4, 5}

# Difference - elements in first but not second
difference_op = set_a - set_b
difference_method = set_a.difference(set_b)
print(difference_op)  # {1, 2, 3}

# Symmetric difference - elements in either but not both
sym_diff_op = set_a ^ set_b
sym_diff_method = set_a.symmetric_difference(set_b)
print(sym_diff_op)  # {1, 2, 3, 6, 7, 8}

Methods support multiple arguments and work with any iterable:

set_a = {1, 2, 3}
set_b = {2, 3, 4}
set_c = {3, 4, 5}

# Union of multiple sets
all_numbers = set_a.union(set_b, set_c)
print(all_numbers)  # {1, 2, 3, 4, 5}

# Intersection with list
common = set_a.intersection([2, 3, 6, 7])
print(common)  # {2, 3}

Membership Testing and Comparisons

Sets provide extremely fast membership testing, making them superior to lists for this purpose.

import time

# Performance comparison
large_list = list(range(100000))
large_set = set(range(100000))

# List lookup - O(n)
start = time.time()
99999 in large_list
list_time = time.time() - start

# Set lookup - O(1)
start = time.time()
99999 in large_set
set_time = time.time() - start

print(f"List: {list_time:.6f}s, Set: {set_time:.6f}s")
# Set is dramatically faster

Set comparison operations:

set_a = {1, 2, 3}
set_b = {1, 2, 3, 4, 5}
set_c = {1, 2, 3}

# Equality
print(set_a == set_c)  # True
print(set_a == set_b)  # False

# Subset and superset
print(set_a.issubset(set_b))    # True (set_a <= set_b)
print(set_b.issuperset(set_a))  # True (set_b >= set_a)

# Proper subset (subset but not equal)
print(set_a < set_b)   # True
print(set_a < set_c)   # False (they're equal)

# Disjoint sets (no common elements)
set_d = {6, 7, 8}
print(set_a.isdisjoint(set_d))  # True

Practical Applications

Sets solve real-world problems elegantly. Here are common use cases with production-ready code.

# Find unique words in text
text = "the quick brown fox jumps over the lazy dog the fox"
unique_words = set(text.lower().split())
print(f"Unique words: {len(unique_words)}")

# Find duplicate entries
def find_duplicates(items):
    seen = set()
    duplicates = set()
    for item in items:
        if item in seen:
            duplicates.add(item)
        else:
            seen.add(item)
    return duplicates

emails = ['user@example.com', 'admin@test.com', 'user@example.com']
print(find_duplicates(emails))  # {'user@example.com'}

# Compare two lists for differences
old_users = {'alice', 'bob', 'charlie', 'david'}
new_users = {'bob', 'charlie', 'eve', 'frank'}

added = new_users - old_users
removed = old_users - new_users
unchanged = old_users & new_users

print(f"Added: {added}")      # {'eve', 'frank'}
print(f"Removed: {removed}")  # {'alice', 'david'}
print(f"Unchanged: {unchanged}")  # {'bob', 'charlie'}

Tag filtering example:

# Filter items by multiple tags
items = [
    {'name': 'Item1', 'tags': {'python', 'web', 'backend'}},
    {'name': 'Item2', 'tags': {'python', 'data', 'ml'}},
    {'name': 'Item3', 'tags': {'javascript', 'web', 'frontend'}},
]

required_tags = {'python', 'web'}
matching_items = [
    item for item in items 
    if required_tags.issubset(item['tags'])
]
print([item['name'] for item in matching_items])  # ['Item1']

Frozensets and Immutability

Frozensets are immutable versions of sets, hashable and usable as dictionary keys or set elements.

# Creating frozensets
immutable_set = frozenset([1, 2, 3, 4, 5])

# Cannot modify
# immutable_set.add(6)  # AttributeError

# Use as dictionary keys
permissions = {
    frozenset(['read', 'write']): 'editor',
    frozenset(['read']): 'viewer',
    frozenset(['read', 'write', 'delete']): 'admin'
}

user_perms = frozenset(['read', 'write'])
print(permissions[user_perms])  # 'editor'

# Sets of sets using frozensets
set_of_sets = {
    frozenset([1, 2]),
    frozenset([3, 4]),
    frozenset([1, 2])  # Duplicate, ignored
}
print(len(set_of_sets))  # 2

Set Comprehensions

Set comprehensions provide concise syntax for creating sets from iterables with optional filtering.

# Basic set comprehension
squares = {x**2 for x in range(10)}
print(squares)  # {0, 1, 4, 9, 16, 25, 36, 49, 64, 81}

# With condition
even_squares = {x**2 for x in range(10) if x % 2 == 0}
print(even_squares)  # {0, 4, 16, 36, 64}

# Extract unique domains from email list
emails = ['user@gmail.com', 'admin@yahoo.com', 'test@gmail.com']
domains = {email.split('@')[1] for email in emails}
print(domains)  # {'gmail.com', 'yahoo.com'}

# Flatten and deduplicate nested lists
nested = [[1, 2, 3], [3, 4, 5], [5, 6, 7]]
unique_values = {num for sublist in nested for num in sublist}
print(unique_values)  # {1, 2, 3, 4, 5, 6, 7}

Sets provide powerful tools for data manipulation with excellent performance characteristics. Their unique element constraint and mathematical operations make them indispensable for deduplication, membership testing, and set-theoretic computations. Choose sets when element uniqueness matters and order doesn’t, use frozensets when immutability is required, and leverage set operations to write cleaner, more efficient code.