Python - Set Tutorial with Examples
• Python sets are unordered collections of unique elements that provide O(1) average time complexity for membership testing, making them significantly faster than lists for checking element existence
Key Insights
• Python sets are unordered collections of unique elements that provide O(1) average time complexity for membership testing, making them significantly faster than lists for checking element existence • Sets support mathematical operations like union, intersection, and difference through both operators and methods, enabling elegant solutions to common data manipulation problems • Mutable sets allow in-place modifications while frozensets provide immutable alternatives that can be used as dictionary keys or elements of other sets
Creating Sets
Python offers multiple ways to create sets. The most common approach uses curly braces with comma-separated values, while the set() constructor converts iterables into sets.
# Using curly braces
fruits = {'apple', 'banana', 'orange'}
# Using set() constructor
numbers = set([1, 2, 3, 4, 5])
# Creating from string (each character becomes an element)
letters = set('hello')
print(letters) # {'h', 'e', 'l', 'o'}
# Empty set - must use set(), not {}
empty_set = set() # Correct
empty_dict = {} # This creates a dictionary, not a set
# Set with mixed types
mixed = {1, 'two', 3.0, (4, 5)} # Tuples allowed, lists not
Sets automatically eliminate duplicates during creation, making them ideal for deduplication tasks:
# Remove duplicates from a list
numbers_with_dupes = [1, 2, 2, 3, 3, 3, 4, 5, 5]
unique_numbers = set(numbers_with_dupes)
print(unique_numbers) # {1, 2, 3, 4, 5}
# Convert back to list if needed
unique_list = list(unique_numbers)
Adding and Removing Elements
Sets provide several methods for modifying their contents. Understanding the differences between similar methods prevents runtime errors.
colors = {'red', 'green', 'blue'}
# add() - adds a single element
colors.add('yellow')
colors.add('red') # No effect, already exists
print(colors) # {'red', 'green', 'blue', 'yellow'}
# update() - adds multiple elements from an iterable
colors.update(['purple', 'orange'])
colors.update('pink') # Adds each character as element
print(colors) # Contains 'p', 'i', 'n', 'k' as separate elements
# Correct way to add a string as single element
colors = {'red', 'green', 'blue'}
colors.add('pink') # Adds 'pink' as one element
Removal methods have important behavioral differences:
numbers = {1, 2, 3, 4, 5}
# remove() - raises KeyError if element doesn't exist
numbers.remove(3)
# numbers.remove(10) # Raises KeyError
# discard() - silent if element doesn't exist
numbers.discard(4)
numbers.discard(10) # No error
# pop() - removes and returns arbitrary element
element = numbers.pop()
print(f"Popped: {element}")
# clear() - removes all elements
numbers.clear()
print(numbers) # set()
Set Operations and Methods
Sets excel at mathematical operations. Python provides both operator and method-based syntax for these operations.
set_a = {1, 2, 3, 4, 5}
set_b = {4, 5, 6, 7, 8}
# Union - all elements from both sets
union_op = set_a | set_b
union_method = set_a.union(set_b)
print(union_op) # {1, 2, 3, 4, 5, 6, 7, 8}
# Intersection - elements in both sets
intersection_op = set_a & set_b
intersection_method = set_a.intersection(set_b)
print(intersection_op) # {4, 5}
# Difference - elements in first but not second
difference_op = set_a - set_b
difference_method = set_a.difference(set_b)
print(difference_op) # {1, 2, 3}
# Symmetric difference - elements in either but not both
sym_diff_op = set_a ^ set_b
sym_diff_method = set_a.symmetric_difference(set_b)
print(sym_diff_op) # {1, 2, 3, 6, 7, 8}
Methods support multiple arguments and work with any iterable:
set_a = {1, 2, 3}
set_b = {2, 3, 4}
set_c = {3, 4, 5}
# Union of multiple sets
all_numbers = set_a.union(set_b, set_c)
print(all_numbers) # {1, 2, 3, 4, 5}
# Intersection with list
common = set_a.intersection([2, 3, 6, 7])
print(common) # {2, 3}
Membership Testing and Comparisons
Sets provide extremely fast membership testing, making them superior to lists for this purpose.
import time
# Performance comparison
large_list = list(range(100000))
large_set = set(range(100000))
# List lookup - O(n)
start = time.time()
99999 in large_list
list_time = time.time() - start
# Set lookup - O(1)
start = time.time()
99999 in large_set
set_time = time.time() - start
print(f"List: {list_time:.6f}s, Set: {set_time:.6f}s")
# Set is dramatically faster
Set comparison operations:
set_a = {1, 2, 3}
set_b = {1, 2, 3, 4, 5}
set_c = {1, 2, 3}
# Equality
print(set_a == set_c) # True
print(set_a == set_b) # False
# Subset and superset
print(set_a.issubset(set_b)) # True (set_a <= set_b)
print(set_b.issuperset(set_a)) # True (set_b >= set_a)
# Proper subset (subset but not equal)
print(set_a < set_b) # True
print(set_a < set_c) # False (they're equal)
# Disjoint sets (no common elements)
set_d = {6, 7, 8}
print(set_a.isdisjoint(set_d)) # True
Practical Applications
Sets solve real-world problems elegantly. Here are common use cases with production-ready code.
# Find unique words in text
text = "the quick brown fox jumps over the lazy dog the fox"
unique_words = set(text.lower().split())
print(f"Unique words: {len(unique_words)}")
# Find duplicate entries
def find_duplicates(items):
seen = set()
duplicates = set()
for item in items:
if item in seen:
duplicates.add(item)
else:
seen.add(item)
return duplicates
emails = ['user@example.com', 'admin@test.com', 'user@example.com']
print(find_duplicates(emails)) # {'user@example.com'}
# Compare two lists for differences
old_users = {'alice', 'bob', 'charlie', 'david'}
new_users = {'bob', 'charlie', 'eve', 'frank'}
added = new_users - old_users
removed = old_users - new_users
unchanged = old_users & new_users
print(f"Added: {added}") # {'eve', 'frank'}
print(f"Removed: {removed}") # {'alice', 'david'}
print(f"Unchanged: {unchanged}") # {'bob', 'charlie'}
Tag filtering example:
# Filter items by multiple tags
items = [
{'name': 'Item1', 'tags': {'python', 'web', 'backend'}},
{'name': 'Item2', 'tags': {'python', 'data', 'ml'}},
{'name': 'Item3', 'tags': {'javascript', 'web', 'frontend'}},
]
required_tags = {'python', 'web'}
matching_items = [
item for item in items
if required_tags.issubset(item['tags'])
]
print([item['name'] for item in matching_items]) # ['Item1']
Frozensets and Immutability
Frozensets are immutable versions of sets, hashable and usable as dictionary keys or set elements.
# Creating frozensets
immutable_set = frozenset([1, 2, 3, 4, 5])
# Cannot modify
# immutable_set.add(6) # AttributeError
# Use as dictionary keys
permissions = {
frozenset(['read', 'write']): 'editor',
frozenset(['read']): 'viewer',
frozenset(['read', 'write', 'delete']): 'admin'
}
user_perms = frozenset(['read', 'write'])
print(permissions[user_perms]) # 'editor'
# Sets of sets using frozensets
set_of_sets = {
frozenset([1, 2]),
frozenset([3, 4]),
frozenset([1, 2]) # Duplicate, ignored
}
print(len(set_of_sets)) # 2
Set Comprehensions
Set comprehensions provide concise syntax for creating sets from iterables with optional filtering.
# Basic set comprehension
squares = {x**2 for x in range(10)}
print(squares) # {0, 1, 4, 9, 16, 25, 36, 49, 64, 81}
# With condition
even_squares = {x**2 for x in range(10) if x % 2 == 0}
print(even_squares) # {0, 4, 16, 36, 64}
# Extract unique domains from email list
emails = ['user@gmail.com', 'admin@yahoo.com', 'test@gmail.com']
domains = {email.split('@')[1] for email in emails}
print(domains) # {'gmail.com', 'yahoo.com'}
# Flatten and deduplicate nested lists
nested = [[1, 2, 3], [3, 4, 5], [5, 6, 7]]
unique_values = {num for sublist in nested for num in sublist}
print(unique_values) # {1, 2, 3, 4, 5, 6, 7}
Sets provide powerful tools for data manipulation with excellent performance characteristics. Their unique element constraint and mathematical operations make them indispensable for deduplication, membership testing, and set-theoretic computations. Choose sets when element uniqueness matters and order doesn’t, use frozensets when immutability is required, and leverage set operations to write cleaner, more efficient code.