Python - id() and hash() Functions
Python developers frequently conflate `id()` and `hash()`, assuming they serve similar purposes. They don't. These functions answer fundamentally different questions about objects, and understanding...
Key Insights
id()returns an object’s memory address and answers “is this the exact same object?”, whilehash()returns a computed integer for dictionary/set operations and answers “can this object be used as a key?”- Objects with the same hash aren’t necessarily identical (hash collisions), but objects used as dictionary keys must maintain consistent hashes throughout their lifetime
- Implementing
__hash__()without__eq__()(or vice versa) breaks Python’s object model and leads to subtle, maddening bugs in collections
Understanding Object Identity vs. Hashability
Python developers frequently conflate id() and hash(), assuming they serve similar purposes. They don’t. These functions answer fundamentally different questions about objects, and understanding the distinction is critical when debugging reference issues, implementing custom classes, or reasoning about dictionary and set behavior.
id() tells you where an object lives in memory. hash() tells you how an object can be indexed in hash-based collections. One is about identity; the other is about lookup efficiency. Let’s dig into both.
The id() Function Deep Dive
The id() function returns a unique integer identifier for an object. In CPython (the standard implementation), this is literally the object’s memory address. The only guarantee Python makes is that this identifier is unique and constant for the object’s lifetime.
# Basic id() behavior
x = [1, 2, 3]
print(f"id of x: {id(x)}") # e.g., 140234866534720
y = x # y references the same object
print(f"id of y: {id(y)}") # Same as x
z = [1, 2, 3] # New list with same contents
print(f"id of z: {id(z)}") # Different from x
print(f"x is y: {x is y}") # True - same object
print(f"x is z: {x is z}") # False - different objects
print(f"x == z: {x == z}") # True - equal contents
The is operator directly compares id() values. When you write a is b, Python evaluates id(a) == id(b).
Immutable objects behave differently due to Python’s optimization strategies:
# Immutable objects and id()
a = 42
b = 42
print(f"id(a): {id(a)}, id(b): {id(b)}")
print(f"a is b: {a is b}") # True - Python caches small integers
# String interning
s1 = "hello"
s2 = "hello"
print(f"s1 is s2: {s1 is s2}") # True - Python interns short strings
# But not always...
s3 = "hello world!"
s4 = "hello world!"
print(f"s3 is s4: {s3 is s4}") # May be False - depends on context
# Reassignment creates new objects for immutables
x = "original"
original_id = id(x)
x = "modified"
print(f"ID changed: {id(x) != original_id}") # True
Python caches small integers (-5 to 256) and interns certain strings for performance. Never rely on this behavior—it’s an implementation detail that varies across Python versions and implementations.
The hash() Function Deep Dive
hash() returns an integer computed from an object’s value, designed for fast dictionary key lookup and set membership testing. The critical requirement: an object’s hash must remain constant for its entire lifetime.
# Hashing immutable built-in types
print(hash("python")) # Consistent within a session
print(hash((1, 2, 3))) # Tuples are hashable
print(hash(42)) # Integers hash to themselves (mostly)
print(hash(3.14)) # Floats are hashable
# Unhashable types raise TypeError
try:
hash([1, 2, 3])
except TypeError as e:
print(f"Lists: {e}") # unhashable type: 'list'
try:
hash({"a": 1})
except TypeError as e:
print(f"Dicts: {e}") # unhashable type: 'dict'
try:
hash({1, 2, 3})
except TypeError as e:
print(f"Sets: {e}") # unhashable type: 'set'
Why can’t you hash mutable objects? Because if an object’s value changes after it’s added to a dictionary, its hash would change, and the dictionary would lose track of it. Python prevents this catastrophe by making mutable built-in types unhashable.
# frozenset is the immutable (hashable) version of set
fs = frozenset([1, 2, 3])
print(hash(fs)) # Works fine
# Tuples containing only hashable elements are hashable
print(hash((1, "a", (2, 3)))) # Works
# But tuples containing unhashable elements aren't
try:
hash((1, [2, 3]))
except TypeError as e:
print(f"Mixed tuple: {e}") # unhashable type: 'list'
Key Differences Between id() and hash()
The relationship between id() and hash() is neither one-to-one nor predictable:
# Same id implies same hash (trivially - it's the same object)
x = "test"
y = x
print(f"Same id: {id(x) == id(y)}") # True
print(f"Same hash: {hash(x) == hash(y)}") # True (same object)
# Same hash does NOT imply same id
a = 0
b = 0.0
print(f"Same id: {id(a) == id(b)}") # False - different objects
print(f"Same hash: {hash(a) == hash(b)}") # True - equal values hash equally
print(f"Equal: {a == b}") # True
# Different ids, same hash (by design for equal values)
s1 = "hello"
s2 = "".join(['h', 'e', 'l', 'l', 'o']) # Constructed differently
print(f"Same id: {id(s1) == id(s2)}") # Likely False
print(f"Same hash: {hash(s1) == hash(s2)}") # True - same content
Python enforces a critical invariant: if two objects compare equal, they must have the same hash. The reverse isn’t true—hash collisions are expected and handled by dictionaries and sets.
# Demonstrating hash collisions
# These have the same hash but aren't equal
class AlwaysSameHash:
def __init__(self, value):
self.value = value
def __hash__(self):
return 42 # Terrible hash function, but legal
def __eq__(self, other):
return isinstance(other, AlwaysSameHash) and self.value == other.value
a = AlwaysSameHash(1)
b = AlwaysSameHash(2)
print(f"Same hash: {hash(a) == hash(b)}") # True
print(f"Equal: {a == b}") # False
# Both can exist in a set (hash collision handled)
s = {a, b}
print(f"Set size: {len(s)}") # 2
Implementing hash() in Custom Classes
By default, custom classes are hashable using their id(). This works because the default __eq__() compares by identity:
class DefaultHashable:
def __init__(self, value):
self.value = value
a = DefaultHashable(1)
b = DefaultHashable(1)
print(f"hash(a): {hash(a)}")
print(f"hash(b): {hash(b)}") # Different from a
print(f"a == b: {a == b}") # False - identity comparison
When you override __eq__(), Python automatically sets __hash__ to None, making instances unhashable:
class BrokenHashable:
def __init__(self, value):
self.value = value
def __eq__(self, other):
return isinstance(other, BrokenHashable) and self.value == other.value
a = BrokenHashable(1)
try:
hash(a)
except TypeError as e:
print(f"Error: {e}") # unhashable type: 'BrokenHashable'
To create properly hashable objects, implement both methods consistently:
class Point:
def __init__(self, x, y):
self._x = x
self._y = y
@property
def x(self):
return self._x
@property
def y(self):
return self._y
def __eq__(self, other):
if not isinstance(other, Point):
return NotImplemented
return self._x == other._x and self._y == other._y
def __hash__(self):
return hash((self._x, self._y))
def __repr__(self):
return f"Point({self._x}, {self._y})"
# Now it works correctly
p1 = Point(1, 2)
p2 = Point(1, 2)
p3 = Point(3, 4)
print(f"p1 == p2: {p1 == p2}") # True
print(f"hash(p1) == hash(p2): {hash(p1) == hash(p2)}") # True
points = {p1, p2, p3}
print(f"Set: {points}") # Two unique points
lookup = {p1: "first", p3: "second"}
print(f"lookup[p2]: {lookup[p2]}") # "first" - p2 equals p1
Common Pitfalls and Best Practices
Pitfall 1: Mutating objects used as dictionary keys
class MutablePoint:
def __init__(self, x, y):
self.x = x
self.y = y
def __eq__(self, other):
return isinstance(other, MutablePoint) and self.x == other.x and self.y == other.y
def __hash__(self):
return hash((self.x, self.y)) # Danger: based on mutable state
p = MutablePoint(1, 2)
d = {p: "value"}
print(d[p]) # "value"
p.x = 100 # Mutate the key
print(p in d) # False - hash changed, lookup fails
print(list(d.keys())) # [MutablePoint] - it's still there, just unfindable
Pitfall 2: Using id() for equality checks
# Wrong: using id() to check equality
def bad_contains(lst, item):
return any(id(x) == id(item) for x in lst)
# Right: use equality
def good_contains(lst, item):
return item in lst # Uses __eq__
Using id() for debugging object lifecycles:
def debug_references():
data = {"key": "value"}
print(f"Created dict: id={id(data)}")
cache = {}
cache["data"] = data
print(f"After caching: id={id(cache['data'])}") # Same id
data = {"key": "new_value"} # Reassignment
print(f"After reassignment: id={id(data)}") # New id
print(f"Cache still has original: id={id(cache['data'])}")
debug_references()
Conclusion
Use id() when you need to verify object identity—debugging reference issues, understanding Python’s memory model, or confirming that two variables point to the exact same object. Use hash() when implementing objects that need to work as dictionary keys or set members.
| Aspect | id() | hash() |
|---|---|---|
| Returns | Memory address (CPython) | Computed integer |
| Purpose | Object identity | Collection indexing |
| Mutable objects | Always works | Raises TypeError |
| Equal objects | Different ids possible | Must have same hash |
| Custom classes | Always available | Requires __hash__() if __eq__() defined |
The golden rule: if you implement __eq__(), implement __hash__() using the same attributes, and make sure those attributes are immutable. Your future self debugging a dictionary lookup failure will thank you.