Python - Remove Duplicates from List

Key Insights

Python offers multiple approaches to remove duplicates: set() for simple cases, dict.fromkeys() for order preservation in Python 3.7+, and manual iteration for complex object deduplication
Performance varies significantly: set() is O(n) but doesn’t preserve order in older Python versions, while list comprehension with tracking is O(n) but maintains insertion order
For custom objects or complex deduplication logic based on specific attributes, implement solutions using loops with seen-tracking or leverage itertools and operator modules

Using set() for Simple Deduplication

The most straightforward method to remove duplicates is converting a list to a set and back to a list. Sets inherently contain only unique elements.

numbers = [1, 2, 2, 3, 4, 4, 5, 6, 6, 6]
unique_numbers = list(set(numbers))
print(unique_numbers)  # [1, 2, 3, 4, 5, 6] (order not guaranteed in Python < 3.7)

This approach works well for hashable types (integers, strings, tuples) but has a critical limitation: it doesn’t preserve the original order in Python versions before 3.7. From Python 3.7+, dictionaries maintain insertion order, and since sets are implemented similarly, order is preserved.

# Works reliably in Python 3.7+
fruits = ['apple', 'banana', 'apple', 'orange', 'banana', 'grape']
unique_fruits = list(set(fruits))
print(unique_fruits)  # Order preserved in Python 3.7+

Performance: O(n) time complexity, O(n) space complexity. This is the fastest method for simple cases.

Preserving Order with dict.fromkeys()

For guaranteed order preservation across all Python 3.7+ versions, dict.fromkeys() is the preferred approach. Dictionary keys are unique and maintain insertion order.

items = [1, 2, 2, 3, 4, 4, 5, 1, 2]
unique_items = list(dict.fromkeys(items))
print(unique_items)  # [1, 2, 3, 4, 5]

This method is more explicit about order preservation and works with any hashable type:

mixed = ['a', 1, 'b', 2, 'a', 1, 'c']
unique_mixed = list(dict.fromkeys(mixed))
print(unique_mixed)  # ['a', 1, 'b', 2, 'c']

Performance: O(n) time complexity, O(n) space complexity. Slightly slower than set() but maintains order reliably.

Manual Order Preservation with List Comprehension

For Python versions before 3.7 or when you need explicit control, use a list comprehension with a tracking set:

def remove_duplicates_ordered(items):
    seen = set()
    result = []
    for item in items:
        if item not in seen:
            seen.add(item)
            result.append(item)
    return result

data = [5, 2, 3, 2, 1, 5, 4, 3]
unique_data = remove_duplicates_ordered(data)
print(unique_data)  # [5, 2, 3, 1, 4]

The same logic as a list comprehension with a side effect:

def remove_duplicates_compact(items):
    seen = set()
    return [x for x in items if not (x in seen or seen.add(x))]

data = [5, 2, 3, 2, 1, 5, 4, 3]
print(remove_duplicates_compact(data))  # [5, 2, 3, 1, 4]

This works because set.add() returns None, which is falsy, so x in seen or seen.add(x) evaluates to True only if x is already in seen.

Deduplicating Custom Objects

For custom objects, you need to define what makes an object unique. Use a key function to extract the comparison attribute:

class Product:
    def __init__(self, id, name, price):
        self.id = id
        self.name = name
        self.price = price
    
    def __repr__(self):
        return f"Product({self.id}, {self.name}, ${self.price})"

products = [
    Product(1, "Laptop", 999),
    Product(2, "Mouse", 25),
    Product(1, "Laptop", 899),  # Duplicate ID
    Product(3, "Keyboard", 75),
    Product(2, "Mouse", 30)     # Duplicate ID
]

def deduplicate_by_attribute(items, key_func):
    seen = set()
    result = []
    for item in items:
        key = key_func(item)
        if key not in seen:
            seen.add(key)
            result.append(item)
    return result

unique_products = deduplicate_by_attribute(products, lambda p: p.id)
print(unique_products)
# [Product(1, Laptop, $999), Product(2, Mouse, $25), Product(3, Keyboard, $75)]

For multiple attributes, use a tuple as the key:

def deduplicate_by_multiple(items, key_func):
    seen = set()
    result = []
    for item in items:
        key = key_func(item)
        if key not in seen:
            seen.add(key)
            result.append(item)
    return result

# Deduplicate by both name and price
unique_by_name_price = deduplicate_by_multiple(
    products, 
    lambda p: (p.name, p.price)
)
print(unique_by_name_price)

Using itertools for Complex Cases

The itertools.groupby() function is useful when you need to deduplicate sorted data or perform grouping operations:

from itertools import groupby
from operator import itemgetter

records = [
    {'id': 1, 'category': 'A', 'value': 10},
    {'id': 2, 'category': 'B', 'value': 20},
    {'id': 3, 'category': 'A', 'value': 15},
    {'id': 4, 'category': 'A', 'value': 12},
    {'id': 5, 'category': 'B', 'value': 25}
]

# Sort first (required for groupby)
sorted_records = sorted(records, key=itemgetter('category'))

# Get first item from each category
unique_by_category = [next(group) for key, group in groupby(sorted_records, key=itemgetter('category'))]
print(unique_by_category)
# [{'id': 1, 'category': 'A', 'value': 10}, {'id': 2, 'category': 'B', 'value': 20}]

Handling Unhashable Types

Lists and dictionaries cannot be added to sets because they’re unhashable. Convert them to hashable equivalents:

# List of lists
list_of_lists = [[1, 2], [3, 4], [1, 2], [5, 6]]
unique_lists = [list(t) for t in dict.fromkeys(tuple(l) for l in list_of_lists)]
print(unique_lists)  # [[1, 2], [3, 4], [5, 6]]

# List of dictionaries
list_of_dicts = [
    {'a': 1, 'b': 2},
    {'a': 3, 'b': 4},
    {'a': 1, 'b': 2},
]

unique_dicts = [dict(t) for t in dict.fromkeys(tuple(sorted(d.items())) for d in list_of_dicts)]
print(unique_dicts)  # [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}]

Performance Comparison

Here’s a practical benchmark comparing different approaches:

import timeit

data = list(range(1000)) * 10  # 10,000 items with duplicates

# Method 1: set()
def method_set():
    return list(set(data))

# Method 2: dict.fromkeys()
def method_dict():
    return list(dict.fromkeys(data))

# Method 3: Manual tracking
def method_manual():
    seen = set()
    result = []
    for item in data:
        if item not in seen:
            seen.add(item)
            result.append(item)
    return result

print(f"set(): {timeit.timeit(method_set, number=1000):.4f}s")
print(f"dict.fromkeys(): {timeit.timeit(method_dict, number=1000):.4f}s")
print(f"manual: {timeit.timeit(method_manual, number=1000):.4f}s")

Results typically show set() as fastest, dict.fromkeys() slightly slower, and manual iteration slowest but most flexible.

Choosing the Right Approach

Use set() when order doesn’t matter and you’re working with hashable types. Use dict.fromkeys() when you need guaranteed order preservation with hashable types. Implement manual tracking when working with custom objects or when you need deduplication based on specific attributes. For unhashable types, convert to hashable equivalents before deduplication. Always consider whether order preservation matters for your use case, as it impacts both performance and implementation choice.