Python - Get Unique Values from List

Key Insights

• Python offers multiple methods to extract unique values from lists, each with different performance characteristics and ordering guarantees—set() is fastest but loses order, while dict.fromkeys() preserves insertion order in Python 3.7+ • For complex objects or custom uniqueness criteria, list comprehensions with tracking sets provide fine-grained control while maintaining O(n) time complexity • When working with unhashable types like nested lists or dictionaries, conversion strategies or specialized libraries like pandas become necessary

Using set() for Basic Uniqueness

The simplest approach to getting unique values from a list is converting it to a set. This method is fast, concise, and works well for basic data types.

numbers = [1, 2, 2, 3, 4, 4, 5, 1, 6]
unique_numbers = list(set(numbers))
print(unique_numbers)  # [1, 2, 3, 4, 5, 6] (order not guaranteed)

The set() approach has O(n) time complexity and is the most performant option for simple cases. However, it has two critical limitations: it doesn’t preserve the original order, and it only works with hashable types.

# Works with strings
words = ["apple", "banana", "apple", "cherry", "banana"]
unique_words = list(set(words))
print(unique_words)  # Order varies

# Fails with unhashable types
nested_lists = [[1, 2], [3, 4], [1, 2]]
# unique = list(set(nested_lists))  # TypeError: unhashable type: 'list'

Preserving Order with dict.fromkeys()

When order matters, dict.fromkeys() is the most efficient built-in solution. Since Python 3.7, dictionaries maintain insertion order as part of the language specification.

numbers = [1, 2, 2, 3, 4, 4, 5, 1, 6]
unique_numbers = list(dict.fromkeys(numbers))
print(unique_numbers)  # [1, 2, 3, 4, 5, 6] (order preserved)

This method maintains O(n) time complexity while preserving the order of first occurrence. It’s particularly useful when processing user input or maintaining sequence integrity.

user_selections = ["red", "blue", "red", "green", "blue", "yellow"]
unique_selections = list(dict.fromkeys(user_selections))
print(unique_selections)  # ['red', 'blue', 'green', 'yellow']

Manual Tracking with List Comprehension

For custom uniqueness logic or when you need more control, use a list comprehension with a tracking set. This pattern is versatile and maintains readability.

numbers = [1, 2, 2, 3, 4, 4, 5, 1, 6]
seen = set()
unique_numbers = [x for x in numbers if not (x in seen or seen.add(x))]
print(unique_numbers)  # [1, 2, 3, 4, 5, 6]

This approach leverages the fact that set.add() returns None, which is falsy. The expression x in seen or seen.add(x) first checks membership, then adds the element if not present, always evaluating to a truthy value for duplicates.

For case-insensitive string uniqueness:

words = ["Apple", "banana", "APPLE", "Cherry", "banana"]
seen = set()
unique_words = [
    word for word in words 
    if word.lower() not in seen and not seen.add(word.lower())
]
print(unique_words)  # ['Apple', 'banana', 'Cherry']

Handling Complex Objects

When working with objects or custom classes, define uniqueness criteria explicitly. For dictionaries, convert to hashable tuples:

data = [
    {"id": 1, "name": "Alice"},
    {"id": 2, "name": "Bob"},
    {"id": 1, "name": "Alice"},
    {"id": 3, "name": "Charlie"}
]

seen = set()
unique_data = []
for item in data:
    # Create hashable representation
    key = tuple(sorted(item.items()))
    if key not in seen:
        seen.add(key)
        unique_data.append(item)

print(unique_data)
# [{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Bob'}, {'id': 3, 'name': 'Charlie'}]

For uniqueness based on specific attributes:

class User:
    def __init__(self, user_id, name):
        self.user_id = user_id
        self.name = name
    
    def __repr__(self):
        return f"User({self.user_id}, {self.name})"

users = [
    User(1, "Alice"),
    User(2, "Bob"),
    User(1, "Alice Updated"),
    User(3, "Charlie")
]

seen_ids = set()
unique_users = [
    user for user in users 
    if user.user_id not in seen_ids and not seen_ids.add(user.user_id)
]

print(unique_users)  # [User(1, Alice), User(2, Bob), User(3, Charlie)]

Performance Comparison

Different methods have varying performance characteristics depending on list size and data type:

import time

def benchmark(func, data, iterations=1000):
    start = time.perf_counter()
    for _ in range(iterations):
        func(data.copy())
    return time.perf_counter() - start

# Test data
large_list = list(range(1000)) * 10

# Method 1: set()
def method_set(lst):
    return list(set(lst))

# Method 2: dict.fromkeys()
def method_dict(lst):
    return list(dict.fromkeys(lst))

# Method 3: list comprehension
def method_comprehension(lst):
    seen = set()
    return [x for x in lst if not (x in seen or seen.add(x))]

print(f"set(): {benchmark(method_set, large_list):.4f}s")
print(f"dict.fromkeys(): {benchmark(method_dict, large_list):.4f}s")
print(f"comprehension: {benchmark(method_comprehension, large_list):.4f}s")

Typical results show set() as fastest, followed closely by dict.fromkeys(), with list comprehension slightly slower due to Python-level iteration overhead.

Using Pandas for Data Analysis

When working with data analysis workflows, pandas provides efficient unique value extraction with additional functionality:

import pandas as pd

data = [1, 2, 2, 3, 4, 4, 5, 1, 6]
unique_values = pd.Series(data).unique()
print(unique_values)  # [1 2 3 4 5 6]

# With value counts
counts = pd.Series(data).value_counts()
print(counts)
# 2    2
# 4    2
# 1    2
# 3    1
# 5    1
# 6    1

For DataFrames, extract unique values across columns or rows:

df = pd.DataFrame({
    'category': ['A', 'B', 'A', 'C', 'B', 'A'],
    'value': [10, 20, 10, 30, 20, 15]
})

# Unique values in a column
unique_categories = df['category'].unique()
print(unique_categories)  # ['A' 'B' 'C']

# Drop duplicate rows
unique_rows = df.drop_duplicates()
print(unique_rows)

Handling Nested Structures

For lists containing unhashable types, convert to hashable representations or use specialized comparison:

nested = [[1, 2], [3, 4], [1, 2], [5, 6], [3, 4]]

# Convert to tuples
unique_nested = [list(x) for x in dict.fromkeys(tuple(item) for item in nested)]
print(unique_nested)  # [[1, 2], [3, 4], [5, 6]]

# For mixed nested structures
complex_data = [
    [1, [2, 3]],
    {"a": 1},
    [1, [2, 3]],
    {"a": 1}
]

def make_hashable(obj):
    if isinstance(obj, list):
        return tuple(make_hashable(item) for item in obj)
    elif isinstance(obj, dict):
        return tuple(sorted((k, make_hashable(v)) for k, v in obj.items()))
    return obj

seen = set()
unique_complex = []
for item in complex_data:
    key = make_hashable(item)
    if key not in seen:
        seen.add(key)
        unique_complex.append(item)

print(unique_complex)  # [[1, [2, 3]], {'a': 1}]

Best Practices

Choose your method based on requirements:

Order doesn’t matter, simple types: Use set()
Order matters, simple types: Use dict.fromkeys()
Custom uniqueness logic: Use list comprehension with tracking set
Complex objects: Implement custom hashable keys or attribute-based tracking
Data analysis context: Use pandas for integrated functionality

Always consider the hashability of your data types and the performance implications of your chosen approach. For production code with large datasets, profile your specific use case to validate performance assumptions.