Python - Filter List with Examples
List comprehensions provide the most readable and Pythonic way to filter lists. The syntax places the filtering condition at the end of the comprehension, creating a new list containing only elements...
Key Insights
- Python offers multiple filtering methods including list comprehensions, filter() function, and comprehensions with conditional logic—each suited for different complexity levels and readability requirements
- List comprehensions provide the most Pythonic and performant approach for simple to moderate filtering operations, while filter() with lambda or named functions excels in functional programming patterns
- Understanding when to use generator expressions versus list comprehensions can significantly impact memory efficiency when working with large datasets
Basic List Comprehension Filtering
List comprehensions provide the most readable and Pythonic way to filter lists. The syntax places the filtering condition at the end of the comprehension, creating a new list containing only elements that satisfy the condition.
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# Filter even numbers
evens = [n for n in numbers if n % 2 == 0]
print(evens) # [2, 4, 6, 8, 10]
# Filter numbers greater than 5
greater_than_five = [n for n in numbers if n > 5]
print(greater_than_five) # [6, 7, 8, 9, 10]
# Filter strings by length
words = ["cat", "elephant", "dog", "hippopotamus", "ant"]
long_words = [word for word in words if len(word) > 5]
print(long_words) # ['elephant', 'hippopotamus']
List comprehensions execute faster than equivalent for-loops with append operations because they’re optimized at the bytecode level. They also create more concise, readable code that clearly expresses intent.
Using the filter() Function
The filter() function applies a filtering function to each element in an iterable and returns an iterator containing elements where the function returns True. This approach aligns with functional programming paradigms.
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# Using lambda function
evens = list(filter(lambda x: x % 2 == 0, numbers))
print(evens) # [2, 4, 6, 8, 10]
# Using named function for complex logic
def is_prime(n):
if n < 2:
return False
for i in range(2, int(n ** 0.5) + 1):
if n % i == 0:
return False
return True
primes = list(filter(is_prime, numbers))
print(primes) # [2, 3, 5, 7]
# Filter None values
mixed_list = [1, None, 2, None, 3, 0, 4]
filtered = list(filter(None, mixed_list))
print(filtered) # [1, 2, 3, 4]
When filter() receives None as its first argument, it removes all falsy values (None, False, 0, empty strings, empty containers). This provides a concise way to clean data.
Filtering with Multiple Conditions
Complex filtering often requires multiple conditions combined with logical operators. Both list comprehensions and filter() support this, though list comprehensions typically offer better readability.
products = [
{"name": "Laptop", "price": 1200, "in_stock": True},
{"name": "Mouse", "price": 25, "in_stock": True},
{"name": "Keyboard", "price": 75, "in_stock": False},
{"name": "Monitor", "price": 300, "in_stock": True},
{"name": "Webcam", "price": 80, "in_stock": False}
]
# AND condition - both must be true
affordable_available = [
p for p in products
if p["price"] < 100 and p["in_stock"]
]
print(affordable_available)
# [{'name': 'Mouse', 'price': 25, 'in_stock': True}]
# OR condition - either can be true
expensive_or_unavailable = [
p for p in products
if p["price"] > 500 or not p["in_stock"]
]
print([p["name"] for p in expensive_or_unavailable])
# ['Laptop', 'Keyboard', 'Webcam']
# Complex nested conditions
premium_available = [
p for p in products
if (p["price"] > 100 and p["price"] < 500) and p["in_stock"]
]
print([p["name"] for p in premium_available])
# ['Monitor']
Filtering with String Methods
String filtering leverages Python’s rich string methods. These operations are common in data cleaning and text processing workflows.
emails = [
"user@gmail.com",
"admin@company.com",
"test@gmail.com",
"info@company.org",
"support@gmail.com"
]
# Filter by domain
gmail_accounts = [email for email in emails if email.endswith("gmail.com")]
print(gmail_accounts)
# ['user@gmail.com', 'test@gmail.com', 'support@gmail.com']
# Filter by prefix
admin_emails = [email for email in emails if email.startswith("admin")]
print(admin_emails) # ['admin@company.com']
# Case-insensitive filtering
names = ["Alice", "bob", "Charlie", "DAVID", "Eve"]
lowercase_names = [name for name in names if name.islower()]
print(lowercase_names) # ['bob']
# Filter strings containing substring
comments = [
"This is great!",
"Needs improvement",
"Great work!",
"Could be better"
]
positive_comments = [c for c in comments if "great" in c.lower()]
print(positive_comments) # ['This is great!', 'Great work!']
Filtering with Regular Expressions
For pattern-based filtering, regular expressions provide powerful matching capabilities. The re module integrates seamlessly with filtering operations.
import re
phone_numbers = [
"123-456-7890",
"555.123.4567",
"(800) 555-1234",
"invalid-number",
"999-888-7777"
]
# Filter valid US phone number format
pattern = re.compile(r'^\d{3}[-\.]\d{3}[-\.]\d{4}$')
valid_phones = [num for num in phone_numbers if pattern.match(num)]
print(valid_phones) # ['123-456-7890', '555.123.4567', '999-888-7777']
# Filter emails with specific pattern
emails = [
"john.doe@company.com",
"jane_smith@company.com",
"invalid.email",
"admin@company.co.uk"
]
email_pattern = re.compile(r'^[a-zA-Z0-9._]+@[a-zA-Z0-9]+\.[a-zA-Z]{2,}$')
valid_emails = list(filter(email_pattern.match, emails))
print(valid_emails)
# ['john.doe@company.com', 'jane_smith@company.com']
Generator Expressions for Memory Efficiency
When filtering large datasets, generator expressions provide memory-efficient alternatives to list comprehensions. They evaluate lazily, producing values on-demand rather than creating entire lists in memory.
# List comprehension - creates entire list in memory
numbers = range(1000000)
evens_list = [n for n in numbers if n % 2 == 0]
# Generator expression - evaluates lazily
evens_gen = (n for n in numbers if n % 2 == 0)
# Process generator values one at a time
total = sum(n for n in numbers if n % 2 == 0)
print(f"Sum of even numbers: {total}")
# Chain multiple filters efficiently
large_dataset = range(1, 1000000)
filtered = (
n for n in large_dataset
if n % 2 == 0
if n % 3 == 0
if n > 100000
)
# Only materializes when needed
result = list(filtered)[:10] # Get first 10 results
print(result)
Filtering with Custom Objects
When working with custom classes, implement filtering logic that leverages object attributes and methods. This approach maintains clean separation between data structures and filtering logic.
from datetime import datetime, timedelta
class Task:
def __init__(self, title, priority, due_date):
self.title = title
self.priority = priority
self.due_date = due_date
def is_overdue(self):
return self.due_date < datetime.now()
def __repr__(self):
return f"Task('{self.title}', {self.priority})"
tasks = [
Task("Deploy app", 1, datetime.now() - timedelta(days=1)),
Task("Write docs", 2, datetime.now() + timedelta(days=3)),
Task("Fix bug", 1, datetime.now() + timedelta(days=1)),
Task("Code review", 3, datetime.now() - timedelta(days=2))
]
# Filter high priority tasks
high_priority = [t for t in tasks if t.priority == 1]
print(high_priority)
# Filter using custom method
overdue_tasks = [t for t in tasks if t.is_overdue()]
print(overdue_tasks)
# Combine multiple conditions
urgent_overdue = [
t for t in tasks
if t.priority <= 2 and t.is_overdue()
]
print(urgent_overdue)
Performance Considerations
Different filtering approaches have distinct performance characteristics. List comprehensions generally outperform filter() with lambda functions, while named functions reduce overhead for complex logic.
import timeit
numbers = list(range(100000))
# Benchmark list comprehension
comp_time = timeit.timeit(
lambda: [n for n in numbers if n % 2 == 0],
number=100
)
# Benchmark filter with lambda
filter_lambda_time = timeit.timeit(
lambda: list(filter(lambda x: x % 2 == 0, numbers)),
number=100
)
# Benchmark filter with named function
def is_even(n):
return n % 2 == 0
filter_named_time = timeit.timeit(
lambda: list(filter(is_even, numbers)),
number=100
)
print(f"List comprehension: {comp_time:.4f}s")
print(f"Filter with lambda: {filter_lambda_time:.4f}s")
print(f"Filter with named function: {filter_named_time:.4f}s")
Choose list comprehensions for straightforward filtering where readability and performance matter. Use filter() when working in functional programming contexts or when you already have predicate functions defined. Deploy generator expressions when memory efficiency is critical for large datasets.