Python - String isdigit()/isalpha()/isalnum()
Python strings include several built-in methods for character type validation. The three most commonly used are `isdigit()`, `isalpha()`, and `isalnum()`. Each returns a boolean indicating whether...
Key Insights
- Python’s
isdigit(),isalpha(), andisalnum()methods provide efficient string validation without regex overhead, but have Unicode behavior that catches many developers off-guard isdigit()returns True for numeric Unicode characters beyond 0-9 (including superscripts and fractions), whileisdecimal()offers stricter ASCII-like validation- Combining these methods with proper input sanitization prevents common security vulnerabilities in form validation, data parsing, and user input processing
Understanding the Character Classification Methods
Python strings include several built-in methods for character type validation. The three most commonly used are isdigit(), isalpha(), and isalnum(). Each returns a boolean indicating whether all characters in the string match specific criteria.
# Basic usage examples
"12345".isdigit() # True
"hello".isalpha() # True
"hello123".isalnum() # True
"".isdigit() # False - empty strings return False
These methods operate on the entire string. A single non-matching character causes the method to return False. They’re case-insensitive for isalpha() and isalnum(), accepting both uppercase and lowercase letters.
The isdigit() Method and Unicode Gotchas
The isdigit() method checks if all characters are digits. However, “digit” in Python means Unicode digit characters, not just ASCII 0-9.
# ASCII digits
"42".isdigit() # True
"007".isdigit() # True
# Negative numbers and decimals
"-42".isdigit() # False - minus sign isn't a digit
"3.14".isdigit() # False - decimal point isn't a digit
# Unicode digits - this surprises many developers
"²".isdigit() # True - superscript two
"½".isdigit() # True - vulgar fraction one half
"①".isdigit() # True - circled digit one
"೧೨೩".isdigit() # True - Kannada digits
For strict ASCII digit validation, use isdecimal() instead:
def validate_numeric_input(value):
"""Validate user input is a proper decimal number."""
if not value.isdecimal():
return False
return True
validate_numeric_input("42") # True
validate_numeric_input("²") # False - superscripts rejected
validate_numeric_input("½") # False - fractions rejected
Here’s a practical comparison of digit validation methods:
test_strings = ["123", "²³", "½", "①", "-5", "3.14"]
for s in test_strings:
print(f"{s:6} | isdigit: {s.isdigit():<5} | isdecimal: {s.isdecimal():<5} | isnumeric: {s.isnumeric()}")
# Output:
# 123 | isdigit: True | isdecimal: True | isnumeric: True
# ²³ | isdigit: True | isdecimal: False | isnumeric: True
# ½ | isdigit: True | isdecimal: False | isnumeric: True
# ① | isdigit: True | isdecimal: False | isnumeric: True
# -5 | isdigit: False | isdecimal: False | isnumeric: False
# 3.14 | isdigit: False | isdecimal: False | isnumeric: False
The isalpha() Method for Letter Validation
The isalpha() method returns True if all characters are alphabetic. Like isdigit(), it operates on Unicode alphabetic characters across all languages.
# Basic ASCII letters
"Hello".isalpha() # True
"WORLD".isalpha() # True
# Spaces and punctuation fail
"Hello World".isalpha() # False - space isn't alphabetic
"Hello!".isalpha() # False - punctuation isn't alphabetic
# Unicode letters work
"Café".isalpha() # True - accented characters are alphabetic
"Москва".isalpha() # True - Cyrillic letters
"北京".isalpha() # True - Chinese characters
"مرحبا".isalpha() # True - Arabic letters
Real-world name validation example:
def validate_name(name):
"""
Validate a name allowing letters, spaces, hyphens, and apostrophes.
Handles international names correctly.
"""
if not name or len(name) > 100:
return False
# Remove allowed special characters
cleaned = name.replace(" ", "").replace("-", "").replace("'", "")
# Check if remaining characters are all alphabetic
return cleaned.isalpha()
# Test cases
print(validate_name("John Smith")) # True
print(validate_name("Mary-Jane")) # True
print(validate_name("O'Brien")) # True
print(validate_name("François")) # True
print(validate_name("José García")) # True
print(validate_name("John123")) # False
print(validate_name("")) # False
The isalnum() Method for Alphanumeric Validation
The isalnum() method combines isalpha() and isdigit(), returning True if all characters are either alphabetic or numeric.
# Alphanumeric strings
"User123".isalnum() # True
"ABC".isalnum() # True
"123".isalnum() # True
# Special characters fail
"User_123".isalnum() # False - underscore
"User-123".isalnum() # False - hyphen
"User 123".isalnum() # False - space
Username validation implementation:
import re
def validate_username(username, min_length=3, max_length=20):
"""
Validate username with specific requirements:
- Only alphanumeric characters and underscores
- Length between min_length and max_length
- Cannot start with a number
"""
if not username or len(username) < min_length or len(username) > max_length:
return False
# Remove underscores for alphanumeric check
cleaned = username.replace("_", "")
# Must contain at least some alphanumeric characters
if not cleaned or not cleaned.isalnum():
return False
# Cannot start with a digit
if username[0].isdigit():
return False
# Only allow alphanumeric and underscore
allowed_pattern = re.compile(r'^[a-zA-Z0-9_]+$')
return bool(allowed_pattern.match(username))
# Test validation
test_usernames = [
"john_doe", # True
"user123", # True
"123user", # False - starts with digit
"user-name", # False - hyphen not allowed
"ab", # False - too short
"user name", # False - space not allowed
]
for username in test_usernames:
print(f"{username:15} -> {validate_username(username)}")
Practical Application: Form Input Sanitization
Here’s a comprehensive form validation class using these methods:
class FormValidator:
"""Validate common form inputs using string classification methods."""
@staticmethod
def validate_postal_code(code, country="US"):
"""Validate postal codes for different countries."""
code = code.replace(" ", "").replace("-", "")
if country == "US":
# US ZIP: 5 or 9 digits
return code.isdecimal() and len(code) in [5, 9]
elif country == "CA":
# Canadian: A1A1A1 format (letter-digit alternating)
if len(code) != 6:
return False
return (code[0].isalpha() and code[1].isdecimal() and
code[2].isalpha() and code[3].isdecimal() and
code[4].isalpha() and code[5].isdecimal())
return False
@staticmethod
def validate_product_code(code):
"""Validate product code: uppercase letters and digits only."""
return code.isupper() and code.isalnum() and len(code) >= 4
@staticmethod
def validate_account_number(account):
"""Validate account number: digits only, 8-12 characters."""
return account.isdecimal() and 8 <= len(account) <= 12
# Usage examples
validator = FormValidator()
print(validator.validate_postal_code("90210")) # True
print(validator.validate_postal_code("K1A0B1", "CA")) # True
print(validator.validate_postal_code("ABC123")) # False
print(validator.validate_product_code("PROD1234")) # True
print(validator.validate_product_code("prod1234")) # False
print(validator.validate_product_code("PROD-1234")) # False
print(validator.validate_account_number("12345678")) # True
print(validator.validate_account_number("123")) # False
Performance Considerations and Best Practices
These methods are implemented in C and significantly faster than equivalent regex operations for simple validations:
import timeit
test_string = "a" * 1000
# Using isalpha()
time_isalpha = timeit.timeit(lambda: test_string.isalpha(), number=100000)
# Using regex
import re
time_regex = timeit.timeit(lambda: bool(re.match(r'^[a-zA-Z]+$', test_string)), number=100000)
print(f"isalpha(): {time_isalpha:.4f}s")
print(f"regex: {time_regex:.4f}s")
print(f"Speedup: {time_regex/time_isalpha:.2f}x")
# Typical output shows isalpha() is 5-10x faster
Best practices when using these methods:
def sanitize_input(value):
"""Demonstrate proper input sanitization."""
# Always strip whitespace first
value = value.strip()
# Check for empty string explicitly
if not value:
return None
# Use appropriate method for your use case
if value.isdecimal(): # Strict numeric validation
return int(value)
# Combine methods for complex validation
if value.replace("_", "").isalnum():
return value
raise ValueError(f"Invalid input: {value}")
Remember that these methods return False for empty strings, so always validate string length separately when empty input is valid. For production systems handling international input, understand the Unicode implications and choose between isdigit() and isdecimal() based on whether you need to accept non-ASCII numeric characters.