Python - String Tutorial (Complete Guide)
Python strings can be created using single quotes, double quotes, or triple quotes for multiline strings. All string types are instances of the `str` class.
Key Insights
- Python strings are immutable sequences of Unicode characters with rich built-in methods for manipulation, formatting, and validation
- String operations range from basic concatenation and slicing to advanced formatting with f-strings, template strings, and the format() method
- Understanding encoding, raw strings, and multiline strings is essential for handling real-world text processing scenarios
String Creation and Basic Operations
Python strings can be created using single quotes, double quotes, or triple quotes for multiline strings. All string types are instances of the str class.
# Basic string creation
single = 'Hello'
double = "World"
multiline = """This is a
multiline string
spanning multiple lines"""
# String concatenation
greeting = single + " " + double # "Hello World"
# String repetition
repeated = "Ha" * 3 # "HaHaHa"
# String length
length = len(greeting) # 11
# Membership testing
exists = "Hello" in greeting # True
not_exists = "Python" not in greeting # True
String Indexing and Slicing
Strings support zero-based indexing and powerful slicing operations. Negative indices count from the end.
text = "Python Programming"
# Indexing
first_char = text[0] # 'P'
last_char = text[-1] # 'g'
sixth_char = text[5] # 'n'
# Slicing [start:end:step]
substring = text[0:6] # 'Python'
from_start = text[:6] # 'Python'
to_end = text[7:] # 'Programming'
last_five = text[-5:] # 'mming'
# Step slicing
every_second = text[::2] # 'Pto rgamn'
reversed_str = text[::-1] # 'gnimmargorP nohtyP'
# Slicing never raises IndexError
safe_slice = text[100:200] # Returns empty string ''
String Methods: Case and Whitespace
Python provides comprehensive methods for case manipulation and whitespace handling.
text = " Python Programming "
# Case conversion
upper = text.upper() # " PYTHON PROGRAMMING "
lower = text.lower() # " python programming "
title = text.title() # " Python Programming "
capitalize = text.capitalize() # " python programming "
swapcase = text.swapcase() # " pYTHON pROGRAMMING "
# Whitespace handling
stripped = text.strip() # "Python Programming"
left_strip = text.lstrip() # "Python Programming "
right_strip = text.rstrip() # " Python Programming"
# Custom character stripping
url = "https://example.com/"
clean_url = url.strip("https://").rstrip("/") # "example.com"
# Case checking
is_upper = "HELLO".isupper() # True
is_lower = "hello".islower() # True
is_title = "Hello World".istitle() # True
String Searching and Validation
Methods for finding substrings and validating string content are essential for text processing.
text = "Python is awesome. Python is powerful."
# Finding substrings
index = text.find("Python") # 0 (first occurrence)
last_index = text.rfind("Python") # 19 (last occurrence)
not_found = text.find("Java") # -1 (not found)
# Index method (raises ValueError if not found)
try:
pos = text.index("awesome") # 10
except ValueError:
print("Substring not found")
# Counting occurrences
count = text.count("Python") # 2
count_in_range = text.count("is", 0, 20) # 1
# Checking prefixes and suffixes
starts = text.startswith("Python") # True
ends = text.endswith("powerful.") # True
starts_tuple = text.startswith(("Java", "Python")) # True
# Content validation
alpha = "Hello".isalpha() # True
digit = "12345".isdigit() # True
alnum = "Hello123".isalnum() # True
space = " ".isspace() # True
numeric = "123.45".isnumeric() # False (decimal point)
String Splitting and Joining
Converting between strings and lists is a common operation in text processing.
# Splitting strings
csv_line = "John,Doe,30,Engineer"
fields = csv_line.split(",") # ['John', 'Doe', '30', 'Engineer']
# Split with maxsplit
limited = csv_line.split(",", 2) # ['John', 'Doe', '30,Engineer']
# Splitting on whitespace
text = "Python is awesome"
words = text.split() # ['Python', 'is', 'awesome']
# Splitting lines
multiline = "Line 1\nLine 2\nLine 3"
lines = multiline.splitlines() # ['Line 1', 'Line 2', 'Line 3']
# Partition (splits into 3-tuple)
email = "user@example.com"
parts = email.partition("@") # ('user', '@', 'example.com')
# Joining strings
words = ["Python", "is", "awesome"]
sentence = " ".join(words) # "Python is awesome"
csv = ",".join(fields) # "John,Doe,30,Engineer"
# Join with path separator
import os
path = os.path.join("home", "user", "documents")
String Formatting
Python offers multiple approaches to string formatting, from old-style to modern f-strings.
name = "Alice"
age = 30
salary = 75000.50
# F-strings (Python 3.6+) - Recommended
message = f"Name: {name}, Age: {age}, Salary: ${salary:,.2f}"
# "Name: Alice, Age: 30, Salary: $75,000.50"
# Expression evaluation in f-strings
result = f"{name.upper()} is {age * 12} months old"
# "ALICE is 360 months old"
# Format method
formatted = "Name: {}, Age: {}, Salary: ${:,.2f}".format(name, age, salary)
indexed = "Name: {0}, Age: {1}, {0} is {1} years old".format(name, age)
named = "Name: {n}, Age: {a}".format(n=name, a=age)
# Old-style formatting (legacy)
old_style = "Name: %s, Age: %d, Salary: $%.2f" % (name, age, salary)
# Advanced formatting
# Alignment and padding
left = f"{name:<10}" # "Alice "
right = f"{name:>10}" # " Alice"
center = f"{name:^10}" # " Alice "
# Number formatting
binary = f"{42:b}" # "101010"
hex_val = f"{255:x}" # "ff"
octal = f"{64:o}" # "100"
percentage = f"{0.875:.1%}" # "87.5%"
# Date formatting with f-strings
from datetime import datetime
now = datetime.now()
formatted_date = f"{now:%Y-%m-%d %H:%M:%S}"
String Replacement and Translation
Modifying string content through replacement and character mapping.
text = "Python is great. Python is fun."
# Simple replacement
replaced = text.replace("Python", "Programming")
# "Programming is great. Programming is fun."
# Limited replacement
limited = text.replace("Python", "Code", 1)
# "Code is great. Python is fun."
# Translation table for character mapping
translation_table = str.maketrans({
'a': '4',
'e': '3',
'i': '1',
'o': '0'
})
leet = "hello world".translate(translation_table) # "h3ll0 w0rld"
# Remove characters
remove_vowels = str.maketrans('', '', 'aeiou')
no_vowels = "hello world".translate(remove_vowels) # "hll wrld"
# Expandtabs
tabbed = "Name\tAge\tCity"
expanded = tabbed.expandtabs(15) # "Name Age City"
Raw Strings and Escape Sequences
Handling special characters and escape sequences correctly is crucial for file paths and regex patterns.
# Escape sequences
newline = "Line 1\nLine 2"
tab = "Column1\tColumn2"
backslash = "C:\\Users\\Documents" # Needs escaping
quote = "He said \"Hello\""
# Raw strings (ignore escape sequences)
raw_path = r"C:\Users\Documents" # Preferred for Windows paths
regex_pattern = r"\d+\.\d+" # Regex patterns
# Unicode characters
unicode_str = "\u0041\u0042\u0043" # "ABC"
emoji = "\U0001F600" # "😀"
# Byte strings
byte_str = b"Hello" # bytes object
encoded = "Hello".encode('utf-8') # b'Hello'
decoded = encoded.decode('utf-8') # "Hello"
# Handling encoding errors
text = "Café"
latin1 = text.encode('latin-1') # b'Caf\xe9'
utf8 = text.encode('utf-8') # b'Caf\xc3\xa9'
# Error handling during encoding
try:
problematic = "Hello ä½ å¥½".encode('ascii')
except UnicodeEncodeError:
safe = "Hello ä½ å¥½".encode('ascii', errors='ignore') # b'Hello '
Template Strings
The string.Template class provides a simpler, safer alternative for string substitution.
from string import Template
# Basic template
template = Template("Hello, $name! You have $count messages.")
result = template.substitute(name="Alice", count=5)
# "Hello, Alice! You have 5 messages."
# Safe substitution (doesn't raise KeyError)
incomplete = template.safe_substitute(name="Bob")
# "Hello, Bob! You have $count messages."
# Custom delimiter
class CustomTemplate(Template):
delimiter = '%'
custom = CustomTemplate("Hello, %name!")
custom_result = custom.substitute(name="Charlie")
Performance Considerations
String operations have performance implications, especially in loops.
import time
# AVOID: String concatenation in loops (O(n²))
start = time.time()
result = ""
for i in range(10000):
result += str(i)
slow_time = time.time() - start
# PREFER: Join with list (O(n))
start = time.time()
parts = []
for i in range(10000):
parts.append(str(i))
result = "".join(parts)
fast_time = time.time() - start
# String interning for comparison optimization
a = "hello"
b = "hello"
print(a is b) # True (same object due to interning)
# Force interning for runtime strings
import sys
dynamic = "".join(["hel", "lo"])
interned = sys.intern(dynamic)
print(interned is a) # True
Python’s string handling is both powerful and intuitive. Master these operations to efficiently process text in any application, from data parsing to web scraping to log analysis.