R stringr - str_detect() with Examples

Key Insights

str_detect() returns a logical vector indicating whether each element matches a pattern, making it ideal for filtering operations with dplyr::filter()
Use fixed() for literal string matching to gain significant performance improvements over regex, especially on large datasets
The negate = TRUE parameter provides a cleaner alternative to wrapping calls in ! when you need to find non-matching elements

Introduction to str_detect()

The str_detect() function from R’s stringr package answers a simple question: does this string contain this pattern? It examines each element of a character vector and returns TRUE or FALSE based on whether the pattern exists within that element.

This function sits at the foundation of string manipulation workflows. While str_extract() pulls out matched content and str_replace() swaps patterns for new text, str_detect() simply tells you what’s there. That logical output makes it the go-to choice for subsetting and filtering operations.

library(stringr)

# Basic syntax
str_detect(string, pattern)

# Simple example
fruits <- c("apple", "banana", "cherry", "apricot")
str_detect(fruits, "ap")
# [1]  TRUE FALSE FALSE  TRUE

The function processes each element independently, returning a logical vector of the same length as your input. This vectorized behavior eliminates the need for explicit loops and integrates cleanly with tidyverse pipelines.

Basic Pattern Matching

At its simplest, str_detect() searches for literal character sequences. Pass a string vector and a pattern, and you get back a logical vector.

sentences <- c(

"The quick brown fox jumps over the lazy dog",
"Pack my box with five dozen liquor jugs",
"How vexingly quick daft zebras jump"
)

# Detect sentences containing "quick"
str_detect(sentences, "quick")
# [1]  TRUE FALSE  TRUE

# Case sensitivity matters
str_detect(sentences, "Quick")
# [1] FALSE FALSE FALSE

str_detect(sentences, "the")
# [1]  TRUE FALSE FALSE

Notice that str_detect() is case-sensitive by default. The pattern “Quick” with a capital Q doesn’t match “quick” in the text. When case-insensitive matching is required, wrap your pattern with regex() and set the ignore_case argument:

str_detect(sentences, regex("quick", ignore_case = TRUE))
# [1] TRUE FALSE TRUE

str_detect(sentences, regex("THE", ignore_case = TRUE))
# [1] TRUE FALSE FALSE

For checking multiple elements against a single pattern, the vectorization handles everything:

email_addresses <- c(
"user@example.com",
"admin@company.org",
"support@example.com",
"hello@domain.net"
)

# Find all example.com addresses
str_detect(email_addresses, "example.com")
# [1]  TRUE FALSE  TRUE FALSE

Using Regular Expressions

The real power of str_detect() emerges when you move beyond literal patterns to regular expressions. Regex lets you define flexible patterns that match entire categories of strings.

Anchors pin your pattern to specific positions:

words <- c("apple", "application", "pineapple", "app")

# Strings starting with "app"
str_detect(words, "^app")
# [1]  TRUE  TRUE FALSE  TRUE

# Strings ending with "app"
str_detect(words, "app$")
# [1] FALSE FALSE FALSE  TRUE

# Strings that ARE exactly "app"
str_detect(words, "^app$")
# [1] FALSE FALSE FALSE  TRUE

Character classes match categories of characters:

mixed_data <- c("order123", "invoice456", "report", "data2024", "summary")

# Contains any digit
str_detect(mixed_data, "\\d")
# [1]  TRUE  TRUE FALSE  TRUE FALSE

# Contains only letters
str_detect(mixed_data, "^[a-zA-Z]+$")
# [1] FALSE FALSE  TRUE FALSE  TRUE

Quantifiers control how many times a pattern must appear:

codes <- c("A1", "AB12", "ABC123", "ABCD1234", "A")

# At least two consecutive digits
str_detect(codes, "\\d{2,}")
# [1] FALSE  TRUE  TRUE  TRUE FALSE

# Exactly three letters followed by exactly three digits
str_detect(codes, "^[A-Z]{3}\\d{3}$")
# [1] FALSE FALSE  TRUE FALSE FALSE

Alternation matches one pattern or another:

file_names <- c("report.pdf", "data.csv", "image.png", "document.pdf", "sheet.xlsx")

# PDF or CSV files
str_detect(file_names, "\\.(pdf|csv)$")
# [1]  TRUE  TRUE FALSE  TRUE FALSE

Practical Use Cases with Data Frames

The combination of str_detect() and dplyr::filter() handles most real-world string filtering tasks. When you need to subset rows based on text content, this pairing delivers clean, readable code.

library(dplyr)

# Sample customer data
customers <- tibble(
  id = 1:6,
  name = c("John Smith", "Jane Doe", "Bob Johnson", "Alice Smith", "Charlie Brown", "Diana Prince"),
  email = c("john@gmail.com", "jane@company.org", "bob@gmail.com", "alice@yahoo.com", "charlie@company.org", "diana@gmail.com"),
  notes = c("Premium customer", "New signup", "Requested refund", "Premium tier", "Trial user", "Premium member")
)

# Filter customers with Gmail addresses
customers %>%
  filter(str_detect(email, "gmail\\.com"))
# Returns rows 1, 3, and 6

# Filter customers with "Premium" in notes
customers %>%
  filter(str_detect(notes, "Premium"))
# Returns rows 1, 4, and 6

# Filter by name containing "Smith"
customers %>%
  filter(str_detect(name, "Smith"))
# Returns rows 1 and 4

Combining multiple string conditions creates powerful filters:

# Gmail users who are also Premium
customers %>%
  filter(
    str_detect(email, "gmail\\.com"),
    str_detect(notes, "Premium")
  )
# Returns rows 1 and 6

# Company email OR Premium status
customers %>%
  filter(
    str_detect(email, "company\\.org") | str_detect(notes, "Premium")
  )
# Returns rows 1, 2, 4, 5, and 6

You can also use str_detect() within mutate() to create indicator columns:

customers %>%
  mutate(
    is_gmail = str_detect(email, "gmail\\.com"),
    is_premium = str_detect(notes, "Premium")
  )

Negation and Advanced Options

Sometimes you need to find strings that don’t match a pattern. The negate parameter handles this cleanly:

log_entries <- c(
  "[INFO] Application started",
  "[ERROR] Database connection failed",
  "[INFO] User logged in",
  "[WARNING] Memory usage high",
  "[ERROR] File not found"
)

# Find non-error entries
str_detect(log_entries, "ERROR", negate = TRUE)
# [1]  TRUE FALSE  TRUE  TRUE FALSE

# Equivalent but less readable
!str_detect(log_entries, "ERROR")
# [1]  TRUE FALSE  TRUE  TRUE FALSE

The negate parameter becomes especially valuable in filter operations where the intent is clearer:

logs_df <- tibble(entry = log_entries)

# Filter out errors - clear intent
logs_df %>%
  filter(str_detect(entry, "ERROR", negate = TRUE))

# Same result, less obvious
logs_df %>%
  filter(!str_detect(entry, "ERROR"))

Handling NA values requires attention. By default, str_detect() returns NA when the input is NA:

data_with_na <- c("apple", NA, "banana", "cherry", NA)

str_detect(data_with_na, "a")
# [1]  TRUE    NA  TRUE FALSE    NA

When filtering, these NA values get dropped automatically by filter(). If you need explicit control, handle them before or after detection:

# Replace NA with FALSE
coalesce(str_detect(data_with_na, "a"), FALSE)
# [1]  TRUE FALSE  TRUE FALSE FALSE

# Or filter out NA first
data_with_na[!is.na(data_with_na)] %>%
  str_detect("a")

Combining with str_subset() provides a shortcut when you want the actual matching strings rather than a logical vector:

# These produce the same result
fruits[str_detect(fruits, "ap")]
str_subset(fruits, "ap")
# [1] "apple"   "apricot"

Performance Considerations

For literal string matching without regex features, fixed() delivers substantial performance gains. It tells str_detect() to skip regex parsing and perform a direct character comparison:

# Large dataset simulation
large_vector <- rep(c("error_log_2024", "info_message", "debug_trace", "error_report"), 250000)

# Regex matching (slower)
system.time(str_detect(large_vector, "error"))
#    user  system elapsed 
#   0.156   0.004   0.160 

# Fixed matching (faster)
system.time(str_detect(large_vector, fixed("error")))
#    user  system elapsed 
#   0.052   0.000   0.052

The speedup varies by pattern complexity and data size, but fixed() consistently outperforms regex for literal matches. Use it when your pattern contains no special regex characters and you don’t need regex features.

Comparison with base R’s grepl():

# These are functionally equivalent
str_detect(fruits, "ap")
grepl("ap", fruits)

The stringr version offers consistent argument order (data first, pattern second) that works better in pipes, plus the negate parameter. Performance is comparable for most use cases. Choose based on your coding style and whether you’re already using stringr for other operations.

Tips for large datasets:

Use fixed() for literal patterns
Pre-compile complex regex patterns if reusing them
Consider str_which() when you only need indices, not the full logical vector
For very large datasets, data.table’s string functions may offer better performance

Summary

The str_detect() function provides the foundation for string-based filtering in R. Its key parameters include the input string vector, the pattern to match, and the optional negate argument for inverse matching.

Common pitfalls to avoid:

Forgetting case sensitivity (use regex(pattern, ignore_case = TRUE))
Not escaping special regex characters like . when matching literally
Using regex when fixed() would be faster and sufficient

Related stringr functions worth exploring:

str_which(): returns indices of matching elements instead of a logical vector
str_subset(): returns the actual matching strings directly
str_count(): counts how many times a pattern appears in each string
str_locate(): finds the position of the first match within each string

Master str_detect() and you’ll handle the majority of string filtering tasks in R with clean, readable code that integrates naturally into tidyverse workflows.