R - Data Types (Numeric, Character, Logical, Integer)
R operates with six atomic vector types: logical, integer, numeric (double), complex, character, and raw. This article focuses on the four essential types you'll use daily: numeric, character,...
Key Insights
- R’s type system includes six atomic data types, with numeric (double), character, logical, and integer being the most commonly used for data analysis and statistical computing
- R performs implicit type coercion following a hierarchy (logical → integer → numeric → character), which can lead to unexpected behavior if not understood properly
- Understanding the memory footprint and computational differences between numeric and integer types is critical for optimizing performance with large datasets
Understanding R’s Atomic Data Types
R operates with six atomic vector types: logical, integer, numeric (double), complex, character, and raw. This article focuses on the four essential types you’ll use daily: numeric, character, logical, and integer.
Every object in R has a type, which determines how R stores and manipulates that data. You can check an object’s type using typeof() and its class using class().
x <- 42
typeof(x) # "double"
class(x) # "numeric"
y <- 42L
typeof(y) # "integer"
class(y) # "integer"
Numeric Type (Double-Precision Floating Point)
By default, any number you type in R is stored as a numeric type, specifically a double-precision floating-point number. This type can represent both integers and decimals, with approximately 15-17 decimal digits of precision.
# All of these are numeric (double)
a <- 10
b <- 10.5
c <- 1e6
d <- -3.14159
typeof(a) # "double"
typeof(b) # "double"
typeof(c) # "double"
# Checking if numeric
is.numeric(a) # TRUE
is.double(a) # TRUE
Numeric Precision and Limitations
Floating-point arithmetic has inherent precision limitations due to binary representation. This can lead to unexpected comparison results:
# Classic floating-point precision issue
0.1 + 0.2 == 0.3 # FALSE
# Why?
print(0.1 + 0.2, digits = 20) # 0.30000000000000004441
# Solution: use all.equal() for floating-point comparisons
all.equal(0.1 + 0.2, 0.3) # TRUE
# Or specify tolerance
abs((0.1 + 0.2) - 0.3) < 1e-10 # TRUE
Special Numeric Values
R includes special values for handling edge cases in numerical computation:
# Infinity
1 / 0 # Inf
-1 / 0 # -Inf
is.infinite(Inf) # TRUE
# Not a Number
0 / 0 # NaN
Inf - Inf # NaN
is.nan(NaN) # TRUE
# Missing values
x <- NA
is.na(x) # TRUE
# NaN is NA, but NA is not NaN
is.na(NaN) # TRUE
is.nan(NA) # FALSE
Integer Type
Integers are whole numbers stored more efficiently than numeric types. To create an integer, append L to the number. Integers use less memory (4 bytes vs 8 bytes for doubles) and can improve performance with large datasets.
# Creating integers
int_val <- 100L
typeof(int_val) # "integer"
# Vector of integers
int_vec <- c(1L, 2L, 3L, 4L, 5L)
typeof(int_vec) # "integer"
# Using sequences (automatically creates integers)
seq_int <- 1:1000000
typeof(seq_int) # "integer"
# Memory comparison
object.size(1:1000000) # 4 MB
object.size(as.numeric(1:1000000)) # 8 MB
When to Use Integers
Use integers when:
- Working with count data or indices
- Memory efficiency matters (large datasets)
- You need exact whole number representations
- Interfacing with APIs or databases that expect integers
# Practical example: indexing
data <- c("apple", "banana", "cherry", "date")
indices <- c(1L, 3L, 4L)
data[indices] # "apple" "cherry" "date"
# Integer range limits
.Machine$integer.max # 2147483647
2147483647L + 1L # NA (integer overflow)
Converting Between Numeric and Integer
# Conversion functions
as.integer(3.7) # 3 (truncates, doesn't round)
as.numeric(5L) # 5
# Rounding before converting
round(3.7) # 4
as.integer(round(3.7)) # 4L
# Floor and ceiling
floor(3.7) # 3
ceiling(3.7) # 4
Character Type
Character types store text data as strings. In R, there’s no distinction between single characters and strings—both are character vectors.
# Creating character vectors
char1 <- "Hello"
char2 <- 'World'
char3 <- "R"
typeof(char1) # "character"
# Multi-element character vector
fruits <- c("apple", "banana", "cherry")
length(fruits) # 3
nchar(fruits) # c(5, 6, 6) - character count per element
String Manipulation
# Concatenation
paste("Hello", "World") # "Hello World"
paste0("Hello", "World") # "HelloWorld"
paste(fruits, collapse = ", ") # "apple, banana, cherry"
# Substring operations
substr("Hello World", 1, 5) # "Hello"
substring("Hello World", 7) # "World"
# Case conversion
toupper("hello") # "HELLO"
tolower("HELLO") # "hello"
# Pattern matching
grepl("app", fruits) # TRUE FALSE FALSE
grep("app", fruits) # 1 (returns index)
sub("a", "A", fruits) # "Apple" "bAnana" "cherry"
gsub("a", "A", fruits) # "Apple" "bAnAnA" "cherry"
Character Encoding and Special Characters
# Escape sequences
cat("Line 1\nLine 2") # Newline
cat("Tab\tseparated") # Tab
cat("Quote: \"Hello\"") # Escaped quotes
# Unicode characters
"\u03B1" # α (Greek alpha)
"\u2665" # ♥ (heart)
# Raw strings (R 4.0+)
r"(C:\Users\name\file.txt)" # No need to escape backslashes
Logical Type
Logical types represent Boolean values: TRUE, FALSE, and NA. R also accepts T and F as shortcuts, though this is discouraged in production code.
# Creating logical values
bool1 <- TRUE
bool2 <- FALSE
typeof(bool1) # "logical"
# Logical vectors
results <- c(TRUE, FALSE, TRUE, TRUE)
Logical Operations
# Comparison operators
5 > 3 # TRUE
5 == 5 # TRUE
5 != 3 # TRUE
# Logical operators
TRUE & FALSE # AND: FALSE
TRUE | FALSE # OR: TRUE
!TRUE # NOT: FALSE
# Vectorized operations
x <- c(1, 2, 3, 4, 5)
x > 3 # FALSE FALSE FALSE TRUE TRUE
x > 3 & x < 5 # FALSE FALSE FALSE TRUE FALSE
Logical Indexing
Logical vectors are powerful for subsetting data:
# Filtering with logical vectors
numbers <- c(10, 25, 30, 45, 50)
numbers > 30 # FALSE FALSE FALSE TRUE TRUE
numbers[numbers > 30] # 45 50
# Multiple conditions
numbers[numbers > 20 & numbers < 50] # 25 30 45
# which() returns indices
which(numbers > 30) # 4 5
Logical Arithmetic
R treats TRUE as 1 and FALSE as 0 in arithmetic operations:
TRUE + TRUE # 2
TRUE * 5 # 5
sum(c(TRUE, FALSE, TRUE)) # 2
mean(c(TRUE, FALSE, TRUE, TRUE)) # 0.75
# Practical: counting TRUE values
results <- c(TRUE, FALSE, TRUE, TRUE, FALSE)
sum(results) # 3 (count of TRUE)
mean(results) # 0.6 (proportion of TRUE)
Type Coercion
R automatically converts (coerces) types when mixing them in vectors or operations. The coercion hierarchy is: logical → integer → numeric → character.
# Automatic coercion
c(TRUE, 1) # 1 1 (logical to numeric)
c(1L, 2.5) # 1.0 2.5 (integer to numeric)
c(1, "a") # "1" "a" (numeric to character)
c(TRUE, 1L, 2.5, "a") # "TRUE" "1" "2.5" "a" (all to character)
# Explicit coercion
as.logical(c(0, 1, 2)) # FALSE TRUE TRUE
as.integer(c(TRUE, FALSE)) # 1 0
as.character(c(1, 2, 3)) # "1" "2" "3"
# Coercion failures
as.numeric("abc") # NA (with warning)
as.integer("3.7") # NA (with warning)
Type Checking Functions
# Type checking
is.numeric(5) # TRUE
is.integer(5) # FALSE
is.integer(5L) # TRUE
is.character("a") # TRUE
is.logical(TRUE) # TRUE
# Class vs typeof
x <- 5
class(x) # "numeric"
typeof(x) # "double"
# Comprehensive check
str(x) # num 5
Performance Considerations
Understanding type differences impacts performance, especially with large datasets:
# Benchmark: integer vs numeric
library(microbenchmark)
microbenchmark(
integer = sum(1:1000000),
numeric = sum(as.numeric(1:1000000)),
times = 100
)
# Integer operations are typically 10-20% faster
# Memory usage: integer uses 50% less memory
Choose the appropriate data type based on your specific requirements: integers for exact whole numbers and memory efficiency, numeric for general-purpose calculations, character for text data, and logical for Boolean operations. Understanding implicit coercion prevents bugs and unexpected behavior in your R programs.