R - Data Types (Numeric, Character, Logical, Integer)

Key Insights

R’s type system includes six atomic data types, with numeric (double), character, logical, and integer being the most commonly used for data analysis and statistical computing
R performs implicit type coercion following a hierarchy (logical → integer → numeric → character), which can lead to unexpected behavior if not understood properly
Understanding the memory footprint and computational differences between numeric and integer types is critical for optimizing performance with large datasets

Understanding R’s Atomic Data Types

R operates with six atomic vector types: logical, integer, numeric (double), complex, character, and raw. This article focuses on the four essential types you’ll use daily: numeric, character, logical, and integer.

Every object in R has a type, which determines how R stores and manipulates that data. You can check an object’s type using typeof() and its class using class().

x <- 42
typeof(x)  # "double"
class(x)   # "numeric"

y <- 42L
typeof(y)  # "integer"
class(y)   # "integer"

Numeric Type (Double-Precision Floating Point)

By default, any number you type in R is stored as a numeric type, specifically a double-precision floating-point number. This type can represent both integers and decimals, with approximately 15-17 decimal digits of precision.

# All of these are numeric (double)
a <- 10
b <- 10.5
c <- 1e6
d <- -3.14159

typeof(a)  # "double"
typeof(b)  # "double"
typeof(c)  # "double"

# Checking if numeric
is.numeric(a)  # TRUE
is.double(a)   # TRUE

Numeric Precision and Limitations

Floating-point arithmetic has inherent precision limitations due to binary representation. This can lead to unexpected comparison results:

# Classic floating-point precision issue
0.1 + 0.2 == 0.3  # FALSE

# Why?
print(0.1 + 0.2, digits = 20)  # 0.30000000000000004441

# Solution: use all.equal() for floating-point comparisons
all.equal(0.1 + 0.2, 0.3)  # TRUE

# Or specify tolerance
abs((0.1 + 0.2) - 0.3) < 1e-10  # TRUE

Special Numeric Values

R includes special values for handling edge cases in numerical computation:

# Infinity
1 / 0      # Inf
-1 / 0     # -Inf
is.infinite(Inf)  # TRUE

# Not a Number
0 / 0      # NaN
Inf - Inf  # NaN
is.nan(NaN)  # TRUE

# Missing values
x <- NA
is.na(x)   # TRUE

# NaN is NA, but NA is not NaN
is.na(NaN)   # TRUE
is.nan(NA)   # FALSE

Integer Type

Integers are whole numbers stored more efficiently than numeric types. To create an integer, append L to the number. Integers use less memory (4 bytes vs 8 bytes for doubles) and can improve performance with large datasets.

# Creating integers
int_val <- 100L
typeof(int_val)  # "integer"

# Vector of integers
int_vec <- c(1L, 2L, 3L, 4L, 5L)
typeof(int_vec)  # "integer"

# Using sequences (automatically creates integers)
seq_int <- 1:1000000
typeof(seq_int)  # "integer"

# Memory comparison
object.size(1:1000000)     # 4 MB
object.size(as.numeric(1:1000000))  # 8 MB

When to Use Integers

Use integers when:

Working with count data or indices
Memory efficiency matters (large datasets)
You need exact whole number representations
Interfacing with APIs or databases that expect integers

# Practical example: indexing
data <- c("apple", "banana", "cherry", "date")
indices <- c(1L, 3L, 4L)
data[indices]  # "apple" "cherry" "date"

# Integer range limits
.Machine$integer.max  # 2147483647
2147483647L + 1L      # NA (integer overflow)

Converting Between Numeric and Integer

# Conversion functions
as.integer(3.7)    # 3 (truncates, doesn't round)
as.numeric(5L)     # 5

# Rounding before converting
round(3.7)         # 4
as.integer(round(3.7))  # 4L

# Floor and ceiling
floor(3.7)         # 3
ceiling(3.7)       # 4

Character Type

Character types store text data as strings. In R, there’s no distinction between single characters and strings—both are character vectors.

# Creating character vectors
char1 <- "Hello"
char2 <- 'World'
char3 <- "R"

typeof(char1)  # "character"

# Multi-element character vector
fruits <- c("apple", "banana", "cherry")
length(fruits)  # 3
nchar(fruits)   # c(5, 6, 6) - character count per element

String Manipulation

# Concatenation
paste("Hello", "World")           # "Hello World"
paste0("Hello", "World")          # "HelloWorld"
paste(fruits, collapse = ", ")    # "apple, banana, cherry"

# Substring operations
substr("Hello World", 1, 5)       # "Hello"
substring("Hello World", 7)       # "World"

# Case conversion
toupper("hello")                  # "HELLO"
tolower("HELLO")                  # "hello"

# Pattern matching
grepl("app", fruits)              # TRUE FALSE FALSE
grep("app", fruits)               # 1 (returns index)
sub("a", "A", fruits)            # "Apple" "bAnana" "cherry"
gsub("a", "A", fruits)           # "Apple" "bAnAnA" "cherry"

Character Encoding and Special Characters

# Escape sequences
cat("Line 1\nLine 2")    # Newline
cat("Tab\tseparated")    # Tab
cat("Quote: \"Hello\"")  # Escaped quotes

# Unicode characters
"\u03B1"  # α (Greek alpha)
"\u2665"  # ♥ (heart)

# Raw strings (R 4.0+)
r"(C:\Users\name\file.txt)"  # No need to escape backslashes

Logical Type

Logical types represent Boolean values: TRUE, FALSE, and NA. R also accepts T and F as shortcuts, though this is discouraged in production code.

# Creating logical values
bool1 <- TRUE
bool2 <- FALSE

typeof(bool1)  # "logical"

# Logical vectors
results <- c(TRUE, FALSE, TRUE, TRUE)

Logical Operations

# Comparison operators
5 > 3      # TRUE
5 == 5     # TRUE
5 != 3     # TRUE

# Logical operators
TRUE & FALSE   # AND: FALSE
TRUE | FALSE   # OR: TRUE
!TRUE          # NOT: FALSE

# Vectorized operations
x <- c(1, 2, 3, 4, 5)
x > 3          # FALSE FALSE FALSE TRUE TRUE
x > 3 & x < 5  # FALSE FALSE FALSE TRUE FALSE

Logical Indexing

Logical vectors are powerful for subsetting data:

# Filtering with logical vectors
numbers <- c(10, 25, 30, 45, 50)
numbers > 30                    # FALSE FALSE FALSE TRUE TRUE
numbers[numbers > 30]           # 45 50

# Multiple conditions
numbers[numbers > 20 & numbers < 50]  # 25 30 45

# which() returns indices
which(numbers > 30)             # 4 5

Logical Arithmetic

R treats TRUE as 1 and FALSE as 0 in arithmetic operations:

TRUE + TRUE        # 2
TRUE * 5           # 5
sum(c(TRUE, FALSE, TRUE))  # 2
mean(c(TRUE, FALSE, TRUE, TRUE))  # 0.75

# Practical: counting TRUE values
results <- c(TRUE, FALSE, TRUE, TRUE, FALSE)
sum(results)       # 3 (count of TRUE)
mean(results)      # 0.6 (proportion of TRUE)

Type Coercion

R automatically converts (coerces) types when mixing them in vectors or operations. The coercion hierarchy is: logical → integer → numeric → character.

# Automatic coercion
c(TRUE, 1)           # 1 1 (logical to numeric)
c(1L, 2.5)           # 1.0 2.5 (integer to numeric)
c(1, "a")            # "1" "a" (numeric to character)
c(TRUE, 1L, 2.5, "a")  # "TRUE" "1" "2.5" "a" (all to character)

# Explicit coercion
as.logical(c(0, 1, 2))     # FALSE TRUE TRUE
as.integer(c(TRUE, FALSE)) # 1 0
as.character(c(1, 2, 3))   # "1" "2" "3"

# Coercion failures
as.numeric("abc")          # NA (with warning)
as.integer("3.7")          # NA (with warning)

Type Checking Functions

# Type checking
is.numeric(5)      # TRUE
is.integer(5)      # FALSE
is.integer(5L)     # TRUE
is.character("a")  # TRUE
is.logical(TRUE)   # TRUE

# Class vs typeof
x <- 5
class(x)           # "numeric"
typeof(x)          # "double"

# Comprehensive check
str(x)             # num 5

Performance Considerations

Understanding type differences impacts performance, especially with large datasets:

# Benchmark: integer vs numeric
library(microbenchmark)

microbenchmark(
  integer = sum(1:1000000),
  numeric = sum(as.numeric(1:1000000)),
  times = 100
)

# Integer operations are typically 10-20% faster
# Memory usage: integer uses 50% less memory

Choose the appropriate data type based on your specific requirements: integers for exact whole numbers and memory efficiency, numeric for general-purpose calculations, character for text data, and logical for Boolean operations. Understanding implicit coercion prevents bugs and unexpected behavior in your R programs.