R - Data Types (Numeric, Character, Logical, Integer)

R operates with six atomic vector types: logical, integer, numeric (double), complex, character, and raw. This article focuses on the four essential types you'll use daily: numeric, character,...

Key Insights

  • R’s type system includes six atomic data types, with numeric (double), character, logical, and integer being the most commonly used for data analysis and statistical computing
  • R performs implicit type coercion following a hierarchy (logical → integer → numeric → character), which can lead to unexpected behavior if not understood properly
  • Understanding the memory footprint and computational differences between numeric and integer types is critical for optimizing performance with large datasets

Understanding R’s Atomic Data Types

R operates with six atomic vector types: logical, integer, numeric (double), complex, character, and raw. This article focuses on the four essential types you’ll use daily: numeric, character, logical, and integer.

Every object in R has a type, which determines how R stores and manipulates that data. You can check an object’s type using typeof() and its class using class().

x <- 42
typeof(x)  # "double"
class(x)   # "numeric"

y <- 42L
typeof(y)  # "integer"
class(y)   # "integer"

Numeric Type (Double-Precision Floating Point)

By default, any number you type in R is stored as a numeric type, specifically a double-precision floating-point number. This type can represent both integers and decimals, with approximately 15-17 decimal digits of precision.

# All of these are numeric (double)
a <- 10
b <- 10.5
c <- 1e6
d <- -3.14159

typeof(a)  # "double"
typeof(b)  # "double"
typeof(c)  # "double"

# Checking if numeric
is.numeric(a)  # TRUE
is.double(a)   # TRUE

Numeric Precision and Limitations

Floating-point arithmetic has inherent precision limitations due to binary representation. This can lead to unexpected comparison results:

# Classic floating-point precision issue
0.1 + 0.2 == 0.3  # FALSE

# Why?
print(0.1 + 0.2, digits = 20)  # 0.30000000000000004441

# Solution: use all.equal() for floating-point comparisons
all.equal(0.1 + 0.2, 0.3)  # TRUE

# Or specify tolerance
abs((0.1 + 0.2) - 0.3) < 1e-10  # TRUE

Special Numeric Values

R includes special values for handling edge cases in numerical computation:

# Infinity
1 / 0      # Inf
-1 / 0     # -Inf
is.infinite(Inf)  # TRUE

# Not a Number
0 / 0      # NaN
Inf - Inf  # NaN
is.nan(NaN)  # TRUE

# Missing values
x <- NA
is.na(x)   # TRUE

# NaN is NA, but NA is not NaN
is.na(NaN)   # TRUE
is.nan(NA)   # FALSE

Integer Type

Integers are whole numbers stored more efficiently than numeric types. To create an integer, append L to the number. Integers use less memory (4 bytes vs 8 bytes for doubles) and can improve performance with large datasets.

# Creating integers
int_val <- 100L
typeof(int_val)  # "integer"

# Vector of integers
int_vec <- c(1L, 2L, 3L, 4L, 5L)
typeof(int_vec)  # "integer"

# Using sequences (automatically creates integers)
seq_int <- 1:1000000
typeof(seq_int)  # "integer"

# Memory comparison
object.size(1:1000000)     # 4 MB
object.size(as.numeric(1:1000000))  # 8 MB

When to Use Integers

Use integers when:

  • Working with count data or indices
  • Memory efficiency matters (large datasets)
  • You need exact whole number representations
  • Interfacing with APIs or databases that expect integers
# Practical example: indexing
data <- c("apple", "banana", "cherry", "date")
indices <- c(1L, 3L, 4L)
data[indices]  # "apple" "cherry" "date"

# Integer range limits
.Machine$integer.max  # 2147483647
2147483647L + 1L      # NA (integer overflow)

Converting Between Numeric and Integer

# Conversion functions
as.integer(3.7)    # 3 (truncates, doesn't round)
as.numeric(5L)     # 5

# Rounding before converting
round(3.7)         # 4
as.integer(round(3.7))  # 4L

# Floor and ceiling
floor(3.7)         # 3
ceiling(3.7)       # 4

Character Type

Character types store text data as strings. In R, there’s no distinction between single characters and strings—both are character vectors.

# Creating character vectors
char1 <- "Hello"
char2 <- 'World'
char3 <- "R"

typeof(char1)  # "character"

# Multi-element character vector
fruits <- c("apple", "banana", "cherry")
length(fruits)  # 3
nchar(fruits)   # c(5, 6, 6) - character count per element

String Manipulation

# Concatenation
paste("Hello", "World")           # "Hello World"
paste0("Hello", "World")          # "HelloWorld"
paste(fruits, collapse = ", ")    # "apple, banana, cherry"

# Substring operations
substr("Hello World", 1, 5)       # "Hello"
substring("Hello World", 7)       # "World"

# Case conversion
toupper("hello")                  # "HELLO"
tolower("HELLO")                  # "hello"

# Pattern matching
grepl("app", fruits)              # TRUE FALSE FALSE
grep("app", fruits)               # 1 (returns index)
sub("a", "A", fruits)            # "Apple" "bAnana" "cherry"
gsub("a", "A", fruits)           # "Apple" "bAnAnA" "cherry"

Character Encoding and Special Characters

# Escape sequences
cat("Line 1\nLine 2")    # Newline
cat("Tab\tseparated")    # Tab
cat("Quote: \"Hello\"")  # Escaped quotes

# Unicode characters
"\u03B1"  # α (Greek alpha)
"\u2665"  # ♥ (heart)

# Raw strings (R 4.0+)
r"(C:\Users\name\file.txt)"  # No need to escape backslashes

Logical Type

Logical types represent Boolean values: TRUE, FALSE, and NA. R also accepts T and F as shortcuts, though this is discouraged in production code.

# Creating logical values
bool1 <- TRUE
bool2 <- FALSE

typeof(bool1)  # "logical"

# Logical vectors
results <- c(TRUE, FALSE, TRUE, TRUE)

Logical Operations

# Comparison operators
5 > 3      # TRUE
5 == 5     # TRUE
5 != 3     # TRUE

# Logical operators
TRUE & FALSE   # AND: FALSE
TRUE | FALSE   # OR: TRUE
!TRUE          # NOT: FALSE

# Vectorized operations
x <- c(1, 2, 3, 4, 5)
x > 3          # FALSE FALSE FALSE TRUE TRUE
x > 3 & x < 5  # FALSE FALSE FALSE TRUE FALSE

Logical Indexing

Logical vectors are powerful for subsetting data:

# Filtering with logical vectors
numbers <- c(10, 25, 30, 45, 50)
numbers > 30                    # FALSE FALSE FALSE TRUE TRUE
numbers[numbers > 30]           # 45 50

# Multiple conditions
numbers[numbers > 20 & numbers < 50]  # 25 30 45

# which() returns indices
which(numbers > 30)             # 4 5

Logical Arithmetic

R treats TRUE as 1 and FALSE as 0 in arithmetic operations:

TRUE + TRUE        # 2
TRUE * 5           # 5
sum(c(TRUE, FALSE, TRUE))  # 2
mean(c(TRUE, FALSE, TRUE, TRUE))  # 0.75

# Practical: counting TRUE values
results <- c(TRUE, FALSE, TRUE, TRUE, FALSE)
sum(results)       # 3 (count of TRUE)
mean(results)      # 0.6 (proportion of TRUE)

Type Coercion

R automatically converts (coerces) types when mixing them in vectors or operations. The coercion hierarchy is: logical → integer → numeric → character.

# Automatic coercion
c(TRUE, 1)           # 1 1 (logical to numeric)
c(1L, 2.5)           # 1.0 2.5 (integer to numeric)
c(1, "a")            # "1" "a" (numeric to character)
c(TRUE, 1L, 2.5, "a")  # "TRUE" "1" "2.5" "a" (all to character)

# Explicit coercion
as.logical(c(0, 1, 2))     # FALSE TRUE TRUE
as.integer(c(TRUE, FALSE)) # 1 0
as.character(c(1, 2, 3))   # "1" "2" "3"

# Coercion failures
as.numeric("abc")          # NA (with warning)
as.integer("3.7")          # NA (with warning)

Type Checking Functions

# Type checking
is.numeric(5)      # TRUE
is.integer(5)      # FALSE
is.integer(5L)     # TRUE
is.character("a")  # TRUE
is.logical(TRUE)   # TRUE

# Class vs typeof
x <- 5
class(x)           # "numeric"
typeof(x)          # "double"

# Comprehensive check
str(x)             # num 5

Performance Considerations

Understanding type differences impacts performance, especially with large datasets:

# Benchmark: integer vs numeric
library(microbenchmark)

microbenchmark(
  integer = sum(1:1000000),
  numeric = sum(as.numeric(1:1000000)),
  times = 100
)

# Integer operations are typically 10-20% faster
# Memory usage: integer uses 50% less memory

Choose the appropriate data type based on your specific requirements: integers for exact whole numbers and memory efficiency, numeric for general-purpose calculations, character for text data, and logical for Boolean operations. Understanding implicit coercion prevents bugs and unexpected behavior in your R programs.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.