How to Check Data Types in Pandas

Key Insights

Use df.dtypes for quick inspection and df.info() for memory-aware analysis, but rely on pd.api.types functions for programmatic validation in production code.
Object dtype is Pandas’ catch-all type that often hides data quality issues—always investigate object columns for mixed types or incorrectly parsed data.
Checking data types immediately after loading data prevents subtle bugs that surface much later in your analysis pipeline.

Introduction

Data types in Pandas aren’t just metadata—they determine what operations you can perform, how much memory your DataFrame consumes, and whether your calculations produce correct results. A column that looks numeric might actually be stored as strings, silently breaking your aggregations. A datetime column parsed as objects will fail every time-series operation you throw at it.

The most common source of dtype problems? Loading data. When you read a CSV, Pandas infers types based on the values it sees. A single “N/A” string in an otherwise numeric column forces the entire column to object dtype. A date format Pandas doesn’t recognize becomes a string. These issues compound quickly in real-world datasets.

This article covers every method for inspecting and validating data types in Pandas, from quick interactive checks to robust programmatic validation you can build into data pipelines.

Understanding Pandas Data Types

Pandas uses NumPy dtypes as its foundation but extends them with additional types optimized for tabular data. Here are the types you’ll encounter most often:

Dtype	Description	Example Values
`int64`	64-bit integer	1, -5, 1000000
`float64`	64-bit floating point	3.14, -0.001, NaN
`object`	Python objects (usually strings)	“hello”, mixed types
`bool`	Boolean	True, False
`datetime64[ns]`	Timestamp with nanosecond precision	2024-01-15
`timedelta64[ns]`	Duration	5 days, 3 hours
`category`	Categorical data	“red”, “green”, “blue”

The object dtype deserves special attention. It’s Pandas’ fallback type that can hold any Python object. While flexible, it’s memory-inefficient and prevents vectorized operations. When you see object, investigate whether it should be a more specific type.

Let’s create a sample DataFrame we’ll use throughout this article:

import pandas as pd
import numpy as np
from datetime import datetime

df = pd.DataFrame({
    'user_id': [1001, 1002, 1003, 1004, 1005],
    'username': ['alice', 'bob', 'charlie', 'diana', 'eve'],
    'signup_date': pd.to_datetime(['2023-01-15', '2023-02-20', '2023-03-10', '2023-04-05', '2023-05-12']),
    'account_balance': [150.50, 200.00, 75.25, 500.00, 0.00],
    'is_premium': [True, False, False, True, False],
    'subscription_tier': pd.Categorical(['gold', 'free', 'free', 'platinum', 'free']),
    'last_login': ['2024-01-10', '2024-01-08', None, '2024-01-11', '2024-01-09'],  # Intentionally not parsed
    'referral_code': [None, 'REF100', None, 'REF200', 'REF150']
})

This DataFrame includes integers, strings, datetimes, floats, booleans, categoricals, and a couple of columns with issues we’ll detect later.

Checking Data Types with dtypes and info()

The dtypes attribute returns a Series mapping column names to their data types. It’s the fastest way to see what you’re working with:

print(df.dtypes)

Output:

user_id                      int64
username                    object
signup_date         datetime64[ns]
account_balance            float64
is_premium                    bool
subscription_tier         category
last_login                  object
referral_code               object
dtype: object

Notice that last_login is object rather than datetime64—that’s because we passed strings without parsing them. This is exactly the kind of issue dtype checking catches.

For a single column, access the dtype attribute directly:

print(df['signup_date'].dtype)  # datetime64[ns]
print(df['account_balance'].dtype)  # float64

The info() method provides a more comprehensive view, including non-null counts and memory usage:

df.info()

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   user_id            5 non-null      int64         
 1   username           5 non-null      object        
 2   signup_date        5 non-null      datetime64[ns]
 3   account_balance    5 non-null      float64       
 4   is_premium         5 non-null      bool          
 5   subscription_tier  5 non-null      category      
 6   last_login         4 non-null      object        
 7   referral_code      3 non-null      object        
dtypes: bool(1), category(1), datetime64[ns](1), float64(1), int64(1), object(3)
memory usage: 449.0+ bytes

For large DataFrames, add memory_usage='deep' to get accurate memory consumption for object columns:

df.info(memory_usage='deep')

This reveals the true memory cost of storing strings, which can be substantially higher than the default estimate suggests.

Selecting Columns by Data Type

When you need to operate on all columns of a specific type, select_dtypes() filters your DataFrame accordingly:

# Select all numeric columns
numeric_cols = df.select_dtypes(include=['number'])
print(numeric_cols.columns.tolist())
# ['user_id', 'account_balance']

# Select all object (string) columns
string_cols = df.select_dtypes(include=['object'])
print(string_cols.columns.tolist())
# ['username', 'last_login', 'referral_code']

You can include multiple types and use exclusion:

# Include integers and floats explicitly
df.select_dtypes(include=['int64', 'float64'])

# Exclude object and category columns
df.select_dtypes(exclude=['object', 'category'])

# Combine include and exclude
df.select_dtypes(include=['number'], exclude=['int64'])  # Only floats

The include parameter accepts NumPy dtype names, Python types, or these convenient shortcuts:

'number' — all numeric types
'datetime' — datetime types
'timedelta' — timedelta types
'category' — categorical type
'bool' — boolean type

A practical use case—applying string operations only to string columns:

for col in df.select_dtypes(include=['object']).columns:
    df[col] = df[col].str.strip() if df[col].notna().any() else df[col]

Programmatic Type Checking

For validation logic in scripts and pipelines, the pd.api.types module provides functions that return boolean values:

from pandas.api.types import (
    is_numeric_dtype,
    is_string_dtype,
    is_datetime64_any_dtype,
    is_categorical_dtype,
    is_bool_dtype,
    is_integer_dtype,
    is_float_dtype
)

# Check individual columns
print(is_numeric_dtype(df['account_balance']))  # True
print(is_datetime64_any_dtype(df['signup_date']))  # True
print(is_string_dtype(df['username']))  # True
print(is_categorical_dtype(df['subscription_tier']))  # True

These functions shine in validation code:

def validate_dataframe(df):
    """Validate expected data types for user data."""
    errors = []
    
    if not is_integer_dtype(df['user_id']):
        errors.append("user_id must be integer type")
    
    if not is_datetime64_any_dtype(df['signup_date']):
        errors.append("signup_date must be datetime type")
    
    if not is_numeric_dtype(df['account_balance']):
        errors.append("account_balance must be numeric type")
    
    if not is_bool_dtype(df['is_premium']):
        errors.append("is_premium must be boolean type")
    
    if errors:
        raise ValueError(f"Validation failed: {'; '.join(errors)}")
    
    return True

# This passes
validate_dataframe(df)

You can also build type-aware processing functions:

def summarize_column(series):
    """Generate appropriate summary based on column type."""
    if is_numeric_dtype(series):
        return series.describe()
    elif is_datetime64_any_dtype(series):
        return pd.Series({
            'min': series.min(),
            'max': series.max(),
            'range': series.max() - series.min()
        })
    elif is_categorical_dtype(series):
        return series.value_counts()
    else:
        return pd.Series({
            'unique': series.nunique(),
            'most_common': series.mode().iloc[0] if not series.mode().empty else None
        })

Common Data Type Issues and Detection

Real-world data rarely arrives clean. Here are the issues you’ll encounter most often and how to detect them.

Mixed types in object columns occur when a column contains values of different Python types:

# Create a problematic column
df_messy = pd.DataFrame({
    'value': [1, 2, 'three', 4, None, 6.0]
})

# Detect mixed types
type_counts = df_messy['value'].apply(type).value_counts()
print(type_counts)

Output:

<class 'int'>      3
<class 'str'>      1
<class 'float'>    1
<class 'NoneType'> 1
dtype: int64

A helper function to check all object columns:

def detect_mixed_types(df):
    """Find object columns with mixed Python types."""
    mixed = {}
    for col in df.select_dtypes(include=['object']).columns:
        types = df[col].dropna().apply(type).unique()
        if len(types) > 1:
            mixed[col] = [t.__name__ for t in types]
    return mixed

print(detect_mixed_types(df_messy))
# {'value': ['int', 'str', 'float']}

Numeric data stored as strings happens when CSVs contain formatting characters:

df_currency = pd.DataFrame({
    'price': ['$100.00', '$250.50', '$75.25', 'N/A', '$300.00']
})

print(df_currency['price'].dtype)  # object

# Detect: try converting and see what fails
def check_numeric_convertibility(series):
    """Check if string series can be converted to numeric."""
    converted = pd.to_numeric(series, errors='coerce')
    failed = series[converted.isna() & series.notna()]
    return failed

print(check_numeric_convertibility(df_currency['price']))

Datetime parsing failures leave you with object columns instead of datetime64:

def check_datetime_columns(df, date_columns):
    """Verify expected date columns are actually datetime type."""
    issues = {}
    for col in date_columns:
        if col not in df.columns:
            issues[col] = "Column not found"
        elif not is_datetime64_any_dtype(df[col]):
            issues[col] = f"Expected datetime, got {df[col].dtype}"
    return issues

# Check our sample DataFrame
print(check_datetime_columns(df, ['signup_date', 'last_login']))
# {'last_login': 'Expected datetime, got object'}

Conclusion

Data type checking should be the first step after loading any dataset. Start with df.dtypes or df.info() for interactive exploration, use select_dtypes() when you need to operate on columns by type, and rely on pd.api.types functions for validation in production code.

Build type validation into your data pipelines early. A validation function that runs immediately after data loading catches dtype issues before they propagate through your analysis. The few minutes spent writing these checks saves hours of debugging mysterious calculation errors later.

The pattern I recommend: load data, check types, fix issues, validate, then proceed with analysis. Make this habitual, and you’ll eliminate an entire category of data bugs from your work.

Introduction

Understanding Pandas Data Types

Checking Data Types with dtypes and info()

Selecting Columns by Data Type

Programmatic Type Checking

Common Data Type Issues and Detection

Conclusion

Liked this? There's more.

Similar Articles