Pandas - Sort by Column Data Type (Custom Sort)

• Pandas doesn't natively sort by column data types, but you can create custom sort keys using dtype information to reorder columns programmatically

Key Insights

• Pandas doesn’t natively sort by column data types, but you can create custom sort keys using dtype information to reorder columns programmatically • Type-based sorting is essential when dealing with wide DataFrames where you need to group numeric, categorical, datetime, and object columns together for analysis or display • Custom sorting functions can combine multiple criteria—data type priority, column names, and statistical properties—to create sophisticated column ordering schemes

Understanding Column Data Types in Pandas

Before implementing custom sorting, you need to understand how Pandas represents data types. Every column in a DataFrame has a dtype that falls into categories like numeric (int64, float64), object (strings), datetime64, categorical, and boolean.

import pandas as pd
import numpy as np

# Create a sample DataFrame with mixed types
df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age': [25, 30, 35],
    'salary': [50000.0, 60000.0, 75000.0],
    'hire_date': pd.to_datetime(['2020-01-15', '2019-06-20', '2021-03-10']),
    'department': pd.Categorical(['Sales', 'Engineering', 'Sales']),
    'is_manager': [False, True, False],
    'city': ['NYC', 'SF', 'LA']
})

# Examine data types
print(df.dtypes)

This outputs each column’s dtype, which forms the basis for type-based sorting.

Basic Type-Based Column Sorting

The simplest approach reorders columns by grouping them according to their data type. This uses a mapping dictionary to assign priority values to each dtype category.

def sort_columns_by_type(df):
    # Define type priority (lower number = appears first)
    type_priority = {
        'int': 0,
        'float': 1,
        'datetime': 2,
        'bool': 3,
        'category': 4,
        'object': 5
    }
    
    # Create list of (column_name, priority) tuples
    column_order = []
    for col in df.columns:
        dtype_str = str(df[col].dtype)
        
        # Determine type category
        if 'int' in dtype_str:
            priority = type_priority['int']
        elif 'float' in dtype_str:
            priority = type_priority['float']
        elif 'datetime' in dtype_str:
            priority = type_priority['datetime']
        elif 'bool' in dtype_str:
            priority = type_priority['bool']
        elif 'category' in dtype_str:
            priority = type_priority['category']
        else:
            priority = type_priority['object']
        
        column_order.append((col, priority))
    
    # Sort by priority, then alphabetically by column name
    column_order.sort(key=lambda x: (x[1], x[0]))
    
    # Reorder DataFrame
    sorted_columns = [col for col, _ in column_order]
    return df[sorted_columns]

sorted_df = sort_columns_by_type(df)
print(sorted_df.columns.tolist())
# Output: ['age', 'salary', 'hire_date', 'is_manager', 'department', 'city', 'name']

Advanced Multi-Criteria Sorting

Real-world scenarios often require combining type-based sorting with other criteria. This example sorts by data type first, then by column name length, then alphabetically.

def advanced_column_sort(df, type_order=None, reverse=False):
    """
    Sort DataFrame columns by multiple criteria.
    
    Parameters:
    -----------
    df : DataFrame
    type_order : list, optional
        Custom order of dtype categories
    reverse : bool
        Reverse the sort order
    """
    if type_order is None:
        type_order = ['int', 'float', 'datetime', 'bool', 'category', 'object']
    
    def get_sort_key(col_name):
        dtype_str = str(df[col_name].dtype)
        
        # Map dtype to category
        if 'int' in dtype_str:
            type_cat = 'int'
        elif 'float' in dtype_str:
            type_cat = 'float'
        elif 'datetime' in dtype_str:
            type_cat = 'datetime'
        elif 'bool' in dtype_str:
            type_cat = 'bool'
        elif 'category' in dtype_str:
            type_cat = 'category'
        else:
            type_cat = 'object'
        
        # Get type priority
        try:
            type_idx = type_order.index(type_cat)
        except ValueError:
            type_idx = len(type_order)
        
        # Return tuple: (type_priority, name_length, alphabetical)
        return (type_idx, len(col_name), col_name)
    
    sorted_columns = sorted(df.columns, key=get_sort_key, reverse=reverse)
    return df[sorted_columns]

# Custom type order: datetime first, then numeric, then others
custom_order = ['datetime', 'int', 'float', 'bool', 'category', 'object']
result = advanced_column_sort(df, type_order=custom_order)
print(result.columns.tolist())

Grouping Columns by Type with Pandas API

A more Pandas-idiomatic approach uses select_dtypes() to group columns, then concatenates them in the desired order.

def group_columns_by_type(df):
    """Group columns by data type using Pandas select_dtypes."""
    
    # Select columns of each type
    int_cols = df.select_dtypes(include=['int']).columns.tolist()
    float_cols = df.select_dtypes(include=['float']).columns.tolist()
    datetime_cols = df.select_dtypes(include=['datetime']).columns.tolist()
    bool_cols = df.select_dtypes(include=['bool']).columns.tolist()
    category_cols = df.select_dtypes(include=['category']).columns.tolist()
    object_cols = df.select_dtypes(include=['object']).columns.tolist()
    
    # Combine in desired order
    ordered_columns = (
        sorted(int_cols) + 
        sorted(float_cols) + 
        sorted(datetime_cols) + 
        sorted(bool_cols) + 
        sorted(category_cols) + 
        sorted(object_cols)
    )
    
    return df[ordered_columns]

result = group_columns_by_type(df)
print(result.head())

This approach is cleaner and leverages Pandas’ built-in type selection, making it easier to maintain and understand.

Sorting by Type and Statistical Properties

For analytical workflows, you might want to sort numeric columns by their variance or range, helping identify high-variation features first.

def sort_by_type_and_variance(df):
    """Sort numeric columns by variance, others alphabetically."""
    
    numeric_df = df.select_dtypes(include=[np.number])
    non_numeric_df = df.select_dtypes(exclude=[np.number])
    
    # Calculate variance for numeric columns
    variances = numeric_df.var().sort_values(ascending=False)
    numeric_cols_sorted = variances.index.tolist()
    
    # Sort non-numeric alphabetically
    non_numeric_cols_sorted = sorted(non_numeric_df.columns)
    
    # Combine: high-variance numerics first
    final_order = numeric_cols_sorted + non_numeric_cols_sorted
    return df[final_order]

# Add more numeric columns for demonstration
df['bonus'] = [5000, 8000, 3000]
df['years_exp'] = [3, 8, 2]

sorted_df = sort_by_type_and_variance(df)
print(sorted_df.columns.tolist())

Dynamic Type-Based Column Reordering

When working with DataFrames that change structure dynamically, create a reusable class that maintains sorting logic.

class TypeBasedColumnSorter:
    def __init__(self, type_priority=None):
        self.type_priority = type_priority or {
            'datetime64': 0,
            'int': 1,
            'float': 2,
            'bool': 3,
            'category': 4,
            'object': 5
        }
    
    def _get_type_category(self, dtype):
        """Map Pandas dtype to category."""
        dtype_str = str(dtype)
        
        if 'datetime' in dtype_str:
            return 'datetime64'
        elif 'int' in dtype_str:
            return 'int'
        elif 'float' in dtype_str:
            return 'float'
        elif 'bool' in dtype_str:
            return 'bool'
        elif 'category' in dtype_str:
            return 'category'
        else:
            return 'object'
    
    def sort(self, df, secondary_sort='alpha'):
        """
        Sort DataFrame columns by type.
        
        Parameters:
        -----------
        df : DataFrame
        secondary_sort : str
            'alpha' for alphabetical, 'length' for name length
        """
        def sort_key(col):
            dtype_cat = self._get_type_category(df[col].dtype)
            priority = self.type_priority.get(dtype_cat, 999)
            
            if secondary_sort == 'alpha':
                return (priority, col)
            elif secondary_sort == 'length':
                return (priority, len(col), col)
            else:
                return (priority, col)
        
        sorted_cols = sorted(df.columns, key=sort_key)
        return df[sorted_cols]

# Usage
sorter = TypeBasedColumnSorter()
result = sorter.sort(df, secondary_sort='alpha')
print(result.columns.tolist())

# Custom priority: objects first
custom_sorter = TypeBasedColumnSorter(type_priority={
    'object': 0,
    'category': 1,
    'int': 2,
    'float': 3,
    'datetime64': 4,
    'bool': 5
})
result2 = custom_sorter.sort(df)
print(result2.columns.tolist())

Practical Application: Data Pipeline Integration

In production pipelines, type-based sorting helps standardize DataFrame structures before saving or processing.

def standardize_dataframe_structure(df, config=None):
    """
    Standardize DataFrame column order for consistent outputs.
    Useful in ETL pipelines.
    """
    if config is None:
        config = {
            'id_columns_first': True,
            'type_order': ['datetime', 'int', 'float', 'bool', 'category', 'object'],
            'id_pattern': ['id', 'key', 'index']
        }
    
    columns = df.columns.tolist()
    
    # Separate ID columns
    id_cols = []
    other_cols = []
    
    if config['id_columns_first']:
        for col in columns:
            if any(pattern in col.lower() for pattern in config['id_pattern']):
                id_cols.append(col)
            else:
                other_cols.append(col)
    else:
        other_cols = columns
    
    # Sort other columns by type
    sorter = TypeBasedColumnSorter()
    temp_df = df[other_cols]
    sorted_df = sorter.sort(temp_df)
    
    # Combine ID columns + sorted columns
    final_columns = id_cols + sorted_df.columns.tolist()
    return df[final_columns]

# Example with ID column
df['user_id'] = [101, 102, 103]
standardized = standardize_dataframe_structure(df)
print(standardized.columns.tolist())
# ID columns appear first, then sorted by type

Type-based column sorting provides structure to wide DataFrames and enables consistent data handling patterns across analytical workflows. Combine these techniques with your domain knowledge to create sorting schemes that make your data more accessible and your code more maintainable.

Liked this? There's more.

Every week: one practical technique, explained simply, with code you can use immediately.