Columns

Feb 05, 2026 Engineering

SQL - GROUP BY Multiple Columns

GROUP BY is fundamental to SQL analytics, but single-column grouping only gets you so far. Real business questions rarely fit into one dimension. You don’t just want total sales—you want sales by…

Read more →

Feb 05, 2026 SQLite

SQL: GROUP BY with Multiple Columns

When you need to analyze data across multiple dimensions simultaneously, single-column grouping falls short. Multi-column GROUP BY creates distinct groups based on unique combinations of values…

Read more →

Jan 28, 2026 SQL

SQL - Aliases (AS) for Columns and Tables

• Aliases improve query readability by providing meaningful names for columns and tables, especially when dealing with complex joins, calculated fields, or ambiguous column names

Read more →

Dec 19, 2025 R

R tidyr - unite() Columns into One

The unite() function from the tidyr package merges multiple columns into one. The basic syntax requires the data frame, the name of the new column, and the columns to combine.

Read more →

Dec 10, 2025 R

R dplyr - select() Columns

The select() function from dplyr extracts columns from data frames using intuitive syntax. Unlike base R’s bracket notation, select() returns a tibble and allows unquoted column names.

Read more →

Dec 09, 2025 R

R dplyr - mutate() - Add/Modify Columns

The mutate() function from dplyr adds new variables or transforms existing ones in your data frame. Unlike base R’s approach of modifying columns with $ or [], mutate() keeps your data…

Read more →

Dec 09, 2025 R

R dplyr - relocate() - Reorder Columns

The relocate() function from dplyr moves columns to new positions within a data frame. By default, it moves specified columns to the leftmost position.

Read more →

Dec 09, 2025 R

R dplyr - rename() Columns

The rename() function from dplyr uses a straightforward syntax where you specify the new name on the left and the old name on the right. This reversed assignment feels natural when reading code…

Read more →

Dec 05, 2025 R

R - Access Rows and Columns in Data Frame

• R data frames support multiple indexing methods including bracket notation [], double brackets [[]], and the $ operator, each with distinct behaviors for subsetting rows and columns

Read more →

Dec 05, 2025 R

R - Add/Remove Columns in Data Frame

• Data frames in R support multiple methods for adding columns: direct assignment ($), bracket notation ([]), and functions like cbind() and mutate() from dplyr

Read more →

Oct 31, 2025 Python

PySpark - Unpivot DataFrame (Columns to Rows)

Unpivoting transforms wide-format data into long-format data by converting column headers into row values. This operation is the inverse of pivoting and is fundamental when preparing data for…

Read more →

Oct 27, 2025 Python

PySpark - Select Columns from DataFrame

Column selection is fundamental to PySpark DataFrame operations. Unlike Pandas where you might casually select all columns and filter later, PySpark’s distributed nature makes selective column…

Read more →

Oct 26, 2025 Python

PySpark - Rename Multiple Columns

Column renaming is one of the most common data preparation tasks in PySpark. Whether you’re standardizing column names across datasets for joins, cleaning up messy source data, or conforming to your…

Read more →

Oct 26, 2025 Python

PySpark - Select All Columns Except One

When working with PySpark DataFrames, you’ll frequently encounter situations where you need to select all columns except one or a few specific ones. This is a common pattern in data engineering…

Read more →

Oct 26, 2025 Python

PySpark - Select Columns by Index

PySpark DataFrames are designed around named column access, but there are legitimate scenarios where selecting columns by their positional index becomes necessary. You might be processing CSV files…

Read more →

Oct 25, 2025 Python

PySpark - Rename All Columns in DataFrame

Column renaming in PySpark DataFrames is a frequent requirement in data engineering workflows. Unlike Pandas where you can simply assign a dictionary to df.columns, PySpark’s distributed nature…

Read more →

Oct 19, 2025 Python

PySpark - Join on Multiple Columns

Multi-column joins in PySpark are essential when your data relationships require composite keys. Unlike simple joins on a single identifier, multi-column joins match records based on multiple…

Read more →

Oct 18, 2025 Python

PySpark - GroupBy Multiple Columns

When working with large-scale data processing in PySpark, grouping by multiple columns is a fundamental operation that enables multi-dimensional analysis. Unlike single-column grouping, multi-column…

Read more →

Oct 17, 2025 Python

PySpark - Get Number of Columns in DataFrame

When working with PySpark DataFrames, knowing the number of columns is a fundamental operation that serves multiple critical purposes. Whether you’re validating data after a complex transformation,…

Read more →

Oct 15, 2025 Python

PySpark - Drop Multiple Columns

Working with large datasets in PySpark often means dealing with DataFrames that contain far more columns than you actually need. Whether you’re cleaning data, reducing memory consumption, removing…

Read more →

Oct 11, 2025 Python

PySpark - Add Multiple Columns to DataFrame

Adding multiple columns to PySpark DataFrames is one of the most common operations in data engineering and machine learning pipelines. Whether you’re performing feature engineering, calculating…

Read more →

Oct 03, 2025 Pandas

Pandas - str.split() and Expand to Columns

• The str.split() method combined with expand=True directly converts delimited strings into separate DataFrame columns, eliminating the need for manual column assignment

Read more →

Oct 01, 2025 Pandas

Pandas - Sort by Multiple Columns

• Pandas provides multiple methods for multi-column sorting including sort_values() with column lists, custom sort orders per column, and performance optimizations for large datasets

Read more →

Sep 29, 2025 Pandas

Pandas - Select Columns by Data Type

• Use select_dtypes() to filter DataFrame columns by data type with include/exclude parameters, supporting both NumPy and pandas-specific types like ’number’, ‘object’, and ‘category’

Read more →

Sep 29, 2025 Pandas

Pandas - Select Columns by Index Position

The iloc[] indexer is the primary method for position-based column selection in Pandas. It uses zero-based integer indexing, making it ideal when you know the exact position of columns regardless…

Read more →

Sep 29, 2025 Pandas

Pandas - Select Multiple Columns

The most straightforward method for selecting multiple columns uses bracket notation with a list of column names. This approach is readable and works well when you know the exact column names.

Read more →

Sep 28, 2025 Pandas

Pandas - Rename Columns Using Dictionary

The rename() method accepts a dictionary where keys are current column names and values are new names. This approach only affects specified columns, leaving others unchanged.

Read more →

Sep 28, 2025 Pandas

Pandas - Reorder/Rearrange Columns

The most straightforward approach to reorder columns is passing a list of column names in your desired sequence. This creates a new DataFrame with columns arranged according to your specification.

Read more →

Sep 27, 2025 Pandas

Pandas - Read Specific Columns from CSV

The usecols parameter in read_csv() is the most straightforward approach for reading specific columns. You can specify columns by name or index position.

Read more →

Sep 24, 2025 Pandas

Pandas - Merge on Multiple Columns

Merging on multiple columns follows the same syntax as single-column merges, but passes a list to the on parameter. This creates a composite key where all specified columns must match for rows to…

Read more →

Sep 21, 2025 Pandas

Pandas - GroupBy Multiple Columns

• GroupBy with multiple columns creates hierarchical indexes that enable multi-dimensional data aggregation, essential for analyzing data across multiple categorical dimensions simultaneously.

Read more →

Sep 17, 2025 Pandas

Pandas - Drop Columns by Index

• Pandas provides multiple methods to drop columns by index position including drop() with column names, iloc for selection-based dropping, and direct DataFrame manipulation

Read more →

Sep 17, 2025 Pandas

Pandas - Drop Multiple Columns

• Pandas offers multiple methods to drop columns: drop() with column names, drop() with indices, and direct column selection—each suited for different scenarios and data manipulation patterns.

Read more →

Sep 12, 2025 Pandas

Pandas - Add Multiple Columns

The most straightforward approach to adding multiple columns is direct assignment. You can assign multiple columns at once using a list of column names and corresponding values.

Read more →

Jun 10, 2025 Pandas

How to Sort by Multiple Columns in Pandas

Sorting data by a single column is straightforward, but real-world analysis rarely stays that simple. You need to sort sales data by region first, then by revenue within each region. You need…

Read more →

Jun 10, 2025 Python

How to Sort by Multiple Columns in Polars

Polars has rapidly become the go-to DataFrame library for Python developers who need speed. Built in Rust with a focus on parallel execution, it routinely outperforms pandas by 10-100x on common…

Read more →

Jun 09, 2025 Engineering

How to Select Columns in PySpark

Column selection is the most fundamental DataFrame operation you’ll perform in PySpark. Whether you’re preparing data for a machine learning pipeline, reducing memory footprint before a join, or…

Read more →

Jun 08, 2025 Pandas

How to Select Columns in Pandas

Column selection is the bread and butter of pandas work. Before you can clean, transform, or analyze data, you need to extract the specific columns you care about. Whether you’re dropping irrelevant…

Read more →

Jun 08, 2025 Python

How to Select Columns in Polars

Polars has rapidly become the go-to DataFrame library for Python developers who need speed. Built in Rust with a lazy execution engine, it consistently outperforms pandas by 10-100x on common…

Read more →

Jun 06, 2025 Pandas

How to Rename Columns in Pandas

Every data scientist has opened a CSV file only to find column names like Unnamed: 0, cust_nm_1, or Total Revenue (USD) - Q4 2023. Messy column names create friction throughout your analysis…

Read more →

Jun 06, 2025 Python

How to Rename Columns in Polars

Column renaming sounds trivial until you’re staring at a dataset with columns named Customer ID, customer_id, CUSTOMER ID, and cust_id that all need to become customer_id. Or you’ve…

Read more →

Jun 06, 2025 Engineering

How to Rename Columns in PySpark

Column renaming in PySpark seems trivial until you’re knee-deep in a data pipeline with inconsistent schemas, spaces in column names, or the need to align datasets from different sources. Whether…

Read more →

May 15, 2025 Pandas

How to Merge on Multiple Columns in Pandas

Single-column merges work fine until they don’t. Consider a sales database where you need to join transaction records with inventory data. Using just product_id fails when you have multiple…

Read more →

Apr 28, 2025 Pandas

How to GroupBy Multiple Columns in Pandas

Single-column groupby operations are fine for tutorials, but real data analysis rarely works that way. You need to group sales by region and product category. You need to analyze user behavior by…

Read more →

Apr 28, 2025 Python

How to GroupBy Multiple Columns in Polars

Polars has rapidly become the go-to DataFrame library for Python developers who need speed. Built in Rust with a lazy execution engine, it routinely outperforms Pandas by 10-100x on real workloads….

Read more →

Mar 10, 2025 Pandas

How to Apply a Function to Multiple Columns in Pandas

Applying functions to multiple columns is one of the most common operations in pandas. Whether you’re calculating derived metrics, cleaning inconsistent data, or engineering features for machine…

Read more →