Data

SQL

Spark SQL - Data Types Reference

• Spark SQL supports 20+ data types organized into numeric, string, binary, boolean, datetime, and complex categories, with specific handling for nullable values and schema evolution

Read more →
Rust

Rust Enums: Algebraic Data Types

Algebraic data types (ADTs) come from type theory and functional programming, but Rust brings them to systems programming with zero runtime overhead. Unlike C-style enums that are glorified integers,…

Read more →
Engineering

R-Tree: Spatial Data Indexing

Traditional B-trees excel at one-dimensional data. Finding all users with IDs between 1000 and 2000 is straightforward—the data has a natural ordering. But what about finding all restaurants within 5…

Read more →
R

R - merge() Data Frames

The merge() function combines two data frames based on common columns, similar to SQL JOIN operations. The basic syntax requires at least two data frames, with optional parameters controlling join…

Read more →
R

R dplyr - arrange() - Sort Data Frame

The arrange() function from dplyr provides an intuitive interface for sorting data frames. Unlike base R’s order(), it returns the entire data frame in sorted order rather than just indices.

Read more →
R

R - cut() - Bin Continuous Data

The cut() function divides a numeric vector into intervals and returns a factor representing which interval each value falls into. The basic syntax requires two arguments: the data vector and the…

Read more →
Engineering

Python - Data Types Overview

Python is dynamically typed, meaning you don’t declare variable types explicitly—the interpreter figures it out at runtime. This doesn’t mean Python is weakly typed; it’s actually strongly typed. You…

Read more →
Engineering

PySpark: Handling Skewed Data

Data skew occurs when certain keys in your dataset appear far more frequently than others, causing uneven distribution of work across your Spark cluster. In a perfectly balanced world, each partition…

Read more →
Pandas

Pandas: Handling Missing Data

Every real-world dataset has holes. Missing data shows up as NaN (Not a Number), None, or NaT (Not a Time) in Pandas, and how you handle these gaps directly impacts the quality of your analysis.

Read more →
Python

NumPy: Data Types Explained

Python’s dynamic typing is convenient for scripting, but it comes at a cost. Every Python integer carries type information, reference counts, and other overhead—a single int object consumes 28…

Read more →
Python

NumPy - Array Data Types (dtype)

• NumPy’s dtype system provides 21+ data types optimized for numerical computing, enabling precise memory control and performance tuning—a float32 array uses half the memory of float64 while…

Read more →
Pandas

How to Check Data Types in Pandas

Data types in Pandas aren’t just metadata—they determine what operations you can perform, how much memory your DataFrame consumes, and whether your calculations produce correct results. A column that…

Read more →
Python

How to Cast Data Types in Polars

Data type casting is one of those operations you’ll perform constantly but rarely think about until something breaks. In Polars, getting your types right matters for two reasons: memory efficiency…

Read more →
Pandas

How to Bin Data in Pandas

Binning—also called discretization or bucketing—converts continuous numerical data into discrete categories. You take a range of values and group them into bins, turning something like ‘age: 27’ into…

Read more →
Go

Go Race Detector: Finding Data Races

A data race happens when two or more goroutines access the same memory location concurrently, and at least one of those accesses is a write. The result is undefined behavior—your program might crash,…

Read more →
Go

Go Maps: Key-Value Data Structures

Maps are Go’s built-in hash table implementation, providing fast key-value lookups with O(1) average time complexity. They’re the go-to data structure when you need to associate unique keys with…

Read more →
Go

Go Data Types: Complete Reference

Go provides a comprehensive set of basic types that map directly to hardware primitives. Unlike dynamically typed languages, you must declare types explicitly, and unlike C, there are no implicit…

Read more →
Go

Go Byte Slices: Binary Data Handling

The []byte type is Go’s primary mechanism for handling binary data. Unlike strings, which are immutable sequences of UTF-8 characters, byte slices are mutable arrays of raw bytes that give you…

Read more →