Operations

Engineering

Spark Scala - RDD Operations

Resilient Distributed Datasets (RDDs) are Spark’s original abstraction for distributed data processing. While DataFrames and Datasets have become the preferred API for most workloads, understanding…

Read more →
Scala

Scala - zip and unzip Operations

• Scala’s zip operation combines two collections element-wise into tuples, while unzip separates a collection of tuples back into individual collections—essential for parallel data processing and…

Read more →
Scala

Scala - reduce and fold Operations

The reduce operation processes a collection by repeatedly applying a binary function to combine elements. It takes the first element as the initial accumulator and applies the function to…

Read more →
Scala

Scala - Date and Time Operations

The java.time package provides separate classes for dates, times, and combined date-times. Use LocalDate for calendar dates without time information and LocalTime for time without date context.

Read more →
Engineering

Python - Boolean Operations

Python’s boolean type represents one of two values: True or False. These aren’t just abstract concepts—they’re first-class objects that inherit from int, making True equivalent to 1 and…

Read more →
Python

PySpark - SQL JOIN Operations

Join operations in PySpark differ fundamentally from their single-machine counterparts. When you join two DataFrames in Pandas, everything happens in memory on one machine. PySpark distributes your…

Read more →
Python

PySpark - RDD join Operations

• RDD joins in PySpark support multiple join types (inner, outer, left outer, right outer) through operations on PairRDDs, where data must be structured as key-value tuples before joining

Read more →
Python

PySpark - Pair RDD Operations

• Pair RDDs are the foundation for distributed key-value operations in PySpark, enabling efficient aggregations, joins, and grouping across partitions through hash-based data distribution.

Read more →
Pandas

Pandas: String Operations Guide

Text data is messy. Customer names have inconsistent casing, addresses contain extra whitespace, and product codes follow patterns that need parsing. If you’re reaching for a for loop or apply()

Read more →
Python

NumPy: Array Operations Explained

NumPy is the foundation of Python’s scientific computing ecosystem. Every major data science library—pandas, scikit-learn, TensorFlow, PyTorch—builds on NumPy’s array operations. If you’re doing…

Read more →
Go

Go Unsafe Package: Low-Level Operations

The unsafe package is Go’s escape hatch from type safety. It provides operations that bypass Go’s memory safety guarantees, allowing you to manipulate memory directly like you would in C. This…

Read more →
Go

Go Strings: Operations and Manipulation

Go strings are immutable sequences of bytes, typically containing UTF-8 encoded text. Under the hood, a string is a read-only slice of bytes with a pointer and length. This immutability has critical…

Read more →
Go

Go atomic Package: Lock-Free Operations

Concurrent programming in Go typically involves protecting shared data with mutexes. While effective, mutexes introduce overhead: goroutines block waiting for locks, the scheduler gets involved, and…

Read more →
Go

Go bufio: Buffered I/O Operations

Every system call has overhead. When you read or write data byte-by-byte or in small chunks, your program spends more time context-switching to the kernel than actually processing data. Buffered I/O…

Read more →