Streaming

Jan 28, 2026 Data Engineering

Spark Structured Streaming - Architecture Guide

Structured Streaming builds on Spark SQL’s engine, treating streaming data as an unbounded input table. Each micro-batch incrementally processes new rows, updating result tables that can be written…

Read more →

Jan 27, 2026 Data Engineering

Spark Streaming - Output Modes Explained

Spark Structured Streaming’s output modes determine how the engine writes query results to external storage systems. When you work with streaming aggregations, the result table continuously changes…

Read more →

Jan 27, 2026 Data Engineering

Spark Streaming - Rate Source for Testing

The rate source is a built-in streaming source in Spark Structured Streaming that generates rows at a specified rate. Unlike file-based or socket sources, it requires no external setup and produces…

Read more →

Jan 27, 2026 Data Engineering

Spark Streaming - Sources and Sinks Overview

Structured Streaming sources define where your streaming application reads data from. Each source type provides different guarantees around fault tolerance and data ordering.

Read more →

Jan 27, 2026 Data Engineering

Spark Streaming - Stateful Processing (mapGroupsWithState)

Structured Streaming’s built-in aggregations handle simple cases, but real-world scenarios often require custom state management. Consider session tracking where you need to group events by user,…

Read more →

Jan 27, 2026 Data Engineering

Spark Streaming - Stream-Stream Joins

Stream-stream joins combine records from two independent data streams based on matching keys and time windows. Unlike stream-static joins, both sides continuously receive new data, requiring Spark to…

Read more →

Jan 27, 2026 Data Engineering

Spark Streaming - Triggers (ProcessingTime, Once, Continuous)

Spark Structured Streaming processes data as a series of incremental queries against an unbounded input table. Triggers determine the timing and frequency of these query executions. Without an…

Read more →

Jan 27, 2026 Data Engineering

Spark Streaming - Watermarking for Late Data

• Watermarks define how long Spark Streaming waits for late-arriving data before finalizing aggregations, balancing between data completeness and processing latency

Read more →

Jan 27, 2026 Data Engineering

Spark Streaming - Window Operations

Window operations partition streaming data into finite chunks based on time intervals. Unlike batch processing where you work with complete datasets, streaming windows let you perform aggregations…

Read more →

Jan 26, 2026 Data Engineering

Spark Streaming - Deduplication in Streaming

Streaming data pipelines frequently encounter duplicate records due to at-least-once delivery semantics in message brokers, network retries, or upstream system failures. Unlike batch processing where…

Read more →

Jan 26, 2026 Data Engineering

Spark Streaming - Exactly-Once Semantics

Exactly-once semantics ensures each record is processed once and only once, even during failures and restarts. This differs from at-least-once (potential duplicates) and at-most-once (potential data…

Read more →

Jan 26, 2026 Data Engineering

Spark Streaming - Fault Tolerance and Checkpointing

• Spark Streaming achieves fault tolerance through Write-Ahead Logs (WAL) and checkpointing, ensuring exactly-once semantics for stateful operations and at-least-once for receivers

Read more →

Jan 26, 2026 Data Engineering

Spark Streaming - File Source Processing

Spark Structured Streaming treats file sources as unbounded tables, continuously monitoring a directory for new files. Unlike traditional batch processing, the file source uses checkpoint metadata to…

Read more →

Jan 26, 2026 Data Engineering

Spark Streaming - Join Streaming with Static Data

• Joining streaming data with static reference data is essential for enrichment scenarios like adding customer details, product catalogs, or configuration lookups to real-time events

Read more →

Jan 26, 2026 Data Engineering

Spark Streaming - Kafka Source Integration

Spark Structured Streaming integrates with Kafka through the kafka source format. The minimal configuration requires bootstrap servers and topic subscription:

Read more →

Jan 26, 2026 Data Engineering

Spark Streaming - Monitoring and Metrics

Spark Streaming exposes metrics through multiple layers: the Spark UI, REST API, and programmatic listeners. The streaming tab in Spark UI displays real-time statistics, but production systems…

Read more →

Jan 23, 2026 Engineering

Spark Scala - Structured Streaming Example

Spark Structured Streaming fundamentally changed how we think about stream processing. Instead of treating streams as sequences of discrete events that require specialized APIs, Spark presents…

Read more →

Jan 17, 2026 Engineering

Server-Sent Events: Unidirectional Streaming

Server-Sent Events (SSE) is a web technology that enables servers to push data to clients over a single, long-lived HTTP connection. Unlike WebSockets, which provide full-duplex communication, SSE is…

Read more →

Dec 24, 2025 Databases

Redis Streams: Event Streaming Data Structure

Redis Streams implements an append-only log structure where each entry contains a unique ID and field-value pairs. Unlike Redis Pub/Sub, which delivers messages to active subscribers only, Streams…

Read more →

Dec 22, 2025 JavaScript

React Server Components: Streaming and Suspense

React Server Components fundamentally change how we think about server-side rendering. Traditional SSR forces you to wait for all data fetching to complete before sending any HTML to the client. If…

Read more →

Oct 30, 2025 Python

PySpark - Streaming from File Source

PySpark Structured Streaming treats file sources as unbounded tables, continuously monitoring directories for new files. Unlike batch processing, the streaming engine maintains state through…

Read more →

Oct 30, 2025 Python

PySpark - Streaming from Socket Source

• PySpark’s socket streaming provides a lightweight way to process real-time data streams over TCP connections, ideal for development, testing, and scenarios where you need to integrate with legacy…

Read more →

Oct 30, 2025 Python

PySpark - Streaming Join with Static DataFrame

Stream-static joins combine a streaming DataFrame with a static (batch) DataFrame. This pattern is essential when enriching streaming events with reference data like user profiles, product catalogs,…

Read more →

Oct 30, 2025 Python

PySpark - Streaming Output Modes (Append, Complete, Update)

PySpark Structured Streaming output modes determine how the streaming query writes data to external storage systems. The choice of output mode depends on your query type, whether you’re performing…

Read more →

Oct 30, 2025 Python

PySpark - Streaming Triggers Explained

Streaming triggers in PySpark determine when the streaming engine processes new data. Unlike traditional batch jobs that run once and complete, streaming queries continuously monitor data sources and…

Read more →

Oct 30, 2025 Python

PySpark - Streaming Watermark and Late Data

Watermarks solve a fundamental problem in stream processing: when can you safely finalize an aggregation? In batch processing, you know when all data has arrived. In streaming, data arrives…

Read more →

Oct 30, 2025 Python

PySpark - Streaming Window Operations

Streaming window operations partition unbounded data streams into finite chunks for aggregation. Unlike batch processing where you operate on complete datasets, streaming windows define temporal…

Read more →

Oct 30, 2025 Python

PySpark Structured Streaming Tutorial

PySpark Structured Streaming requires Spark 2.0 or later. Install PySpark and create a SparkSession configured for streaming:

Read more →

Oct 08, 2025 Databases

PostgreSQL Replication: Streaming and Logical

PostgreSQL offers two fundamentally different replication mechanisms, each suited for distinct operational requirements. Streaming replication creates exact physical copies of your entire database…

Read more →

Aug 23, 2025 JavaScript

Node.js Streaming: Readable and Writable Streams

Node.js streams solve a fundamental problem: how do you process data that’s too large to fit in memory? The naive approach loads everything at once, which works fine until you’re dealing with…

Read more →

Feb 08, 2025 Architecture

Design a Video Streaming Platform: Content Delivery

Video streaming is the hardest content delivery problem you’ll face. Unlike static assets where you cache once and serve forever, video introduces unique challenges: files measured in gigabytes,…

Read more →