Bucketing

Jan 03, 2025 Engineering

Apache Spark - Bucketing for Performance

Bucketing is Spark’s mechanism for pre-shuffling data at write time. Instead of paying the shuffle cost during every query, you pay it once when writing the data. The result: joins and aggregations…