Apache Spark - Bucketing for Performance
Bucketing is Spark’s mechanism for pre-shuffling data at write time. Instead of paying the shuffle cost during every query, you pay it once when writing the data. The result: joins and aggregations…
Read more →