In Spark cluster data is typically read in as 128 MB partitions which ensures even distribution of data. However, as the data is transformed (e.g. aggregated), it is possible to have significantly…
1.5 Years of Spark Knowledge in 8 Tips, by Michael Berk
Spark Performance Tuning: Skewness Part 1, by Wasurat Soontronchai
How to Optimize Spark Applications for Performance using Sparklens
3. A Case Study Of Spark Performance Optimization On Large Dataframes, by Jiahui Wang
List: Apache Spark, Curated by Luan Moreno M. Maciel
Himansu Sekhar – Medium
Spark Performance Tuning: Skewness Part 1, by Wasurat Soontronchai
High Performance Spark: Best Practices for Scaling and Optimizing Apache Spark 1, Karau, Holden, Warren, Rachel, eBook
List: DataEng, Curated by Bruno Servilha
Spark Performance Optimization Series: #1. Skew, by Himansu Sekhar, road to data engineering