For Assignment 1, I used generative AI (ChatGPT) to help me:

  • Understand the difference between RDD and DataFrame approaches – I asked for explanations of how Spark reads data line by line with RDDs versus column-wise with DataFrames, and why this leads to different word counts.

  • Interpret the Spark code – I got step-by-step explanations for code snippets like the RDD word count pipeline (flatMap, map, reduceByKey, sortBy) and the DataFrame equivalent using split, explode, and groupBy.

  • Explain performance metrics – I asked for a breakdown of the wall time, RSS, and peak memory metrics displayed by the %%timemem cell magic.