I used Generative AI to assist in the development of this project.
-
Debugging & Error Resolution Encountered specific PySpark errors regarding file paths (e.g.,
java.net.URISyntaxExceptionlinked to:Zone.Identifierhidden files) and Py4J compatibility issues withSparkContext. The AI helped diagnose the root cause (OS-specific metadata files) and suggested robust code patterns (pathGlobFilter,try-exceptblocks) to handle environment-specific issues. -
Code Optimization & Refactoring
- Implementing the
SparkMetricsCollectorclass and optimizing Gold queries. I used AI to refactor repetitive metric collection code into a reusable Python class. It also helped suggest syntax forsortWithinPartitionsandcoalescelogic to ensure best practices.
- Technical Writing & Reporting Drafting the final report. I used AI to help structure the technical arguments, specifically for explaining the “Optimization Paradox” (why sorting adds write latency but improves read performance) and refining the English terminology for the “Physical Design” section.
All code generated or suggested by AI was reviewed, executed, and validated against the project requirements. The logic for business queries (Q1, Q2, Q3) and the choice of dataset optimizations remain my own architectural decisions.