GENAI.md - Generative AI Usage Declaration
Assignment: DE2 - Lab 1
Course: Data Engineering II - ESIEE Paris 2025-2026
Track: D - Aviation (METAR weather reports)
1. Tool Used
- Model: Gemini / Claude
- Purpose: Technical support, debugging, and performance analysis.
2. Assisted Tasks
- Debugging: Assisted in resolving
AnalysisExceptionrelated to theevent_timecolumn resolution andNameErrorfor thewindowfunction. - Code Optimization: Provided guidance on configuring
spark.sql.shuffle.partitionsand explaining the impact of differenttriggervalues. - Metrics Capture: Provided a helper function (
capture_lab_evidence) to systematically log SparklastProgressmetrics into a CSV file. - Documentation: Helped structure the engineering note and format the comparison tables based on raw Spark UI screenshots.
3. Manual Work & Verification
- Code Implementation: All streaming logic, schema definitions, and file path management were manually implemented or adapted to the specific track requirements.
- Environment Setup: Local installation of Spark 4.0.0, Java 21, and Python 3.10 was performed manually.
- Data Handling: Manual management of the
landingdirectory and execution of the three test runs. - Validation: Every line of code was reviewed, tested, and explained during the development process to ensure full understanding of Spark Structured Streaming mechanics.
4. Conclusion
AI was used as a “Pair Programmer” to accelerate troubleshooting and documentation, ensuring the pipeline followed best practices for stateful streaming.
Declared by Justine Guirauden & Volcy Desmazures — ESIEE Paris, April 2026