GENAI.md - Generative AI Usage Declaration

Assignment: DE2 - Lab 1
Course: Data Engineering II - ESIEE Paris 2025-2026
Track: D - Aviation (METAR weather reports)

1. Tool Used

  • Model: Gemini / Claude
  • Purpose: Technical support, debugging, and performance analysis.

2. Assisted Tasks

  • Debugging: Assisted in resolving AnalysisException related to the event_time column resolution and NameError for the window function.
  • Code Optimization: Provided guidance on configuring spark.sql.shuffle.partitions and explaining the impact of different trigger values.
  • Metrics Capture: Provided a helper function (capture_lab_evidence) to systematically log Spark lastProgress metrics into a CSV file.
  • Documentation: Helped structure the engineering note and format the comparison tables based on raw Spark UI screenshots.

3. Manual Work & Verification

  • Code Implementation: All streaming logic, schema definitions, and file path management were manually implemented or adapted to the specific track requirements.
  • Environment Setup: Local installation of Spark 4.0.0, Java 21, and Python 3.10 was performed manually.
  • Data Handling: Manual management of the landing directory and execution of the three test runs.
  • Validation: Every line of code was reviewed, tested, and explained during the development process to ensure full understanding of Spark Structured Streaming mechanics.

4. Conclusion

AI was used as a “Pair Programmer” to accelerate troubleshooting and documentation, ensuring the pipeline followed best practices for stateful streaming.

Declared by Justine Guirauden & Volcy Desmazures — ESIEE Paris, April 2026