- Build an annotated dataset to help evaluating the various LLMs, log data consuming methods, parameters etc
- Evaluate the model's ability to detect previously unseen errors by leveraging the LLM's understanding of language patterns.
- Test the model on logs with injected synthetic errors to simulate new error scenarios.
- Propose strategies for integrating the LLM into our existing CI pipeline and developer tools for real-time error detection.
- Conduct user studies to gather feedback from engineers on the tool's utility.
- Define metrics such as precision, recall, and time savings to evaluate the model's effectiveness compared to the current system.
- Prompt Engineering and Input Formatting: Experiment with various input representations to optimize the LLM's focus on critical log sections. Develop prompts that elicit more accurate and concise error identifications from the model.
Expected Outcomes:
- A robust LLM-based tool capable of automatically detecting and classifying errors in build logs with high accuracy.
- A comprehensive evaluation of the LLM's performance compared to the existing pattern-based system (Build Failure Analyzer), highlighting strengths and limitations.
- Guidelines and best practices for integrating LLMs into industrial CI pipelines.
- Insights into the scalability and adaptability of LLMs for log analysis across different projects and domains.
- Recommendations for future enhancements and potential extensions of this approach.
- Most likely you are studying a Master Program within Engineering and are interested in the areas stated above.
- The announced thesis is open only to students affiliated with a Swedish University/College either directly or via an exchange program.
- When the thesis proposal states that it includes two students working together, we would like you to apply in pairs. In these cases, send one application each but make sure to clearly state in your application who your co-applicant is. If you have any questions regarding this, please do not hesitate to contact us.
TBD