Llms Achieve 206% Improvement With Codified Expert Knowledge For AI Agents

Scientists are tackling the challenge of inaccessible expert knowledge which often limits scalability and informed decision-making within organisations. Choro Ulan uulu, Mikhail Kulyabin, and Iris Fuhrmann, all from Siemens AG, alongside Jan Joosten of Eindhoven University of Technology, and et al., present a novel software engineering framework designed to capture and embed this crucial human domain expertise into artificial intelligence systems. Their research details how augmenting Large Language Models (LLMs) with codified rules and a Retrieval-Augmented Generation (RAG) system dramatically improves the generation of simulation data visualisations , achieving a 206% improvement in output quality and, crucially, expert-level ratings across five engineering scenarios. This work demonstrates a pathway for non-experts to produce high-quality results in specialised fields, effectively democratising access to complex insights and reducing reliance on limited expert time.

TionExpert knowledge bottlenecks hinder data visualisation

Scientists across industries face a critical scalability challenge: essential domain knowledge often resides with few experts, creating bottlenecks that limit productivity and decision-making quality. When experts are unavailable, work either halts or proceeds with suboptimal outcomes, potentially leading to missed deadlines, increased costs, and catastrophic failures. This challenge is particularly acute in data visualization, where creating effective charts requires both domain knowledge and visualization expertise. Non-experts typically default to familiar chart types because selecting appropriate techniques for complex data remains difficult.

Even when attempting sophisticated visualizations, results frequently require expert interpretation, while experts must balance mentorship against their primary responsibilities. In simulation data visualization, these challenges intensify. Engineers need dual expertise in simulation analysis and data analytics to create visualizations revealing decision-critical insights. Without this expertise, users significantly underutilize available capabilities, missing opportunities to expose key trade-offs. Critical visualization design knowledge, such as which plot types reveal specific patterns, remains tacit within domain experts, necessitating continuous validation cycles that divert expert resources from high-value tasks.

We illustrate this through Simulation Analysis software, a design space exploration platform that optimizes parameters (e. g, minimizing weight while maximizing strength). While Simulation Analysis software includes sophisticated post-result analysis capabilities that enable users to visualize complex data sets, this presents an opportunity to enhance user experience through automated guidance. While it includes sophisticated visualization capabilities, users require multiple attempts to identify effective visualization types. This trial-and-error approach is time-consuming and discourages full exploration of features that could accelerate design decisions.

Simulation Analysis software serves as an ideal test case for our framework because its extensive visualization capabilities and comprehensive post-processing features are representative of sophisticated engineering software where automated expert guidance can significantly enhance user productivity. This paper addresses the research question (RQ): How can domain knowledge from human experts be captured, codified, and leveraged to construct Large Language Model (LLM)-based AI agents capable of autonomous expert-level performance? Our results demonstrate how human expert domain knowledge can be captured to construct LLM-based AI agents to reduce expert bottlenecks. The resulting AI agent enables non-experts to generate expert-level visualizations that match expert-level quality in technical accuracy, visual clarity, and analytical insight, without requiring constant expert involvement.

The contributions of this paper are: (1) A systematic software engineering framework for capturing human domain knowledge and engineering an AI agent through complementary strategies: request classifier, RAG for domain-specific code generation, codified expert rules, and visualization design principles, implemented as a reference architecture demonstrating integration of heterogeneous AI techniques with clear separation of concerns. (2) Empirical evidence from industrial evaluation with 12 evaluators across five scenarios spanning multiple engineering domains (electrochemical, electromagnetic, mechanical systems) demonstrating 206% improvement in output quality (mean: 2.60 vs 0.85 on 0-3 scale), with our system achieving expert-level ratings (Mode=3) consistently versus baseline’s poor performance (Mode=0 in 4/5 scenarios), and superior code quality with lower variance (SD: 0.29-0.58 vs 0.39-1.11). (3). Initially, semi-structured interviews were conducted with two specialists, a simulation analysis software expert and a data visualisation expert, to pinpoint current challenges and actionable rules for improved visualisation creation. Each interview, lasting between 60 and 90 minutes, was performed individually to mitigate groupthink and ensure focused insights into expert workflows and decision-making processes.

The study pioneered a novel software engineering framework designed to capture tacit expert knowledge and embed it within an LLM-based system. Researchers engineered a request classifier to accurately interpret user needs, coupled with a Retrieval-Augmented Generation (RAG) system for autonomous code generation. This system was further augmented with codified expert rules and established visualisation design principles, unifying them to facilitate autonomous, reactive, proactive, and social behaviour. The team meticulously documented 8000 primitive rules and 6000 compositional rules across five engineering domains, demonstrating a commitment to comprehensive knowledge capture.

Experiments employed five distinct scenarios spanning multiple engineering domains, utilising 12 evaluators to assess the system’s performance. The approach achieves a remarkable 206% improvement in output quality compared to baseline models, consistently attaining expert-level ratings across all scenarios. Furthermore, the system maintained superior code quality, exhibiting lower variance with standard deviations ranging from 0.29-0.58, versus the baseline’s 0.39-1.11. This innovative framework successfully addresses expert bottlenecks by empowering non-experts to generate expert-level visualisations through simple prompts, effectively democratising domain knowledge and freeing up specialists for more complex tasks. The research validated that the developed system operates with autonomy, responding to user requests, proactively applying expert rules, and engaging in natural language interaction, characteristics central to the AI agent paradigm. This work demonstrates that non-experts can achieve expert-level outcomes in specialised domains, signifying a substantial advancement in the accessibility and scalability of complex data analysis.

Expert knowledge embedded within an AI agent unlocks

Scientists have developed a novel framework to capture and embed human domain knowledge into artificial intelligence systems, addressing the common problem of limited expert availability within organisations. Researchers demonstrated this through an industrial case study focused on simulation data visualisation, creating an AI agent capable of generating expert-level outputs from simple prompts provided by non-experts. A key aspect of the framework’s success was a physics-agnostic design pattern, enabling the system to be applied across diverse engineering domains, battery, motor, and structural, without requiring retraining. The authors acknowledge that the current evaluation is limited to simulation data visualisation, and further research is needed to assess the framework’s generalizability to other domains and data types. They also suggest that ‘LLM-as-a-Judge’ frameworks could be valuable for rapid regression testing, reducing the need for extensive human evaluation. Ultimately, this research offers a significant advancement in democratising access to specialised expertise, potentially enabling more efficient data analysis and informed decision-making across various industries.

👉 More information
🗞 How to Build AI Agents by Augmenting LLMs with Codified Human Expert Domain Knowledge? A Software Engineering Framework
🧠 ArXiv: https://arxiv.org/abs/2601.15153

Llms Achieve 206% Improvement With Codified Expert Knowledge For AI Agents

Tags: