QConAI NY 2025 - Designing AI Platforms for Reliability: Tools for Certainty, Agents for Discovery

At QCon AI NYC 2025, Aaron Erickson presented agentic AI as an engineering problem, not a prompt craft exercise. His central message was that reliability comes from combining probabilistic components with deterministic boundaries.

Erickson argued agentic AI becomes more interesting when it is treated as a layer over real operational systems rather than a replacement for them. The model can interpret questions, retrieve evidence, classify situations, and suggest actions. Deterministic systems execute the actions, enforce the constraints, and provide the telemetry that allows the whole loop to be evaluated.

He described a common trap in natural language to SQL and similar query generation patterns. The first few demos work because the questions are simple and the schema is small. Accuracy falls sharply when the schema is complex and the query space includes many joins, edge cases, or overloaded fields. One mitigation he emphasized was reducing degrees of freedom: flatten the schema, constrain the query forms, and treat expressiveness as a cost that must be paid for with more evaluation and more safeguards.

He also observed a pragmatic difference between classification and code generation. When the system’s job is to select among a small set of known categories, a model can be very effective. When the system’s job is to invent an arbitrary program over a large search space, error rates climb. That gap becomes a design lever. You can ask the model to classify an intent, then route to a deterministic query template or a bounded tool call.

He showed a slide with a large cheesecake menu which made the point that tool choice itself is a reliability problem. “LLMs can suffer from ‘paradox of choice’.” When too many tools look similar, selection quality degrades, and the model may confidently choose a suboptimal or unsafe path. The engineering implication is that tool catalogs and tool interfaces are part of the product. Tooling should be differentiated, well described, and constrained, or the agent will behave like a user staring at an enormous menu he said.

Erickson then described why role specialization matters. A general purpose agent that “knows a bit about everything” can be useful for routing and summarization, but the system’s correctness depends on purpose built components that do specific tasks with narrow contracts. He described a manager like layer that delegates, but he treated it as orchestration, not as the place where domain logic should live. In his view, the important work is in the specialized agents and deterministic tools that actually touch the underlying systems.

This set up his taxonomy of agent behaviors. One of the most concrete examples was the “Worker Agent” slide showing someone painting spirals on rocks, paired with a prompt to examine large numbers of clusters and flag the ones worth attention. He argued agents can be deployed across thousands of similar records, do the same analysis repeatedly, and store structured outputs for later review.

He described additional roles that help control complexity as systems grow. A tool selection agent can be used to reduce ambiguity when there are multiple ways to achieve an outcome. An observer or consulting style agent can monitor interactions between components and flag unsafe communication patterns, policy violations, or quality regressions. A director agent can delegate work across other agents and track progress toward a measurable outcome. The message mirrored classic testing guidance: push as much confidence as possible down into cheap tests and reserve full system runs for validating integration behavior.

He also used a simple operational analogy to justify deterministic anchors. He asked whether you reinvent routine operations every time, then answered that you do not, you provide operators a deterministic runbook. His argument was that agentic systems should inherit this habit. Where repeatability matters, encode repeatability in tools and runbooks, and let the agent decide when the runbook applies instead of letting the agent invent a new process for every incident.

Erickson finally returned to the split between certainty and discovery. Discovery is where agents explore, propose, and surface anomalies. Certainty is where deterministic tools execute bounded operations and enforce policy. He argued the boundary between them is where platform engineering lives: authentication, authorization, auditing, telemetry, and safe degradation.

Developers who want to learn more can wait until January 15 when the full video from the talk will be made available.

QConAI NY 2025 – Designing AI Platforms for Reliability: Tools for Certainty, Agents for Discovery

Tags: