Data part is often the hardest part of building an AI application: Srikant Gokulnatha, Oracle

By Abhijit Ahaskar & Uday Bhaskarwar

The proliferation of AI models and agents has left enterprises spoilt for choice, while also exposing critical gaps in their data infrastructure. Enterprises are increasingly realizing that the success of GenAI and agentic AI is inextricably linked to the quality of the underlying data, making robust data governance and oversight essential prerequisites.

In an interaction with CXOtoday, Srikant Gokulnatha, Senior Vice President, Oracle AI Data Platform, Analytics, and Analytical Applications Products, Oracle, talks about the critical role data plays in agentic AI’s success and what enterprises need to do to solve the data fragmentation issue. Gokulnatha also pointed out that enterprises should not lock themselves in with any vendor or training method, as the AI model and agent ecosystem is evolving very fast. Edited excerpts:

Q. What are some of the essential data quality standards that you recommend to companies to get the best value out of GenAI and agentic AI?

The quality of your data often determines the success of your AI application’s agentic capabilities. The challenge is that, unlike classical analytics implementations, where you have the luxury to bring your data into a warehouse for regular loading, cleansing, and transformation of data, agentic applications involve data that moves at velocity.

You may have data coming out of the warehouse, but you may also have unstructured data or real-time transaction data flowing in simultaneously. The standard data quality techniques still apply, but we are also seeing use of LLMs to gauge data quality and to enrich the data. We are still in a learning phase. However, as we do more AI deployments, we are discovering new techniques to ensure the data is of sufficient quality to deliver robust outcomes.

Q. With growing AI adoption the volume of AI-generated data is also growing. Gartner recently warned that LLMs in future will train more on output from previous AI models, increasing the risk of model collapse, where AI’s output quality deteriorates even more. Is this something that enterprises should be worried about?

In general, it is a concern because many of the solutions we are building use the power of LLMs. However, our solutions apply it to real data that sits within an enterprise. Therefore, it is less of a concern for the applications we drive. If you are trying to build a generic app based on public data, which is synthetically generated using LLMs, then model collapse is a real concern. For business applications, that is not really a concern, because the data is internal enterprise data.

Q. Many enterprises struggle with fragmented data stored in silos. How is Oracle’s integrated platform simplifying this complexity and helping CXOs make better decisions in real time?

Our answer to that is allowing our customers to use an enterprise lakehouse approach, which inherently recognizes that data resides all over the enterprise and could be coming from various systems. It could be Oracle, it could be non-Oracle, it could be relational, it could be unstructured, it could be historical, it could be real-time, it could be images, or it could be video.

So, data can come in all kinds of formats. A lakehouse approach is valid for those solutions because you want to bring data from many places, shapes and modalities. The data part is often the hardest part of building an AI application. We are able to bring data from all those different sources and have it managed centrally.

Q. Is data fragmentation a bigger problem in India as compared to other regions?

I don’t think so. It is a fairly general-purpose problem and an attribute of how businesses grow. If you are starting a new business and you have to define everything, you can take a much more centralized data approach. Companies grow organically. Often, various departments do different things or acquire new types of businesses. This complexity is inherent to how business models evolve. In some ways, India may have leapfrogged some legacy challenges. When I look at banks in the US, for instance, they are often running mission-critical databases built 30 years ago. These systems still do their job, so there is a reluctance to migrate from them, even as newer systems are being introduced alongside them.

Q. Some hyperscalers are offering enterprises the option to train frontier models directly on proprietary data. In your view, does this early domain pre-training offer a significant upgrade in performance or does fine tuning still remain the best approach to get the best out of a frontier model for enterprise customers?

Right now, I see more fine tuning. There are so many different new techniques coming. For instance, there’s a notion of relational foundation models in which models are pre-trained on large volumes of datasets and they can actually, without building a machine learning (ML) model, predict patterns and do forecasts. So, there are various kinds of forms of models that are actively being researched and built.

We obviously look at all of those as well. But I don’t see the pre-training part as becoming very common yet.

The other challenge you have is that these models are evolving so fast that often for a particular agent or application, a model you may have picked six months ago is no longer valid and it’s often not from the same vendor. Now, we don’t restrict customers to a particular model and some of the agents we are building use two or three different models from different vendors. You don’t want to lock yourself into a pattern because the capabilities of these models have changed dramatically in three or four months.

Q. Whatever is happening in the AI space is a continuum where models will constantly iterate and improve. But if this is a gradual transition, how do you explain the sudden volatility we saw a few weeks ago?

I am not too worried about that, since we don’t control that. What we do know is that for our customers, there are three things that we bring to the table that are very valuable.

First, much of the world’s data resides in Oracle systems. We make it easy for you to utilize that data without moving it. Because moving your data leads to high cost of ownership problems and you end up with the brittleness of the data pipelines. We let you leverage all of your Oracle data very effectively.

Second, our systems have a lot of business context. When you build and deploy a system of ours, you configure it in a particular way, you put a lot of knowledge into constructing or configuring that system for specific business processes.

We are able to automatically gauge what that is and use that to inform the models. So, this notion of a context graph has become a really popular one because that’s what lets AI agents operate more effectively. Third, once you have these insights or these actions that the agents want to perform, you have to perform that in the context of the business processes. Most of the business processes are within an Oracle system.

Q. As AI begins to make more autonomous decisions in Oracle Fusion applications, how do you ensure oversight and trust in AI-driven outcomes?

That actually exists in multiple layers. Within OCI itself, for all of these models, regardless of who the vendor is, there is a set of in-built guardrails and there is also a set of guardrails that are configurable by the customer. So, for instance, if you’re using the Oracle AI data platform and you’re not letting the end user or the developer use a set of models, as part of that configuration process, you can define the guardrails that apply to each of these.

At an agent level, you have to apply techniques. Now, there are various techniques, like you can use LLM as a judge to make sure that you don’t have hallucination or biases. But I still see many customers still having human-in-the-loop for the most critical processes. There are actions that are fairly automated, but there is often a threshold, such as a dollar amount or a severity level below which things are automated without a lot of oversight because they are considered low risk.

Data part is often the hardest part of building an AI application: Srikant Gokulnatha, Oracle

Tags: