{"id":113818,"date":"2025-05-19T07:56:12","date_gmt":"2025-05-19T07:56:12","guid":{"rendered":"https:\/\/www.europesays.com\/uk\/113818\/"},"modified":"2025-05-19T07:56:12","modified_gmt":"2025-05-19T07:56:12","slug":"the-new-markets-for-ai-data","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/uk\/113818\/","title":{"rendered":"The new markets for AI data"},"content":{"rendered":"<p>Unlock the Editor\u2019s Digest for free<\/p>\n<p class=\"article__content-sign-up-topic-description o3-type-body-base\">Roula Khalaf, Editor of the FT, selects her favourite stories in this weekly newsletter.<\/p>\n<p>The writer is the global co-head of investment banking at Goldman Sachs.<\/p>\n<p>Data is the foundation of the artificial intelligence revolution, but AI is also revolutionising the market for data. Developers are racing to invest billions of dollars to build the infrastructure to power vast AI systems. That rapid expansion has led to a surge in demand for data, creating the potential for companies to generate significant economic value.<\/p>\n<p>AI systems are typically described as having three main components \u2014 power, compute and data. These refer to the electricity required to power data centres, the chips needed to conduct computations at mind-boggling speeds, and the data necessary to train AI models. Of these critical components, it is data that is least discussed, perhaps because data centres and <a href=\"https:\/\/www.ft.com\/semiconductors\" data-trackable=\"link\" target=\"_blank\" rel=\"noopener\">semiconductors<\/a> are physical things you can see and touch. (It\u2019s admittedly difficult to hold up a data packet during an onstage keynote.)<\/p>\n<p>But sourcing data is an essential aspect of the rapidly expanding AI ecosystem. According to some estimates, the world is running out of \u201corganic\u201d <a href=\"https:\/\/www.ft.com\/big-data\" data-trackable=\"link\" target=\"_blank\" rel=\"noopener\">data<\/a>, with model developers reaching the limits of publicly available data \u2014 essentially copies of the entire internet \u2014 to pre-train ever-bigger models.<\/p>\n<p>After AI models are constructed and pre-trained on huge data sets, they still require additional \u201ctest time compute\u201d where a model is asked to answer specific questions or solve problems. This requires the right kind of data, which is sometimes lacking.<\/p>\n<p>There is a lack of sufficient training data that shows humans \u201cshowing their work\u201d in the steps to address complex problems. This is where companies with focused, well-organised, or highly logical data sets can become newly relevant. Imagine how a textbook company might use its archives of technical manuals and coursework to train an AI system to do complex scientific processes.<\/p>\n<p>Recent data licensing deals show how different companies are selling access to their data to AI companies. Expect this trend to accelerate as companies get even more creative in doing so. So far, these deals have been negotiated individually with special terms, but you can imagine a marketplace \u2014 or multiple markets \u2014 for training data emerging.<\/p>\n<p>Synthetic data, or data created at least in part by AI systems, is a critical part of the development of large language models and has emerged as one path for expanding the set of options for developers looking for new data sets.<\/p>\n<p>For example, as robotic technology becomes more sophisticated, AI systems can increasingly create maps of our physical environment. Synthetic data for self-driving might involve setting up a \u201cdigital twin\u201d of Los Angeles and having millions of \u201cmock\u201d vehicles navigate the city in a virtual space as training data.<\/p>\n<p>And it is possible that types of data that have previously been difficult to analyse or use become newly accessible and valuable with the incredible computational power of AI systems. Think about what data we\u2019ve collected about complex systems such as weather, quantum mechanics or viral mutations. As robots can perceive entire categories of data that are imperceptible to humans, collections of video and spatial data may also suddenly have a newfound value.<\/p>\n<p class=\"n-content-recommended__title o3-type-body-highlight\">Recommended<\/p>\n<p><a href=\"https:\/\/www.ft.com\/content\/7a462525-4033-40e9-867e-2a4459396ee6\" data-trackable=\"image-link\" data-trackable-context-story-link=\"image-link\" tabindex=\"-1\" aria-hidden=\"true\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"o-teaser__image\" src=\"https:\/\/www.europesays.com\/uk\/wp-content\/uploads\/2025\/05\/https:\/\/www.ft.com\/__origami\/service\/image\/v2\/images\/raw\/https%3A%2F%2Fd1e00ek4ebabms.cloudfront.net.jpeg\" alt=\"Business leaders at the King Abdul Aziz International Conference Center in Riyadh\"\/><\/a><\/p>\n<p>Tesla uses the data collected by its fleet of autonomous driving vehicles to train the AI models that power its underlying self-driving technology. And Nvidia recently announced an expansion of its robot simulation environment, where it trains its robots in a virtual, digital representation of the physical world.<\/p>\n<p>One of the most valuable repositories of data is human-generated data that remains locked away \u2014 proprietary research behind corporate and government firewalls. Today, the holders of this data are reluctant to make it accessible without knowing the implications. But the right structures and incentives can invite more deals.<\/p>\n<p>In practical terms, different companies will devise different strategies. Some will treat data as a core business asset, not a byproduct, and work to monetise it through licensing or subscriptions. Others will need to upgrade their data infrastructure to make the best use of future AI capabilities.<\/p>\n<p>How different jurisdictions decide to regulate AI and further regulate data usage will have profound implications for how those markets evolve \u2014 and where. Data privacy and security, questions about data provenance, ownership, authentication, are all potential new legislation areas.<\/p>\n<p>This period of incredible innovation and upheaval offers opportunities for the companies that get their data strategy right.<\/p>\n","protected":false},"excerpt":{"rendered":"Unlock the Editor\u2019s Digest for free Roula Khalaf, Editor of the FT, selects her favourite stories in this&hellip;\n","protected":false},"author":2,"featured_media":113819,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3163],"tags":[323,1942,53,16,15],"class_list":{"0":"post-113818","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-artificial-intelligence","8":"tag-ai","9":"tag-artificial-intelligence","10":"tag-technology","11":"tag-uk","12":"tag-united-kingdom"},"share_on_mastodon":{"url":"https:\/\/pubeurope.com\/@uk\/114533432221486644","error":""},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/113818","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/comments?post=113818"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/posts\/113818\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media\/113819"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/media?parent=113818"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/categories?post=113818"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/uk\/wp-json\/wp\/v2\/tags?post=113818"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}