{"id":28155,"date":"2026-05-05T15:51:08","date_gmt":"2026-05-05T15:51:08","guid":{"rendered":"https:\/\/www.europesays.com\/ai\/28155\/"},"modified":"2026-05-05T15:51:08","modified_gmt":"2026-05-05T15:51:08","slug":"causal-dynamics-lab-launches-cielara-code-for-ai-coding","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/ai\/28155\/","title":{"rendered":"Causal Dynamics Lab launches Cielara Code for AI coding"},"content":{"rendered":"<p>Causal Dynamics Lab has launched Cielara Code, a tool it says outperformed Anthropic&#8217;s Claude Code and OpenAI&#8217;s Codex in code localisation tests.<\/p>\n<p>The research focused on a common weakness in AI coding agents: finding the right files to change. An analysis of thousands of coding sessions found that 56.8% of agent actions involved reading files and 24.2% involved using grep, while less than 1% involved actual code edits.<\/p>\n<p>That pattern suggests a search problem rather than a code-generation problem. The findings also showed that agents became less reliable as tasks spanned more files, and that failed attempts used four times as much computing power as successful runs when a correct fix involved more than six files.<\/p>\n<p>Cielara Code is designed to address that issue by building what Causal Dynamics Lab describes as a code dependency causal graph before an agent begins exploring a codebase. The graph maps relationships across software components so the agent can navigate structure and dependencies instead of moving through files one by one.<\/p>\n<p>The system models a production environment in six layers: what code does, why it was created, who owns it, its constraints, where it runs and what happens at runtime. According to the company, this approach can also trace a failure back to a specific code change, the developer who approved it and the reason for the change.<\/p>\n<p>In benchmark results cited by Causal Dynamics Lab, Cielara Code recorded overall localisation accuracy of 0.774, compared with 0.738 for Claude Code and 0.707 for Codex. On MULocBench, which covers 1,033 issues across 46 repositories, it posted recall@5 of 0.752 versus 0.727 for Claude Code and cut mean task time to 128.62 seconds from 141.84 seconds.<\/p>\n<p>The approach also reduced compute cost per task by 30% to 40%, according to the company. The benchmarks were run against Claude Code (Opus-4.6) and OpenAI Codex (GPT-5.4) using public test harnesses including MULocBench, UltraDomain, LoCoMo and LongMemEval.<\/p>\n<p><img decoding=\"async\" alt=\"\" src=\"https:\/\/www.europesays.com\/ai\/wp-content\/uploads\/2026\/05\/Built_for_Coding.webp\" style=\"width: 100%; height: 100%;\"\/><\/p>\n<p>Chief Executive Officer Hasibul Haque set out the company&#8217;s case for a different approach to software search.<\/p>\n<p>&#13;<\/p>\n<p>&#8220;Every coding agent out there today uses grep, which is like a surgeon operating without imaging,&#8221; Haque said. &#8220;We created Cielara Code to help agents see better: it provides a clear understanding of the working environment, making the reasons behind each change clear and verifiable.&#8221;<\/p>\n<p>&#13;<\/p>\n<p>Verification gap<\/p>\n<p>Causal Dynamics Lab positioned the launch against wider concerns about reliability in AI-assisted software development. It cited the 2025 DORA report, which found that use of AI coding tools was associated with a 7.2% drop in deployment stability, and pointed to what AWS Chief Technology Officer Werner Vogels has described as &#8220;dynamic verification debt&#8221;.<\/p>\n<p>The company argued that current coding agents often treat code as flat text and do not show how files connect, how functions call one another or how a change affects the wider system. It said this becomes a bigger problem in production environments, where engineering teams need to understand not only what changed but also why the change was made and what knock-on effects might follow.<\/p>\n<p>The launch also introduces REASONARA, a graph-structured memory layer used by Cielara Code. According to Causal Dynamics Lab, REASONARA stores more than 125 million tokens of context while retrieving only the material relevant to a given query.<\/p>\n<p>A typical lookup uses between 1,000 and 2,500 tokens, compared with 23,000 to 115,000 tokens for full-context approaches, the company said. It reported benchmark scores of 94% on UltraDomain, 92% on LoCoMo, 73% on LoCoMo-plus and 87.4% on LongMemEval, and said the system ran five to eight times faster than Codex high-reasoning mode.<\/p>\n<p>Causal Dynamics Lab said 11 Fortune 100 companies and more than 40 Fortune 500 companies are using Cielara Code on their codebases. It described the product as a safety layer for AI coding agents rather than a replacement for them.<\/p>\n<p>Security leaders are among those weighing the governance implications of more autonomous development tools.<\/p>\n<p>&#8220;Board members and auditors expect more proactive risk management. Leaders now want proof that security can anticipate risks caused by fast-moving AI and automation, instead of just reacting after incidents,&#8221; said the CISO of one of the largest law firms in the United States, who is also a Cielara Code customer.<\/p>\n<p>Phillip Miller, Vice President and Global Chief Information Security Officer at H&amp;R Block, said the problem had moved beyond what teams could manage manually.<\/p>\n<p>&#13;<\/p>\n<p>&#8220;Enterprises need solutions to problems they cannot solve with people alone. Cielera&#8217;s technology is a generational leap towards the original promise of AI: tackling complexity 7&#215;24 with acquired knowledge, deep reasoning, and unbeatable accuracy. For engineering teams, this means a single engine to discover faults in real-world deployments, including legacy and cloud, and provide clear resolution steps. When I wrote, Hacking Success, I described a world where AI needs strong, directive policy, not rules or guardrails, to be safe and effective. Information security lags behind the innovation curve, as most options rely on legacy thinking including posture, gateways, and logging. Enterprises now have an option to leverage Cielera&#8217;s models to oversee deployments of AI agents, models, and their supporting infrastructure,&#8221; Miller said.<\/p>\n<p>&#13;<\/p>\n<p>The company was founded by former Uber platform engineers and researchers with backgrounds including Microsoft Research and Emory University. Haque previously led platform engineering at Uber, while Chief Technology Officer Ryan Turner was a staff engineer at Uber and worked on the SPIRE Project within the Cloud Native Computing Foundation. Research and development is led by Dr Xuchao Zhang and Dr Liang Zhao, and the company has a formal research partnership with Emory&#8217;s AI Lab.<\/p>\n<p>Matt Fisher, former co-founder and chief technology officer of Daydream and an adjunct professor at Brown University, linked the company&#8217;s work to a broader shift in how AI systems are used.<\/p>\n<p>&#8220;AI has already changed how people find information. The next step is to change how people make decisions by exploring possibilities, comparing options, and understanding the outcomes before making a choice,&#8221; Fisher said. &#8220;That shift towards exploring outcomes is what CDL is focusing on.&#8221;<\/p>\n","protected":false},"excerpt":{"rendered":"Causal Dynamics Lab has launched Cielara Code, a tool it says outperformed Anthropic&#8217;s Claude Code and OpenAI&#8217;s Codex&hellip;\n","protected":false},"author":2,"featured_media":28156,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[8],"tags":[7903,8923,4989,53,3154,8918,10348,1237,10140,1929,182,14006,313,8763,7897,14654,5282,8526,10583,2112,641,424,5411],"class_list":{"0":"post-28155","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-anthropic","8":"tag-ai-agents-agentic-ai","9":"tag-ai-ethics-governance","10":"tag-ai-safety","11":"tag-anthropic","12":"tag-anthropic-claude","13":"tag-application-security","14":"tag-appsec","15":"tag-audit","16":"tag-benchmarking","17":"tag-change-management","18":"tag-claude","19":"tag-cloud-native-computing-foundation","20":"tag-cybersecurity","21":"tag-devops","22":"tag-enterprise-resource-planning-erp","23":"tag-fortune-100","24":"tag-fortune-500","25":"tag-large-language-models-llms","26":"tag-risk-compliance","27":"tag-risk-management","28":"tag-software-development","29":"tag-software-engineering","30":"tag-uber"},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/posts\/28155","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/comments?post=28155"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/posts\/28155\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/media\/28156"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/media?parent=28155"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/categories?post=28155"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/tags?post=28155"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}