{"id":31110,"date":"2026-05-07T15:21:11","date_gmt":"2026-05-07T15:21:11","guid":{"rendered":"https:\/\/www.europesays.com\/ai\/31110\/"},"modified":"2026-05-07T15:21:11","modified_gmt":"2026-05-07T15:21:11","slug":"openai-introduces-websocket-based-execution-mode-to-reduce-latency-in-agentic-workflows","status":"publish","type":"post","link":"https:\/\/www.europesays.com\/ai\/31110\/","title":{"rendered":"OpenAI Introduces Websocket-Based Execution Mode to Reduce Latency in Agentic Workflows"},"content":{"rendered":"<p>OpenAI has introduced a <a href=\"https:\/\/openai.com\/index\/speeding-up-agentic-workflows-with-websockets\/\" rel=\"nofollow noopener\" target=\"_blank\">WebSocket-based execution mode<\/a> for its responses API to improve the performance of agentic workflows used in coding agents and real-time AI systems. The change replaces the traditional HTTP request-response pattern with a persistent, bidirectional connection between client and server, targeting latency and coordination overhead in multi-step reasoning workflows. According to OpenAI, early production use shows up to 40% latency reduction and improved throughput in high-concurrency scenarios.<\/p>\n<p>The update addresses a growing bottleneck in agentic systems where each step in a workflow, such as tool calls, intermediate reasoning, and follow-up queries, previously required separate HTTP requests. As inference speeds improved, these repeated network round-trip times became a dominant source of latency and operational complexity.<\/p>\n<p><img decoding=\"async\" alt=\"\" class=\"zoom-image\" src=\"https:\/\/www.infoq.com\/news\/2026\/05\/openai-websocket-responses-api\/news\/2026\/05\/openai-websocket-responses-api\/en\/resources\/1openaibeforewebsocket-1777845419017.jpeg\" height=\"400\" rel=\"share\"\/><\/p>\n<p>Traditional HTTP Flow (Source: <a href=\"https:\/\/openai.com\/index\/speeding-up-agentic-workflows-with-websockets\/\" rel=\"nofollow noopener\" target=\"_blank\">OpenAI Blog Post<\/a>)<\/p>\n<p>The WebSocket-based execution mode uses a long-lived, bidirectional connection to enable continuous data exchange without repeated handshakes. This supports streaming responses, faster tool execution, and more efficient coordination of multi-step workflows. It aligns with event-driven design patterns in distributed systems, where maintaining state across interactions improves responsiveness and throughput. The change reflects a broader focus on the transport layer in agentic systems, where communication patterns and connection management influence overall performance, as discussed in an <a href=\"https:\/\/www.infoq.com\/articles\/ai-agent-transport-layer\/\" rel=\"nofollow noopener\" target=\"_blank\">AI Agent Transport Layer<\/a>.<\/p>\n<p><a href=\"https:\/\/x.com\/VibeCoderOfek\" rel=\"nofollow\">Ofek Shaked<\/a>, a Vibe Coder, described the change as,<\/p>\n<p>&#13;<\/p>\n<p>WebSockets for agent state is such an obvious but huge win. No more cold starts killing your multi-tool chains.<\/p>\n<p>&#13;<\/p>\n<p>OpenAI reported up to 40% latency reduction in early production use, along with sustained throughput of around 1,000 transactions per second and bursts up to 4,000 TPS. These results indicate that transport-level optimizations can significantly impact end-to-end AI system performance alongside model-level improvements.<\/p>\n<p>Gabriel Chua, DX Engineer @ OpenAI, <a href=\"https:\/\/www.linkedin.com\/posts\/gabriel-chua_agents-just-got-faster-the-responses-ugcPost-7432065272846655488-oUvX?utm_source=share&amp;utm_medium=member_desktop&amp;rcm=ACoAAArnikgBqzTxA9Y838-O55QUcB2McACIq94\" rel=\"nofollow noopener\" target=\"_blank\">stated<\/a><\/p>\n<p>&#13;<\/p>\n<p>You can warm up the connection by sending your system prompt and tool definitions first. It&#8217;s Zero Data Retention (ZDR) compatible.<\/p>\n<p>&#13;<\/p>\n<p>Adoption has been immediate among developer tooling and coding agent platforms. Vercel integrated the WebSocket mode into its AI SDK and <a href=\"https:\/\/x.com\/aisdk\/status\/2026031263925039591\" rel=\"nofollow\">reported<\/a> up to 40% latency reduction. Cline <a href=\"https:\/\/x.com\/cline\/status\/2026031848791630033\" rel=\"nofollow\">observed<\/a> a 39% improvement in multi-file workflows, while Cursor <a href=\"https:\/\/x.com\/leerob\/status\/2026030244407468259\" rel=\"nofollow\">reported<\/a> gains of up to 30%. These results highlight how system-level optimizations outside the model itself are increasingly shaping real-world AI performance.<\/p>\n<p><img decoding=\"async\" alt=\"\" class=\"zoom-image\" src=\"https:\/\/www.infoq.com\/news\/2026\/05\/openai-websocket-responses-api\/news\/2026\/05\/openai-websocket-responses-api\/en\/resources\/1openAIwebsokcet-1777845419017.jpeg\" height=\"400\" rel=\"share\"\/><\/p>\n<p>Agent Workflow Evolution with Persistent Sessions (Source: <a href=\"https:\/\/openai.com\/index\/speeding-up-agentic-workflows-with-websockets\/\" rel=\"nofollow noopener\" target=\"_blank\">OpenAI Blog Post<\/a>)<\/p>\n<p>From an implementation perspective, developers integrate the WebSocket mode by replacing multiple HTTP calls with a single persistent session. This reduces repeated connection setup and simplifies orchestration logic across multi-step workflows. It also improves support for streaming use cases such as incremental code generation and interactive reasoning, where partial outputs can be consumed as they are produced.<\/p>\n<p>Kevin Cho, an engineer at Microsoft, <a href=\"https:\/\/x.com\/itayad\/status\/2050171942578078020?s=20\" rel=\"nofollow\">noted<\/a> that the approach reflects<\/p>\n<p>&#13;<\/p>\n<p>Going back to the original software stack problems. websockets and stateful connections.<\/p>\n<p>&#13;<\/p>\n<p>The shift introduces new system design considerations, including connection lifecycle management, backpressure under high concurrency, and reliability in distributed systems, aligning with established stateful system patterns.<\/p>\n<p>OpenAI released the feature in alpha after a two-month cycle to selected partners, including Codex. Codex has since migrated most Responses API traffic to WebSocket mode, indicating production readiness.<\/p>\n","protected":false},"excerpt":{"rendered":"OpenAI has introduced a WebSocket-based execution mode for its responses API to improve the performance of agentic workflows&hellip;\n","protected":false},"author":2,"featured_media":31111,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[7],"tags":[4077,24,7398,640,8540,504,25,633,7401,1642,20075,634,157,20071,20073,12633,20072,20074,642,8765],"class_list":{"0":"post-31110","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-openai","8":"tag-agents","9":"tag-ai","10":"tag-ai-architecture","11":"tag-ai-assisted-coding","12":"tag-api","13":"tag-architecture-design","14":"tag-artificial-intelligence","15":"tag-development","16":"tag-distributed-systems","17":"tag-large-language-models","18":"tag-low-latency","19":"tag-ml-data-engineering","20":"tag-openai","21":"tag-openai-websocket-responses-api","22":"tag-optimization","23":"tag-orchestration","24":"tag-realtime-api","25":"tag-sdk","26":"tag-websocket","27":"tag-workflow-foundation"},"_links":{"self":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/posts\/31110","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/comments?post=31110"}],"version-history":[{"count":0,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/posts\/31110\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/media\/31111"}],"wp:attachment":[{"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/media?parent=31110"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/categories?post=31110"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.europesays.com\/ai\/wp-json\/wp\/v2\/tags?post=31110"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}