Anthropic released Claude Sonnet 4.6, offering Opus 4.6-level performance in coding and agentic workflows at a more accessible Sonnet price and speed. It excels in agentic computer use and coding, as shown by top-tier performance on SWE-bench Verified (79.6%), BrowseComp (74.7%), and OSWorld-Verified (72.5%). It’s SOTA at professional AI office tasks, obtaining 1633 on the GDPval-AA Elo benchmark of real-world computer tasks. Sonnet 4.6 also features a 1-million-token context window with improved reasoning across the longer context.
Within Claude Code and developer environments, Claude Sonnet 4.6 now supports adaptive thinking, extended thinking, and context compaction, as well as improvements to tool use. In addition, Claude is now integrated into PowerPoint for Pro users, extending Claude further into desktop applications. Bottom-line, this positions Claude Sonnet 4.6 as a powerful top-tier AI model for use in Claude Code, Claude Cowork, and other agentic AI applications.
Claude Sonnet 4.6 is available on all Claude plans, including free tier, while API costs $3 / $15 per million input/output tokens.
Google debuted Gemini 3.1 Pro, a more powerful version of its flagship model, with improved abstract reasoning, coding, agentic and multi-modal capabilities. Designed for complex reasoning tasks, Gemini 3.1 Pro shows dramatic gains on reasoning benchmarks: 77.1% on ARC AGI 2 (intelligence), 44.4% on Humanity’s Last Exam without tools, and 94.3% on GPQA Diamond (science knowledge). In coding tasks, it gets 80.6% on SWE-bench Verified and a SOTA 2887 on LiveBenchPro. Overall, it matches Opus 4.6 on coding.
Gemini 3.1 Pro has improved adjustable reasoning with three levels of reasoning modes. Gemini 3 “Deep Think” model released recently was already using 3.1 Pro in “high” reasoning mode as its underlying engine.
Figure 2. Gemini 3.1 Pro excels in raw intelligence and reasoning, long-context reasoning, and multi-modal understanding and visual generation. It’s also competitive with Opus 4.6 on coding but falls short on GDPval-AA Elo (where Claude Sonnet 4.6 excels).
Gemini 3.1 Pro also excels at visual generation, including generating complex, animated SVGs, and users are reporting great results for Gemini 3.1 Pro in generating websites and visual applications in its Gemini canvas. Combining visual intelligence and reasoning makes it useful in making 3D graphics, physics simulations, and 3D CAD drawing. Gemini 3.1 Pro is rolling out in preview across Google’s ecosystem: AI Studio, Gemini app, Vertex AI APIs, and NotebookLM.
Google added Lyria 3 generative music to Gemini. Google debuted the Lyria 3 music generation model within the Gemini app, which allows users to generate 30-second high-fidelity music tracks through text prompts. It functions as Google’s answer to Suno, allowing for remixes and genre-specific music creation. This feature is available to all users 18+ in the Gemini app.
Alibaba’s Qwen team released Qwen 3.5, an open-weight, sparse mixture-of-experts (MoE) AI model designed for multimodal understanding and generation. The first Qwen 3.5 model is Qwen3.5-397B-A17B, an MoE architecture with 397B total parameters and just 17B active parameters. It offers native text and image capabilities and can extend its context to 1 million tokens. It is competitive with frontier AI models (such as GPT-5.2) on coding and agentic benchmarks, such as 76.4% on SWE-Bench and 78.6% on BrowseComp.
Qwen3.5 is built for efficiency, combining sparse MoE, Gated DeltaNet and Gated Attention hybrid attention, and multi-token prediction, enabling it to decode 8 times faster than Qwen 3 Max while achieving higher performance. Qwen 3.5’s efficiency makes it a price-performance king, providing frontier AI model performance at only $0.40/$2.40 per million input-output token (on OpenRouter). The Qwen3.5 open-weight model is available on HuggingFace, Qwen’s chat interface, or via API.
xAI launched Grok 4.20 with minimal fanfare on X. Grok 4.20 utilizes a unique “Council of Four” architecture where four specialized agents (Harper, Benjamin, Lucas, and a coordinator) debate and reach a consensus before providing an answer. Each agent focuses on a specific area like logic, creativity, or fact-checking to provide a more refined final answer. Reviews are that it’s a downgrade and there are few details beyond this.
Google Labs announced a new AI marketing feature called Photoshoot in Pomelli, that turns simple product photos into professional studio-quality images. Users provide a product image and brand context and choose image templates, and then the tool automatically applies these to create polished on-brand visuals. Pomelli also adds improved image generation, editing capabilities, and campaign controls. These AI marketing tools can help businesses create consistent, high-quality marketing materials. You can try Pomelli in Google Labs.
Anthropic has launched a security review feature for Claude that can scan software codebases for vulnerabilities and suggest patches. Anthropic believes embedded vulnerability scanning will become essential as software development is increasingly automated with AI coding.
YouTube is bringing its mobile conversational AI to Smart TVs, allowing viewers to ask questions about the videos they are watching. The tool uses voice-driven interactions to provide summaries and content recommendations without interrupting the playback experience.
Reddit is testing an AI-powered shopping search tool designed to synthesize community reviews and product recommendations into concise product summaries for users. Reddit’s seeks to monetize its data by providing a more direct path from community discovery to product purchase.
Zyphra released a ZUNA brain-computer interface (BCI) model called ZUNA, an open-source 380M parameter model designed to translate thought into text in real-time. ZUNA reconstructs clinical-grade brain activity from EEG signals, and it can enhance the signal from noisy inputs, including inexpensive EEG headsets, to obtain professional-level signals without retraining. This is a small but exciting step toward integrating human thoughts, via neural input, as an interface to AI systems.
The authors of “SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks” developed a new benchmark called SkillsBench to evaluate how well agent Skills perform across various task domains. Their research shows that even advanced agentic AI systems perform significantly better when guided by explicit human procedural instructions, as curated Skills outperform an AI model’s self-generated Skills. It also shows that Skills extend AI performance in a context-dependent way and focused Skills are most useful.
First Proof is a set of ten math questions used to evaluate AI capabilities in autonomous math research. OpenAI published their First Proof submissions in Mathematical Research, demonstrating the ability of GPT-5 level models to derive new results in theoretical physics and advanced mathematics. They claim:
We believe at least five of the model’s proof attempts (problems 4, 5, 6, 9, and 10) have a high chance of being correct.
As with the AI models achieving Gold medal-level performance at the 2025 International Math Olympiad, this is a sign AI can solve challenges in fundamental scientific discovery.
OpenAI hired OpenClaw founder Peter Steinberger while continuing OpenClaw support, announcing that the viral open-source OpenClaw AI platform that Peter Steinberger vibe-coded will remain an open source AI agent framework. OpenAI’s Sam Altman said that Steinberger will “drive the next generation of personal agents” and OpenClaw will “live in a foundation as an open source project that OpenAI will continue to support.” OpenAI also confirmed that ChatGPT Plus/Pro subscriptions are compatible with OpenClaw deployment.
Georgi Gerganov’s GGML team has officially joined Hugging Face to secure long-term resources for the development of local AI infrastructure. While the team behind llama.cpp and ggml moves to Hugging Face, their core projects like llama.cpp will remain open-source and autonomous, ensuring that the community continues to have access to efficient, hardware-agnostic AI runners.
Major Hollywood studios and SAG-AFTRA issued strong condemnations against ByteDance’s Seedance 2.0 for its unauthorized use of actor likenesses and studio IP. In response to the backlash, ByteDance announced it would implement stricter safeguards to prevent the infringement of intellectual property.
World leaders and tech executives convened in New Delhi for the India AI Impact Summit 2026 this week, with discussions on human-centric AI governance, economic opportunities from AI, and digital development worldwide. Indian Government used this summit to tout India’s potential as an AI hub. Eighty-six countries and two international bodies signed a joint declaration on AI at the New Delhi summit that recognizes their collective responsibility for ethical AI deployment.
Figure 4. India’s PM Modi led tech leaders in a come-together moment at the AI Impact Summit, but OpenAI’s Sam Altman and Anthropic’s Dario Amodei didn’t hold hands.
Speaking in India, Mistral AI CEO Arthur Mensch predicted that 50% of current enterprise SaaS software could be replaced by AI in the coming years. Mensch argued that the ability of AI agents to build custom applications in days will fundamentally disrupt the $800 billion traditional software market.
Nvidia released new data showing that AI is accelerating the telecommunications industry’s transformation by becoming the foundation of network automation. The report suggests that AI-native wireless infrastructure is now a primary driver for ROI, unlocking new revenue streams through automated network management.
OpenAI CEO Sam Altman said something both true and arrogant in an “AI are better than people” way. Yes, it takes a lot of energy to raise a child. This fact won’t placate people worried that AI data centers will raise their electric bills.
“People talk about how much energy it takes to train an AI model … But it also takes a lot of energy to train a human. It takes like 20 years of life and all of the food you eat during that time before you get smart.” – Sam Altman