TLDR AI 2026-05-07
Claude self-improving agents π€, Anthropic SpaceX deal π, ProgramBench launch π»
Higher usage limits for Claude and a compute deal with SpaceX (3 minute read)
Anthropic increased usage limits for Claude through a new compute partnership with SpaceX, accessing over 220,000 NVIDIA GPUs. This expansion follows deals with Amazon, Google, Broadcom, Microsoft, NVIDIA, and Fluidstack for significant compute capacity. The company also plans international expansion to address compliance needs for enterprise customers in regulated industries.
Claude adds Self-Improving Agents (5 minute read)
Claude Managed Agents launched features like dreaming, outcomes, and multiagent orchestration. Dreaming enhances agent improvement by analyzing past sessions to identify patterns, while outcomes allow agents to self-correct based on predefined success criteria. Multiagent orchestration optimizes complex task management by enabling agents to delegate tasks to specialized subagents, as utilized by companies like Harvey, Netflix, Spiral by Every, and Wisedocs.
China to Invest in DeepSeek at $50 Billion Valuation (4 minute read)
DeepSeek is in talks to raise money from China's National Artificial Intelligence Industry Investment Fund, a one-year-old government-backed fund with around $8.8 billion in capital. The startup aims to raise a few billion dollars in the new round, which values it at around $50 billion. DeepSeek is a key component in China's plan to have top-class homegrown companies in a range of AI fields. The strategy is a way to hedge against US export controls and to take leadership in bringing AI to the world.
π§
Deep Dives & Analysis
OpenAI Flips the Script (10 minute read)
OpenAI's Codex now surpasses Anthropic's Claude Code after Codex's integration of GPT-5.5 and improved app performance. Austin Tedesco highlights Codex's use in creating strategy documents from diverse sources, while Dan Shipper uses it for recruiting based on career trajectories. Marcus Moretti adopts a cautious approach to new AI tech, focusing only on tools solving real problems and proven by reputable use.
How AI agent memory works (28 minute read)
Language language models forget everything the moment they finish replying. Memory systems help them 'remember' things so they can have conversations. Agent memory systems are a part of the loop that carries information forward. This article looks at different ideas on what information should be passed on in each loop.
π¨βπ»
Engineering & Research
Four levers to specialize your AI agents (Sponsor)
General-purpose AI agents fail in specialized domains β subtly wrong in edge cases. Domain specialization fixes this.
Build AI agents with four levers: system prompt, knowledge corpus, tool selection, guardrails. Demonstrated across customer engagement, logistics, and voice on AWS.
Workshop + guide. TokenSpeed: A Speed-of-Light LLM Inference Engine for Agentic Workloads (5 minute read)
TokenSpeed, a high-performance LLM inference engine, optimizes agentic workloads with speed-of-light efficiency, leveraging a compiler-backed modeling mechanism and a high-performance scheduler. It delivers faster throughput than TensorRT-LLM for coding agents, with optimizations like TokenSpeed MLA to enhance Nvidia Blackwell's performance. Developed with NVIDIA DevTech and other collaborators, TokenSpeed significantly reduces latency and increases throughput in typical agentic workloads.
ProgramBench (5 minute read)
ProgramBench challenges agents to recreate software executables without source code, using only documentation and experimentation. The tasks range from terminal utilities to complex software like compilers and libraries, offering over 248,000 behavioral tests across 200 tasks. Agents must design and implement entirely from scratch in a secure, sandboxed environment, emphasizing software architecture skills without external aids or decompilation.
NVIDIA Spectrum-X β the Open, AI-Native Ethernet Fabric β Sets the Standard for Gigascale AI, Now With MRC (3 minute read)
Multipath Reliable Connection (MRC) is an RDMA transport protocol that enables a single RDMA connection to distribute traffic across multiple network paths. This improves throughput, load balancing, and availability for large-scale AI training fabrics. MRC delivers high levels of GPU utilization by load-balancing traffic across all available paths. It gives administrators fine-grained visibility and control over traffic paths to simplify operations and accelerate troubleshooting at scale.
vLLM V0 to V1: Correctness Before Corrections in RL (8 minute read)
The vLLM V1 update improved inference correctness by addressing discrepancies in logprob computation, runtime defaults, inflight weight updates, and final projection precision. Key fixes included adjusting processed logprobs, disabling prefix caching, matching weight update models, and ensuring fp32 lm_head computation to align with vLLM V0's behavior. These changes resolved initial training mismatches, ensuring the new engine maintains expected RL performance without unnecessary objective-side corrections.
Google is not building a consultancy. It is writing a licensing agreement. That may be the smarter play (9 minute read)
Google is betting that enterprise AI is a platform problem, not a services problem. It is in talks with Blackstone, KKR, and EQT to give their portfolio companies access to Gemini models through omnibus licensing agreements. The discussions are not exclusive, and no deals have been finalized. Google is offering private equity firms a commercial wrapper that gives their entire portfolio access to Gemini, then relying on the consulting ecosystem it has already financed to handle implementation. The approach trades consulting revenue for distribution speed.
AI inference just plays by different rules (9 minute read)
AI inference demands extreme data performance, overwhelming traditional storage and data infrastructures. Vector DBs, sub-millisecond access times, and decoupled cloud storage are essential to handle unprecedented concurrency and unpredictable workloads. Silk offers a solution that boosts storage performance without heavy provisioning, keeping systems resilient against AI-driven demand spikes.
World Models Can Change Everything (20 minute read)
World models aim to advance AI from mere pattern recognition to understanding and interacting with the physical world, posing potential challenges like data friction and variation. Investments from AI pioneers like Yann LeCun are addressing these obstacles with significant billions to develop models that encapsulate complex physical interactions beyond current LLM capabilities. The struggle remains in obtaining diverse, high-quality, real-world data necessary for these models to function effectively, creating a significant challenge and opportunity in AI progression.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email