TLDR AI 2026-05-06
GPT-5.5 Instant β‘, SubQ 12M context π§ , Gemini Flash upgrades π
GPT-5.5 Instant (8 minute read)
OpenAI released GPT5.5 Instant, updating its default ChatGPT model with improved factual accuracy, reduced hallucinations, and stronger personalization based on user context.
The context window has been shattered: Subquadratic debuts a 12-million-token window (8 minute read)
Subquadratic has launched a new AI model with a 12-million-token context window. It outperforms GPT-5.5 on retrieval benchmarks. Attention cost scales quadratically with context length, so doubling the input quadruples the work. Subquadratic claims to have solved the problem. It plans to offer a model with a 50-million-token context window soon.
Meta plans advanced 'agentic' AI assistant for users (2 minute read)
Meta is building a highly personalized AI assistant that will be able to carry out everyday tasks. The digital assistant will be powered by the company's new Muse Spark AI model. It can connect several hardware and software tools and learn from data with less human intervention than a chatbot. Meta is targeting a launch before the fourth quarter of this year.
π§
Deep Dives & Analysis
In search of wasted bits: how much information do LLM weights carry? (11 minute read)
A lot of LLM inference is transferring data from one place to another and then computing on it when it's there. The most frustrating bottleneck in the system is when compute units sit idle because the data bus feeding them isn't fast enough. The solution is to transform memory into compute. Quantization is a nice trick, but it doesn't actually trade memory for compute - it transfers half as much data to a place to do twice as much computation.
Computer use is 45x More Expensive Than Structured APIs (7 minute read)
Vision agents are the default for operating web apps that don't expose APIs. Most teams default to vision agents because the alternative, writing an MCP or REST surface, is too expensive to build. The cost of the vision approach is treated as a fixed price. Current vision agents require detailed prompts to succeed in tasks, and they are still prone to making mistakes. Better vision models reduce error rates, but they do not reduce the number of screenshots required to reach the relevant data, each of which is worth thousands of input tokens.
π¨βπ»
Engineering & Research
AI built for the >80% of the world that doesn't think in English (Sponsor)
Does your AI know how people convey tone, humor, and feelings in their mother tongue, or does it just translate from English? Welo Data's
native-language training data & human evaluation lets you build for your users, everywhere. Surface multilingual quality and safety issues before your users find them.
See howHow to Scale Your Model (14 minute read)
This book discusses the science of scaling language models. It covers how TPUs and GPUs work, how they communicate with each other, how LLMs run on real hardware, and how to parallelize models during training and inference so they run efficiently at massive scale. The book answers questions about how expensive training a model should be, how much memory is needed to serve models, and more.
Google Rethinks Hallucinations Through Uncertainty (25 minute read)
The paper reframed hallucinations as failures to express uncertainty rather than gaps in knowledge, proposing βfaithful uncertaintyβ as a mechanism for aligning model confidence with actual reliability.
Accelerating Gemma 4: faster inference with multi-token prediction drafters (4 minute read)
Gemma 4 models reduce latency bottlenecks and achieve improved responsiveness for developers by using Multi-Token Prediction drafters. These drafters deliver up to a 3x speedup without any degradation in output quality or reasoning logic due to a specialized speculative decoding architecture. Speculative decoding decouples token generation from verification. It utilizes idle compute to 'predict' several future tokens at once with the drafter in less time than it takes for the target model to process just one token. The target model then verifies all of these suggested tokens in parallel.
AI2 Released MolmoAct 2 (9 minute read)
MolmoAct 2 is an upgraded action reasoning model that improves real-world robot task performance and is paired with a large open bimanual manipulation dataset.
Gemini API File Search is now multimodal: build efficient, verifiable RAG (3 minute read)
Multimodal support, custom metadata filtering, and page-level citations are now available in the Gemini API File Search tool. The features can help developers bring structure to unstructured data for efficient, verifiable RAG. Users' RAG systems can now natively process and better organize text and visual data. The File Search tool handles the heavy infrastructure so users can focus on building products.
73% of enterprises say this is the #1 issue with scaling AI [Webinar] (Sponsor)
It's not the models, it's the data connectivity. To get an architecture blueprint made for prod-ready AI agents, join CData and Microsoft on May 13th.
Save your seatGoogle Launches $3.5M Future Vision Film Competition (1 minute read)
Google partnered with XPRIZE and Range Media to launch a global competition encouraging short films about optimistic, tech-driven futures, with AI tools supported in production.
Agents for financial services (12 minute read)
Anthropic has released 10 ready-to-run templates for the most time-consuming work in financial services, including building pitchbooks, screening KYC files, and closing the books at month-end.
Apple Explores Multi-Model AI in iOS 27 (3 minute read)
Apple reportedly planned a system allowing users to select third-party AI models within iOS 27, integrating them into features like Siri and writing tools.
Become a curator for TLDR AI (3-5 hrs/week)
TLDR is looking for an engineer/researcher at a major AI lab or startup to help write for 1M+ subscribers. Our curators have been invited to Google I/O and OpenAI DevDay, scouted for Tier 1 VCs, and get early access to unreleased TLDR products.
Learn more.
OpenAI releases a separate ChatGPT iOS app for enterprise users (2 minute read)
OpenAI has released a new iOS app created specifically for school and work organizations.
Get the most interesting AI stories and breakthroughs delivered in a free daily email.
Join 920,000 readers for
one daily email