Netflix’s ML Metadata Graph 🧬, Inside DuckDB’s Speed 🦆, Searchable S3 Storage 🔎

Can Agents Replace the Search Stack? (6 minute read)

A lightweight LLM agent, given basic retrieval tools (BM25 and/or embeddings), can outperform complex search backends and reranking pipelines, simplifying the search architecture. In experiments on Amazon ESCI data, agentic setups delivered big gains (NDCG from ~0.29 baseline to 0.41-0.45), with agents intelligently rewriting queries, exploring, and evaluating results.

TLDR Data 2026-05-07

Netflix’s ML Metadata Graph 🧬, Inside DuckDB’s Speed 🦆, Searchable S3 Storage 🔎

Deep Dives

Democratizing Machine Learning at Netflix: Building the Model Lifecycle Graph (14 minute read)

DuckDB Internals: Why is DuckDB Fast? (17 minute read)

Building Self-Healing Data Pipelines at Halodoc (9 minute read)

From SSH to REST: A Security-Driven Modernization of Slack's EMR Data Pipelines (15 minute read)

Opinions & Advice

Can Agents Replace the Search Stack? (6 minute read)

Beyond the hype: The enterprise AI architecture we actually need (7 minute read)

We're Missing Data: The Other Half of AI Transformation (6 minute read)

Launches & Tools

An open lakehouse — any engine, your cloud or ours (Sponsor)

How We Accelerated Transpilation by Compiling SQLGlot with mypyc (8 minute read)

Integrating AI Into Apache Kafka Architectures: Patterns and Best Practices (11 minute read)

S3 is the perfect place to store data, until you try to search it (11 minute read)

Miscellaneous

Redis Array Type: Short Story of a Long Development (3 minute read)

Implementing Statistical Guardrails for Non-Deterministic Agents (5 minute read)

Quick Links

SAP to acquire data lakehouse vendor Dremio (7 minute read)

Validate Smarter at the Row Level: A Four-Layer Approach (6 minute read)

Curated deep dives, tools and trends in big data, data science and data engineering 📊