AI Evals from Google DeepMind & More Upcoming Lightning Lessons

Learn from AI Evals researchers and practitioners

Feb 04, 2026

AI evaluations is a fast-evolving space, and many practices are still taking shape.

In our AI Evals & Analytics Playbook course, we regularly invite researchers and practitioners to share recent developments and ideas that can inspire more reliable and effective AI evaluation in practice.

We’re excited to share two upcoming Lightning Lessons that dive into how AI evals actually hold up in production.

⚡ Evals for Voice AI: Learnings from Google Evals Team
Feb 10 · 4pm PT
Ravin Kumar, Senior Researcher@Google DeepMind

Voice AI introduces evaluation challenges that text-only systems don’t face. In this Lightning Lesson, Ravin uses Google’s NotebookLM as a case study to share practical insights on designing evals that capture real-world conversational quality, reasoning depth, and responsiveness in multimodal and voice-based contexts.
👉 Sign up here.

⚡ Calibrate LLM-as-a-Judge for Real-world Impact
Feb 6 · 11am PT
Eddie Landesberg, Research Scientist@CIMO Labs

LLM-as-a-judge is fast and inexpensive, but often noisy and biased. Human labels are reliable, but costly. This session focuses on calibration as a design choice: how combining a small amount of human judgment with automated judges can meaningfully reduce decision risk.
👉 Sign up to watch the recording here.

Hope to see you in both sessions.

Data Science x AI

Discussion about this post

Ready for more?