903: LLM Benchmarks Are Lying to You (And What to Do Instead), with Sinan Ozdemir

903: LLM Benchmarks Are Lying to You (And Wha...

Up next

997: How This AI Startup Hit 20M Users (No Moat)

Dr. Andrey Kurenkov returns to the show to talk about Astrocade's astronomical growth from pre-alpha to over 20 million engaged users, what it actually takes to build a vibe-coding platform that scales, and how the broader AI landscape has shifted since his last appearance. Andre ...  Show more

996: TrueFoundry’s Nikunj Bajaj on How to Get $100M Returns on AI Agent Deployments

TrueFoundry co-founder and CEO Nikunj Bajaj speaks to Jon Krohn about how enterprises like Nvidia and Siemens are realizing returns of over $100 million from single agent deployments, the AI gateway architecture that makes it possible to connect, observe, and govern agents at sca ...  Show more

Recommended Episodes

Metrics Driven Development
Practical AI

How do you systematically measure, optimize, and improve the performance of LLM applications (like those powered by RAG or tool use)? Ragas is an open source effort that has been trying to answer this question comprehensively, and they are promoting a “Metrics Driven Development” ...  Show more

Only as good as the data
Practical AI

You might have heard that “AI is only as good as the data.” What does that mean and what data are we talking about? Chris and Daniel dig into that topic in the episode exploring the categories of data that you might encounter working in AI (for training, testing, fine-tuning, ben ...  Show more

The Future of AI: Predictions and Realities
AI Chat: AI News & Artificial Intelligence

In this episode, Jaeden Schafer discusses the current challenges and developments in the AI industry, particularly focusing on the limitations faced by major players like OpenAI and Anthropic. The conversation explores the anticipated improvements in AI models, the predictions fo ...  Show more

Measuring The Speed of AI Through Benchmarks
The Brave Technologist

David Kanter, Executive Director at MLCommons, discusses the work they're doing with MLPerf Benchmarks, creating the world's first industry standard approach to measuring AI speed and safety. He also shares ways they're testing AI and LLMs for harm, to measure—and, o ...

  Show more