903: LLM Benchmarks Are Lying to You (And What to Do Instead), with Sinan Ozdemir

903: LLM Benchmarks Are Lying to You (And Wha...

Up next

984: Building AI Agents Where 99.9% Accuracy Isn't Good Enough, with Raju Malhotra

Raju Malhotra, Chief Product and Technology Officer at Certinia, talks to Jon Krohn about the so-called SaaSpocalypse and how agentic AI is proving the doomsayers wrong. Listen to the episode to hear more about Certinia’s work with Salesforce and building with Agentforce 360, the ...  Show more

983: AI in the Classroom: How a Top Elementary School Is Doing It Right, with Principal Traci Walker Griffith

My guest today took a public school that was about to be shut down and turned it into the number one school in Boston, and AI is her latest secret weapon. In a long-overdue episode on AI for supporting children’s education, hear directly from Principal Traci Walker Griffith how h ...  Show more

Recommended Episodes

Metrics Driven Development
Practical AI

How do you systematically measure, optimize, and improve the performance of LLM applications (like those powered by RAG or tool use)? Ragas is an open source effort that has been trying to answer this question comprehensively, and they are promoting a “Metrics Driven Development” ...  Show more

Only as good as the data
Practical AI

You might have heard that “AI is only as good as the data.” What does that mean and what data are we talking about? Chris and Daniel dig into that topic in the episode exploring the categories of data that you might encounter working in AI (for training, testing, fine-tuning, ben ...  Show more

The Future of AI: Predictions and Realities
AI Chat: AI News & Artificial Intelligence

In this episode, Jaeden Schafer discusses the current challenges and developments in the AI industry, particularly focusing on the limitations faced by major players like OpenAI and Anthropic. The conversation explores the anticipated improvements in AI models, the predictions fo ...  Show more

Measuring The Speed of AI Through Benchmarks
The Brave Technologist

David Kanter, Executive Director at MLCommons, discusses the work they're doing with MLPerf Benchmarks, creating the world's first industry standard approach to measuring AI speed and safety. He also shares ways they're testing AI and LLMs for harm, to measure—and, o ...

  Show more