903: LLM Benchmarks Are Lying to You (And What to Do Instead), with Sinan Ozdemir

903: LLM Benchmarks Are Lying to You (And Wha...

Up next

987: AI Infrastructure, Ray, and Why Nonlinear Careers Win, with Linda Haviv

Linda Haviv talks to Jon Krohn about staying current on AI matters, why open-source technology is narrowing the gap in its race with proprietary models, and how being a content creator in tech is key to career growth and longevity. She emphasizes that non-linear pathways to a car ...  Show more

986: Building Hardware is Hard but AI Agents Help, with Kishore Subramanian

CTO of Propel Software Kishore Subramanian talks to Jon Krohn about how product lifecycle management (PLM) software and quality management systems (QMS) help ensure compliance, record management, and quality assurance. Listen to the episode to hear Kishore Subramanian talk about ...  Show more

Recommended Episodes

Metrics Driven Development
Practical AI

How do you systematically measure, optimize, and improve the performance of LLM applications (like those powered by RAG or tool use)? Ragas is an open source effort that has been trying to answer this question comprehensively, and they are promoting a “Metrics Driven Development” ...  Show more

Only as good as the data
Practical AI

You might have heard that “AI is only as good as the data.” What does that mean and what data are we talking about? Chris and Daniel dig into that topic in the episode exploring the categories of data that you might encounter working in AI (for training, testing, fine-tuning, ben ...  Show more

The Future of AI: Predictions and Realities
AI Chat: AI News & Artificial Intelligence

In this episode, Jaeden Schafer discusses the current challenges and developments in the AI industry, particularly focusing on the limitations faced by major players like OpenAI and Anthropic. The conversation explores the anticipated improvements in AI models, the predictions fo ...  Show more

Measuring The Speed of AI Through Benchmarks
The Brave Technologist

David Kanter, Executive Director at MLCommons, discusses the work they're doing with MLPerf Benchmarks, creating the world's first industry standard approach to measuring AI speed and safety. He also shares ways they're testing AI and LLMs for harm, to measure—and, o ...

  Show more