903: LLM Benchmarks Are Lying to You (And What to Do Instead), with Sinan Ozdemir

903: LLM Benchmarks Are Lying to You (And Wha...

Up next

971: 90% of The World’s Data is Private; Lin Qiao’s Fireworks AI is Unlocking It

Lin Qiao, CEO of Fireworks AI, talks to Jon Krohn about how she builds effective models quickly, why coding agents can perform at the level of a junior engineer, and what she attributes to the success of Fireworks AI: True to its name, the company exploded into the AI industry wi ...  Show more

970: The “100x Engineer”: How to Be One, But Should You?

Working with code-gen models and Claude Code: In this Five-Minute Friday, Jon Krohn addresses how AI superstars like Andrej Karpathy are using AI agents in their coding work, the outlook for code-gen in 2026, and how you can get started. Hear about Karpathy’s work as well as the ...  Show more

Recommended Episodes

Metrics Driven Development
Practical AI

How do you systematically measure, optimize, and improve the performance of LLM applications (like those powered by RAG or tool use)? Ragas is an open source effort that has been trying to answer this question comprehensively, and they are promoting a “Metrics Driven Development” ...  Show more

Only as good as the data
Practical AI

You might have heard that “AI is only as good as the data.” What does that mean and what data are we talking about? Chris and Daniel dig into that topic in the episode exploring the categories of data that you might encounter working in AI (for training, testing, fine-tuning, ben ...  Show more

The Future of AI: Predictions and Realities
AI Chat: ChatGPT, AI News, Artificial Intelligence, OpenAI, Machine Learning

In this episode, Jaeden Schafer discusses the current challenges and developments in the AI industry, particularly focusing on the limitations faced by major players like OpenAI and Anthropic. The conversation explores the anticipated improvements in AI models, the predictions fo ...  Show more

Measuring The Speed of AI Through Benchmarks
The Brave Technologist

David Kanter, Executive Director at MLCommons, discusses the work they're doing with MLPerf Benchmarks, creating the world's first industry standard approach to measuring AI speed and safety. He also shares ways they're testing AI and LLMs for harm, to measure—and, o ...

  Show more