Benchmark Bank Heist

Up next

Unfaithful Chain of Thought

What's actually happening when an LLM "thinks out loud"? Research on human decision-making suggests that much of the reasoning we believe drives our choices is actually post hoc rationalization — we decide first, explain later. Katie and Ben get curious about whether the same mig ... Show more

. 24min 32sec

Benchmarking AI Models

How do you know if a new AI model is actually better than the last one? It turns out answering that question is a lot messier than it sounds. This week we dig into the world of LLM benchmarks — the standardized tests used to compare models — exploring two canonical examples: MMLU ... Show more

. 29min 55sec

Recommended Episodes

AI Today Podcast: Overview of Synthetic Data

AI Today Podcast

Machine learning algorithms need examples of data from which they can learn, especially supervised machine learning algorithms. However, one big challenge for those looking to put machine learning into practice is the lack of a sufficient quantity of good quality data examples fr ... Show more

. 47min 14sec

MLG 004 Algorithms - Intuition

Machine Learning Guide

<div>

Machine learning consists of three steps: prediction, error evaluation, and learning, implemented by training algorithms on large datasets to build models that can make decisions or classifications. The primary categories of machine learning algorithms are supervised, un ...

. 23min 27sec

Rust and machine learning #4: practical tools (Ep. 110)

Data Science at Home

In this episode I make a non exhaustive list of machine learning tools and frameworks, written in Rust. Not all of them are mature enough for production environments. I believe that community effort can change this very quickly.

To make a comparison with the Python ecos ...

. 24min 18sec

MLG 001 Introduction

Machine Learning Guide

Show notes: ocdevel.com/mlg/1. MLG teaches the fundamentals of machine learning and artificial intelligence. It covers intuition, models, math, languages ...

. 8min 11sec