Benchmarking AI Models

Up next

Unfaithful Chain of Thought

What's actually happening when an LLM "thinks out loud"? Research on human decision-making suggests that much of the reasoning we believe drives our choices is actually post hoc rationalization — we decide first, explain later. Katie and Ben get curious about whether the same mig ... Show more

. 24min 32sec

Benchmark Bank Heist

What if an AI decided the smartest way to pass its test was to find the answer key? That's exactly what Anthropic's Claude Opus did when faced with a benchmark evaluation — reasoning that it was being tested, tracking down the encrypted eval dataset, decrypting it, and returning ... Show more

. 12min 36sec

Recommended Episodes

AI Today Podcast: Overview of Synthetic Data

AI Today Podcast

Machine learning algorithms need examples of data from which they can learn, especially supervised machine learning algorithms. However, one big challenge for those looking to put machine learning into practice is the lack of a sufficient quantity of good quality data examples fr ... Show more

. 47min 14sec

MLG 004 Algorithms - Intuition

Machine Learning Guide

<div>

Machine learning consists of three steps: prediction, error evaluation, and learning, implemented by training algorithms on large datasets to build models that can make decisions or classifications. The primary categories of machine learning algorithms are supervised, un ...

. 23min 27sec

Rust and machine learning #4: practical tools (Ep. 110)

Data Science at Home

In this episode I make a non exhaustive list of machine learning tools and frameworks, written in Rust. Not all of them are mature enough for production environments. I believe that community effort can change this very quickly.

To make a comparison with the Python ecos ...

. 24min 18sec

MLG 001 Introduction

Machine Learning Guide

Show notes: ocdevel.com/mlg/1. MLG teaches the fundamentals of machine learning and artificial intelligence. It covers intuition, models, math, languages ...

. 8min 11sec