We live in an era of charts that are going up and to the right. This image obviously describes the stock market, particularly any company whose business is adjacent to artificial intelligence. But beyond stocks, another sort of chart we keep seeing is of AI capabilities also going up and to the right. The most famous and viral of these comes from an organization called METR, which stands for Model Evaluation and Threat Research. The organization is focused on understanding the degree to which AI models can engage in autonomous, complex tasks. METR see this is as a particularly important benchmark, given the risk that AI could one day be engaged in recursive self improvement, taking humans out of the loop. But how do you really gauge a model's ability to do complex problems. And what is being measured for exactly? On this episode, we speak with METR's President Chris Painter as well as Joel Becker, a member of the technical staff who works on evaluation methods for the organization. We discuss both the mechanics and the philosophy of METR's work, and what it means when we see a a chart showing that Clause Opus 4.6 can do a task that would take a human nearly 12 hours.
Read more:
DeepSeek Unveils Flagship AI Model a Year After Breakthrough
Meta Inks Deal to Use Amazon’s Graviton Processors for AI
Only http://Bloomberg.com subscribers can get the Odd Lots newsletter in their inbox each week, plus unlimited access to the site and app. Subscribe at bloomberg.com/subscriptions/oddlots
Subscribe to the Odd Lots Newsletter
Join the conversation: discord.gg/oddlots

James Bosworth on the "Orange Wave" Happening Across Latin America
50:00

Google's Liz Reid on Who Will Own Search in a World of AI
51:06

Daniel Yergin Sees a 'Different World' Emerging After the Hormuz Crisis
45:35