One of the most amazing things about ChatGPT and other, similar AI models: Nobody really understands what they can do. Not even the people who build them. On today’s show, we talk with Sam Bowman about some of the mysteries at the heart of so-called large language models. Sam is on the faculty at NYU, he runs a research group at the AI company Anthropic and he is the author of the illuminating paper Eight Things to Know About Large Language Models.
Pushkin. Today's show is about no knowns and unknown unknowns, which is to say, we're talking about AI, specifically a type of AI called a large language model, or an LM. The most famous LLM is CHAT GPT, but there are lots of others, and at their core, they all do the same thing. They read a piece of text and they predict what the next series of words should be. Lms are, obviously and quite suddenly, a huge deal in a lot of ways. One thing about them that is particularly wild to me. Lms behave in ways that are surprising even to the people who built them. In other words, large language models are this profoundly powerful, disruptive new thing, and right now we urgently need to figure out what they mean and how they work. I'm Jacob Goldstein and this is What's Your Problem, the show where I talk to people who are trying to make technological progress. My guest today is Sam Bowman. He's an expert in large language models in lms. He's on the faculty at NYU, and he runs a research group at an AI company called Anthropic. All the reason talk about lms inspired Sam to write a paper to clear up what he thought were some misconceptions. The paper is called eight Things to Know about Large Language Models. I am a fan of lists in general, and I loved this list in particular. Among other things, it gave me a deeper sense of the ways in which large language models are still a mystery, even to experts like Sam. That mystery, those unknowns, have important implications for the way we think about, and regulate and develop AI. We're going to start by discussing a pretty simple item on Sam's list. The item is this, brief interactions with llms are often misleading. You write this, You write, brief interactions with lms are often misleading. What's that mean?
So when, especially when GPD four came out, and I guess also went when chat GPT first came out, there was very predictably this wave of people on Twitter saying, hey, this system is sentient and it knows where I live and it's ready to take over the world tomorrow because they had one chat with it and it said that it was sentient and it made a few educated guesses that happened to be bright, and you'll get other people on Twitter saying, hey, this system is dumb as bricks. I told it a really simple story and ask it what happened in the story and it got it wrong. There's a couple of things going on here. There's this great analogy that came up in a recent I think Time article by hell in Time owner saying they're basically improv players, where if you put them in some situation, if you put them in this situation of, oh, this is a conversation between a human who thinks the AI is sentient and the AI, then maybe the AA is going to say it's sentient.
So specifically, they're improv players in the sense that famously an improv you're supposed to say yes to everything that your improv partner suggests, and so CHATCHYPT and the other llms are there to say yes, yes, and and that's what's going on.
That's a decent part of it. Yeah, they're going to say yes. They're going to go along with what you're doing if you make it clear what you expect, if you make it clear, like what kind of narrative you're putting them in, what kind of environment you're putting them in, they'll go along with that.
Uh, there are a couple of items on your list that seems directly contrary to assertions I've heard from other people about LMS, so that's fun and exciting. One is human performance on a task is not an upper bound on LM performance.
So one of the reasons I think these systems can be better at a lot of tasks than humans is just that they've learned more stuff that they've read and mostly memorized, not just sort of all of the important papers in one little branch of chemistry or all of the important papers in all of chemistry. They've just read and mostly memorized, sort of all of the research papers.
In everything, all of the papers in everything.
Yeah, and many of the novels and many of the many of the news stories. And even if these systems aren't really great at drawing connections between these and sort of synthesizing a new knowledge out of them, they can do that a little bit. So you can sort of imagine what happens if you get someone who's not especially bright, but basically reasonably intelligent, reasonably competent person who's just gotten a PhD in every single thing you can get a PhD in, I'd expect them to figure some things out and to be able to do some things that no one person can do, and probably they'll notice some things that that'll be really hard for even a team or an organization to do, just because it really it's important that it kind of is, in some sense living in this one person's head.
Let me just like lean into that one for a sec. So do you think that in some amount of time in the next few years, say, an LM will make some kind of you know, breakthrough in knowledge, will figure something out that no human has ever figured out that will be a meaningful breakthrough.
Yeah, I think so almost. By definition, I don't have a good guess of what that's going to look like or that's going to be.
Otherwise you'd be figuring it out right now, right, yeah, yeah, yeah, But no, I can imagine some story like, hey, kind of a bunch of chemists in this field of chemists have noticed this thing, and some biologists in this other subfield have noticed this other thing, and some doctors have noticed this third thing, and together they mean that some very unexpected kind of drug design might treat some new disease.
And maybe if you had enough medical researchers trying enough different things, eventually they'd stumble into that. But it seems possible at some point that something like a large language model is just going to notice that, and if you ask it the right way, it's going to tell you, and you might have to second guess it a lot. These systems also make stuff up, But I think it's quite possible that you start seeing these things pretty often tell you surprising new things that happen to be true.
There's another item on your list that seems to me to be like a provocation. It seems to me in a good way. It seems like directly contradictory to what I have read, specifically to this idea that all large language models are doing is guessing what the next word in a series is likely to be, and that list item is this. Llms often appear to learn and use representations of the outside world. Llms often appear to learn and use representations of the outside world. So that sounds quite different from just guessing the next word, is it or is it not? Different in a way that I just don't understand.
It turns out it's not that different. Okay, this is I want to say it's the big discovery, But it's this big discovery that's spread out over dozens of experiments over the last few years.
Can you give me a specific example. It's such an abstract assertion that I think it would be helpful to have a specific example.
That we can think about. One great example of this is if you tell a model a story, a simple story that takes place in some sort of physical space where it's it's some characters walking around a house and they're having a conversation while they're walking, and they're picking style up and they're putting it down. You can see inside the activations of the neurons when the model is reading that story. You can pull out a map of the house. You can see that there's a there's a piece the network that says, oh, okay, now they're in the living room, and another piece that says, oh, living room is connected to the bedroom. And you can mess with this in ways that show that it's really sort of it is really representing the house. That if you find the piece of the network that says, oh, Susan is in the living room, and you flip that, flip that from a positive number to a negative number, then the story will continue as though Susan is not in a lot in the living room, or couldn't possibly have been in living.
So that does seem like it's representing the physical world in a way that is not just guessing the next word.
Yeah. Yeah, so we're finding out these systems are actually representing the objects they're talking about, at least some of the time.
They're creating a representation of physical space.
Yeah. I should be clear that this is this doesn't always work when when you're giving these systems something really hard and subtle, they're just going to totally botch this stuff. Their internal representations are a mess. But more and more of the time they're really doing it. And as these things get bigger and bigger, they're doing it more and more. And so this feels like this important turning point where it's like, oh, okay, there is some understanding going on here and it's getting better, and that really radically opens up the possibilities for where this technology might go.
This what you're saying seems very much at odds with what people generally say about llms, Right, Like the standard line is they're just predicting what the next word is going to be. And they're very good at predicting what the next word is going to be, and there's a lot of powerful things you can do, but what you're saying sounds fundamentally different from that. And so I mean, are the people saying they're just predicting the next word? Are they wrong? Is what you're saying a point of debate among experts or what? Why is this so different than what I've heard before.
There's a few things going on. So first, saying that they're just predicting the next word is mostly right. But it turns out that's saying that they just predict the next word is a lot like saying humans are just chemical reactions. It turns out that if you're trying to predict the next word, and if you've got a smaller work that's trying to predict the next word, it's going to learn that sort of the word, the and of an a and those show up often, and that's about all it's going to learn. If you take a medium sized neural network, it's going to learn how to write fluent sentences. This is going to write, oh, okay, sort of adjectives come before nouns, these kinds of nouns come before these kinds of nouns. It might even learn some facts. It might learn that if you talk about the president of the United States, you'll get names like Obama and Bush and Biden and Trump, and it'll start to kind of make sense, but it's still just kind of learning statistics. And if you make the neural work even bigger, it will abstract further away. It will start to reason about the people and the objects and the spaces themselves and use that abstraction to predict the next word. So kind of the more these systems learn about the world, the farther and farther their Internet representations get from just sort of literally what word comes after what other word.
So there's another item on your list that seems like it should have interesting implications for the AI industry, right for the business of building lms, I'll just read that one. It goes lms predictably get more capable with increasing investment, even without targeted innovation. So we'll get into it. But just top line, what does that mean?
We had language models in almost their modern form back in twenty ten, eleven, twelve. Most of the building blocks for them go back even farther to the eighties or even the sixties. You might have noticed that we weren't We didn't have chat GBT ten or twenty or fifty years ago. What people have been gradually discovering and dually sort of discovering to a greater and greater degree is that if you just take this reldly simple technology and throw more data at it and run it in its sort of training phase for longer and longer by fancier or and France your computers to run it on, it just keeps getting better.
But if the technology is not special, I mean, everybody knows the basic sauce, it suggests that GPT might not have an open AI. The company that makes chat GPT might not have like that much of a moat, right. I mean, Google is clearly in this business, as is Anthropic, the company where you're working. Is there any reason to think open AI GPT is going to stay ahead.
I think there's not a lot of secret sauce. There are some details of how to build these things that don't get published, but the basic idea is very much out there. And yeah, I think the the closest thing you can really have to emote is just enormous amounts of money. I think at some point you're going to have a relatively small number of labs building the really impressive frontier systems just because at some point these are going to be ten billion dollar projects, and it just seems unlikely that you're going to get that many ten billion dollar projects.
If it's the case, as you say that, essentially what you need to build a frontier level LM is a lot of money. I would guess that governments around the world, certainly say China to pick a salient government, are probably building giant lms right now. Does that seem like a reasonable guess?
Yeah, that seems right. I know there are a lot of private and private, public and public groups in China working in this stuff, and when I sort of hear people in the field who are following the geopolitical side of this more closely, they're paying a lot of attention to things like the Chips Act and Global Trade in chips in that you really do need. When you're spending these millions or billions of dollars, you're basically spending them to buy or rent very fancy, state of the art computer chips. And it has become a priority for the US to try to make it hard for China to do that, and.
To try and make it hard for China to get at the processor level, which in a sense is like the cement that lllms are built from. There is a physical thing. We forget that, but it's fancy chips basically.
That's right.
Yeah, we've been talking so far about what we know about how large language models work. After the break, we'll get into what I think is the most interesting thing about lms, what we don't know about how they work. That's the end of the ads.
Now we're going back to the show.
So far, we've basically been talking about how do lllms work. What's going on? There is another bucket in your list, several items, three items that are it seems to me, in quite a different category, and they get at this very very interesting idea about lms, and that is, to some significant degree, nobody knows how they work. The people who build lms, people like you, people who build them and study them, don't understand a lot of what is going on, which is amazing to me and super interesting. So let's start with this list item. It says specific important behaviors in lms tend to emerge unpredictably as a byproduct.
Of increasing investment.
And you give a couple of examples of this happening for real in the world. I think the best way to understand what's going on here is to talk about one of those examples. Can you just like talk me through one of those examples of this unpredictable new behavior emerging. Yeah.
So a specific large language model that people working in the stuff talk about a lot is GPD three. This came out a little less than three years ago and I think sort of kicked off the modern wave of research on this stuff. And one thing researchers would do, as these systems would would come out is give them math puzzles and logic puzzles and see how they did. And this could be as simple as just sort of giving the model reasonably hard arithmetic, sort of asking the model, what is one hundred and twenty five plus four hundred and sixty seven. And what they found is sort of GPD one was bad at this, and GPD two was bad at this, and at least for some of these tasks, GPD three was also bad at this. And they released it. They put it out in the world, they wrote a paper about it, they did some demos to researchers, and then eventually just let anyone sign up and use it. And after a few months people started noticing. Oh, there are some tricks you can use to actually make it quite a bit better at this. If you ask the model the right way, sometimes it'll just kind of reason out loud. Sometimes it will say, well, it'll actually do long edition, we'll actually write out its steps.
So give me a specific example. How do you ask it the right way?
So it took even a few more months for people to figure out how to do this systematically, but it turned out the trick was you literally say, let's think step by step.
You actually type that in, you say that to the machine, to the model.
Yes, And if you say what is this number of plus this number question mark, it'll give a wrong answer. If you say, what is this number of plus this number, let's think step by step dot dot, it's going to list out. Okay, let's start with the ones digit, and then the tenth digit, and then the one hundredth digit, and then give you the answer, and it'll very often be right huh. And it turns out this works really generally that for many kinds of sort of math and reasoning problems, even some even sort of ethics problems. There's a huge range of things you might ask one of these ural networks to do where if you just tell it, let's think step by step, it will bring out this whole reasoning ability that is actually really useful, that allows it to do much better at a lot of things, and that it didn't have before. And when this technology was first released, the people who built it, they did not know this was a possibility.
That's wild, right, Like it means this thing is incredibly powerful in a way that the people who built it didn't know. And let's think step by step is just like this incantation. It's just like saying abracadabra or something, and the builders didn't know it was there.
Yeah, it's it's a bizarre time to be working on this stuff.
It Like, here's where it's getting a little sketchy to me at a certain level, right, I mean you've also done a lot of work in AI safety and this kind of section of the interview, I feel like we're getting more toward that, the section of like, the people building this stuff don't understand what it can do. And here should we add another list item here? Like this might be the place Cherkiff, So there's this other item on your eight things to know list that seems germane. Here experts are not yet able to interpret the inner workings of lms, which also wild also kind of goes with this idea of not knowing what the thing can do, right and very not intuitive for a piece of technology. Right If you go back to say the Internet, Sure we didn't know all the social implications of the Internet, but we knew how the technology worked. We knew what was going on with the chips and the wires and the electrons and whatever. Right, Like the amazing thing here is clearly we don't know the social implications of AI. But you're saying, we don't even know what it's doing inside the box.
Yeah, that's right. We've got these very crude tools for sort of opening the box and looking inside. I mean, in a literal sense, we know it's going on. We can say, oh, when you put in this word, then it makes this number bigger, which makes that number smaller, which makes this number bigger. And you could keep saying that for twenty years and then you'd have explained what happened. But we haven't figured out any other way of talking about these systems that actually gives us any clarity about what's possible why these systems are doing what they're doing where they're reliable and not it's just this huge mess of connections that we don't really know what to do with.
I mean, what should we make of this set of facts that these are incredibly powerful tools that nobody understands at a pretty deep level, that can do unpredictable things, that are able to do things that even their makers don't know they can do.
I think it's pretty exciting and also pretty sobering. I think we don't have a good way of predicting how fast this is moving or what we're going to get when. But in the big picture, it seems like there's a lot of momentum toward building these really powerful eye systems over the next few years. We don't understand how they work. Another one of these list items is we also aren't very good at controlling, and we aren't very good at making them do what we want.
Yes, let me just pause there, because it's the last list item and you have just walked up to it. So the last item, the item that we haven't mentioned on your list. There are no reliable techniques for steering the behavior of lms, so they're powerful. We don't really understand how they work. They can do things we don't know they're going to do, and we can't really control them. Now we're through the list. Now let's just talk it out.
Yeah, and so we're yeah, we're building, we're building these systems. They're getting better, the developing new capabilities. We don't really know how they work. We can't predict which capabilities are showing up when and if they're doing something we don't want. We don't really know how to notice that and mitigate it and prevent it. And that just feels like it's playing with fire at a scale what I'm not sure we've seen before, at least outside of things like nuclear weapons. It's this very sort of sobering situation to be in.
What do we do about it?
I'm not sure. I wish I had a better answer. There are a few things that will definitely help. Maybe one obvious thing here is just there's probably a lot of regulation that would be good to have here. You really don't want the move fast and break things ethos to be behind a technology that is close to human level ability at a lot of cognitive task That seems like the setup for a bad sci fi movie.
Specifically, what regulation do you think is a good idea?
One outline of an idea that I'm excited about, and I think this is definitely not the best idea or the only good idea is mandating or standardizing some tests for particularly scary capabilities for things that would be particularly important. And this includes things like an opening. Eyes actually started doing this and inthropics also doing something like this is testing sort of if you ask the system to walk you, a layperson, through building a new biologic weapon, through sort of seeding the start of a new pandemic in your garage, will it Will it help you or will it help you? Sort of much much better than just googling around or talking to your friend of the PhD.
And so then you have to think of all of the versions of that. You can think of whatever shutting down the electric grid, poisoning the water supply, building a nuclear bomb, right, I mean, are there people who are just making that list and making sure that chat GPT can't do it?
There are people who are making this list, and I'm not sure there are enough of them, and I'm not sure they are involved in testing every system that's being built. Yeah, but it's kind of yeah, running through this checklist of what are the capabilities these systems could have that would be just really disruptive, that we don't want to move fast and break things with that we want to see coming and we want these to sort of influence our decisions about what actually gets deployed when and where.
And that there's not some unpredictable abercadabra that nobody can see, but that three months later somebody will figure out.
Right, Yeah. This is the big gap of this is I think we can say, Okay, once your system is this dangerous, only deploy it if it's really under control. We don't even know how to define that. We don't even know how you would be sure that a steamer.
We'll be back in a minute with the lightning round. Now let's get back to the show. Okay, sign for the lightning round.
Are you ready? All right, let's go.
Let's go. What's your favorite fictional representation of AI.
Off the top of my head, X, Mac and I was pretty good. The premises around I think what people in AI tend to worry about actually look not that far off of it, except that right now we're dealing with bots instead of instead of seductive.
Robots, I liked the vibe of X mocking a lot. I like the aesthetic. I like how spare and empty it is. What's your favorite theory for how lms could destroy humanity?
Oh, there's so many options and it's so hard to know where this goes.
The what's one that's worth mentioning because it's surprising or because it's particularly worrisome, or for any reason.
One kind of thing I'm particularly worried about is this sort of slow moving train wreck by way of politics that you get sort of totalitarian states get better and better at surveillance, political persuasion gets better and better, and so democratic political campaigns go more and more off the rails. You wind up with more and more to Helderan states. They're more and more effective, and they themselves are leaning more and more on AI to do important work. And at that point, sort of something like an AIK doesn't seem that crazy.
And in particular, what is the large language model doing there in that story.
Persuading people one on one, surveiling people one on one, also making political decisions, sort of deciding how resource should be allocated and who should be empowered with any government, and eventually making military decisions and eventually making big economic decisions. I just sort of worry about this world where people put more and more trust in systems because they work, and that helps centralize things more and more into fewer and fewer institutions, and that makes those institutions really really delicate. And if an aisystem goes up the rails and start doing something that even their creators don't want, that gets pretty arbitrarily bad.
What's your favorite theory for how llms can help humanity?
I think the big ones are education and science. I think it would be pretty cool if you could hire a really world class like sort of Oxford Oxford tutorial quality tutor for just everyone with access to a computer of any kind, and that feels like.
On your phone. Close You could do it on your phone.
Yeah, yeah, And I don't think we've really figured out how to make that work, but I think if that really works, that could be really transformative for the better. On science, I think there's a lot of just really thorny problems around things like drug development, things like sort of fusion power and clean energy, where it could be that just having these systems that can kind of digest more information understand more at once could unlock a bunch of important stuff that would otherwise take us many more generations to get to.
On balance, you think the potential upside of AI outweighs the potential downside.
Probably, But I think that really depends on us being careful right now. I think this makes me optimistic in the long run, but I think there's a there's a real chance that things sort of go off the rails if this keeps being kind of a free for all commercial product for more than a few more years.
You went viral on Twitter a while ago when you wrote, quote, doing a PhD is in most cases a terrible idea you should put out have a PhD. Also, it's worth pointing out that PhDs have been saying this for I guess as long as there have been PhDs. So there's a lot of questions you could ask here. Well, there's two really, like why do people with PhDs keep saying don't get a PhD? And also why do people keep ignoring them? Why do people keep going to get PhDs?
This was in a moment of being being particularly horrified at some of the sort of common outcomes and PhD programs, and I think the average case is really bad the average case, literally, I think the median PhD gets an actual diagnosis of depression or anxiety and often doesn't get that much out of the program, like kind of really struggle in it, and because they're really struggling in it, don't accomplish that much and don't have great job prospects near the end. The best case, if you get a sort of top five percent PhD, it's really great. You get to play around with great resources and do whatever you want and explore new ideas for a few years and it opens up really tremendous opportunities. But yeah, I think it's the kind of thing that people should really really check their motivations and check their resilience before going into it and kind of brace themselves just because it is so often such a such a difficult experience.
Why do you think people keep going to get PhDs?
I mean, there is some real upside. There are some really cool jobs that you can only get if you have one. But I think there's also this piece, and this is maybe why I had my snippy tweet about this, that if you're a sort of smart, nerdy college student at a research university where you've got lots of opportunities to kind of work in research labs. Then you can get this really strong social signal that just like you're good at school, you should keep doing school, Like doing APHD is what it looks like, keep doing school. This is just the obvious way to use your talents and just kind of jump into that out of momentum, and that's that can be I think a pretty a riskier decision than it looks like.
If everything goes well, what problem will you be trying to solve in say, five years?
But I don't know. I got into this stuff sort of through the cognitive science, sort of through the idea that you don't really understand something until you can build it, and I really want to understand how minds work, why it is that sort of hooking neurons together in your head this way makes something that can think and that can experience and sort of mixed in with all of this very consequential real world stuff that's going on with AI, as we're building all these tools, we're also building really great tools for just doing cognitive science and sort of figuring out the answers to a lot of really old questions about how the human mind works and if all the practical problems are solved and everything's under control and going great, then I'd be happy to get back into that stuff.
So you would have to be less worried about the world than you are now.
I think that's right.
Well, I hope it goes. I hope you become less worried about the world. I guess I'm not super optimistic about that. I feel like I'm generally a reasonably optimistic person, but this one seems seems like there's a lot to worry about on this one.
Yeah, yeah, thanks, thanks for the well wishes. And yet it feels like it feels like sort of decent chance. Things go badly, decent chant things go very well. But I'm it seems pretty sure that stuff is just getting weird, that research five years from now is not going to look like research now, and probably saying with many, many, many other things we do.
Sam Bowman is an associate professor at NYU, and he runs a research group at the AI Company and THROP. Today's show was edited by Lydia Jean Kott. It was produced by David Jah and Edith Russelo and engineered by Amanda k Wong. I'm Jacob Goldstein and We'll be back next week with another episode of What's Your Problem.