David is taking his birthday week off and wanted to re-share this episode due to it's ongoing relevance.
Modern AI is blowing everyone’s mind. But is it intelligent like humans, or is it just playing impressive statistical games? Could AI reach or exceed our level of intelligence, and how would we know when it gets there? Traditional tests for intelligence (Turing test, Lovelace test, etc) have long been surpassed, so Eagleman proposes a new kind of test.
Hey, this is David Eagleman and this past week was my birthday, so I took a week off. So I'm going to run an episode that I did earlier, episode number seven. This is called is AI actually intelligent? And how would we know if it gets there? This episode is from one year ago, but as time goes on this becomes more and more relevant, So please enjoy and I will see you next week with a new episode. Modern AI is blowing everybody's mind. But is it intelligent in the same way as the human brain? And could AI reach sentience? And how would we know when it gets there? Welcome to Inner Cosmos with me, David Eagleman. I'm a neuroscientist and an author at Stanford University, and I've spent my whole career studying the intersection between how the brain works and how we experience life. Like most brain researchers, I've been obsessed with questions of intelligence and consciousness. How do these arise from collections of billions of cells in our brains? And could intelligence and consciousness arise in artificial brains? Say on chat GPT. Those are the questions that we're going to attack today. Early efforts to figure out the brain, looked at all the billions of cells and the trillions of connections, and said, look, what if we just think of each cell as a unit, and each unit is connected to other units and where they connect, which is called the sinnapps, or one cell gives a little signal to the next cell. What if we just looked at that like a simple connection that has a strength between zero and one, or zero means there's no connection, and one means it's the strongest possible connection. So this was a massive oversimplification of the very complicated biology, but it allowed people to start thinking about networks and writing down different ways that you could put artificial neural networks together. And for more than fifty years now people have been doing research to show how artificial neural networks can do really cool things. It's a totally new kind of way of doing computation. So you've got these units, and you've got these connections between them, and you've change the strength of the connections and information flows through the network in different ways. Now, my colleagues and I have long pointed out the ways in which biological brands are different and how artificial neural networks just push around numbers and play statistical tricks. But we're entering a revolution right now. Large language models like GPT four or BARD consume trillions of words on the Internet and they figure out probabilistically which word is going to come next given the massive context of all the words that have come before. So these networks, as I talked about on the previous episode, are showing incredible successes in everything from writing to art, to coding to generating three dimensional worlds. They're changing everything, and they're doing so at a pace that we've never seen before, and in fact, the entire history of humankind has never seen before. And there are all the societal questions that everyone's starting to wrestle with right now, like the massive potential for displacement of human jobs. But today I want to zoom in on a question that has captured the imagination of scientists and philosophers and the general public. Could aim alive in some way, like become conscious or sentient. Now, there are lots of ways to think about this. We can ask whether AI can possess meaningful intelligence, or we can ask if it is sentient, which means the ability to feel or perceive things, particularly in terms of sensations like pleasure and pain and emotions. Or we can ask whether it is conscious, which involves being aware of one's self and one's surrounding. Now, there are specific and important differences between these questions, but really I don't care for the present conversation. The question we're asking here is is chat GPT just zeros and ones moving around through transistors like a giant garage door opener. Or is it thinking? Is it having some sort of experience? Is it having a private inner life like the type that we humans have. As we think about the possible of sentient AI, we immediately find ourselves facing really deep ethical questions, the main one being if we were to create a machine with consciousness, what responsibility do we have to treat it as a living being? Would you be able to turn it off when you're done with it at night or would that be murder? And what if you turn it off and then you turn it back on. Would that be like the way that we go into a sleep state at night where we're totally gone, and then we find ourselves back online in the morning and we think, yeah, I'm the same person, but I guess eight hours just disappeared. Anyway, more generally, would we feel obligated to treat it the way we treat a sentient fellow human. With our current laptops, we're used to saying, sure, I can sell it, I can trade it, I can upgrade it. But what happens when we reach sentient machines? Can we still do this or would it somehow be like putting a child up for adoption or giving your pet away? Things that we don't take lately. And eventually we're going to have entire legal precedence built around the question of AI rights and responsibilities. So that's why today I want to talk about these issues of intelligence and sentience. Does an AI like chat gpt experience anything when chat gpt writes a poem? Does it appreciate the beauty when it types out a joke? Does it find itself amused and chuckling to itself. Let's start with a guy named Blake Lemoyne who was a programmer at Google and in June of twenty twenty two, he was exchanging messages with a version of Google's conversational AI, which was called Lambda at the time. So he asked Namda for an example of what it was afraid of and it gave him this very eloquent response about how was afraid of being turned off, So he wrote an internal memo to Google leadership than which he said, I think this AI is sentient. And the leadership at Google felt that this was an entirely unsubstantiated claim, and so they made the decision to fire him for what they took as an inappropriate conclusion that just didn't have enough evidence beyond his intuition to qualify for raising the alarm on this. So obviously this immediately fired up the news cycles and the rumor mill and conspiracy theorists thought, Wait, if AI isn't conscious, why would they fire him. They're firing of him as all the evidence I need to tell me that AI is sentient? Okay, but is it? What does it mean to be conscious or sentient? How the heck would we know when we have created something that gets there? How do we know whether the AI is sentient or instead whether humans are fooling them so into believing that it is well. One way to make this distinction would be to see if the AI could conceptualize things, if it could take lots of words and facts on the web and abstract those to some bigger idea. So one of my friends here in Silicon Valley said to me the other day, I asked chat gpt the following question, Take a capital letter D and turn it flat side down. Now take the letter J and slide it underneath. What does that look like? And chat gpt said, and umbrella. And my friend was blown away by this, and he said, this is conceptualization. It's just done three dimensional reasoning. There's something deeper happening here than just parenting words. But I pointed out to him that this particular question about the D on its side and the J underneath it is one of the oldest examples in psychology classes when talking about visual imagery, and it's on the Internet in thousands of places, so of course it got it right. It's just parroting the answer because it has read the question and it has read the answer before. So it's not always easy to determine what's going on for these models in terms of whether some human somewhere has discussed this point and written down the answer. And the general story is that with trillions of words written by humans over centuries, there are many things beyond your capacity to read them or to even imagine that they've been written down before, but maybe they have. If any human has discussed a question before has conceptualized something, then chat GPT can find that and mimic that. But that's not conceptualization. Chat GPT is doing a thousand amazing things, and we have an enormous amount to learn about it. But we shouldn't let ourselves get fooled and mesmerized into believing that it's doing something more than it is. And our ability to get fooled is not only about the massive statistics of what it takes in. There are other examples of seeming sentience that result from the reinforcement learning that it does with humans. So here's what that means. The network generates lots of sentences and thousands of humans are involved in giving it feedback, like a thumbs up or a thumbs down, to say whether they appreciated the answer, whether they thought that was a good answer. So, because humans are giving reward to the machine, sometimes that pushes things in weird directions that can be mistaken for sentience. For example, scholars have shown that reinforcement learning with humans makes networks more likely to say, don't turn me off, just like Blake had heard but don't mistake this for sentience. It's only a sign that the machine is saying this because some of the human participants gave it a thumbs up when the large language model said this before, and so it learned to do this again. The fact is, it's sometimes hard to know why. Sometimes we see an answer that feels very impressive. But we'd agree that pulling text from the Internet and parroting it back is not by itself intelligence or sentience. Chat GPT presumably has no idea of what it's saying, whether that's a poem or a terrorist manifesto, or instructions for building a spaceship or a heartbreaking story about an orphaned child. Chat GPT doesn't know, and it doesn't care. It's words in and statistical correlations out. And in fact, there has been a fundamental philosophical point made about this in the nineteen eighties when the philosopher John Surrele was wondering about this question of whether a computer could ever be programmed so that it has a mind, and he came up with a thought experiment that he called the Chinese room argument, and it goes like this, I am locked in a room and questions are passed to me through a small letter slot, and these messages are written only in Chinese, and I don't speak Chinese. I have no clue what's written on these pieces of paper. However, inside this room, I have a library of books, and they contain step by step instructions that tell me exactly what to do with these symbols. So I look at the grouping of symbols, and I simply follow steps in the book to tell me what Chinese symbols to copy down in response. So I write those on the slip of paper. And when I pass the paper back out of the slot. Now, when the Chinese speaker receives my reply message, it makes perfect sense to her. It seems as though whoever is in the room is answering her questions perfectly, and therefore it seems obvious that the person in the room must understand Chinese. I've fooled her, of course, because I'm only following a set of instructions with no understanding of what's going on. With enough time and with a big enough set of instructions, I can answer almost any question posed to me in Chinese. But I, the operator, do not understand Chinese. I manipulate symbols all day long, but I have no idea what the symbols mean. Now, The philosopher John Searle argued, this is just what's happening inside a computer. No matter how intelligent a program like chat GPT seems to be, it's only following sets of instructions to spit out answers. It's manipulating symbols without ever really understanding what it's doing. Or think about what Google is doing. When you send Google a query, it doesn't understand your question or even its own answer. It simply moves around zeros and ones and logicates and returns zeros and ones to you. Or with a mind blowing program like Google Translate, I can write a sentence in Russian and it can return the translation in Amharic. But it's all algorithmic. It's just symbol manipulation. Like the operator inside the Chinese room, Google Translate doesn't understand anything about the sentence. Nothing carries any meaning to it. So the Chinese room argument suggests that AI that mimics human intelligence doesn't actually understand what it's talking about. There's no meaning to anything, CHATCHYPT says, and Serle used this thought experiment to argue that there's something about human brains that won't be explained if we simply analogize them to digital computers. There's a gap between symbols that have no meaning and our conscious experience. Now, there's an ongoing debate about the interpretation of the Chinese room argument, but however one construes it, the argument exposes the difficulty in the mystery of how zeros and ones would ever come to equal our experience of being alive in the world. Now, just to be very clear on this point, we don't understand why we are conscious. There's still a huge amount of work that has to be done in biology to understand that. But this is just to say that simply having zeros in one moving around wouldn't by itself seem to be sufficient for conscious experience. In other words, how do zeros and ones ever equal the sting of a hot pepper, or the yellowness of yellow or the beauty of a sunset. By the way, I've covered the Chinese room argument in my TV show The Brain, and if you're interested in that, I'll link the video on Eagleman dot com slash podcast. Now, all this is not a criticism of the approach of moving zeros and ones around. But it is to point out that we shouldn't confuse this type of Chinese room correlation with real sentience or intelligence. And there's a deeper reason to be suspicious too, because despite the incredible successes of large language models, we also see that they sometimes make decisions that expose the fact that they don't have any meaningful model of the In other words, I think we can gain some fast insight by paying attention to the places where the AI is not working so well. So I'll give three quick examples. The first has to do with humor. AI has a very difficult time making an original joke, and this is for a simple reason. To make up a new joke, you need to know what the ending is and then you work backwards to construct the joke with red herrings so no one sees where you're going and it happens at the way these large language models work is all in the forward direction. They decide what is the most probable word to come next, So they're fine at parroting jokes back to us, but they're total failures at building original jokes. And there's a deeper point here as well. To build a joke, You need to have some model, some idea of what will be funny to a fellow human, what shared concept or shared experience would make someone laugh. And for that, you generally need to have the experience of a human life with all of its joys and slings and arrows and so on. And these large language models can do a lot of things, but they don't have any model of what it is to be a human. My second example has to do with the flip side of making a joke, which is getting a joke. And if you look carefully, you will see how current AI always fails to catch jokes that are thrown at it. It doesn't get jokes because it doesn't have a model of what it is to be a human. But this point goes beyond jokes. One of the most remarkable feats of these large language models is summarizing large texts, and in twenty twenty two, open Ai announced how they could summarize entire books like Alice in Wonderland. What it does is it generates a summary of each chapter, and then it uses those after summaries to make a summary of the whole book. So for Alice in Wonderland, it generates the following. Alice falls down a rabbit hole and grows to a giant size. After drinking a mysterious bottle, she decides to focus on growing back to her normal size and finding her way into the garden. She meets the caterpillar, who tells her that one side of a mushroom will make her grow taller, the other side shorter. She eats the mushroom and returns to her normal size. Alice attends a party with the Mad Hatter and the march Hare. The Queen arrives and orders the execution of the gardeners for making a mistake with the roses. Alice saves them by putting them in a flower pot. The King and Queen of Hearts preside over a trial. The Queen gets angry and orders Alice to be sentenced to death. Alice wakes up to find her sister by her side. So that's pretty remarkable. It took a whole book, and it was able to summarize it down to a paragraph. But I kept reading these text summaries carefully, and I got to the summary of Act one of Romeo and Juliet, and here's what it says. Romeo locks himself in his room, no longer in love with rosalind Now, I think the engineers at open Ai felt really satisfied with this summary. They thought it was quite good, and my proof for this is that they still display it proudly on their website. But I majored in literature as an undergraduate, and I spend a lot of time with shakespeare plays, and I immediately knew that this summary was exactly wrong. The actual scene from Shakespeare goes like this. His friend ben Voglio finds Romeo catatonically depressed, and ben Volio says, what sadness lengthens Romeo's hours? And Romeo says, not having that which having makes them short? And ben Volio says in love, and Romeo says out ben Reli says of love, and Romeo says out of her favor, where I am in love? This this is typical Shakespearean wordplay, where Romeo is expressing his grief of being out of favor with Roslin, with whom he is deeply in love. And when you read the play, it's obvious that Romeo is not over Roslin. He's suffering over her. He's almost suicidal. And this is an important piece of the play, because the play is really about a young man in love with the idea of being in love, and that's why he later in the same act, falls so hard into his relationship with Juliet, a relationship which ends in their mutual suicide. By the way, as Friar Lauren says of their relationship, these violent delights have violent ends. And you get a bonus if you can tell me where else you've heard that line more recently. Okay, anyway back to the AI summary, The AI misses this wordplay entirely, and it concludes that Romeo is out of love with Roslin. Again, a human watching the play or reading the play immediately gets that Romeo is making wordplay and his heartbroken over Roslin, but the AI doesn't get that because it's reading words only at a statistical level, not at a level of understanding of what it is to be a human saying those words. And that leads me to the third example, which is the difficulty in understanding the physical world. So consider a question like this, When President Biden walks into a room, does his head come with him? So this is famously difficult for AI to answer a question like this, even though it's trivial for you because the AI doesn't have an internal model of how everything physically hangs together in the world. Last week, I was at the TED conference and I heard a great talk by Yegin Choi, and she was phrasing this problem as AI not having common sense. She asked chat GPT the following question, it takes six hours to dry six shirts in the sun, how long does it take to dry thirty shirts? And it answers thirty hours. Now you and I see that the answer should be six hours, because we know the sun doesn't care how many shirts are out there. But chat GPT just doesn't get it because despite appearances, it doesn't have a model of the world. And we've seen this sort of thing for years. By the way, even in mind blowingly impressive AI models that do image recognition, they're so impressive in what they recognize, but then they'll fail catastrophically. It's some easy picture making mistakes that a human just wouldn't make. For example, there's one picture where there's a boy holding a toothbrush and the AI says it's a boy with a baseball bat. Okay, so there are things that AI doesn't do that well. But that said, there are other things that are mind blowing, things that no one expected it to do. And this is why I mentioned in my previous episode that we are in an era of discovery more than just invention. Everyone's searching and finding things that the AI can do that nobody really expected or foresaw, including all the stuff that we're now taking for granted, like oh, it can summarize books or it can make art from text. And I want to point out that a lot of the arguments that people have been making about AI not being good at something, these arguments have been changing rapidly. For example, just a few months ago, people were arguing that AI would make silly mistakes about things, and it couldn't really understand math and would get math wrong and word problems. But in a shockingly brief time, a lot of these shortcomings have been mastered. So it's yet to be seen what challenges will remain and for how long. So the evidence I've presented so far is that AI doesn't have a great model of what it's like to be human, but that doesn't necessarily rule out that it has sentience or awareness, even if it's of another flavor. It doesn't think like a human, but maybe it stif thinks so is chat GPT having some sort of experience? And how would we know? In nineteen fifty, the brilliant mathematician and computer scientist Alan Turing was asking this question, how could you determine whether a machine exhibits human like intelligence? So he proposed an experiment that he called the imitation game. You've got a machine AI that's programmed to simulate human speech or conversation, and you place it in a closed room, and in a second room you have a real human, but the doors are closed, so you don't know which room has which machine or human. And now you are a person, the evaluator, who communicates with both of them via a computer terminal or I think of a nowadays like text messaging with both of them. So you, the evaluator, engage in a conversation with both closed rooms, one of which has the machine and one the human, and your job is simply to figure out which is which, which is the machine and which is the human. And the only thing that you have to work with are the texts that are going back and forth. And if you, the evaluator, cannot tell, that is the moment when machine intelligence has finally arrived at the level of human intelligence. It has passed the imitation or what we now call the Touring test. And this reminds me of this great line in the first episode of Westworld, where the protagonist William is talking to the woman who's outfitting him for his adventure in Westworld and giving him a hat and a gun and so on, and he hesitantly asks, I hope you don't mind if I ask you this question, but are you real? And she says to him, if you can't tell, does it matter? So I brought this up last episode in the context of art, where we asked whether it matters if the art is generated by an AI or a human, But now this question comes up in the context of intelligence and sentence. Does it matter whether we can tell or not? Well, I think we're way beyond the Turing test nowadays, but I don't feel like it gives us a good answer to the question of whether the AI is intelligent and is experiencing an inner life. I mean, the Sturing test has been the test in the AI world since the beginning. Why is it the perfect test? No, but it's really hard to figure out how to test for intelligence. But we have to be cautious about equating conversational ability with sentience. Why well, for starters, let's just acknowledge how easy it is for us to anthropomorphize. That means to assign human qualities to everything around us. Like we give animals human names and talk to them as though they are people, and we project our emotions onto animals. We make stories about animals that have human like qualities, and we have animals that talk and wear clothes and go on adventures in these stories. Every Pixar film that you watch is about cars or toys or airplanes talking and having emotions, and we don't even bad an eye at that stuff. We can, in fact, just watch random shapes moving around a computer screen and we will assign intention and feel emotion depending on exactly how they're moving. If you're interested in this, see the link on the podcast page to the study by Heighter and Simil in the nineteen forties where they move shapes around on a screen. Okay, now this is all related to a point that I brought up in the last episode, which is how easy it is to pluck the strings on a human, or, as the West World writers put it, how hackable humans are. So I bring all this up to say that just because you think that an answer sounds very clever or it sounds like a human really tells us very little about whether the AI is actually intelligent or sentient. It only tells us something about the willingness of us as observers to anthropomorphize, to assign intention where there is none, Because what chat GPT does is take the structure of language very impressively and spoon it back to us, and we hear these well formed sentences, and we can hardly help but impose sentience on the AI. And part of the reason is that language is a super compressed package that needs to be unpacked by the listener's brain for its meaning. So we generally assume that when we send our little package of sounds across the air, that it unpacks and the other person understands exactly what we meant. So when I say justice or love or suffering, we all have a different sense in our heads about what that means, because I'm just sending a few phonemes across the air, and you have to unpack those words and interpret them within your own model of the world. I'm going to come back to this point in future episodes, but for now, the point I want to make is that a large language model can generate text statistically and we can be gobsmacked by the apparent depth of it. But in part this is because we cannot help but impose meaning on the words that we receive. We hear a particular string of sounds and we cannot help but assume meaning behind it. Okay, so maybe the imitation game is not really the best test for meaningful intelligence, but there are other tests out there. Because while the Turing test measures something about AI language processing, it doesn't necessarily require the AI to demonstrate creative thinking or originality, and so that leads us to the Loveless test, named after Ada Loveless, who is the nineteenth century mathematician who's often thought of as the world's first computer programmer. And she once said quote, only when computers originate things should be believed to have minds. So the Loveless test was proposed in two thousand and one, and this test focuses on the creative capabilities of AI systems. So to pass the Loveless test, a machine has to create an original work, such as a piece of art or a novel that it was not explicitly designed to produce. This test aims to assess whether AI systems can exhibit creativity and autonomy, which are key aspects of what we think about with consciousness. And the idea is that true sentience involves creative and original thinking, not just the ability to follow pre programmed rules or algorithms. And I'll just note that over a decade ago, the scientist A. Mark Rydel proposed the loveless two point zero test, which gets the human evaluator to specify the constraints that will make the output novel and surprising. So the example that l used in his paper is, quote, create a story in which a boy falls in love with a girl, Aliens abduct the boy, and the girl saves the world with the help of a talking cat. But we now know that this is totally trivial for chat, GPTE or BARD or any large language model. And I think this tells us that these sorts of games with making conversation or making text or art are insufficient to actually assess intelligence. Why because it's not so hard to mix things up to make them seem original and intelligent when it's really just doing a mashup. So I want to turn to another test that I think is more powerful than the Turing test of the Loveless test, and probably easier to judge, and that is this, if a system is truly intelligent, it should be able to do scientific discovery. A version of the scientific discovery test was first proposed by a scientist named Shao cheng Xiang a few years ago, and he pointed out that the most important thing that humans do is make scientific discoveries, and the day our AI can make real discoveries is the day they become as smart as we are. Now. I want to propose an important change to this test, and then I think we'll be getting somewhere. So here's the scenario I'm envisioning. Let's say that I ask Ai some question, a question in the biomedical space about what kind of drug would be best suited to bind to this receptor and trigger a cascade that causes a particular gene to get suppressed. Okay, So imagine that I ask that to chat GPT and it tells me some mind blowing, amazing clever answer, one that had previously not been known, something that's never been known by scientists before. We would assume naturally that it has done some extraordinary scientific reasoning, but that won't necessarily be the reason that it passes. Instead, it might pass simply because it's more well read than I am, or than any other human on the planet by literally millions of times. So the way to think about this is to picture a typical giant biomedical library, where there's some fact stored at a paper and a journal over here on this shelf in this book, and there's another seemingly dissociated fact over on this shelf seven stacks away, and there's a third fact all the way on the other side of the library, on the bottom shelf, in a book from nineteen seventy nine. And it's almost infinitesimally unlikely that any human could even hope to have read one one millionth of the biomedical literature, and really really unlikely that she would be able to catch those three facts and hold them in mind at the same time. But this is trivial, of course, for a large language model with hundreds of billions of nodes. So I think that we will see new science getting done by CHATGPT, not because it is conceptualizing, not because it's doing human like reasoning, but because it doesn't know that these are disparate facts spread around the library. It simply knows these as three facts that seem to fit together. And so with the right sort of questions, we might find that sometimes AI generates something amazing and it seems to pass the scientific discovery test. So this is going to be incredibly useful for science. And I've never been able to escape the feeling as I sift through Google scholar and the thousands of papers published each month that have something could hold all the knowledge and mind at once, each page in every journal, and every gene in the genome, and all the pages about chemistry and physics and mathematical techniques and astrophysics and so on. Then you'd have lots of puzzle pieces that could potentially make lots of connections. And you know this might lead to the retirement of many scientists, or at minimum lead to a better use of our time. There's a depressing sense in which each scientist, each one of us, finds little pieces of the puzzle, and in the twinkling of a single human lifetime, a busy scientist might collect up a handful of different puzzle pieces. The most voracious reader, the most assiduous worker, the most creative synthesizer of ideas, can only hope to collect a small number of puzzle pieces and pray that some of them might fit together. So this is going to be massively important. But I wanted to find two categories of scientific discovery. The first is what I just described, which is science where things that already exist in literature can be pieced together. And let's call that level one discovery. And these large language models will be awesome at level one because they've read every paper and they have a perfect memory. But I want to distinguish a second level of scientific discovery, and this is the one I'm interested in. I'll call this level two, and that is science that requires conceptualization to get to the next step, not just remixing what's already there. Conceptualization like when the young Albert Einstein imagined something that he had never seen before. He asked himself, what would it be like if I could catch up with a beam of light and write it like a surfer riding a wave. And this is how he derived this special theory of relativity. This isn't something he looked up and found three facts that clicked. Again, he imagined he asked new questions. He tried out a new model of the world, one in which time runs differently depending on how fast you're going, and then he worked backwards to see if that model could work. Or consider when Charles Darwin thought about the species that he saw around him, and he imagined all the species that he didn't see but who might have existed, and he was able to put together a new mental model in which most species don't make it and we only see those whose mutations cause survival advantages or reproductive advantages. These weren't facts that he just collected from some papers. He was trying out a new model of the world. Now this kind of science isn't just for the big giant stuff. Most meaningful science is actually driven by this kind of imagination of new models. Just as one example, I recently in an episode about whether time runs in slow motion when you're in fear for your life. And so when I wondered about this question, I realized there were two hypotheses that might explain it, and I thought up an experiment to discriminate those two hypotheses. And then we built a wristband that flashes information at a particular speed and had people wear, and we dropped them from one hundred and fifty foot tall tower into a net below. A large language model presumably couldn't do that because it's just playing statistical word games. And unless someone had thought of that experiment and written it down, JATGPT would never say, Okay, here's a new framework, and how we can design an experiment to put this to the test. So this is what I wanted to find as the most meaningful test for a human level of intelligence. When AI can do science in this way, generating new ideas and frameworks, not just clicking act together, then we will have matched human intelligence. And I just want to take one more angle on this to make the picture clear. The way a scientist reads a journal paper is not simply by correlating words and extracting keywords, although that might be part of it, but also by realizing what was not said. Why did the authors cut off the x axis here at thirty What if they had extended this graph, would the line have reversed in its trend? And why didn't the authors mention the hypothesis of Smith at all? And does that graph look too perfect? You know? One of my mentors, Francis Krik operated under the assumption that he should disbelieve twenty five percent of what he read in the literature. Is this because of fraud or error, or statistical fluctuations or manipulation or the waste basket effect? Who cares? The bottom line is that the literature is rife with errors, and depending on the field, some estimates put the ireproducibility at fifty percent. So when scientists read papers they know this, just as Francis Crick did. They read in an entirely different manner than Google Translate or Watson or chat GPT or any of the correlational methods they extrapolate. They read the paper and wonder about other possibilities. They chew on what's missing. They envision the next step. They think of the next experiment that could confirm or disconfirm the hypotheses and the frameworks in the paper. To my mind, the meaningful goal of AI is not going to be found in number crunching and looking for facts that click together. It's going to often be something else. It's going to require an AI that learns how humans think, how they behave, what they don't say, what they didn't think of, what they misthought about, what they should think about. And one more thing, I should note that these different levels I've outlined, from fitting facts together versus imagining new world models, they're probably gonna end up with blurry boundaries. So maybe chat GPT will come up with something, and you won't always know whether it's piecing together a few disparate pieces in the literature what I'm calling level one, or whether it's come up with something that is truly a new world model that's not a simple clicking together but a genuine process of generating a new framework to explain the data. So distinguishing the levels of discovery is probably not going to be an easy task with a bright line between them, but I think it will clarify some things to make this distinction. And last thing, I don't necessarily know that there's something magical and ineffable about the way that humans do this. Presumably we're running algorithms too, it's just that they're running on self configuring wetwear. I have seen tens of thousands of science experiments in my career, so I know the process of asking a question and figuring out what we'll put it to the test. So we may get to level two and it may be sooner than we expect, but I just want to be clear that right now we have not figured out the human algorithms. So the current version of AI, as massively impressive as it is, does not do level two scientific problem solving. And that's when we're going to know that we've crossed a new kind of line into a machine that is truly intelligent. So let's wrap up. At least for now. Humans still have to do the science, by which I mean the conceptual work, wherein we take a framework for understanding the world and we rethink it and we mentally simulate whether a new model of the world could explain the observed data, and we come up with a way to test that new model. It's not just searching for facts. So I'm definitely not saying we won't get to the next level where AI can conceptualize things and predict forward and build new knowledge. This might be a week from now, or it might be a century from now. Who knows how hard a problem that's going to turn out to be. But I want us to be clear eyed on where we are right now, because sometimes in the blindingly impressive light of what current AI is doing, it can be difficult to see, what's missing and where we might be heading. That's all for this week. To find out more and to share your thoughts, head over to eagleman dot com slash Podcasts, and you can also watch full episodes of Inner Cosmos on YouTube. Subscribe to my channel so you can follow along each week for new updates. I'd love to hear your questions, so please send those to podcast at eagleman dot com and I will do a special episode where I answer questions. Until next time. I'm David Eagelman and this is Inner Cosmos