Google's Journey to the Edge of Search

Published Nov 3, 2022, 4:05 AM

Cathy Edwards is vice president and GM of Search at Google. Cathy's problem is this: how do you teach computers to tell people what they want to know, even if they don't know how to ask?

Google's last leap: Moving from search results based on keywords to search results based on concepts. The next step: Figuring out how to let people search using not just words, but combinations of words and images.

If you’d like to keep up with the most recent news from this and other Pushkin podcasts be sure to subscribe to our email list.

Pushkin. It feels like searching the web is a problem that's been solved. You know, it's ridiculously easy for me to say, find out when Alexander Hamilton was shot eighteen oh four, or whether they are making Sing three. Not yet, but Matthew McConaughey has expressed interest and yet. And maybe this is not surprising. The people who spend their lives working on search do not think search is solved. This is partly because the people at the frontier of search don't just want to search the web. They want to answer every question that might cross your mind, even questions you can't put into words. I'm Jacob Goldstein, and this is What's your problem? The show where on entrepreneurs and engineers talk about how they're going to change the world once they solve a few problems. My guest today is Kathy Edwards, vice president and GM of Search at Google. Cathy's problem is this, how do you teach computers to tell people what they want to know, even if they don't know how to ask. Later in the conversation we get to the frontier of what Kathy and Google are working on now, but we started with the problem they have largely solved in the six years Kathy has been at Google, the jump from search results based on keywords to search results based on natural language, the way people talk in everyday life. So one of the problems that we were working on around six years ago is this problem of natural language queries. So, if you're old enough to remember the early days of search on the Internet, there was this idea of keywordies, right, that you had to sort of take this idea you had in your mind of what you what you needed to know and figure out what were the exact right keywords to enter into the search engine to get your results back right. I mean an example as an example, very early in the you know, I remember being taught how to query back in you know, nineteen ninety nine and being told never used the word and never used the word because the word and or the word there is in almost every document on the Internet. And the way it worked back then is you did this word matching, right, and so if you had a word that was in your query and there was that same word in the document, then that document would be returned and potentially scored right. And that was very helpful if it was a word like genetics, right, which was highly specific and wasn't in a heap of documents on the internet. But the word and not very specific. And you know, in the very early days the Internet, these words weren't even weighted particularly right, The word and count for as much as the word genetics, and so a document might have a ton of the uses of the word and and one use of the word genetics, and it would score really highly, even though it wasn't particularly genetics. Folk. Now, by the time you get to Google, that part is solved, right Google buy that part is years ago? Is waiting genetics more more heavily than it's waiting the But but what what part six years ago was not solved? That's solved now or solved ish now. But we were still seeing people do these very keyword oriented queries. So they weren't saying things like what wine pairs best with chicken? Or if they were, they were doing those queries and getting not the best results, because not only is there a question of word matching and how much each word counts for, there's also the question of like does the word what appear at all? Right? Like are the answers to that question actually just documents that talk about the best wine to pair with chicken is you know, chardonnay, right, and not so much talking. You know, they didn't include the question, and so we sort of saw these like SEO documents that would spring up that would have the questions kind of baked in and an attempt to match. But those documents weren't necessarily the best answers. And so this is when we started to go just that next level deeper in our language understanding with these AI models, their language models that really can start to map out in a concept space, things like this sort of translation between how you might ask a query and then what that might look like in the document. So, to take the example you gave of what wine pairs best with chicken, even as late as six years ago when you got to Google, you're saying Google wasn't great at delivering the best results to a query like that because it was written as speech, not written as a series of keywords. So six years ago, I would have been better off typing chicken wine pairing. I would have got better results if I did that, you're saying, because that's kind of the way. That's the way Google had mapped the web. It was like a series of important words and what sites are reliable and they it just the technology wasn't there to actually try and understand the way people ask questions in real life. Absolutely, And it was this idea of bringing AI into search and having these like large scale language models. That first one was called Bert. We now use one called mom, which is get to mom. But let's talk about Let's try talking about Bert for a second. So how do you get from search results that are fundamentally keyword based to search results that are fundamentally you know, answering questions that are posed in a more natural way, like how do you make that leap? So the fundamental insight is you go from looking at these words as tokens that get matched against each other to suddenly you look at all the words in all the documents on the Internet and you create what's code an embedding space, which is essentially you can think of it as a map of the concepts that these documents know about. And suddenly, by being able to say, okay, you can take a query, map that into this concept embedding space. You'd take these documents, map that into the content embedding space. You can start to actually match together not these words, but what people actually mean what they actually mean when they ask these questions, and what they actually mean when they write these web pages on the Internet. That seems I mean, A, it seems super hard, right, and B As I'm parsing that, I'm tempted to use a lot of anthropomorphic language, right, I'm tempted to say, like, you have to go from the computer just sort of having a list of words and kind of weights around those words to a computer understanding what people mean. Like, am I right to say that? Or is that just my like layperson intuition getting in the way of what's going on? I Mean, the first thing I'll say is I think we're very far away from the computer having any sort of sentience and truly understanding. But I think it is true. It is fair to say that there is a level of deeper understanding that you're not just looking at these words as as you know, bits in a computer, but you're actually starting to model in a way that a human might, a brain might model what the concepts are. And I do think that's a first step of getting closer to this sort of natural human understanding. So is there a way to talk about how that works? It's it's pattern matching effectively right, And it just so happens that if you magnify pattern matching on a very large scale, that can be a pretty compelling understanding. And so that's the sort of big idea, the theory of how it works. I'm sure in actually building the thing in building Burt, which was this big model that did work, it wasn't that easy, right, I mean, is there a is there a story version of how you built it? So I think there were two hard points along the journey. The first hard point was just these models were being built at a scale that was unprecedented the amount of information. You know, traditional neural networks would run on thousands, maybe millions of training examples. Suddenly you're trying to model all the words on the Internet and this scale. Firstly, this scale is what gets you the amount of training to actually get the concepts model to be compelling. But frankly, the computers just couldn't process. So you're you're building this model and saying, okay, now to learn what you need to learn, read literally every word on the Internet, is that right, yes, and not read at once, because every layer of the neuronet needs to read it and reprocess it. Right. So you're reading every word, you know, a massive number of times. And at the time we didn't really have the compute power. You just needed more more computers, essentially more and more chips, more more engines to just process and process and process. So our research team had developed these these chips that these processes that were really optimized for doing a sort of deep learning work. And it was that these chips and the way that we could sort of put all the chips together at a work in concert to solve this problem that really unlocked the amount of processing power and needed to even build these models in the festival. So the binding constraint wasn't like the theory or the ideas of it, like you knew how to do it, you just didn't have enough enough horsepower to actually make it happen. Well, we knew that we could do it, we didn't know offer to be any good, right, it wasn't it, and you couldn't even try, right, yeah? Right? And so then we tried it and we found out, actually, this thing is pretty compelling. It can understand things that our models previously have never understood. You know. But I will say the second and this gets to the second hard part. We once we had these large scale language models, we didn't quite know how to put them into search ranking. This was not something that had been done before. So we have in search this incredibly rigorous methodology for testing any given change to our algorithm, and it's it's based in statistics, and it's statistically samples queries, and we look at the before and they after, and there's a scoring system to say is it better or not? And I remember looking at the early experiments from this burst integration into our search engine, and the queries that it was impacting were just queries that, honestly before we would have said, we don't know how we can solve this query. And suddenly the model was just able to figure out these sort of unspoken concepts that just our previous technology just would not have even been able to come close to. Like give me an example, like what kind of thing? So here's a really great example. This is directly from the one of the very first bit evaluations that we did internally, and the query is can you get medicine for someone? Pharmacy? Right? And so what's interesting about this question is the users looking for something very specific. They're looking like maybe my partner is sick. Can I go and pick up their prescription at the pharmacy for them? Or do they have to go and get it? Right? It's also a goodly jankie where it's half in natural language can you get medicine for someone? And half in like keyword ease, they're just typing pharmacy at the end, right, it's a weird exactly yeah. And so previously we didn't know how to pause out this intent right, this idea. You know, we could tell that it was about getting a prescription from a pharmacy, but this notion of force someone was a concept that was just slightly too complex. Oh, I didn't even understand it until now. What they mean is can I pick up someone else's prescription? That's what they're actually asking, But it's very it's poorly worded, frankly, and they'refore hard to figure out exactly right. And so previously, before Bert, we would turn these wonderful web pages saying this is how you get a prescription filled, which you can imagine if you're this user doing this query, you're like, yeah, I already know how to get a prescription that filled. Thanks for me. What I need is filled for somebody else exactly, and with Bert we were able to understand pick up this idea of the force someone and put the appropriate weight on it, that that was the sort of you know, discriminating thing in the query, that that was the key thing that the query turned on. And then we were able to show this this web page that talked about can I have a friend or family member pick up a prescription for me? And that was the sort of like aha moment where we could all just sit around and be like, Wow, this is a new level of understanding that we haven't got to previously. So with birth, Google got to the point where it was very very good at dealing with words in a deep, complex way. But words make up less and less of the Internet. Pictures and videos are a whole other story that's coming up in a minute. Now back to the show. So you have got to this place now where you have you Google have made the leap from keyword based searches to intention based search is what do people mean? Right? Which is this big interesting leap? And so I'm interested in kind of the next leap, like what's the next big hard problem you're trying to solve? What's really interesting to me is this idea of how many questions you don't ask because you don't even know the words, right, Like this is a bit of a sad story. But I have at my house this oak tree, and the oak tree I think is dead, and it's very sad for me because a very beautiful oak tree. And what's interesting is, you know, I looked at the oak tree, I'm like, wow, those leaves are kind of brown, Like, that's not it doesn't seem right to me. I wonder if there's something's wrong with the oak tree, right, But I can't necessarily right now really articulate that to a computer this fundamental question of is this oak tree dead? And if not, what can I do to save it? Right? So what I do is I go and type in some queries, I say, you know, oak tree dead? How do I know if my oak tree is dead? You know? And I get back results. But those results aren't necessarily taking what they're not taking in any context of this particular tree and what do the leaves look like? And so this idea of how can you start to ask these questions using all of the information around you, using your camera to actually capture this particular oak tree, using your location to know, you know, what are the native oaks in this area? And what's the current incidence of sudden oak death syndrome, which is a thing that I have recently learned exists. Okay, so I get why this is a hard thing to search in a text box, right, And so the thing that's interesting to me is how can we facilitate asking those types of questions where it's a mix of here's something that you're looking at, Here's something that you're saying with your words that adds to the picture. You know, here's a lemon tree that's got some black spots on it? What's wrong with it? Like? How can you help me understand what I should do about this? You know, these sorts of questions I think are right now. We have to do a tremendous amount of work to try and translate these questions into text that we would issue to a search engine. And yeah, we use that. Yeah, yeah, normal people. Yes, we're all doing it. And when you think it's you know, sometimes it's very easy, right, but sometimes you're like really having to work hard to come up with a query that will actually get you the answers that you need. And I think that's really the next frontier for us is how do we on the query side help users just naturally intuitively express whatever information need they have. And then how do we understand the whole universe of information, not just the web pages, that all the images and video and audio out there, and take that next level of like concept understanding to match those together so that we can get users even more precise answers that really help them. Great. So that's the like vast dream slash big problem. Can we reduce it a little bit so we can talk in sort of practical terms about what you're working on. I mean, I know there's this new AI model that integrates images, like you can, you know, whatever, take a picture with the camera on your phone and put in text. So like, well, you have this new model, and it, like the old one, has this worm fuzzy acronym. Right, it's called MUM, which stands for hold on, I gotta look at my notes, the multitask unified model. So like, tell me about MUM. So MOM is our next level model that you know Bert was about language. MOM is about all these different modalities of information coming together, particular images and language. I mean, is that if really images and language and we've got some limited applications of it in search today. So for example, you can take the take the photo of somebody's handbag and say you want to shop it, and that will work today. And that is like we were not able to do this previously, and that in and of itself is a big breakthrough. But there's still so much headroom, right like, this, still so much ability to say, you know, you can add sort of I would I would classify our current ability to process words in this multimodal context as you know, kind of like back in the early days days of the internet, you can say near me to find where you can buy it. Near me, you can say buy, but you can't necessarily like ask an incredibly complicated question about a picture, right like, so we're kind of back to keywords in this new pictures plus words universe. Let me ask a dumb question, why why can't you just take all of your brilliant intent AI and copy and paste it to fit with the image AI. So a couple of things. The first is that anytime we develop sort of this new technology, we also need to see how users start using it, right And so I think it's also fed say that we don't have. You know, we have a ton of people using this, but we haven't yet. There hasn't been time for that new technology to really be accepted by the world. And then we have this vast set of queries that we're doing poorly on, right. So that's the other thing you should know about Google. We spend a lot of time looking at the queries where we're failing. That's one of the other reasons we have a deep appreciation of how search is an unsolved problem because we're just constantly looking at queries whether the users clearly not getting what they're looking for. And I'll use that as a as a siege to figure out how to make things better. So do I understand you that the fundamental thing you need now is just lots of people to use this thing so you can see the weird ways people search and the things they sort of do that are hard to understand. That's certainly one of the things we need. I mean, it is fundamentally search works in service of our users, right, and so understanding the the failures is critical to how we get better. I think there are also just things that we know that we need to do on the AI and the model side that we'll continue working through, right, the ability to really bring together more of the two step process of how do you conceptually understand the words, conceptually understand the image, and then bring those two things together and have that be a bit deeper on both sides rather than just the combination together and those sorts of things. But yeah, I mean people coming in and using it and then having a bad time, we'll then make it better. It seems like there have been two main threads of AI research. One is basically language and the other is basically vision and images. I mean, it is it right to think of what you're trying to do as the synthesis of those two sort of main AI traditions. Yeah, I think so. I think it is clearly the case that just like uh, you know, with Bert, we took all these words and we got down to concepts. Right, It is clearly the case that human beings understand the world through concepts, and they do that visually, and they do that with language, and ultimately the concepts are the same. Right, So being able being able to say, okay, here's here's a concept, and we can attach to that what that concept looks like or that visual representation of that concept as much as it has one and the words surrounding that concept. That's when we can really unlock this true natural way of understanding the world that we think is going to enable people to ask all those questions that they have that they're not asking right now. Are there applications that go beyond search that come to mind if you figure this out? Yeah, I mean I think that search has this connotation of kind of find what's out there. I think there's something, you know, we're thinking about what this looks like in the generative space. So for example, if you're looking for you know, I bake birthday cakes, and sometimes I for my kids, and sometimes what my kids want, and a birthday cake just actually doesn't exist on the internet right or there's like only one or two, So then I have to come up with it myself. And like, what if AI could help us generate as sample image just based on these concepts that I could then use for inspiration. I think that's a pretty interesting concept. There's obviously a lot of things that we need to be very thoughtful about in this space as we do it, But I think this idea of extending search past the notion of connecting you with the information that's out there. To actually synthesizing new information for you is pretty interesting and something we're talking about a lot. You know. One of the things that has become clear to me talking with you is clearly, I think too narrowly about search. Right. I have this very kind of twenty years ago idea of like searching text on the web, and the web has become much less text based in that time. Right, the web includes Instagram, the web includes TikTok, and those are places where, weirdly to me, lots of people go to search like people go on TikTok to find whatever, where to go out to eat, which would never occur to me. So I mean it's that part of the sort of motivation on some level for you to figure out, Oh, right, text, that's not enough, clue, we got to figure out how to search in video and what does that even mean. I think we're really driven by what our users are telling us, and we just have really robust mechanisms for understanding what our users are doing. And it's pretty clear that people around the world find image and video content to be pretty compelling, right, I Mean that's sort of a very obvious statement, but you know the Internet in the early days, it was banned with constrained. It was technology constrained. It had to be words because that's what the technology enabled, not necessarily because that's what human beings most enjoy in terms of an information consumption experience. And so we really are driven by what we're seeing in the user trends, and we're really driven by just this mission of how do we keep helping people get the best answers to their questions that we can give them. In a minute, the Lightning round, including what you learn about the Internet when you spend six years working at Google Search. Now, let's get back to the show. I want to do a lightning round. We close usually with a lightning round on this show, just a bunch of relatively quick questions. So in this instance, I googled best Lightning Round questions, and right at the top of the search results page I didn't even have to click through, is this bulleted list. I'm just going to give you a few from there. Sounds good. Favorite day of the week, oh Monday, because I get to go to work and not deal with my kids all day. Who I love very daily. Good favorite city in US besides the one you live in just reading here New York's City. Thank you. Would you rather be able to speak every language in the world or be able to talk to animals? Speak every language in the world. I'm shocked, to be honest, Although I get that like a Google might actually figure that out. Does it not seem like you can already get a translator for every language. Talking that animals would be like a revolution and human understanding of the natural world, I guess. But I do not speak any language as really, and I constantly feel bad about it. So maybe that was just a fair personal feeling of weakness. So okay, so we're now we're pivoting out of the Google lightning round questions into my own bespoke lightning round questions. What's your favorite kind of cake to bake? Oh? Well, so I really make these, like quite elaborate cakes for my children because I want them to be able to grow up and say, wow, I remember you making great cakes for us. Mostly so I recently made in one of my kids plays Minecraft a lot, and there is a character called a slime, which is a sort of jelly blob that kind of jumps on top of you and kills you if you don't fight it off. And so I made a slime cake with a cake embedded in the jelly. So big idea one here? What do you think you understand about the Internet that most people don't understand? Oh? I like this one. I think most people don't understand how much it changes every day. And you know, so we have this astonishing stat that even I didn't believe when I heard it, which is that fifteen percent of the queries Google sees every day we have never seen before. And that happens every day. That there's fifteen percent, I just completely new. And the same happens on the internet side. Every day we index a ton of new content we've never seen before about ideas that are completely new to humanity at that time, right, And you know, we have to be able to continually understand that and keep up. And I think that people sort of have this idea that there's a fixed amount of information out there, but actually human beings are astonishingly productive and are constantly coming up with new ideas. If everything goes well, what problem will you be trying to solve in five years, I will still be working on making Google Search better for all our users. I think we will I think we will be working on this for the next hundred years. Is there a narrower answer, like this particular problem you're working on now of integrating image and words basically like you think you'll obviously it won't be completely solved, but you think that'll basically work. And if so, is there a next thing? I think the problem of a video I think will continue to be hard because there's just such a large amount of information in a given video. The other problem that I'm really interested in is helping people pause information with helpful context. So, like, how you know we've unleashed the all of the world's information on people, how do you actually help help them sift through that and make good decisions, whether it's choosing a reliable merchant to buy from or finding reliable medical information? How do you help people make those decisions for themselves and be literate with their information choices. What's one piece of advice you'd give to someone trying to solve a hard problem. I would say, find a really great group of people to help solve up with you, because generally trying to solve hard things by yourself enser being an active fustriction. Kathy Edwards is vice president and g M of Search at Google. Today's show was edited by Robert Smith, produced by Edith Russelo, and engineered by Amanda k Waugh. I'm Jacob Goldstein, and we'll be back next week with another episode of What's Your Problem.

What's Your Problem?

Every week on What’s Your Problem, entrepreneurs and engineers talk about the future they’re trying  
Social links
Follow podcast
Recent clips
Browse 143 clip(s)