Emad Mostaque, the controversial founder and CEO of Stability AI, makes the case to Azeem Azhar that open source generative AI is the best path to reducing global inequality.
Within five years, open source AI will have raised the GDP in the world's poorest countries. That is a premise for today's conversation. I'm Asimazar. Welcome to the Exponentially podcast. Recent reports from the banks and consultancies have suggested that new breakthroughs in AI, particularly generative AI, could add trillions of dollars to global GDP. But the most advanced of these models are built in the West, on expensive supercomputers and trained using English language data sets. So how can the global South, with their young populations, tighter finances and shake your infrastructure share in this productivity boom. Today's guest is m Ad Mustark, a Jordanian born Bangladeshi immigrant to Britain who's a founder and CEO of Stability AI. It's a firm that has accelerated to a billion dollar valuation in less than three years. Emad has been vocal about the potential that open source AI offers to the poorest in the world, but there have been serious criticisms leveled against him, most prominently in a recent story by Forbes magazine. These include questions about taking credit for the company's technology, about some of the partnerships the firm is meant to have and also governance practices at Stability AI, where Mustak remains CEO. MD has publicly rejected these as grossness characterizations, and a debate over the story and others critical of Stability AI continues to play out on social media and elsewhere. But as long as investors who have poured hundreds of millions of dollars into Stability AI continue to stick by a mad and as long as a firm continues to innovate and bring out new AI models, he will remain a powerful force in the expanding AI universe. What is generative AI and how is it different from all of the AI systems that we saw previously.
Generator AI is a new type of AI that started in twenty seventeen. There was a seminal paper called Attention is all you Need because not all data is the same. You pay attention to what's important. So classical AI was built on this concept of big data. So you had all this data of Facebook and Google and they used it to sell you coconut shampoo, but it couldn't go outside its boundaries. So it's like a very logical The future is like the past, stable kind of environment this new type of AI said pay attention to the important parts of data to compress it. So as people listening to this, we'll see they might take away a few points, they're not going to remember our entire conversation. That's what human mind does. You've got the very logical part that can memorize stuff, and you've got the part that builds principles, stories, frameworks for understanding.
And this paper attention is all you need in a sense analogize that for a computer system exactly.
It was the first one that said, this is how you show it at scale, and let's simplify it down to a problem of better data and bigger computers. So using gigantic supercomputers, you can take these big data sets of text, images and others and can press it down to just a few gigabytes of file that learns principles, not facts.
And so this was the missing.
Piece in AI, and that's why using these systems feels actually quite surprisingly human.
Surprisingly human. But how do we get to the generative part of all of that.
The generative part is that you put a prompt in or some words in, and then it gives you something back. It generates the outputs and the outputs are not always even the same because it has principles as a base.
So in the same sense that if you and I meet one day, the way the conversation plays out could be quite different to the next time we meet, because we have principles of socialization and of behavior and of how well we know each other, and those get applied in real time at that moment each time we shake hands.
In real time, it's a file with just a bunch of like it's called neural nets.
Weights.
Words go in and get shaken out like a pinball slot machine, and then the output comes. But the output input can be a painting of a cup in the style of Vermere, and then it understands the nature of painting cup Vermire. But cup has so many different meanings, can mean this cup or cup your hands, or cup your ears, or a world cup, and it understands those things in place because it's been trained on images and text. Similarly, a lot of these language models they've been trained on sentences, so they look at the context of the sentence and they say, what's coming next, you know, like that game of improvisation where you start with the sentence and then you provide something in the next one, and on we go.
And it's moving so so quickly. So last summer I was away in Tanzania on a safari and there's really no mobile signal. We were away from about two and a half weeks and when I came back, the Internet was full of stable diffusion, stable diffusion, stable difusion, stable diffusion. And that is one of your your generative products. What it does it takes text and it produces images. So I can say I want an image of a badger playing football on a bicycle, and stable difusion will will produce that image. It could also produce more useful, commercially interesting images. Recently, you've brought out a new generative product which is stable LM, which looks a lot like the these text models that we've seen are running around totage.
Stable lemmas are language model suites, and what we do is a bit different from a lot of the other companies, and that a lot of the focus and the breakthroughs were because a lot of these research labs, the open eyes and thropics of the world deep minds, have this focus on AGI artificial general intelligence.
Can you be an AI that can do anything?
Yes?
It turns out maybe you can so.
Making a machine that's in our own image in a sense, that's what AGI sound.
More than that, it's a general intelligence. So it's the kid that made you look bad at school because he was good at everything. You know, the top performer like GPT four now can pass the bar exam, the medical licensing exam, the GIRE. It's probably going to Stanford, you know, and next floor. But then our take was that's great, you can have these amazing giant general models. What will be better is to mimic humanity where our companies are not one generalist doing everything. What if we made it so you could bring your own data to the models and have lots of specialists working together and have those models working for you that you own.
Right, What if you.
Had open data, open models and allowed it to be customized and specialized, so rather than relying on one models to do everything, instead you optimize them.
Right.
So, but the idea is still the same. The idea is still constructing a model that is generative. I can give it some text and it can produce something that is commercially useful or in emotionally useful. So working software code or a working invitation to a meeting, or you know, things that will save us time and help us perhaps be a bit more creative.
So again, like a talented graduate, and they can learn very quickly from a few examples. So they have this base of generalized knowledge. They've been through kindergarten, high school, and university, but they're not specialized yet. You can train them yourself, or you can just show them some examples and they learn very quickly. Unlike classical AI models that you had to train on the whole data set.
They weren't good at at aaptation like the badger riding a bike.
That's not really a normal thing, you know, And so that has been the province of well, just us right. We were able to take these concepts together, whereas a computer could never merge together concepts. Now we've got that missing link of being able to take concepts, merge them together, understand some of these hidden meanings.
I'm curier about what you think the economic impact of all of this will be. I mean, there have been any number of papers coming out from the investment banks and the research houses and economists in the last few months. I think Golden Sas had a report that suggested that that with one of their scenarios, they could see fast implementation of generative AI across the world, leading to a seven percent increase in global GDP in about ten years. What's your sense of what it could do economically.
I think it's the biggest thing since the Gutenberg Press, maybe even fire.
The Gutenberg Press six hundred years ago, or fire two million years ago.
I think that humans are driven by stories. It's what allowed us to form tribes and then money and things like that are stories. The press allowed us to write down the stories. But it's very lossy. Again, you me everyone listening like you're looking through your things right now. We do power points, we write things, but it doesn't capture the richness of humanity. Our organizations are built on layers of text, which is painful, and that's why it turns us into cogs and the machine, shall we say.
I mean there's something that's quite powerful about the models. I think that you are getting to here, which is that you can feed them text and through that the machination of the billions of different switches and cogs you guys call them parameters. Yes, in the system, it starts to find those underlying relationships that we know, probably deeply in our brains, but don't express. What we do is we express words, one word at a time, and they look at all these words and they're able to find some representations of reality that actually humans use but can't touch and describe.
I think there's that part of it. Another part of it is just being able again. AI is about information classification. When you're writing, the hardest thing is to write something terse and compress. It's a bit easier to write big, but it's still difficult. The easiest thing for us to do is talk.
Now there's the old added. You know, I couldn't send you a short letter, so I've sent you a long one.
Right now, anyone anywhere can create any image soon, any PowerPoint slide almost instantly that looks beautiful. So the fact that it understands concepts is a big deal because the barriers to information flow are reduced, so in motion can flow better around. Our organizations are systems.
As you eliminate barriers to information flow, you're taking friction out of systems. You're taking friction out of daily life, You're taking friction out of business processes, You're taking friction out of the economy and so we would hope to see improvements in productivity and with that improvements in prosperity.
All of finance can be broken down into two things, securitization and leverage. Securitization is a representation of a asset of some sort. It's money, the trust of the American government, it is a bond, it is a property deed, something like that.
But you can only have so much information on that.
You and I, we have our credit score based on the information of who we are, what we do, our functional identities.
Most of the world is invisible.
The global South. This is people who are under banked, people who perhaps don't have formal IDs and so on.
You need identity and you need information to allow for banking. You need that for finance, and our financial systems are quite slow. As you get increases in information flow, you get increases in prosperity because you can direct assets to where they're needed. You can direct resources to where they're needed. It's like I always tell people in the team roadmaps, are they a resource constrained? Their story constrained? Because if it's a good idea. As as a leader, I hope I will find you the resources. But you have to convince me.
First, I can imagine these models in rich, advanced economies where there's a huge service sector. There are lots of people who sit behind computers typing away creating spreadsheets and PowerPoint slides. You can imagine these models helping economies like that. But how can we see them helping the Global South or poorer, less advanced economies.
One of the reasons these models have got everywhere is they've become good enough, fast enough, and cheap enough. Stuff that used to cost dollars, tens of dolls, hundreds of dollars, thousands of dollars you can now do with a few simple prompts now. I think it will remove a lot of the basic tasks and make people more productive as opposed to leading to mascul unemployment.
And we're not seeing demand for coders drop off right. Coders can still get work pretty quickly as they need it.
Because you will write better code and again a productivity increase. Smart Phones can take amazing pictures, but there are more employed photographers in the world now than ever. You know, again, we adapt, we improve, we use the technology. However, the Global South is a very interesting promonm We had mobile phones, you remember it used to be for the rich, only these big, big things.
And now in the global South there are mobile phones everywhere.
There.
They leapt over the PC to mobile. Yeah, they leapt over to instant payments. Whereas we took a while to catch up. In certain of western countries, you still haven't caught up to instant payments. I think what will happen is these models become good enough, fast enough and cheap enough just within the next few years that they will leap forward to intelligence augmentation.
To the benefit of these emerging markets. Yes, so let's think about where we've got to. We've got this very powerful technology that we've characterized. It's extremely helpful in many many different ways. But it is the case that these systems are extremely expensive to build. To train, as it's called, they require lots and lots of data. The British government has allocated more than a billion dollars to build a supercomputer just to train these models. The rumors are that the GPT four model from open Ai cost hundreds of millions of dollars to train. But they're also trained on pretty much everything that you can find on the internet, a large part of which will be Western American English language, a strong cultural buias. So it sounds like that not only can poorer countries not afford this, but even if they could, the technologies wouldn't necessarily be be suitable for the economic requirements or the cultural requirements of Tanzania or Bangladesh.
Yeah, I think this is a real problem. I think the quality of data we're feeding to these incredible models is poor. It's scraped from the whole Internet. We need better data, We need that as infrastructure. There is a monetary equation if we need giant supercomputers, but more is a question of talent and expertise. It's complicated to build these things. This is one of the reasons again kind of we had stability to do an open version and build these data sets for each country on an open basis.
So what do you actually mean by by an open model and how does that solve the computational problem.
We got the giant supercomputers, and then we got we stability ability, yes, and then we made it available, and then we released an open source so people could take these models as a base and then extend them.
But I'm familiar with open source in software. It's a whereas with closed source. If you're getting a Microsoft word you buy it from Microsoft, and you can't inspect the source code the instructions that make it run. You just run it, so you are fundamentally consumer of it. But with an open source project like Libra Office, which is an open source office product, you just download the code. You can look at the code, you can inspect the code, you can modify the code, and you can tailor it to your own requirement. So that's open source in software. What's open source in a model?
So in a model, you can inspect the code, you can expect the data sets, and the model weights themselves are freely available as a fresh trained graduate, as it were. And by releasing this openly so you could take it and adapt it, it's a massive development boom where you start seeing it everywhere.
Help me understand the mechanics of all this, because I think it's important. Stability has its own machine learning AI supercomputer. So you run up the cost of training these models for the first time. Yeah, you then release them as models, data and weights which any developer can take and use. And when the developer runs them, they run them on their own computing hardware, and then they're paying for that.
Yes, in some sense, they don't pay for it.
But if you want enterprise support, then you work with us and our partners, or if you want customized versions, because it's our view that you enterprise will want their own versions with their own data sets underlying it. Every country will want their own version because this is the next generation of infrastructure. The actual comparison is five G. This is five gv phonology works right, Yes, this is five G for creativity as it were, it's five G for information flow. And our trillion dollars has been spent on five G.
Right, so we can spend a lot more on AI systems across our economies. If we come back to your open source models, it strikes me that one of the things that you can do with them is you could make them very culturally relevant. And I think back to this idea that that sort of Western values get exported for every country regardless. Back in the day, when you register for Facebook, it would ask you what your marital status was, and it was sort of single, divorced, married or it's complicated.
Yes.
And my mum, who was in her late seventies at the time, is registering for Facebook and is on the phone to me game, what does it's complicated mean?
Because it just.
Didn't exist within her sort of mental space. And it seems like given how important AI is going to be as infrastructure, I mean, it is going to be the layer between me and the services that I access as a consumer, as a citizen or as an employee. It's a really critical gatekeeper. So is that part of your vision? An Indian version of chat GPT, a Brazilian version of chat GPT, in Indonesian version of chat GPT.
Yes.
My vision is that every person, company, country, culture has their own models that they themselves build and have the data sets for, because this is vital infrastructure to represent themselves, to extend their abilities.
How much of what you're saying is theory? Do you actually have national models being built across the global South?
A lot of this stuff is still in the research phase and now is only just entering the engineering phase, and so there's a lot we still don't know about these models. But they're good enough, fast enough, and cheap enough to do it.
Okay, even if the models are cheaper, you still have to get the relevant data because you know, the Internet doesn't necessarily have lots of information about Pakistani culture in Pakistani broadcasts in Pakistani media exactly.
And so what you need to have is you need to have Pakistani newspapers, Pakistani broadcast and then have Pakistani's come together to build better data sets that teach these models. We know the technology now, but we lack the data and so that is the key blocking point.
To get access to the data as is needed.
No to enable it so that as these data sets are built, the models can then come from there and then people can build on those models for their own people.
It almost sounds philanthropic, right, So what is your model for making money from these open source models that you are effectively giving away after all your hard work.
The whole of the infrastructure here in London in the West is all based on open source. And the model for open source is that you have an open version that anyone can use, they can start experimenting with, and then there are variant enterprise versions that you provide full support around other services and facilities integration, and you're making money that way with these models. We have our open models based on open data and we have open models based on licensed data from our partners. Because when you talk to regulated industries and others they want models they own. They don't want to send their data away to other people, and they want to know every single piece of data in there, and they want it to be the best data you know.
Essentially, your average young developer or a small startup can download your models and use them, but if there's an issue, there's not a lot of support. But if you're a big company, you're a national government, you might enter into a more detailed contract where there is support and advice and potentially even data yes coming through. We know that AI is a very powerful technology, and Stability has taken a different path to other firms by going through an open source approach, which could democratize the technology, making it culturally, linguistically, locally relevant for any nation, any business, any region, any individual. But you're up against firms like Microsoft and open AI and Google and deep Mind and others. How do you plan to compete?
I think there's a cab career on addressable market. Our addressable market is all the private data in the world. Data you can't send anyone but your personal data or enterprise data, financial regulated data, and so our models will go in and transform that into knowledge and we'll have a hybrid AI think, we've got our models on your private data, and we standardize all of that, make it very predictable, loads of support, and then you use these proprietary systems for the best outcomes. You'll have your own graduates that you hire, and you'll hire from McKinsey, and.
You'll put them together.
You'll put them together.
But I'm curious because other companies are taking a closed source approach to proprietary data. So there are companies like cohere and Anthropic who will build a powerful generative AI model just on a company's own private data. They're competing with you as well, right, and so why is your approach better than that?
They will not give that company ownership of that model. They will not share the detail of every single piece of data that's in that model.
But that's the case today with lots of the technology that we use. You know, when I'm running my e commerce application on the cloud, I don't know the details of every configuration of the servers that I'm renting from Amazon or Microsoft as are so businesses are used to that.
One hundred percent, and you have data that you can share with people, but there is a core of regulated data and other things hip a compliant data medical data that you cannot send to other companies, and you have to build your own systems for inside regulated environments. The feedback we've got from regulated entities and again from policymakers and others is open, transparent models, even if it's licensed data are something that we would like a lot, and we would like to own this technology if it's going to be vital infrastructure.
Right, we're in London. Now there's a deep bench of AI skills. How do we expand this and democratize it out to countries where there's just less talent in these breakthrough areas.
I think there's talent, it just hasn't been accessed.
And these models are very interesting and you can use AI to help you develop applications of the AI.
Quite a funny kind of recursion there.
Right, So you effectively will start to support places where perhaps the workforce doesn't have the depth of San Francisco's AI talent with the tools themselves.
Yes, because the AI models are pre computed, the actual running of the AI is very computationally non intensive. The creation of the AI is ridiculously intensive, right, So you've got all the energy at the start rather than the end.
So that takes us back to what stability will do. Stability will take the cost upfront, and then it'll find rich companies, rich nations, rich clients to tailor the models, which allows you to continue to make the base foundation model, which can then be given as open source to anyone else who yes.
It stimulates demand, and then as people go up and they need the support, they come to us and our partners. As any customization, they come to us on our partners, right, And it's for private data open models, and then other models are for data that you either are semi private or you don't mind, and you combine the two so you have models of both types.
There are, of course concerns with the safety of AI systems, and one argument is that with closed models that are controlled, there's a lot more safety because if I'm accessing it over the web, the organization that's running the AI system can can stop me if I'm trying to do something bad with it. And with open source models, of course they're just available for anyone to download, so the cat is literally out of the bag many many times over, millions of times over. Is your approach less safe than the closed approach?
I think it's more safe. There's a reason that our infrastructure is based on open source databases and servers and others because it can be checked, it can be tested, and it can be fully audited and battle tested. Our approach to stability is to create the standard around this, so there aren't thousands of different models. There is an entity in a partnership and an ecosystem that standardizes around this principles line safety, water marking, and other things, so it becomes predictable.
But you've put the models out for any bad actor, any hacker, any annoyed employee to build something difficult with.
We weren't the ones that came up with open models. We're standardizing it. We're supporting open innovation for detection and prevention as well as creation.
But it does sound like bad actors will end up having a little bit of a field day, which creates I suspect an enormous opportunity for an AI driven security and resilience industry.
The reality is that we're stronger together when things are open, and open is required for all the private, regulated and other data out there. If you don't have open systems, then you will only have proprietary entities, and they become the choke points on the Internet, and that's far more dangerous than the other side. Open is there anyway, But like I said, let's standardize it, let's make it safer, and let's work together to combat the bad as opposed to leaving it with a few unelected giant companies.
Part of the story of technology is that technology has been exported from one place really for everyone else to use. I think one exciting opportunity now is the idea that the people on whom this technology is going to operate could potentially build their own Now, many of those people are going to be in the Global South, and the premise of our conversation is that within five years, safe and open source generative AI could make communic for contribution to increase the GDP of those world's poorest nations. How likely is it that this vision could become reality? I think it's incredibly likely. The desired talent and passion to adopt technology like this is huge within the Global South, and it is where it can have the most impact the highest ROI. So I think they'll take the building blocks that we and others provide and they'll build some amazing things to activate their potential. Mada, it's a great vision. Thank you so much, my pleasure, thank you for having me. Reflecting on my conversation with Emmad, I'm reminded that much of the software that powers the Internet today, used by billions of US, is actually open source. It's proven to be resilient, stable, and importantly affordable. The open approach is one reason why the Internet is today ubiquitous, So why wouldn't that be true for generative AI? And if the technology can live up to its promise of improving productivity, wouldn't the open approach make it more widely accessible to the poorest countries in the world. That seems to make sense to me. Thanks for listening to the Exponentially podcast. If you enjoy the show, please leave a review or rating. It really does help others find us. Any podcast is presented by me Azeem Azar. The sound designer is Will Horrocks. The research was led by Chloe Ippah and music composed by Emily Green and John Zarcone. The show is produced by Frederick Cassella, Maria Garrilov and me Azeem Azar. Special thanks to Sage Bauman, Jeff Grocott and Magnus Henrikson. The executive producers are Andrew Barden, Adam Kamiski, and Kyle Kramer. David Rivella is the managing editor. Exponentially was created by Frederick Cassella and is an e to the pie I plus one Limited production in association with Bloomberg LC