Andrew Mason is the founder and CEO of Descript. Descript's software has made editing audio and video much simpler.
The company recently received a large investment from OpenAI, the company behind ChatGPT. It's a sign that Descript is moving toward using generative AI to generate words and pictures. What will that mean for the people who currently do that work?
Pushkin.
After every interview we do for the show, we upload the audio to a piece of software called descript. Descript turns the audio into a transcript, and then I can edit that transcript, cut out the boring parts, move sections around, and when I do that, descript edits the underlying audio to match. As software, Descript is pretty janky, buggy. It's constantly changing in ways that can make it hard to use, and sometimes it just blows stuff up. But we use it anyway because Descript is an incredible.
Advance over what came before.
Before descript audio software represented audio files not as words that you can read and edit, but as waveforms, as squiggly lines presented.
On a timeline. So when does the script came along?
Being able to edit audio by editing words on a screen was this huge advance, and it was an advance made possible by artificial intelligence. Eventually, Descript expanded to allow people to edit not just audio but also video, and last fall, open AI, the company that makes chat GPT, led a fifty million dollar.
Investment round in Descript.
It's a sign that Descript is moving out to the new AI frontier, the frontier of generative AI. AI that creates words and pictures. This is of immediate interest to me, as in is AI gonna help me do my job? Is AI gonna do my job?
But there is also a bigger question here what is AI going to mean?
More broadly for people whose jobs involve writing things and creating visuals, which is to say, what is AI going to mean for almost all white collar workers. I'm Jacob Goldstein and this is What's Your Problem, a show about people trying to make technological progress. My guest today is Andrew Mason, founder and CEO of descript or maybe it's descript by the way, I've always said descript and I'm pretty sure that's wrong, right, it's a descript like dtour.
Weird noncommittal on the issue.
Let's do the subjective version.
You're just one man. How do you say the name of your company?
Yeah, I've kind of cultivated the ability to flip between them as I speak.
You're killing me.
The world still needs a little mystery.
Okay, how about this? Say your name and your job.
My name is Andrew Mason. I work at descript. That's dscript dscript.
Well played.
Earlier in his career. Andrew Mason was the co founder of groupunk. He took the company public and then got fired after its stock fell by something like seventy five percent. After that, he started a company called Dtour or maybe it's I don't know. The company made these highly produced audio walking tours that you could listen to on your phone. In that job, Andrew saw the challenges of working with the old waveform.
Based audio editing software.
At the same time, AI generated transcripts were getting better and cheaper, and new technology was making it possible to automatically match a transcript to an audio file. Andrew looked at those two developments and thought, we should make an audio editor that works like a word processor, which he admits was a distraction from what he was supposed to be doing, which was making walking tours.
If I'm being honest, it was a bit of an indulgence. It just felt like an incredibly cool problem to work on. I went to school for music technology and worked in a recording studio after I graduated, and just always loved tools, and audio visual tools in particular. It was just so fun to start thinking about this puzzle.
Uh huh.
So we told ourselves it was kind of way of diversifying, but that's just like a ridiculous way for a product that at like a startup that just hasn't even found product market fit in their core product to be thinking about the world. You know, all of the advice textbooks will tell you not to do that, and it's probably generally good advice, but it was just it was just irresistible.
You know.
So I am a fan of descript. I started using it around when it came out several years ago. Certainly I think it's great. It is kind of janky, and it's always kind of janky, right, And my guess is jankie meaning like a little bit unstable, things don't quite work. It's always telling you to restart.
By the way, if I'm not sure, if I have your side, so I may ask you to send me this entire portion of the interview so I can share it with the team.
So the thing is, like, I wonder, why is it always kind of janky? Why is it never just like stable and it works? And my guess is it's because you're pushing forward really fast, right, You're trying to make it better and better and better, and presumably there is some trace off, right, like the faster you try and push it forward, the more janky it's gonna be. You could, I'm sure, just perfect the way it was four years ago, but then it would never get better, but it would be stable, right, And so this is like a big whatever startup founder type question, like, is that some balance you're always trying to figure out how fast do we iterate versus how much do we try and make it just stable and work.
Yeah, that's an astute observation, not the fact that it's janky. That doesn't take a genius.
Respectful but respectfully as a fan.
I'm telling you it's.
But I think like your your attempt to make sense out of it, I think like a good story to tell here is maybe like going back to the to the very beginning of Descript. So when we became Descript, we sold off Detour to Bows and we decided to just focus on building out this media word processor thing. And some of the public radio producers who had worked at Detour went on back into public radio and they became some of the earliest customers of dscript. And what we found was that they pushed it so much farther than we were ready for.
Ah, so quickly, what do you mean by that? Like, what is an example of that.
Yeah, I mean specifically in the case of like some of these shows, it means putting together three to five hour cuts of tape from many different files, with lots like tons and tons of edits and notes mixed into the edits, and just like stuff that we hadn't pressure tested from a performance just giant.
The files are really big, right, Like a three hour audio file is actually a giant file, right, And if you're stacking up a bunch of those, so you have all these giant files and you're making tons of cuts, that's just like computationally intensive, that kind of thing storage intensive.
Yeah, it was just something that we hadn't that we hadn't optimized for. It's it's an eminently solvable problem, but it was something that in the earliest versions we hadn't done. And so that is kind of in many ways been the story of descript up to this point, where there's there's been that element of it, and there were kind of realities of needing to make quick progress that we had to balance against stability and what we had for our customers in terms of like the core product idea of being able to edit by text was still for them so much better than the alternative that there was just a tolerance of the stability issues that honestly made us sick to our stomachs that we had to put people through. And it's not like wegn but it was like we had to make trade offs there. So all of this pushing kind of culminated with this release of a pretty major overhaul that we did at the end of last year and since then, since last November and really like through the first half of this year, is when we think we start to get to a good place. Our goal is that if we're having this conversation, like we're not going to be having the same conversation in say July, for sure at the very latest, like the conversation we'll be having with someone like you will be Wow, it's gotten. It's like not an issue anymore.
So you say all that, but also, you just got this big investment from open Ai. You got a thing on descript that says sign up to try GPT four with Descript, which I just signed up for and I'm very curious about That doesn't sound like, oh, we've arrived and now we've got our product and we've just got a hone it that sounds like there's this whole giant new universe of things you were about to try and figure out.
That's true, And that's the funny thing about all of this are is that at the same time that we're turning to focus on quality, it's a moment where generative AI has arrived at a scale and with a force that no one really saw coming this quickly.
So so okay, I know from the beginning Descript was built on top of AI, you know, the technology for transcription for matching audio to text, but was Descript itself an AI company.
So we had some really smart people on the team in UH with machine learning experience, but I wouldn't say in the early days we were like a company that was with anybody that was doing like original AI research or anything like that. We saw that as a gap that we wanted to solve. And so I forget exactly what year it was, it was maybe about four years ago we saw this company called Liarbird. It was a a company out of y Combinator with some really smart PhD candidates. They had built model that would build a clone of your voice based on I think about three minutes or five minutes of training data. Of just talking to it.
Let me just say, I know Liarbird is spelled l yri, but I assume they're aware.
Of the hominem.
Right, this is a thing that is cloning your voice so that you can make it sound like you're talking even if you're not talking. And the company is called Liarbird, and this is a somewhat fraud thing, right, Like, I feel like they're throwing it in my face that this is a sketchy product that they're developing.
Did it cross your mind?
Did it cross my mind? Is like the ethical quandary that we were getting into, or like the branding implications of the name.
More the ethical quandary.
Yeah, the ethical quandary absolutely entered our mind. And our point of view on that, and has been our point of view on these things in general, has been that we don't want to be like out there paving the way for any new paths to the apocalypse, so to speak. We actually think, like have always felt like not really sure how society was going to put the brakes on this sort of thing. We just knew that we didn't want to be part of it, and we tried to put guardrails in place on our product. That would make it easy to stay off the slippery slope. So in the case of Lyyerbird, which once we bought them, we integrated their technology and released it as something that we call overdub. It's a way that you can clone your voice. We require you to authenticate that it's actually you, and we only let you clone your own voice, and that's worked really well. We're now in a world where there's other people that have similar models and they're not putting those protections in place. And the use case that we've always been the most excited about is making it possible to edit your natural recordings, so going in and changing an individual word, and we've built some special stuff that will kind of listen to the audio on either sides and make sure that it blends in. From an intonation perspective, we started with the ability to delete stuff and move stuff around. Now you can just type and really make it feel like it's a word processor.
Presumably the better you get, the better the technology you use to Clona Voice gets, the more words it can do. Right, I mean, every week, for what's your problem? I write a little introduction and then I read it. But presumably at some point overdub will be good enough that no one knows will know whether it's me reading it or I'm just typing it right.
We have a new version of overdub that will release in the next couple of months, and it's the first time that I've heard my own voice doing a narration of something that made me say, like, this sounds so much like me in a way that it's not distracting or the AI does not get in the way.
Can I try that new version now, like, not this minute, but like for the show?
Yeah, for the show.
I bet we could find a way to do it. It's just so you could hear it and stuff.
There's a universe where I say, at this moment in the show, guess what today? That voice me reading the intro at the top of the show that was overdubbed.
It wasn't really made.
Yeah, we tried overdubb for the voice doing the intro at the top of the show.
And we decided it wasn't quite.
Good enough, but we decided it would work for this part of the show.
What you're hearing right now, it's not really me. It's overdubbed.
In a minute, what overdubb and chat GPT and generative AI will mean for descript and for the.
World and also for me.
Now back to the show, descript is expanding from podcasts to video, and it just took a big investment from open Ai, the company that makes chat GPT, and also this system called Dolly that uses AI to generate images. So Descript is clearly pointing toward a future where it's going to be software for creating AI generated or at least AI.
Enhanced audio and video.
And I asked Andrew, what does that future look like? How is generative AI going to work in descript?
I don't think we know entirely yet. In a lot of ways, it feels to me like you're letting this alien into into your app. You're just giving it the keys and then the interfaces. How do you find how do you find a way to kind of give the aliens some buttons in tier UI, give them the ability to press the buttons, and then how do you talk to the alien?
What do you mean? Like?
That is a striking metaphor a little scarier right. It suggests a certain level of uncertainty and potential downside. It's not like, oh, this is great, this is going to solve a problem like, why do you say it's like letting an alien.
In as opposed to letting a human in.
It's a really interesting choice of words. Tell me more about it.
So let's start by just saying, like, very specifically, what I mean. I think, when implemented, well, what this will feel like is as if you had a co editor in a document with you, in our case, in a video or a podcast that you're working on that is smart, knows how to do everything, definitely knows how to do the tedios busy work, and you can kind of kind of guide or direct through giving these tasks. You know, it's almost like it's the production assistant or something like that, and you're the director and you're able to just guide it and give it feedback on how it's doing and what it's doing well and what it's not doing well.
There's a version of it where it's like we've gotten used to the graphical user interface, right, We've been trained since the Magintosh computer in the mid nineteen eighties that the way you interact with a computer is like there's little pictures and little folders and you point.
And click one way or another, right, and.
One possibility here is the new standard interface is chat. You just type in like whatever, please trim all the ums from this file, or even please turn this thirty minute interview into a twenty minute interview, and the way that makes it most interesting, right, and you just type that in and it happens.
I mean that's a version of what I hear you saying there.
I think some people believe that that chat or a text field will become the primary interface for making things. I think of it more as like it's the primary interface for interacting with the alien, and then you and the alien are still going to be working, like have other buttons that they can press. You still, sometimes you just want to take the thing in your hands and do it yourself.
The alien metaphor, I mean there's a real like do we welcome our alien overlord's question? When you choose that metaphor, it makes.
Me I mean, maybe it feels that way.
It doesn't it it doesn't make me feel better.
I'll say that I think it feels the way that an alien arrival would probably feel, where you know, maybe you shake its hand and immediately it has something in its skin that cures your cancer, and you feel hopeful, but you also want to know what they're up.
To and yeah, and cure your cancer is definitely the happy version.
Not usually in the alien movie.
What happens, but I guess that could happen.
Well, there's the good there's a good part, right, But you never really know, I think is the point. And I think we're all living in this kind of like pushing forward in this mystery, kind of kind of stuck between awe and terror.
You sound more ambivalent than I might have thought. Why is that because you just took a giant investment from open AI.
I think like at moments like this, you have a choice between either renunciation and just like stopping and out of from a place of fear. Which maybe that's right, you know, maybe fulfillment and happiness everything we have for that is is already here, and we should focus our energies on making peace with our inevitable death.
In any case, we should do that.
But go on the other way to think of it is to just forge ahead and realize that the potential of what's on the other end of this might make us feel in retrospect like we were just in the earliest possible innings of our of the human experiment. So you know, I feel like we're all going to die one way or another, might as well forge ahead. It's not ambivalence, but it's more just being clear eyed about the fact that not trying to pretend that there's parts of it that don't seem scary.
I mean, one of the things that's really striking to me with AI, and that seems quite different from other technologies in the past, is the people who are working on it, the people who really understand it, seem more scared than everybody else.
I'm not a first time founder. I went through the experience of being a young person building building group on telling myself a story about how it was going to revolutionize local commerce and all the good stuff, and it just didn't turn out that way. And I think we've seen a generation of tech companies that just like didn't turn out the way that the the super Rose colored Glasses mission statement would have suggested. And I think we're just trying to be we just have that experience, that recent experience at top of mind, and are trying to think about it in a way that has guardrails around around repeating that history and just make sure we're really proud of what we build. Does that make sense?
It makes sense.
Am I Am I going to regret saying all this?
I don't think so.
You haven't said anything like incriminating as far as I can tell. You know, I heard somebody saying the other day, like, it's an interesting question to ask somebody like, what was the first thing you asked GPT chet GPT to do? And the first thing I asked chet GPT to do was write an episode of Planet Money podcast I used to host, of which there are you know, a thousand transcripts on the internet. Write an episode of Planet Money about whether the FED is going to raise interest rates by twenty five basis points or leave them unchanged, right, And it wrote something that was pretty good, like not a whole show, it's not there now, but at the rate of current improvement, you could definitely imagine it writing that episode pretty well in whatever a year or two years or some amount of time when I will still want to be gainfully employed. And like I do wonder on this one, is there a day slash? How far are we from.
The day when generative a I can just make a podcast without me?
How does that make you feel?
I mean somewhat afraid, also like interested in figuring out how to use it, right, Like it feels like a steamroller. It's like, oh, maybe I should go get in that steamroller. If my choices are get in the steamroller or get run over by it.
Yeah, I think, like before I comment on it, I think it's important that people understand, Like it's very true that, like it's easy to think that I'll have a bullshitty answer to a question like this because I work at a tech company that's working on a lot of this stuff. But you have to remember that, like, if that's true, we're out of jobs as soon as like a human is no longer in the loop. That's really bad for us. Like does that make sense to you buy that.
At some margin?
Right, there's a long way between all the people who are doing it now and zero people. There's a lot of intermediate cases between the way it is now and like a fully AI generated podcast, right, and like we're already starting down the road, right, getting AI to write show notes or something that's basically has happened now. And you know, like I know the history of technology and the labor market pretty well, you know, from the Industrial Revolution on.
I'm pro.
Technological innovation. I believe in productivity gains and efficiency gains. I'm also aware that there are instances when highly skilled crafts people are displaced by technology. Right, that is definitely a thing that happens. And I recognize that the pie gets bigger and everybody's better off than the long run, But like, I just want to not get pinched, right, I just want to be you know, you don't want to be the one.
I don't want to be the one. And you know.
I'm not out on using it. It's getting really good, really fast. It's doing a lot of the things that I can do.
There's one other thing I wanted to say, just about the fear for your job thing, which is something we say around here a lot, is that you should struggle with your story and not your tools. That's almost like a guiding light for us, is we want to take all of the cognitive friction away from using the tools. The funny thing about all of these things is like there's a brief moment in time where you feel like you have superpowers, but then everybody has them, and humans once again become the differentiator. And we really think to make like making great stuff is always going to be a thing, and great is always going to be determined by the human that's in the loop.
I mean, you know, there's this story about chess, right, a computer chess program beat a person a long time ago, decades ago now. But then after that people pointed out the fact optimistically from my point of view, that a computer plus a person could still beat any computer. Right, a person working with a computer was better than the best computer in the world. And that was like the metaphor for like, yes, if we work with machines, we can be better. That is no longer true now the computer's kept getting better, and now people can't make them better. Even a person plus a computer cannot beat a computer. And I know that chess is less complex than the real world, and so perhaps still a reason for optimism. I certainly think I'm clever and good at making podcasts and hope that I can do that. I hope that I can work with AI to make something better than ANYII or more like me or something.
It's it might not be true, though, but here's the amazing thing. People are still playing chess. Right. It's like true, there's some separation. Some separation happens where the machines become so good and we just say, okay, you you machines, you go off and do your thing, and we're going to be here kind of reveling in our humanity with each other. I think what we'll see is there's there's going to be a certain category of content that's really just about like the transmission of bits of information from your brain to my brain, and that's all that it's about.
That.
Maybe we do one day see humans taken out of the loop, but I really do believe there will always be space for like at the core great content, storytelling, whatever you call it, it's it's about feeling connected to the humans and other people. And as soon as machines play to have too heavy a hand, it's just not interesting anymore.
We'll be back in a minute with the Lightning Round, which includes a message from Andrew to.
His future self. That's the end of the ads. Now we're going back to the show.
Okay, so this is the Lightning Round, now you ready. It's just a bunch of questions. Do you use generitive AI in your life outside of work?
Now?
You know what's interesting. I did something this morning where I was actually like, I don't I don't even care if it's wrong. I don't even care if it's.
Like the test of a theory is not is it correct?
But is it interesting?
Yeah? Exactly. I was asking it about I think, like my son got hit in the head with a baseball, and I was trying to I really should care about this. Actually, you should.
Not ask chat GPT anything significant.
About that, not to give your parents advice.
It's stuff like that, like I've I've pretty quickly been.
Able to like that you should not be asking for medical advice about your child.
I know. But when I say stuff like that, like I would have googled it and probably just done what I was going to do anyway. So it was almost just a curiosity. He was fine, He didn't.
Need to go see a doctor, not according to JGBT. No, I'm curious about your time working in a recording studio, right, You worked in a recording studio where musicians came in and recorded. Did you see there any like moments of musical genius?
Is there one?
In particular?
I worked for this guy named Steve Albini, who is a pretty well known engineer producer that was in some popular kind of punk rock bands in the in the eighties and currently and definitely saw some cool bands. But I think also I really feel like I learned a ton from watching him work. He's so talented, so articulate, so smart in many ways, like an example of what I aspired to be at the time, and so seeing that output, but then also seeing him every day and how hard he worked, it was a real like, oh, this is how it happens kind of moment for me, and it kind of inspired me. It inspired within me a kind of work ethic that I'm not sure I would have gotten to otherwise.
What's the best deal you ever got from group on?
Man? You know, it's so funny because, like obviously I was asked. I used to be asked that question all the time. I think it was a sensory deprivation tank. They had a sensory deprivation tank center in somewhere in Chicago. Had never tried. It was really cool.
This is a descript question. Now, how will you know when it's time to do something else?
But leave? Dude?
I don't know if I want to say this on a podcast, because if I do decide to take the company public, it'll come back to haunt me. But I almost want to say it specifically for that reason, Andrew, I'm talking to future Andrew right now. You do not want to be a public company CEO again, Okay, hire someone else to do that. I know you're talking yourself into it and saying it's going to be different time. It's okay, but you hate it. It's the things that those people are good at is and are interested in is different than you go do something else.
Amazing.
I've never had someone leave themselves at time. Tast a lot of podcasts before.
I'm going to send that to you. If you go public, I'm to have you back on the show and I'm going to play it to you.
Thank you, Thank you for being so generous with your time. I appreciate your candor and I'm grateful for that.
I appreciate that I had. I had fun too. You're good at your job in the sense that, like uh you, you bring it out in me.
I'm better than a machine for now. It's gonna that's my model, better than a machine for now.
Andrew Mason is the founder and CEO of Descript. Today's show was edited by Sarah Nix, produced by Edith Russolo, and.
Engineered by Amanda k Wong. I'm Jacob Goldstein.
We'll be back next week with another episode of What's Your Problem? And here, finally is the top of today's show. The intro to the show as read if that's what you'd call it, as generated by overdub descripts AI, powered voice whatever emulator. After every interview we do for the show, we upload the audio to a piece of soft were called the script. Descript turns the audio into a transcript, and then I can edit the transcript, cut out the boring parts, move sections around, and when I do that, descript edits the underlying audio to match. As software, descript is pretty janky, it's buggy, it's constantly changing in ways that can make it hard to use, and sometimes it just blows stuff up. But we use it anyway because descript is an incredible advance over what came before. Before descript audio software represented audio files not as words, but as waveforms, squiggly lines presented on a timeline. So when descript came along, being able to edit audio by editing words on a screen was a huge advance, and it was an advance made possible by artificial intelligence.