The sold-out chips at the heart of AI

Published Oct 19, 2023, 4:05 AM

Brannin McBee is the co-founder and Chief Strategy Officer at CoreWeave.

A few years ago, he and a few friends started buying hardware to mine cryptocurrency. It turns out, the same hardware -- chips known as GPUs -- is essential for running state-of-the-art AI models. Today, Brannin and his friends have turned their hobby into a company that’s competing against some of the biggest companies in the world to provide the hardware and computing power to run AI.

Pushkin. Brandon McBee was a commodities trader. His job was buying and selling things like natural gas and soybeans, and in twenty seventeen, he and a few of his friends, who were also commodities traders, got interested in crypto obviously, and as a hobby, Brand and his friends started buying specialized computer hardware to mine crypto, specifically to mind a token called ether. They were setting up computers in their offices and in garages, and then after a little while, other traders they knew wanted in on the action, and so Brand and his friends started buying more hardware and charging the other traders to use it.

We were basically retrofitting warehouses in New Jersey.

That sounds sketchy, it was.

It was extremely sketchy, lots of wires everywhere, and could it be any more different than where we're at today? But we figured out how to run tens of thousands of GPUs in the harshest environments imaginable GPUs.

Those were the key pieces of hardware that Brandon and his friends were using. GPU stands for graphics processing unit, and GPUs were originally developed to power things like video games, but they also turn out to be really good for some other things like mining cryptocurrency, running complex biological simulations, and also GPUs are really good for AI. And as Brandon and his buddies kept buying more and more GPUs, they thought to themselves, maybe we shouldn't just rent these out to people who want to mind crypto. Maybe we should see if people want to use them for other things like training and running AI. That turned out to be a very good call. Today, companies building and running AI models are clamoring to get access to GPUs, and Brannon and his friend's hobby has grown into a multi billion dollar company called core Weave that competes against companies like Google, Microsoft, and Amazon. I'm Jacob Goldstein. This is What's Your Problem, the show where I talk to people who are trying to make technological progress. My guest today is Brandon b He's the chief strategy officer at Coreweek and I wanted to talk to Brandon because you know, obviously we talk about AI all the time, and usually we talk about it as this very abstract thing, but in fact you need physical stuff for AI to exist. You need in particular GPUs, And so if you want to understand what's going on with AI. You need to understand what's going on with GPUs, and Brannon is at the center of that.

We grew up building computers, right like I grew up building computers with my brother.

Like were you a gamer? Is that one? Yeah?

Absolutely?

Yes, yeah, like in then you were very early to GPUs.

Yes, yeah, in the nineties, like well as your Christmas present it was to go pick your new hard drive, or to get like a new sticker ram and bring it into your computer, or to you know, raid your uncle's office for their old compute equipment and take parts out of it.

Uh huh.

It's just a really fun space to be.

Then let's start macro and go micro. Okay, so like macro, Like, let me think of planet Earth, Like where on planet Earth are the data centers that you are in?

So we operate in North America currently. I think you'll see us expand globally within the next twelve eighteen months or so.

Where in North America? Like how many data centers in?

Yeah? So in North America will be in about fourteen data centers by Q one. Okay, So that's co located, which means someone else built and operates the actual shell and all the power and cooling and all these like really engineering tins of components within that data center. Then we come in and build out our infrastructure and the infrastructure and the data center. It's one of the coolest things you can ever walk into, right, It's like walking into a spaceship.

I love it. So so tell me what it's like when you go to one of these data senters. What does it look like? Just like a giant warehouse out in the middle of nowhere, somewhere where power is cheap.

Uh, you know, power's in there, but it's more about redundancy of power. It's more about connectivity to like the broader Internet backbone across North America. And it's about security as well. So when you pull up to one of these sites, you know it's it's twenty foot walls, it's security guards, it's you know, multiple badges worth of security to get back to where you need to. It's no cameras, it's completely clean rooms like, no food, no drinks, Like you can't have anything being pulled into the service, right because there's so much airflow moving through you know, the entire room.

Okay, so you badge, you badge the security guy, you park, you get in, walk into the room. What's it look like? Is it hot? Is allowed?

It is? It feels like it's screaming at you. It is extremely loud. You can barely have a conversation. And that's a reflection of the amount of power intensity ultimately, right, because that's just fans pushing air through the servers and very aggreat to cool them, to cool them, yes, because they're consuming power on an extremely dense basis. Right, So the more power you consume in a smaller area, the more you know, socialated cooling you have to there. So it's loud, and i'd say one of the more interesting things that you're in there, it's the miles of fiber optic that's required to transport information and connect everything. So a typical let's say sixteen thousand GPU cluster that we build, it has forty eight thousand discrete connections amongst it that all have to be done perfectly. Then it has about five hundred miles of fiber optic cabling run between all those connections. So this is one sort of cluster that you have built. Yeah, and it's like one part of a bigger data center. Yeah, that's right.

Okay, So let's talk. Let's talk about the cluster. You said it has forty eight thousand connections, meaning like does a human being when you're building it, actually have to go and like plug forty eight thousand things into forty eight thousand things one at a time, no or yes.

Yes, yeah, And it all has to be done correctly. And then all the components have to provision correctly. When I say provisioning, that means they have to get their you know, operational instructions, like you turn something on, it doesn't know who it is, right, like what networks of what it's supposed to do. So you have to develop a bunch of bunch of software that we've done that informs the machine. Hey, hello world, here's what we'd like for you to do. So that that's called provisioning.

Okay, so forty eight thousand connections and how many GPUs did you say are in this cluster?

So that that's for a sixteen thousand GPU clubs.

And so so we've started at the macro, We started a planning that we're getting, then we got the data center, and then we got your cluster, and then now let's get to that GPU because like that is a really interesting. It's the center of this whole thing, right in this cluster you're talking about, Is it, in fact the.

Nvidia H one hundred, Yes, oh go, that's right.

So I want to talk about the Nvidia H one hundred for a minute, because like it's a big deal in the world right now, right like everybody wants them. Apparently you can't get them, or you can't get as many as you want. You can search for Nvidia H one hundred on Amazon and it comes up, and the first one that came up for me costs thirty seven thousand, seven hundred and ninety dollars, doesn't say weather shipping is included. And that's just one of them. That's just one of them. And you're talking about how many in this cluster.

So that would be sixteen thousand and a sixteen thousand GPU cluster, and we're you know, we're building multiple of those.

And so presumably you're not paying retail one hopes you're not paying thirty seven thousand dollars per But but what is the order of magnitude of what the what the GPUs cost you in one of these clusters.

We're not going to talk about that.

Okay, but Let's be clear, I'm just the same as everyone else's thirty seven thousand.

Yeah, it's it's less. It's it's going to be comparable with the what the hyperscalers payers, or so we think right end of the day, we don't.

Really know how much does it cost to build one of these clusters, like to an order of magnitude? Sure, one hundred million dollars, is that the right order of magnitude.

It's in the hundreds of millions of dollars, that's.

Correct to build one of these clusters.

It's a built one sixteen thousand GPU cluster. But to be clear, like we build clusters of different sizes, we build multiples of these clusters. It's it's all kind of based on what that end client is looking for. And you know we do on demand compute. May all kind of qualified as our our capital expins at the company levels measured in the billions of dollars a year.

Sure, it's a very capital intensive business because you have to buy and for the fundamental reason that you have to buy these GPUs that everybody wants that are profoundly expensive.

That's correct.

I mean an interesting twist in this part of the story is you know, in Vidia is this really interesting company right now. Their stock has gone crazy this year because they make the GPUs that are in wild demand right now, and they are also an investor in your company, right which is very interesting. Like how does that piece of it, that in Vidia investment in core weave, Like, how does that work in terms of the broader dynamics.

Yeah, I think you see strategic investors on cap tables pretty often. We have a differentiated product, We have a fantastic way to get their infrastructure into the market, but we don't have any preferential treatment. We don't have any prioritization.

We're just not easier for you to get the GPUs.

No, it is not easier for us to get the GPUs whatsoever.

So these GPUs, these Nvidia GPUs, why is this the one that everybody wants? What's so special about the H one hundred?

So Jensen and the Vidia team did this.

Jensen is the founder c Yeah.

Yes, did this amazing job of enabling the artificial intelligence space from an early early onset, like back in the early twenty tens, by releasing the drivers, the software that actually control the GPUs into the market and making it accessible to those developers. Right, so when you go into like GitHub, for example, you search AI project, they're all going to come back referencing Kuda drivers, which is the software in Cressent right there, using x eighty six.

So Nvidia very cleverly sort of created the language that most of the people who write software, who write AI software speak essentially you, that's totally right. And so if there might be another chip where the sort of specs are comparable, but the language is different, and everybody speaks this one language and there's sort of a network effect at play where it's like everybody develops on this chip because everybody develops on this chip, and so that is like a classic killer network effect.

Yep, they just did an amazing job.

Interesting, you're competing against Amazon, Microsoft, Google, and Oracle, so more or less the biggest companies in the world. Yes, you're like four guys who were just in a warehouse in New Jersey, Like what what happened there?

Like?

How is that happening?

It's ultimately it was through that specialization dot that really goddess thing. Like you have an at scale existing market and you figure out how to do something better. You've decommoditized a market through performance. So the analogy I like is like, we figured out how to refine a barrel of oil better.

Yeah, so everyone has to come use us because you can do it more efficiently, more cheaply.

Yeah, which means it's more efficient for our clients. At the end of the day.

Presumably these companies are spending billions, tens of billions, maybe hundreds of billions of dollars to catch up with you. I mean, I wouldn't be inclined to bet against them, respectfully.

I couldn't agree with you more. They absolutely Isn't that bad for you?

Isn't that scary for you?

Like?

Why are you talking about to be like sprinting to try and not get caught by Google and Amazon?

Look, these are amazing companies with amazing depth of engineering talent and a massive amount of resources. To your point, they're the largest companies on the Planet's super smart. It's very good.

At building big, efficient things.

Yes, but they are generalized. These are huge teams, and we both know how long it takes to move a big ship in a new direction. And I mean, to put a sense of scale to it, They're working with tens of thousands of engineers. We're in the hundreds of engineers, and it just allows for us to focus on a task much more discreetly. This is all we do. We build supercomputers. I'm not trying to serve every single corner of the cloud. I'm trying to serve the part of the cloud that needs access to supercomputers, and we've just become really good at doing it.

We'll be back in a minute after a break for some ants. I know some of your clients I've read about, right, I imagine some might be private, but some are public, right, Like, so, who are some of the people who are using the things that you have built?

Some of the public ones. Inflection AI is one that we really enjoy working with that team, especially on Jeanie most.

Of us solely my right, who was one of the founders of DeepMind and who just wrote a book and who are trying to book for the show. That's his company, right, that's the company that he founded. That's correct Mind, And so they are training like a big sort of a GBT size model, right, and they're using your computers to do it.

That's right, yeap?

Who else you know?

A smaller team that we just love. The engineers at is a company called novel Ai. Novel Ai has kind of a storytelling based inference product and market. They do text to image, text to story. But there's we have over six hundred AI clients on our platform today.

And are you still doing you know, drug discovery stuff. Are you still doing entertainment stuff? Or has AI? Is the demand for AI and willingness to pay just so high that it's mostly.

That now, that's just where all the attention is is on AI. But we're absolutely cruiting clients on the media and entertainment side and the drug discovery side.

And is it getting way more expensive for those clients because of the AI demand? Are they in the sort of unfortunate kind of backwash of like this thing they need to use just got wildly more expensive because there's so much demand and so much money flowing in for these other use cases.

We have not increased pricing due to that. I think what you've seen instead is just it's just more difficult for people to get.

To Supply is constrained. Price hasn't gone up, but supply is constrained. Well, let's let's talk about that, because supply constraints are like an interesting part of the story right there. Was a quote from a guy at OpenAI who's like, what people, what my friends talk about when we get together is like, how are we going to get more GPUs? Like have you heard where are their GPUs? It's like it's like mad Max or something, but instead of fuel, what everybody needs is GPUs. Like that is really interesting, right, Like one doesn't think of technology these you know, tech firms with all the money they could want constrained because they just can't get these chips, these GPUs at any price. Like that's a really interesting state of affairs.

Yes, and I think the driver ultimately look like AI software adoption is the steepest adoption curve we've ever seen, and we're trying to build supercomputers at that pace, and it's a massive logistical and engineering challenge build to keep pace with that.

Duct right, stuff doesn't scale like tof.

Is a massive engineering challenge as well, Like it.

Are the GPUs the rate limiting step? I mean, even if you had them, it would be hard, But is the fundamental it's a good question bottleneck the GPUs.

GPUs themselves are an extremely high demand, but increasingly it's data center space, which which we've found has been extremely interesting because data center space is an area that's been undercapitalized or underinvested in for probably the last decade or so, and there's only a certain number of data centers that can actually run this infrastructure due to that increase in power density relative to kind of legacy infratis sructual.

Right, So, like in this given amount of space, the kinds of clusters you're building, kinds of machines you're building, are going to use way more power. And that's a problem because these in a given amount of these BEATA centers aren't set up to provide that much power per unit space.

And cooling most more importantly as well, like you have to be able to cool that area, and.

They generate more heat because they consume more.

Power exactly right. So there's only a certain number of data centers in the US I'm really speaking to the US here that can run that. And now that's the bottleneck, and that's going to be the bottleneck for the next two years arguably, because like all that space has now been leased.

What are you trying to figure out that you have not yet figured out?

It's that scaling, it's that onslaught of demand. I think we're we have it figured out, We know the engineering, we have the right product, we have the capital, the funding to go do it. We have the right people on the ground, we have the right management team. Now it's just growth like we have the next twelve to eighteen months in front of us outlined, and it's just go build, like, be focused on building. It's arguably some of the largest technology infrastructure projects and the world are happening at our company.

When you think about like the next year, you have to spend billions of dollars, billions of dollars, it's like three guys who are playing around in the garage four years ago spending billions of dollars, like you worried you might do it wrong.

If I worried, so, I think it's an excellent question. But we've also had some of the most well informed, intelligent participants in the artificial intelligence space look at our engineering teams, look at our product and say that's who I'm going with. We're doing it every day, We're building it every day, and now it's just keep doing it, remain focused, keep building. Don't get this dude.

Now you're just grinding me like you seem super successful, but you're telling me is now you just have to grind it out for a year and like every day.

Grind it out and keep growing. Yes, we'll be back in.

A minute with the lightning Okay, now let's do the lightning round. How many screens do you have? How many monitors do you have on your computer? Six? It's like former commodities trader once, commodities trader always.

I can't help it. I literally can't help it.

It's so like one screen to you, you feel like you're looking at an Apple Watch or something. One monitor. Do you use all six monitors?

I do? I use all six. Now, whether it's all relevant at the same time, I don't know, but but I do use.

Also looking at six different things.

No, it's like dedicated chat Windows. There's dedicated you know, browser, Windows, information Windows. I can get stats on all of our infrastructure. I just can't help it. I can't get that commodity trader out of me.

Will you go to eight?

I've been at eight. I stepped down to six. Six.

It hurt me a bit it As a former commodities trader, what do you understand about the world that most people don't?

Risk management say more so acknowledging that there's always going to be trade offs and what you do, but reducing the potential downside of that trade off as much as possible through through hedging, through negotiations, through contractual agreements, and being able to filter out the noise.

Overrated or underrated? Bitcoin underrated, Ethereum overrated. I would not have guessed Bitcoin underrated, Ethereum overrated, given in particular your experience that you started your company mining ethereum. Why why is eum overrated and why is Bitcoin underrated?

I think ethereum had so many promises.

Of what it would do global computer.

I just what are they delivered on. I just haven't seen like any real use cases. They've never got enterprise adoption, nothing, whereas Bitcoin is getting government It was.

Supposed to be new global money. It was. So why do you think it's underrated?

Because I think it's still moving that direction. It has the potential for it. It remains really interesting. It's slow moving, but it's it's making progress, whereas I feel like Ethereum has kind of been more sideways, if anything, interesting to down.

Even what's one time when you gave up on something?

Oh my gosh, I mean in the trading world, every single day, like.

You abandon a thesis, basically abandon a trade.

Is that? Yeah, I mean look like you're you're wrong every single day. And that was like one of the really interesting kind of outcomes of spending a decade in that career is is learning how to be wrong. You're wrong all the time, but it's it's like, how do you admit you're wrong, accept it, learn from it, and move on to the next. And I thought that the commodity trading world was just such an fascinating platform for that. Just get it reinforced every single day. It's not like a quarterly performance, It's an intra day, twenty times a day.

Sing it on eight screens in front of your face in real time.

I said, red numbers telling you how badly you're doing. Like it's immediately quantified. So I've been wrong a lot.

Great, I appreciate your time.

Yeah, that was fantastic. Really enjoyed the conversation.

Brandon McNee is the chief strategy Officer at Corweath

M

What's Your Problem?

Every week on What’s Your Problem, entrepreneurs and engineers talk about the future they’re trying  
Social links
Follow podcast
Recent clips
Browse 138 clip(s)