On Friday July 19, millions of Windows PCs entered a doom-loop that rendered them non-functional thanks to an update sent by a little-known company called CrowdStrike - and in this emergency Better Offline dispatch, Ed Zitron walks you through exactly what happened, why it's so bad, and why both CrowdStrike and Microsoft executives should face criminal prosecution for such a catastrophic managerial failure. These are the dark consequences of the growth-at-all-costs movement.
Newsletter: wheresyoured.at
Reddit: http://www.reddit.com/r/betteroffline
Discord: chat.wheresyoured.at
Ed's Socials - http://www.twitter.com/edzitron instagram.com/edzitron https://bsky.app/profile/zitron.bsky.social https://www.threads.net/@edzitron
LINKS: https://tinyurl.com/betterofflinelinks
A media Hello one, Welcome to a very special emergency episode of Better Offline. I'm at Zeitron, I'm your host, and I'm recording this from inside a closet in a hotel in San Francisco. You're very important to me. On Friday afternoon, I sat at my desk and just started writing about any clear aim or objective other than a desire to wrap my head around probably the most cataclystic technological meltdown that I've seen in my career, And of course I'm referring to the CrowdStrike situation. How was it the piece of software, one that few people understood, made by a company that people really didn't know was able to shut down our banking system, mayor travel, TV logistics chains, those weird screens that you see around and of course hospitals. And as I wrote this script, I found myself returning to some of the themes that I wrote about in The Rock Economy, in the Shareholder Supremacy, and many other pieces that speak to a larger problem in the tech industry, a complete misalignment in the incentives of most major tech companies, which has become less about building new technologies and maintaining them and then selling them to people who would then use them over time, and more about capturing monopolies and gearing organizations to extract value from the things around them. Every problem you see is a result of the tech industry, from the people funding the earliest startups to the trillion dollar juggernauts that dominate our lives, and the fact that it's no longer focused on the creation of technology with a purpose and organizations driven towards said purpose. Everything's about expressing growth and about showing how you will dominate an industry rather than serve it, and providing metrics that speak to the paradoxical notion that you'll grow forever without any consideration of how you'll actually live that long. Legacies are now subordinate to monopolies, current customers are subordinate to new customers, and products well, they're considered the means to introduce a customer to a form of parasite designed to punish the user for even thinking about moving to a competitor. The key difference between what happened on Friday with CrowdStrike and by the way, it's still being fixed, and as I'll explain later, will really take some time to be fully resolved, and my criticisms of other companies like Facebook and Google is the sheer violent nature of this failure, the decline of search and social tools we use in it is kind of a gradual, incremental kind of rot. CrowdStrike, meanwhile, was a demonstration of what happens when the rod fully consumes the timber holding up the building. What's happened with CrowdStrike is completely unprecedented. I'll get to why shortly, and on the scale of the much feared why to Q bug that threatened to ground the entirety of the world's computer based infrastructure once the year two thousand began. You'll note that I'm not saying that White was over hyapt or dismissing the scale, because ydo K was a huge society threatening calamity waiting to happen, and said calamity was averted not through any kind of magical thinking, but through a remarkable half trillion dollar industrial effort that took a decade to manifest. Because the seriousness of such a significant single point of failure would have likely crippled governments, banks, and airlines, people laughed when nothing happened on January first, two thousand, Assuming that all that money and time had been wasted. All of the media was just being hysterical rather than being grateful that an infrastructural weakness was identified taken seriously, and that a single point of failure was dealt with, and that the crisis was averted by investing in stopping bad staff happening before it does. Crazy goddamn idea. Huh. But as we speak, millions or even hundreds of millions of different Windows based computers are now stuck in a doom loop, repeatedly showing us as the fame blue screen of death, thanks to a single point of failure in a company called CrowdStrike, the developed of a globally adopted cybersecurity product designed ironically to prevent the kinds of disruption that we witnessed on Friday end. We're still witnessing today, and for reasons we'll get into shortly, this nightmare is going to drag on for several days, if not weeks to come. The product called CrowdStrike Falcon Sensor is an EDR system which stands for endpoint Detection and Response. If you aren't a security professional and your eyes are glazing over, I'll keep it brief an EDR system is designed to identify hacking attempts, to remediate them prevent them. They're big, sophisticated and complicated products, and they do a lot of things that's quite hard to build with the standard tools available to Windows developers. But as I'll get to later, not Microsoft, and so to make Falcon sensor work, CrowdStrike had to build its own internal kernel driver. Now, kernel drivers operate at the lowest level in the computer. They have the highest possible permissions, but they operate with the fewest amount of guardrails because massive control and they're very important to the system. Very technical people can to hear that and be like, that's not the right way to put it. Get out not your podcast. But if you've ever built your own computer, or you remember what computers were like in the dark days of Windows ninety eight, you know that a single faulty kernel driver can wreak havoc on the stability of your system. The problem here is that CrowdStrike pushed out an evidently broken kernel driver that locked whatever system that installed it in a permanent bootloop, meaning that you just started Blue Screen of Death, restarted kept doing him, the system would start loading Windows Encounter a fatal error and reboot, and then reboot, and then reboot again and again and again, in essence rendering the machine useless. It's convenient to blame CrowdStrike here, and perhaps that's fair, and I intend to do so several times. This should not have happened on a basic level. Whenever you write or update a kernel driver, you need to know it's actually robust and won't shit the bed immediately. Regrettably, CrowdStrike seemed to borrow Boeing's approach to quality control, except instead of building plane where the doors fly off and Boeing is the noise it makes when they fly off at the most inopportune times, it released a piece of software that blew up the transportation and banking sectors. The name just a few. It created a global IT outage that as grounded flights and broken banking services. It took down the BBC's flagship TV channel for kids, infuriating parents across the British isles, as well as Sky News, which, when it was able to resume life broadcasts, was forced to do so without graphics. In essence, it was forced back to the nineteen fifties, giving an esthetic that matches the politics of its founder and former owner, Rupert Murdoch. By no means is this exhaustive list of those affected. Either. The scale and disruption caused by this incident is unlike anything we've ever seen before. Previous instances like this, particularly rival ransomware outbreaks like Wanna Craze, simply can't compare, especially when we're looking at the disruption at the sheer scale of this problem. Still, if your day has been ruined by this outage, at least spare a thought for those will have to actually fix it, because those machines affected are now locked in this boot loop. It's not like CrowdStrike and just release a new software patch and call it a day on Doing this update requires some users to have to individually go to each computer, loading up safe mode or limited version of Windows with most non essential software and drivers disabled, and manually remove the faulty code. And if you have encrypted your computer, that process gets a lot harder. Servers running on cloud services like Amazon Web Services and Microsoft Azure, you know, the way that most of the Internet's infrastructure works, requires an entirely different and much more annoying, separate series of actions. If you're on a small item team, and you're supporting hundreds of workstations across several far flung locations, which really isn't unusual these days, especially in sectors like retail and social care. You're especially fucked. Say goodbye to your weekend, your evenings, Say goodbye to your spouse, your kids. You won't be seeing them for a while, and I'm really sorry. I'll buy you a drink some time. Your life will be driving from site to site, applying the figs and moving on. Forget about sleeping in your own bed or eating a meal that wasn't brought to you by door dash, Good luck, godspeed, God bless. I do not envy you. I so gratefully have a fake job. You know what do envy? I was buying the products that follow this utterly seamless ad break, which will likely echo my exact sentiments on literally every issue ever. And we're back. The significance of this failure, which isn't a breach, by the way, and in many respects, is far worse, at least with destruction it courst is not its damage to individual users, but to the amount of technical infrastructure that runs on Windows, and that so much of our global infrastructure relies on automated enterprise software that when it goes wrong, breaks everything. It isn't about the number of computers, but the amount of them that underpin things like security checkpoints or systems that run airlines or banks or hospitals, all running as much automated software as possible so that the costs can be kept down. Hey remember the raw economy. Jesus fucking The problem here is systemic that there's a company that the majority of people affected by the outage had no idea existed until well a day or two ago, that Microsoft trusted to the extent that they were able to push an update that broke the back of a chunk of the world's digital infrastructure. Microsoft a company, instead of building the kind of rigorous security protocols that would say, I don't know, rigorously tests something that connects to what seems to be a huge portion of Windows computers, Well, they just chose to do something else. They've just screwed the fuck up. As pointed out by Whir, the company vets and cryptographically signs all kernel drivers, which is sensible and good because kernel drivers have an incredible amount of access and thus can inflict serious harm. With this testing process, usually taking several weeks. What happened Microsoft? How did this slip through Microsoft's fingers? Well, for this to have happened, two companies needed to screw up epically in boy, fucking howardy did they? What we're seeing isn't just one major fuck up, but the first of what will be many systemic failures, some small, some potentially larger, that are the natural byproduct of the growth of all costs ecosystem, where any attempt to save money by outsourcing major systems is one that must simply be taken to please the beautiful, sexy shareholder that they all love so much. And this is a problem with the digitization of society, or more specifically, the automation of once manual tasks. It introduces a single point of failure, or rather several of them, or clustered together like a rat king or a Katamari. Our world, our lifestyle, and our economy is dependent on automation and computerization, with these systems in turn dependent on other systems to work, and if one of those systems breaks, the effects rick shay outwards like ripples mean you cast a rock in a lake or throw a body in. For some listeners, Freddy's CrowdStrike cockup is just the latest example of this, but it isn't the only one. Some of you might remember the Solar Winds hacked back in twenty twenty, where Russian state link hackers gained access to an estimate eighteen thousand companies in public sector organizations including NATO, the European Parliament, the US Treasury Department, and the UK's National Health Service by compromising just one service, Solar wins Oryan Remember when Octa some of you might know Octa is a company that makes software that handles authentication for a bunch of websites, governments and businesses. Well, when they got hacked in twenty twenty three, they then lied about the scale of the breach. Hey, do you remember when those hackers leap frogged from Octa to a bunch of other companies like cloud Flare. Yeah, they provide the content delivery services and the services that protect websites from being well brought down by a bunch of bots. From much the entire Internet, everything feels like it's being held up by like twigs and chewing gum. You probably know the quote no man is an island, and it's especially true when we're talking about tech, because when you scratch beneath the surface, every system that looks like it's independent is actually heavily, heavily dependent on services and software provided by a very small number of companies, many of whom are not particularly good. And this is as much a cultural failing as it is a technological one, the result of a management culture geared towards value extraction, building systems that build monopolies by attaching themselves to other monopolies. CrowdStrike went public in twenty nineteen and immediately popped on its first day of trading thanks to wall streets appreciation of them moving away from a focused approach to serving large enterprise clients, building products now for small and medium sized businesses by selling through channel partners, in effect outsourcing both product sales and the relationship with the client that would tailor a business a solution to said client. Especially when something is so serious like this, I want you to really think about this and think about this problem, because the problem isn't so much selling to small businesses or media businesses. It's the fact that CrowdStrike made its money selling to the enterprise and specializing in that, and that's the thing. When you broaden out, when you must grow in all directions, at all times, in all ways to please the horny beasts of Wall Street, you lose your focus. But that isn't the only problem, because Crowdstrike's culture appears to also fucking suck. A recent Glassloor entry referred to CrowdStrike as great tech with terrible culture with no work life balance, with leadership that does not care about employee well being. Another from June twenty twenty four claim that CrowdStrike was changing its culture for the street with KPIs as in metrics related to your success at the company, driving behavior more than building relationships, with a serious lack of experience in the public sector in senior management. So glad that this company is selling intellect government anyway. Moving on, others complained of micromanagement, with one claiming that management is the biggest issue, with managers asking way too much of you and it doesn't matter if you do what they ask since they're not even around to check on you, and another saying that management is arrogant and needed to stop lying to the market on product capability. That's what I love to see, we all love to see that. I'm very happy to read that, And while I can't say for sure, I'd imagine an organization with such powerful signs of growth at all costs thinking a place where you and I quote have to get used to the pressure, that's a clique that you're not in. Likely isn't giving its quality assurance teams the time and the space to make sure that there aren't any Kaiju level security threats baked into an update. And that assumes it actually has a significant QA team in house and hasn't just this with many companies outsourced the work to a body shop like Wypro or Emphasis or Tartar Consultancy. But for a moment, I'm going to change gears a little to try and explain what actually happened and why. It suggests that the issue is likely the product of cost cutting and institutional failure within CrowdStrike. In the aftermath of Friday's incident, we've seen some analyses about what actually went down with them first some throat clearing. I haven't verified this stuff independently. From what I've read, though, and from speaking to developers, this all seems relatively plausible, but maybe worth googling this a little yourself. But I'm going to give it a go. So the kernel driver at fort was written with a programming language called C plus plus. This language was developed in the nineteen eighties and it's very good for writing high performance applications, anything where you're concerned about speed, like the Interenno's operating system or a video game. It's pretty popular for that, and it's so pretty dangerous too, so dangerous in fact, that it's often referred to as an unsafe language. Without getting two into the weeds. C plus plus makes it incredibly easy to shoot yourself and the foot, the ars, and the dick. At the same time, it's big, complex and has few safeguards while providing many opportunities for developers to screw up very badly. Like the languages derived from C, it forces developers to deal with a lot of low level stuff like handling memory allocation that you don't really have to deal with in many popular languages like Python, Java, Russ, Swift or Sea sharp. And this matters because if you screw this up, your code will break, or I don't know, it might introduce some kind of potentially disastrous security vulnerability. In twenty nineteen, Microsoft researchers said that seventy percent of all security vulnerabilities were the result of memory management issues, and I doubt that figure has changed much since then. And earlier this year, the White House Office of the National Cyber Director urged developers to stop using unsafe languages like C and C plus plus and start using modern and safer alternatives like Rust. With me so far, ah, So, from what I've read, the CrowdStrike Falcon sensor kernel driver crash because it had something called a null pointer error. Essentially, the developer wrote some code that told the program to look for a memory location that didn't exist, and didn't write any safeguards to protect against them. When this happened, the driver and so the operating system crashed. This is a rookie mistake, and I've talked to multiple developers that have backed this up. If you take an introductory C plus plus programming class at university, they'll cover this in the first year. Kind of boggles the mind how trivial a mistake this is, and how it made it into production code, which is the code that goes out into the real world, and how it wasn't caught either by CrowdStrike or by Microsoft, who are supposedly obligated to vet this driver, and if the reports are true, someone really really really screwed up, really badly. But if you don't want to screw up, if you want to really do well in life, I advise you to buy one of the following products or services, which I of course fully understand, know all about and won't be embarrassed by.
And we're back.
And to be clear, I don't want you to think that I'm letting Microsoft off the hook either, assuming the kernel driver testing roles are still being done in house. Do you think that these testers who have likely seen their friends laid off at a time when Microsoft was highly profitable and denied raises, when their welfared CEO probably took home over one hundred million dollars in salary for a job he's eminently bad at. Do you think these people doing their best work? Do you think they go into a jazz full of piss and vinegar ready to save the world, or do you think they hate their job and they're being forced to do too much and they're miserable, And the people that knew what the fuck was going on haven't been fired, and the people who managed those people and the people that wrote the code that they're edited. Do you think anyone knows what the hell is going on? No, they don't, And this is the culture that's poisoned almost the entirety of Silicon Valley. What we're seeing now is the societal cost of moving fast and breaking things of people like Mark Andresen considering risk management the enemy of hiring and firing things. Thousands of people, tens of thousands in some case, to please Wall Street, are seeking as many new possible ways to make as much money as possible, to show shareholders that you'll grow, even if doing so means growing at a pace that makes it impossible to sustain organizational and cultural stability. When you aren't intentional on the people you hire and retain, the people you fire, the things that you build, the way that they are deployed, maintaining your systems, understanding how and why things were written, the decisions that were made five, ten, and fifteen years ago, you're going to lose the people to understand the problems they're solving, and thus lack the organizational ability to understand the ways the problems might be solved in the future, or disasters might be averted. This is dangerous, and it's also a dark warning for the future. Do you think the Facebook or Microsoft or Google, all of whom have laid off over ten thousand people in the last year, have done so in a conscientious way, in a knowledgeable way, a people focused a way, in organized, zationally rigorous way that means that the people are left who understand how their systems run and the inherent issues built into them. Do you think the management types obsessed with unsustainable AI bullshit are investing heavily in making sure that their organizations are rigorously protected against, say, one bad line of code or one dipshit error. Did they even know who wrote the code of their current systems? Is that person still there? Do they have their email and their phone number? Is that person at least contracted to make sure that something nuanced about the system in question isn't mistakenly removed or changed or quote fixed. No, now they're not, They're gone. They're not there anymore. Only a few months ago, Google laid off two hundred employees in the core of its organization, outsourcing their roles to Mexico and India in a cost cutting measure. The quarter after the company made twenty three billion dollars in profit I'm jumping to Google because they're just probably next in one of these horrible breaches or sorry, not breaches. Silicon Valley in big tech writ large is not built to protect against situations like the one we saw on Friday and the damage we're going to get from CrowdStrike because the culture's cancer. He values growth or costs with no respect for the human capital that empowers organizations or the value of building rigorous, quality focused products that are maintained over time. You know me, I'm a nasty little bitch. What are more on the nose? Example, George Kurtz, the CEO and co founder of CrowdStrike, said in twenty twenty that not one time has he regretted firing someone too fast, in a conversation where he argued that tech executives were becoming too obsessed with culture, and in a stunning act of foreshadowing, when he was the chief technology officer at McAfee, best known as the company that makes antivirus software that they sell to your granddad and that they ship with computers and you immediately uninstalled, while he oversaw an update that treated in the central part of Windows XP as a virus quarantining it and sending the computer into a boot loop. It's almost a little too on the nose. They're calling him the prabagar Ragavan of security. It's a very bad deal. But dear listener, this is just the beginning. Big Tech is, to quote trivium, in the throes of perdition, teetering over the edge of the abyss, finally paying the harsh cost of building systems as fast as possible. But let's be honest, they're not paying the cost we are. This isn't simply moving faster, breaking things, but doing so without any regard for the speed at which you're doing so, and firing the people that could fix them more might have broke them, the people that know what's broken, possibly the people who might have an idea to stop this happening in the future. And it's not just tech Boeing, a company I've already shat on plenty and one ll likely return to in the future, largely because it exemplifies the short sightednus of managerial fuckery, has over the past twenty years or so, span off huge parts of the company. Parts of that at one point we're vitally important probably still are into multiple other separate companies laid off thousands of employees at a time and outsource software development too nine dollars an hour. Body shop engineers fucking how hollowed itself out until there was nothing left and then the planes started breaking. And tell me, knowing what you know about Boeing today, would you rather get on the seven three seven max on Airbus A three twenty neo. I guess it depends how much of a Buddy Holly fan you are. Anyway, As these organizations push their engineers harder and harder and have less of them because they've been laying them off, said engineers will need to find a way to write code quickly, and perhaps they'll turn to AI generated code, which poisons code bases with insecure and buggy writing. As companies shed staff to keep up with wall streets demands in ways that I'm not really sure people are capable of understanding yet, when you have less engineers and bigger time constraints, and by the way, Prabagar Ragavan at Google specifically told people they'd be doing things faster with less people. It's so cool. I love tech. When you have less people, more time constraints, they're going to turn to whatever little tricks they can and wouldn't you in that situation too, You have to ship faster than this possible. Of course you're going to do that. But the companies that run the critical parts of our digital lives do not invest in maintenance, or cultural unity or any kind of rigorous infrastructure. If I'm honest, you need intentionality as well when building these things. You need it. It's required to prevent the kinds of things that happened on Friday with CrowdStrike, and the kind of systemic failures that you're going to see in the future. And they need you to be ready for this to happen again. And all of this is the horrifying cost of the rot economy. Systems used by billions of people, held up by flimsy cultures and brittle infrastructure, maintain with the diligence of an absentee parent. This is the cost of arrogance, of rewarding managerial malpractice, of promoting speed over safety and profit over people. Every single major organization should see crowdstrike's failure as a wake up call, a time to reevaluate the fundamental infrastructure behind every single tech stack. What I fear is they won't that they'll see it as someone else's problem, just like Microsoft did. And that's exactly how we got there in the first place. And this is going to keep happening. I'm going to make a daring suggestion at the end of this one, based on guest of the show they're on, Assamerglu, I believe it's time to start bringing in criminal prosecution to executives. If you, as the executive, are pushing the kind of cultures where basic security practices are failing, where managers do not exist, where checks and balances don't exist, you should be held responsible. And I don't mean a fine, by the way, A fine for a multi trillion dollar even multi billion dollar company is just a fee with a different hat on. No, I believe there should actually be a criminal inquiry in to CrowdStrike, in to Microsoft, and the people responsible are not necessarily the workers. No, the people responsible are people like satch In the Della, the CEO of Microsoft, and George Kurtz, the CEO of CrowdStrike, both of whom should face criminal investigations. We do not know at this time the significance of this event, but we know it's more significant than almost any computer infrastructure or failure in history and in affected hospitals. Do you think people didn't die? Do you think that something didn't break? Do you think that there's not a corpse on satch in Adela and George Kurtz's goddamn hands. Yes, it would be blood, but still we keep going. These people are responsible and they're not afraid, and they should be. There must be consequences for this level of fuck up. Microsoft made over ten billion dollars of profit in the last quarter. By the way, the market cab of CrowdStrike before this happens, around eighty nine billion dollars. Microsoft could probably in a space of years profits buy them in cash or build their own goddamn system. But they chose not to save money, and CrowdStrike in turn found other ways to save money, and saving money will likely have ended lives and ruined them. This is why I'm so pissed off everyone, This is why I'm so frustrated. This is what I've been talking about from the goddamn beginning of this goddamn show. This is the consequence. This is what will happen, and will happen again and again and again. This is the first of many calamities that will happen as a direct result of companies run by people that don't give a shit, of a Silicon Valley culture built on exploitation and value extraction, and of a business cartel run by people all agreeing to do the same level of shitty job, of holding no one accountable, of not calling out they're peers for running shitty companies because everyone's in on the scam. And it's a culture that is failing society, and the culture that I will continue to eviscerate every goddamn week until they well kick me out of this closet I'm in reading to you. It's such a pleasure reading this stuff, and I hope I've given you more clarity. If you have any questions, you'll hear my email address after this. But it's E. That's the letter Easy, the letter Z at better offline dot com and a's EAZ at better offline dot com for my wonderful British listeners. Thank you for listening, and if this affected you, I'm so sorry, and it likely did. Normal people, people in hospitals, banks, airports, people traveling got their lives fucked up by this, and I'm one hundred percent sure people have died. It's time for criminal inquiries, and it's time for criminal prosecution. It's time for real consequences for executives who don't give a shit. You heard it here first, well, and I guess they're on set it first. Be safe out there. Thank you for listening to Better Offline.
The editor and composer of the Better Offline theme song is Matasowski. You can check out more of his music and audio projects at Matasowski dot com, M A T T O S O W s ki dot com. You can email me at easy at Better Offline dot com or visit Better Offline dot com to find more podcast links and of course, my newsletter. I also really recommend you go to chat dot Where's youreed dot at to visit the discord, and go to our slash.
Better Offline to check out our reddit. Thank you so much for listening.
Better Offline is a production of cool Zone Media. For more from cool Zone Media, visit our website cool Zonemedia dot com, or check us out on the iHeartRadio app, Apple Podcasts, or wherever you get your podcasts.