Why Corporate America Still Runs on Ancient Software That Breaks

Published Jan 26, 2023, 9:00 AM

Southwest Airlines had a disastrous holiday season, thanks in part to a software bug that left crews out of place and grounded thousands of flights. But Southwest isn't alone in having software in the headlines lately. The New York Stock Exchange recently had a software error that caused weird pricing on stocks and the FAA had its own computer issue that grounded planes earlier this month. So what's the deal with corporate software? Why do these crashes happen? And why does the user experience typically leave something to be desired? On this episode of the podcast we speak with Patrick McKenzie, an expert on engineering and infrastructure, who writes the Bits About Money newsletter and recently left payments company Stripe after six years. We talked about the challenges of keeping any software system alive after years of upgrades and updates, the distribution of tech talent across industries, and whether non-tech companies can close the gap with Silicon Valley.

Hello, and welcome to another episode of the Odd Locks Podcast. I'm Joe Wisenthal and I'm Tracy Hallo. Tracy, I forgot to ask you, and it's kind of embarrassing like this late in January, but how is your how is your New Year's? Like? How is your holidays? Do you have a good Christmas? And stuff? Thanks? Thanks, Joe. I had excellent Christmas. I stayed at home for a week with my husband and my dog and we did hardly anything and it was absolutely glorious. How about you? It was all right. So the thing was I was down in Texas visiting a family, which is but there was that huge cold blast and it's worse in It's worth when when the weather gets really cold, it's worse in a place like Texas because like none of the buildings are insulated particularly well, none of them have that heat. So it's like, when it's really cold, is that like better to be in a cold place where people are used to it. So it's a little uncomfortable. But the good news is somehow we managed to travel back and forth without getting having any major airline disruptions. Right, So this is the key thing that happened right before Christmas, which is we had that very big winter storm, the Arctic bomb blast, and it disrupted a ton of flights first off, because of the weather, But then what happened is you had this sort of cascade effect because the weather event was so large, a number of airlines, but one airline in particular experienced a lot of problems with its software and Southwest had to cancel I think in the end it was something like sixteen thousand flights. You had millions of passengers affected, and you had disruptions that you know, a weather related disruption that lasted one or two days ended up lasting I think more than a week because of the impact of the computer litches. I guess, yeah, yeah, that's right. Like you know, airline traveled like it always sort of cascades and ripples out right, because it canceled flight is going to affect other flights and so forth. But it seemed like Southwest experienced something unique, which is that there's turned into this major software problem. And it's sort of like a reminder that like, okay, when we use the Internet, when we use like sort of like modern consumer software, it's all very zippy and quick, and it has nice interfaces and then when you use like sort of like back end corporate software, particularly at large legacy institutions, it's just like it's nothing like the consumer Internet. It's like clunky. It's like we all know what it's like, right. So this is something that came up quite a lot. I used to cover the big banks, and one of the crazy things that I learned relatively early on while I was doing this was just how much of their I T system was still these old, creaky, big iron mainframes, some of them still running on Cobalt, which is the programming language that I think it dates back to the nineteen fifties or nineteen sixties. And I remember, you know, you hear this, You hear like, oh, I can't believe that these big banks, our entire financial system in some respects, is still running on these legacy computer systems. But on the other hand, if you look at what a big bank is, it's structurally a series of mergers and acquisitions. There used to be the no, no, that's there used to be this great flow chart that showed the formation of like a JP Morgan or a Bank of America, and you can just see it's like a series of roll ups of smaller banks, and you think every time they acquire a new bank, they have to integrate another system into their own system, and in the end you kind of end up with this just incredibly like complex and kind of patchy i T structure that in some ways very much resembles the amalgamation of all these smaller banks into a larger bank. There's too the right, like you think of like banks and when they merge, it just like okay, like you know, the capitals all merged together and the assets, etcetera. But like, right, like none of them are going to have like i T systems that work perfectly together, so they're like glued together with like duct tape and just over time. I think technical debt is a term that software engineers, and so you can to relate all this technical debt. And I'm surely nobody likes it inside the bank at any level. But I've always been curious, like, what are the economics such that it is just so impossible for these legacy institutions, banking, airlines, absolute public sector to like get with the times, you know, absolutely, and when it comes to Southwest, So first of all, let me declare a small self interest in Southwest which is my My dad was a pilot for Southwest for a long time and I remember when the outages were happening this Christmas. I sent him a text going like, well, what do you think about this? And he just said, well, all the crew are out of position and they need to get them back. So very straightforward ex pilot answer about what was going on. Not that helpful for this podcast. But I do think in the case of Southwest, this has been a long running issue and you have had, you know, people talking about it at various times, the need to upgrade the infrastructure, of the technological infrastructure, and yet it hasn't happened. And so the question is why. The question is whether or not is it cheaper just to keep running these old mainframes and assume that you are going to have these outages during major events, cheaper than it would cost to actually upgrade it. No. Absolutely, Well, let's let's talk to somebody who knows about software and knows about economics and can walk us through and help us understand the problem. We're gonna be speaking with Patrick mackenzie. He's an expert on software and infrastructure. He's a writer of the Bits about Money newsletter and knows a lot about finance, and he just left a stripe after six years. He's still an adviser there. Been reading his stuff for a long time. One of these people I was sort of trust on almost any topic. Patrick, Thank you so much for coming on odd lots. Thanks very much for having me. What is the deal? Let's just start like straight up, like why do you like, I mean Tracy mentioned like all these institutions, they just like they swish it all together. They sort of like tie it together with duct tape and so forth. You get these unwieldy things. Just like give us the high level view of like why is it so difficult from at an abstract level to modernize legacy software? So I'll start with a disclaimer, which is sort of mandatory and engineering culture, we have this thing that we've come up with over the last course of two decades, or so called blameless postmortems, where when there is a failure within a company where you know, planes cannot be up in the air for a week at a time. Rather than like trying to point the finger at someone and say it was your decision or your inaction that caused this event, we as engineers want to like look at the rejective reality of the system, figure out what went wrong for the benefit of both that organization and for the larger community. And so this isn't too grind their nose in it, but just you know, as an engineering matter, what probably happened in an ideal world, And if this had happened at a Google, for example, the engineering teams would push, because of the culture of these things, to do a very public postmortem of what the decisions were, what the background is, etcetera, etcetera. In more traditional industries, I don't think that culture is fully baked yet as it were, although they're very well might be a postmortem by uh, you know, the FA and federal regulators, because you know, liveness constraint is a real thing for extremely important economic systems like airlines. Anyhowp what probably happened. It wouldn't surprise anyone in the bowels of an airline that if you put off maintenance on airplanes for decades at a time, that eventually bad things would happen and no one would countenance that. However, software systems are quite similar. Where aren't a build once and then run for the rest of eternity sort of things? There were some decisions made early in their lifetimes which share no longer accurate for the world we live in. They do suffer from something engineers you from misply called bit rot where software which worked back in the past will tend to like succumbed entropy over time and not work exactly perfectly for all time afterwards, and so you need to be doing a ongoing program of maintenance for your software, just like you would for your airplanes. That bluntly was not done, and it seems to be credibly reported that's the sort of cultural factors at Southwest that caused that to not be done. Might have been caused by a sort of like overly accounting the slash penny pinching focused management culture, which thought, well, it costs money and the short term to do to do maintenance, we can like cram down or engineering costs by doing less of this and relying more on external vendors, etcetera, etcetera. And then when stuff HiT's the oscillating plate, the right people have not done the years of work that are required to get to a posture where you can like quickly recover from failures, and so they were left in a part where we use euphemistically in the in street called heroics was required from folks in operations trying to you know, contact thousands and tens of thousands of employees by fille and pass around their information and probably spreadsheets to figure out where the crew is actually where to be able to take the boxes that are required due to regulation to allow people to get back and up in the air. So that's probably the high level, like root cause of what happened. But like a reason to conduct these postmortems is it's there for one, like one single decision made by one person. It's the result of cultural factors, business decisions made over a course of probably decades in this place, and we want to like keys out the various nuances there and like make both Southwest and other organizations where of that so that they don't suffer critical systemic failures in their own places. So, first of all, bit brought is a fantastic term that I am going to have to try to work into all my conversations going forward. But secondly, just to step back a bit, can you maybe explain, you know, when we talk about mainframe computer systems, what exactly are we talking about? And like what is the counterpoint to mainframe. I'm assuming it's more like cloud based applications and things like that, but could you maybe define those that basic term very quickly. And then secondly, my understanding of the Southwest debacle was that they had this in house software. I think it was called sky Solver, but it was based on an application that g E had been selling, and then Southwest kind of customized it. And I guess my question is how endemic is that type of software where you get something off the shelf, but then you customize it in such a way that it becomes um, I guess special to you and therefore your problem when it goes awry. So let's talk about the the main frame first, and then we'll talk about the problem of like who owns the problem, whether it's your problem or some of the vendor's problem, and the way that balls tend to end uh in the middle of people and things get dropped. So mainframes back in the day, many many decades ago, computers were approximately the size of a room. This is the before the personal computer revolution and banks, which were some of the earliest adopters of computers for sort of scaled usage in industry. And it's funny like the earliest users and computers tended to be either the financial industry or the military, either attempting to like move numbers which represented money, or numbers which represented like literally artillery shells flying through the air, and both were a very important society for better or worse, Like banks because they standardized on what was the best available technology at the time, ended up with a lot of main frames, and they have kept those main frames running for a good portion of seventy years now. In some cases, as you mentioned earlier, the alternative to main frames. Cloud is a bit of a buzzword. So there's the personal computer form factor that you're familiar with, where something sits on your desk. There are servers which would typically sit in a server X somewhere, uh. And the difference between like servers and quote unquote the cloud is like the traditional way to manage servers would be you would have a data center that would be owned or at least by yourself, and you would put hardware that you owned in that data center and in the cloud. The cloud case, the data center is owned by Amazon or Google or Microsoft. You have rented access to a machine there which sits on their balance sheet and it's probably not one machine, and it's probably like an awful lot of machines potentially with some virtualization layer, and your engineers can cause systems to scale up or scale down based on how many machines you need on a you know, minute by minute or second by second basis. Is sort of like the high level version of the sales pitch that cloud vendors will give you. So that is a hey quick run through between like four generations of the main stay of how technology gets done at scale the question of where software gets written. So it is quite common for businesses in the traditional economy to not have an internal software engineering competence, and as a result, they'll go to vendors. Like in this case, you probably know this better than me. But for example g and the vendor will sell quote unquote package software which might not be fully responsive to the needs of the business, and then some customization happens. Um what could happen is the customization might happen within the business if they have some level of software engineers internally. What happens more frequently in a lot of places, particular in depend where I live, is that the business will contract with we call them system integrators here that may be called consultancies in America, you know a Deloitte or a another large consulting firm like that Accenture and say, okay, we have this package system, we have these business requirements. Clearly some software engineers have to be involved. We don't have enough on our staff. Can you like figure out the missing link for us? And so there is an ex sense of contract negotiation. Consultancy goes off and does the customizations, and they delivered the work to the organization. And then there are various different ways to do maintenance, but often that will involve the original team standing down for a while, which that always sounds like a great idea when it's pitched, because it's like, oh great, I don't have to pay expensive engineers every week to just sit around waiting for something to happen. And then when something actually happens, you sort of like realize that the other end of the option value there where oh not so great. I don't have a team of like experts who understand the system, ready to call up at a moment's notice and bring in to debug the problems that we're seeing right now. And so I have no specific knowledge about went in this individual instance. But a thing that happens during a lot of outages is uh a, you know, quick look into the history of the system to say, hey, wait, who actually built this for us? Are they still in business? Can we get on their calendar immediately? Is there an engineering team there that is ready to hop in and work at this? Like goodness, they're going to charge us I bleeding rates to to do it, but we're going to have to pay that. That's a you know, tertiary consideration at this point. The bigger consideration is like, how quickly can we get our you know, organization to organization like legal paperwork, etcetera, spun up? And then how quickly can you get they get the engineering team spun up to address the system in real time? And this sort of thing is why the sort of like major scaled software companies and the economy that Google, Microsoft, etcetera, etcetera, you all know the names, largely treat engineering as an internal competence rather than you know, the software for the iPhone isn't built by Debloid, It's built by Apple, and almost everything in the critical path will either be open source or something that there is a team on Apple that owns the sort of totality of the experience for a goal. So something that I sort of hinted at in the beginning, and we talked about this a little bit. We did an episode on petroleum engineering. But I'm really curious about like the distribution of engineering talent. Like in my mind, I would have to imagine that a talented engineer would be more excited to work for Stripe than Google, and we're excited to work for Google than Deloitte, and we're excited to work for Deloitte than Southwest. And I'm just sort of and I'm curious, like a if that's the case, and whether like this is a problem, and whether like it would be better, whether it be better if like there were just more high tech experts, software experts who would want to work for a Southwest directly or work for a bank directly, whether this sort of like distribution of engineering talent contributes to software body all act. So I have nuanced thoughts that engineering talent, on one hand, like that is certainly thing that exists in the world, and like talent is not necessarily distributed equally among different industries, individuals, etcetera. On the other hand, I think that Silicon Valley and ecosystem to which I owe a lot occasionally has an overly enamored view of itself, thinking that, oh, yes, we have the best engineers in the world, and all the engineers who made every other system that we rely on in our lives are sort of second rate, And clearly that's not true. Like the phone system works, airplanes like what's the old Lui kiss came on a love like it flies through the sky and then teleports beams up to space like none of that happened by accident, And so it's important to not focus on like the You know, engineers working traditional industry are worse engineers that engineers their work in software companies. The larger problement they have is one they don't drive the bus. They have less ability to control the situations within their organizations and control like larger decisions made such that they have influence over decisions on like what the main and schedule would look like, or who gets to make decisions with respect to whether something ships or not. And there is a little bit of a sort of like the life cycle of engineers thing that happens where the original architects of the system did it forty years ago. A careers are about as long as they are. Many of the original architects of the system will sort of be aging into the retirement years at this point, and so there is a question of did the organization put in the work over the years to recruit newer engineers to inculcate them into how the system is made, etcetera, etcetera, Or do they just allow all of the knowledge to walk out the door and did they Like I think that comes up a lot is do they put enough work on a day to day basis to maintain an engineering brand so that they can get new talent engineers to join them in three such that those engineers are the word used as often gray beard, but you know that the wise and veteran thirty year is from now, so that when something happens in there are people who have been around the black that know where the skeletons are buried in the system. Interestingly, traditional industry is getting better over the years at having an engineering brand, sort of moving away from this this world where engineering was largely seen as a cost center, where like the goal is just too cram down the amount of money you spend on it and improve your margins. I think there's a few things that played into that. One of them was, particularly in the Internet age, it became obvious that you know, in finance, you talk about like the front office and the back office. Engineering used to be in the back office, and the front office, where the sales people live, is the one that generates all the money for the bank. And increasingly, because like experiences that were in the palm of the users hands were the thing that we're actually generating in the sales, those experiences became sort of institutionally important within banks and airlines and other firms. And then when those experiences became important took a while, but gradually the people and teams that build those experiences became more institutionally important than they had previously been. And so like, if you look at the large money center banks in the US, they certainly have no small number of technical challenges. But the thing that is like largely true which was not true, is that their mobile apps are actually kind of good these days. Like if you download, you know, not to endorse anybody in particular, but like Chase or Capital One, when you play with their mobile apps, like, oh, this kind of feels like a mobile app made in Silicon Valley. And the reason is, well, yeah, they hired a lot of people who made the apps from Silicon Valley, and those folks brought their skills and sort of like level of competence with this and their taste and now exercise and on behalf of old world companies. One hopes, knock on wood, that that seeps into the back end of these systems to where Apple, Google, Microsoft, they have very talented teams on the front end of their systems, but they also have very talented teams on the back end of the systems. And that's bluntly why you don't see like core systems that Google going down for a week at a time that is almost unimaginable. You know how much of capitalism would break if Google Docs was just down for a week. Given that there are many parts of the world where there is a liveness constraint, we would like all else being equal. If it is safe to fly airplanes, we would prefer that there be airplanes flying versus all the airplanes being on the ground, because airplanes generated valley for human society. If that is true, then it must be the case that the back end systems of airlines that control whether airplanes are allowed to fly at a given time. That has to be at least as important as you know, Google docs is, which implies that the airlines need to put at least as much work as Google does, roughly into like having true mastery of their own back end systems and the various problems that could happen there. So just on this note, you know, there's another travel disruption that happened recently and we haven't even mentioned it, but that was the f a A experiencing some sort of computer event and grounding I think all the domestic departures for for one morning. And it was recently reported that the approximate cause of that was because there were some software engineers who were trying to upgrade the system and they accidentally deleted a bunch of critical files as they were trying to do this. Can you talk a little bit more about the technical challenges when it comes to trying to fix some of these legacy systems, Like why exactly is it so difficult? You sort of talked about it from a organizational perspective, but from a technical perspective, why does this seem to be such a big challenge? Sure, so that one thing is that that that symptom, the the underlying cause is that a problem happened during an upgrade is extremely well understood in the software engineering field. It's called out in, among other places, Google's book about Site Reliability Engineers, which is essentially the subcategory of engineers that Google relies on keeping the world running. And it is often the case that the people who are attempting to make an incremental change do not have full context of how the system came to be in the state that is currently in, and a change that they thought will have a limited sort of area of impact ends up having a larger area of impact. The term of art we use in the industry is blast radius. Like you you hope to quantify the amount of blast radius of something that goes wrong, such that you know, if I'm make a mistake, am I going to like bring down our blog? Or am I going to bring down like credit card processing worldwide? And you know, be much more careful if the blast radius includes like worldwide credit card processing plausibly, like you should engineer systems such that there's no way to take down worldwide credit card process but that's actually harder to do, then then it just sounds. So anyhow, how do you does one get to the point where it is difficult to understand, like what the implications of the changes who make and truly are These are like meat and potatoes questions, and then the meat and potato's answers are often things like was the system adequately documented when it was made? Frequently the answer is no, and a lot of information about like how systems are put together survives this oral lore within the engineering teamate various companies, which, yeah, that's an uncomfortable bit of information to holding your head when you start talking about like the life cycle engineers and the fact that the original architects of many of these systems are like literally no longer with us, either because they have retired or they might be like um, beyond our ability to call up out of retirement at this point, and so you have to write down what you do. And that concept was not like new to governments and bureaucracies as a result of software engineering happening in the last seventy years, sort of fundamental to the operation of large organizations. Software is just how people choose to do work with each other, but is a lesson that we keep relearning there is often that issue where because software is how people an organization choose to work with each other, often software will interface various systems together and problems will happen at the boundaries between systems, either between literal computer systems or between other breakages between organizations. So the thing that you will see frequently is my software didn't fail. Your software didn't fail. We mutually failed together at that point where where we are, you know, supposed to transfer information and then both sides end up pointing their their at their counterparty. And so part of the discipline of software engineering is one like creating a culture where you don't want to point fingers of the counterparty into creating structures and incentives such that, you know, like complex systems that involved multiple different parties with multiple different engineering teams who might not report to the same payerial department will like converge on correct outcomes. And there are a variety of ways to do that in the industry. Some of them are better than others. So without like naming the particular company, there exists to credit card system which is extremely so credit card systems as a baseline are extremely reliable. You probably don't remember the last time that you were unable to use a credit card for a week, because like literally that does not happen. So like a plus plus for achieving that outcome. What one credit card company does to achieve that is they are willing to make changes for their system precisely twice a year, after six months of testing every change that they make. That's an incredible amount of upfront work to do, like relatively small amounts of engineering, and so the pace at which that ecosystem evolves is much much slower than more software forward company is like Google, App, Apple, ms on, etcetera, etcetera, where they're shipping thousands of changes to their systems every day. So one of the interesting bits about the remixing the skills and techniques of Silicon Valley is attempting to get people who are comfortable with the engineering practices that allow you to ship software thousands of times a day into positions of authority at old line companies such that they can gradually transition from you know, the point that they're at where they might be able to ship software once or twice a year, maybe quarterly, to the point where they will be shipping software like let's start with by week move up from there. So we've been talking a lot about like failures or sort of like collapses. But the other thing that I like sort of associate with large business software is just like the user the user experience is just not as good. And I think that was, you know, the sort of like the u X I believe was part of the story with like the infamous like city error where they transmitted nine million dollars they shouldn't have to some counterparties, and I think some of the users were confused by the internal software, whether they were actually sending that or not. I heard story from someone who worked at the VCR once of a major bank about like the hoops that they have to go through just to share documents with each other like power points, because they get flagged off. And internally I assume that there's some sort of like regulatory issues or booking travel, like you know, going to like booking dot com is really easy when Tracy and I book travel for work here, it's not that bad. But like the usability of the software, internally, it's not as smooth and snappy as consumer travel sides. Why is that? Like what is why is this sort of like business software internally just like is not as uh, you know, yeah, easy to use and sort of visually appealing as the consumer internet. So this is getting better over time, and we'll talk about that in a moment, but broadly, like your your observation is entirely accurate. If you were the softwares are for the entire world, you might like rank applications by their importance to the world and say, okay, if you were an online application on someone's phone that allows someone to share cat photos, it's like important in some sense, but probably not as important as sending a billion dollars outside of a bank, And so we would have a lot more talent and time spent on the question of can you send a billion dollars out of a bank? Versus does a thirteen year old have an awesome experience when sending a cat photo? In actual fact, though much much more time and talent is spent the cat photo question, that is spent on the like wiring billions of dollars out of banks question, that was a choice. It sounds silly to say maybe we should stop choosing stupid things, but the the like true answer is business software gets better when the you know, people and organizations that cause that software to be built, choose that the quality of that software is something that is very relevant to their interests. And so one way to like come to the realization is to lose a billion dollars and then you know, hopefully the next time you're level engineering management says we should spend a little more on maintenance, you will say, yes, I agree with you, we should spend a little more on maintenance versus taking another billion dollar charge at a time, not of our choosing. Part of it is just the culture of quality coming back to these things. Part of it is also through sort of teaching the user inside of organizations that software doesn't have to be terrible because most software in the world exists inside of companies and runs business processes. There's something that is not broadly known, but like, of all the lines of software in the world, most exist inside of companies, and for a very long time, because most software people interacted with was after employer and it was generally kind of terrible. They just had an image like software is like generally kind of terrible. And then the iPhone came around. Everyone has a powerful computer in their hand. For x number of hours a day, you've used applications you've tapped three times, and you know, interesting things happened in the world. There's a result of you tapping three times and you broadly like the experience, and then you go back to work and say, wait, I've used software that doesn't suck. All the stuff I use at work sucks. Hey, I t department, Hey senior management, can you please make like our expense tracking software not be terrible? And that is starting to happen built as a result of like internal software producers advocating for change, the result of that user feedback within companies, and also as a result of various startups happening to say, like, not to throw a particular expense solution under the bus, but the thing that most like oldline economy companies probably use is not a thing that people love using to book their travel. And if you you know, use trip actions or something that is designed by a modern team with modern sort of ux offordances, it's a much nicer solution for the end user. And in some companies, end users are starting to have some level of ability to advocate for what software gets adopted, whereas previously that was made by processes that were not user centric, where which team was better at doing whining and dining the person in charge of the purchase and decision and not winning on the basis of product quality. And the couple of years, even like some enterprise software is starting to win largely end the basis of product quality versus on sort of the more traditional sales motion, although goodness knows that the traditional sales motion is still very important to enterprise software companies. So I have a slightly weird question, but I'm thinking a lot about it as as we have this conversation. But it feels to me, like software engineering and computer programming, it always seems to be in flux. Like if your job is a software engineer, it feels like there's always something to do. You're always trying to fix a problem or adapt a system. And I guess, I guess my question is why you know? I fully admit my own programming experience is confined to like HTML, which I learned from that website HTML goodies in like But back then, you know, you program your website, you design it in HTML, you release it into the wild, and you're kind of done. And yet it seems with these large scale systems that there's always change, something is always in motion. Something is always influx. Why is that? So, let me push back a tiny bit on this year, Like how many years have we had lawyers available? And does anyone ever go up to the lawyers and say, like, come on, guys, it's three haven't you like figured out all the laws? Yeah? And so why does the law change on a week to week basis? Well, it doesn't change per se. It's just the world is complicated. The number of commercial relationships between organizations is increasing all the time. We have increasing demands on what those relationships will do. And the job of lawyers is to adapt to that increasingly complex world every weekend, continue delivering like the law that society needs and the outcomes that come as a result of like competently executing on the ability of organizations to collaborate internally with their employees and with other organizations. What's my answer for software engineering, Well, software engineers, they are you know, working this week on increasingly complex world where software is more leverage than it had even last week, where they're increasing demands on the world, etcetera, etcetera, etcetera. Is there going to be a time where the last line of software is written? Probably not. There will never be a last a bit of software written. There will never be a less contract written, there will never be a last book written. Because humans want more things out of the world than we have, kind of like infinite capacity for want. At the margin, there was a compelling answer, can you talk a little bit about you know, we've been talking about banks, airlines for startups. Can you tell us like, how are the challenges for the public sector? And I remember like Obamba had a thing about like I want to like bring government websites or government tech and the modern age. It just seems like whatever problems exist for big companies seemed to be even worse or more tricky when you're dealing with the public sector. Could I think at one point, at one point, New Jersey was explicitly like begging the Internet for cobal programmers, wasn't it. In I think that was a thing that happened. Yeah, so full disclosure here at today UH nonprofit organization. Last year where a few of us in the tech industry abandoned together to work on the vaccine location information infrastructure for the United States because the public sector was having a great deal of difficulty creating websites that would track where the vaccine was and route vaccine sekers to it. So have lots of thoughts here, so many issues. Again, we did not wake up in three with these issues magically. It's a result of like decisions that we've collectively made as a society go for many years. One decision that we've made in the United States in particulars that government pay of scales are what they are. If you compare those government pay scales to what private industry pays for technologists, they are sharply out of whack. And so then you know, if you look at GS whatever, the highest paid public sector employees in the United States make less than Google uh interns. To help for the equilibrium, if you can get hired by Google, you know, it would require you to really does matter in this realm, right. And one of the things that the government has been attempting to do over the years to create things where there are groups of people who are like officially their government employees. Unofficially, I think they're sort of doing an active service to the nation in places like the Digital Services Agency, etcetera, etcetera, where they already made their money in tech there now on the GS whatever and making a fraction of what they previously made, but are contributing software expertise to these various problems, where like what the government needs is like some competent software written, and that requires having competent software people available in quantity. Another issue that governments have is like what is the true goal you are solving for? Without getting too political about it, In some parts of the government, like, you know, an organization might exist as largely a job's program, and i T modernization might sharply decrease the effectiveness of that organization at employing a large number of people to like repeatedly to a process that a machine could do in a faster fashion. And so sometimes like the the powers that be with an organization so like, well, you know, I don't necessarily consider i T modernization one of my top priorities at the moment, because that would cause me to need to break faith with a number of people that that I employees slash. You know, sometimes my own career trajectory as a bureaucrat is and this is true within private industry as well. You know, there's a bit of empire building involved where you want your number of people that you manage and your budgets to go up every year, and you don't want to say, okay, like I've solved my problems, so I can deal with five percent as much budget next year, thank you. That is incentive incompatible. That's not great from the perspective of the parts of society which aren't employed by government, but which nonetheless dependent on government for you know, providing goods and services. And so this is ultimately a thing that we have to resolve through the political system on pushing back a little bit on saying like, hey, you kind of have to be good at what you do, and these days that involves making software that is also good at what you do. I have no magic bullet for how to cause that to be a you know, a stunning rallying cry for political parties, but probably something that needs to get said in a lot of places for enough decades until the message thinks in Patrick, this has been an amazing conversation, and I feel like a I already want to have you back and be like almost each one of your answers could be like it's or like have a full conversation. I have one last question, how does bit rot happen? And I mean, like you know, even I like you know, you like, go away on vacation for two weeks, you come back to your office computer and like, face are weird and sort of jaki, they don't quite work the same, Like what is that process? Because you would think that just like words on a database, like what did rot? So what's actually going on there? So the sardonic but true answer that you have to think of its scale is like bits in a computer can literally flipped by gamma rays coming from outer space that interact with like the physical manifestation of your your memory in the computer. And that's one cause of this that's true. That does happen, and that isn't like the dominant thing that happens. The dominant thing that happens is like there exists change in the broader system that must happen on any given basis. Change is a sort of risk. It is not always managed well. Um. This gets back to that's like a commanding majority of systemic downtime at well managed software companies is caused by attempts to upgrade the system that go less than optimally. Is the thing that is amenable to study, Like a bit happens in some cases because you know you had a constellation of software, etcetera. Installed on your machine and installed on other machines that your machine connected to, which was working. You might say exactly perfectly, exactly perfect because unknown as software, but like it was working right now. Something about the constellation changed as a result of a decision made about a machine that is not directly under your control, and that decision must be made at scale in the economy because software can't be software can't allow to be static to deliver the things that we want from software. Is a society, and then that change caused some other part of the system to behave in a less great manner, and then eventually, you know, you see the ripple effects of it in your daily life. That's the dominant way bit rot happens. It is not the bits actually getting corrupted over time, but again a thing that does happen, and we have things in engineering to control against that. Well, Patrick, you are the perfect guest for this topic. Really appreciate you coming out on odd locks. I really appreciate you having me and would be glad to be back sometimes. Definitely, Thanks so much, Patrick, that was great. I learned so many excellent new terms like bit rot, blast, radius, heroics. We mutually failed together. That one will come in hand Ea Joe, every everything we do wrong mutual. No. I love that, Patrick, Thank you so much, Thanks very much, happing Tracy. I think really Patrick was like the perfect guest for that topic, Like we needed to do this for a while. I'm glad we like did it with Patrick. Yeah. Well, I also feel like this is something that's going to keep coming up, and so we might have more opportunities from Patrick to do uh interesting post mortems on various tech failures. Yeah. Absolutely, I mean there's so many interesting things. I really like some of the questions that you asked about, like ownership of software, and I think like it's sort of like that really click to me because like if you're a software company and software is the main product, and you know, I'm thinking about like in the manufacturing analogy, you know, it's like a Taiwan semiconductor, this sort of like institutional knowledge to build something exists within the firm and it just you know gets handed down, handed down when software isn't your main product, Like if you're a Southwest like if you're a city group, etcetera. Then you can sort of see why that process of like internal knowledge that works in manufacturing, you don't get that sort of ongoing feedback, you know, sort of distribution of knowledge in some of these large organizations. Yeah. Absolutely, And it feels like, I mean Patrick kind of I think he used the expression like dropping the ball in the middle of both of us. But it does seem like that system kind of produces opportunities for I guess I'm trying to think how to phrase this for people to sort of like, no one takes total responsibility for a systems failure like that, right, because on the one hand, someone designed the software, but on the other hand, maybe it was customized by someone else. Maybe you have two different systems talking to each other and both of them mess up in some way, or there's some sort of misunderstanding. It just seems like there's such a like gray area, and maybe this is one of the reasons why it's so difficult to fix, because you have all these different things that are sort of operating together. Well, even like you know, like his last answer about how bit rot happens, right, like somewhere somewhere, like all these interconnected computers, somewhere someone has to make a change. Because he pointed out in his answer to your question about why software is never like done nor why it's never solved problem. It's like we're always demanding more so there will never be a time where someone like has the luxury of not making a change and then everyone else has to. Intercomputer we figured out technology is solved. Well, maybe that's what the AI, like, what's it, you know, the singularity, GPT it could be, but like, you know, that seems like it makes a lot of sense. Someone has to make a change because that's just how the world works. And then all these other interconnected systems like maybe they're fine with the change, but something happens and then eventually they have to change too, and so it's just like this constant state of flux. Can I tell you my one internalized programming lesson? So, um, you know I mentioned HTML and then when I was in high school, part of our computer science class we had to learn JavaScript and we had to create a program. And so I wrote this program. Again, like keep in mind that this was the year two thousand or something like that. I wrote a program. It was like a digital fortune cookie, and you know, you could click on it and it would give you a fortune. And then at the end of it, at the end of this module, we had to sign a contract signing over our program to our computer teacher took ownership. No, it was an extremely valuable lesson, which is all the coding that you're doing will ultimately belong to someone else and they'll be able to monetize it. That's the downside, as they'll monetize it for you and maybe you won't get as much. But the upside, I guess, is that you don't have to take responsibility for it. Once you write it, it goes out into the world. The computer professor owns it and he can do with it. I love that that was actually the lesson. Also, I feel like in another universe, like you could have sold that startup for a hundred million dollars to face book and it went like super viral. It seems like, how did this person make their fortune? They made like a digital fortune cookie, but they also like I loved his answer about like the public sector because it's like, there, you really do have this problem with like salary disparities, and it is pretty crazy that we sort of treat like the way we're sort of solving this problem in this country is kind of getting people to volunteer, like going to work for the government in I T and tech. It's like what you do after you're like rich and you want to give something back is like, Okay, I'm gonna like go work for the federal government and like try to help them like update their systems. Which is great that people want to do that, and like I love that, but like that does not seem like a great sustainable solution to having like a modern government that can like communicate with people and provide services for people in the way that they expect. No, Absolutely, and it's something that we see again and again in various ways. Shall we leave it there, Let's leave at the Okay, this has been another episode of the All Thoughts podcast. I'm Tracy Alloway. You can follow me on Twitter at Tracy Alloway and I'm Joe Why Isn't Though. You can follow me on Twitter at the Stalwart. Follow our guest Patrick McKenzie. He's on Twitter at Patio eleven and check out his Bits about Money newsletter. Follow our producers Carmen Rodriguez at kerman Ermine and Dash Bennett at Dashbot. And check out all of the podcasts here at Bloomberg under the handle at podcasts and For more odd Lots content, go to Bloomberg dot com slash odd Lots, where we push the transcripts of the episodes. Tracy and I blog, and we have a weekly newsletter that comes out every Friday. Go there, sign up and get into your inboxes. Thanks for listening to ye to