What Algorithms Say About You

Published Nov 2, 2020, 10:00 AM

Artificial intelligence is letting us make predictive algorithms that translate languages and spot diseases as well or better than humans. But these systems are also being used to make decisions about hiring and criminal sentencing. Do computers trained on vast datasets of human experience learn human biases, like sexism and racism? Is it possible to create an algorithm that is fair for everyone? And should you have the right to know when these algorithms are being used and how they work?


For links to materials referenced in the episode, suggestions for further learning, and guest bios, visit bravenewplanet.org

Learn more about your ad-choices at https://www.iheartpodcastnetwork.com

Pushkin. You're listening to Brave New Planet, a podcast about amazing new technologies that could dramatically improve our world. Or if we don't make wise choices, could leave us a lot worse off, Utopia or dystopia. It's up to us. On November eleventh, twenty sixteen, the Babelfish burst from fiction into reality. The Babelfish was conceived forty years ago in Douglas Adam's science fiction classic The Hitchhiker's Guide to the Galaxy. In the story, a hapless Earthling finds himself a stowaway on a Vogon spaceship. When the alien captain starts an announcement over the loudspeaker, his companion tells him to stick a small yellow fish in his ear. Listen, it's important, it's a I can't just put this in your ear. Suddenly he's able to understand the language. The Babelfish is small, yellow, leech like and probably the oddest thing in the universe. It feeds on brainwave energy, whose ambing all unconscious frequencies, the practical upshot of which is that if you stick one in your ear, you instantly understand anything said to you in any form of language. At the time, the idea of sticking an instantaneous universal translator in your ear seems charmingly absurd, But a couple of years ago, Google and other companies announced plants to start selling Babelfish well not fish actually, but earbuds that do the same thing. The key breakthrough came in November twenty sixteen, when Google replaced the technology behind its translate program. Overnight, the Internet realized that something extraordinary had happened. A Japanese computer scientist ran a quick test. He dashed off his own Japanese translation of the opening lines of Ernest Hemingway's short story The Snows of Kilmanjaro, and dared Google Translate to turn it back into English. Here's the opening passage from the Simon and Schuster audio book. Kilimanjaro is a snow covered mountain nineteen thousand, seven hundred and ten feet high and is said to be the highest mountain in Africa. Its western summit is called the Massai Nagaji Nagai, the House of God. Close to the western summit there is the dried and frozen carcass of a leopard. No one has explained what the leopard seeking at that altitude. Let's just consider that last sentence. No one has explained what the leopard was seeking at that altitude. One day earlier, Google had mangled the back translation quote. Whether the leopard had what the demand at that altitude? There is no that nobody explained. But now Google Translate returned quote. No one has ever explained what leopard wanted at that altitude. It was perfect except for a missing the the what explained the great leap? While Google had built a predictive algorithm that taught itself how to translate between English and Japanese by training on a vast library of examples and tweaking its connections to get better and better at predicting the right answer. In anyways, the algorithm was a black box. No one understood precisely how it worked, but it did amazingly well. Predictive algorithms turn out to be remarkably general. They can be applied to predict which movies a Netflix user will want to see next, or whether an eye exam or a mammogram indicates disease. But it doesn't stop there. Predictive algorithms or also being trained to make societal decisions who to hire for a job, whether to approve a mortgage application, what students to let into a college, what a rest ease to let out on bail? But what exactly are these big black boxes learning from massive data sets? Are they gaining deep new insights about people? Or might they sometimes be automating systemic biases? Today's big question when should predictive algorithms be allowed to make big decisions about people? And before they judge us, should we have the right to know what's inside the black box? My name is Eric Lander. I'm a scientist who works on ways to improve human health. I helped lead the Human Genome Project, and today I lead the Broad Institute of MIT and Harvard. In the twenty first century, powerful technologies have been appearing at a breathtaking pace, related to the Internet, artificial intelligence, genetic engineering, and more. They have amazing potential upsides, but we can't ignore the risks that come with them. The decisions aren't just up to scientists or politicians. Whether we like it or not, we all of us are the stewards of a brave New planet. This generation's choices will shape the future as never before. Coming up on today's episode of Brave New Planet, predictive algorithms. We hear from a physician at Google about how this technology might help keep millions of people with diabetes from going blind, and the idea was, well, if you could retrain the model, you could get to more patients to screen them for disease. The first iteration of the model was on par with the US board sortified ophomologists. I speak with an AI researcher about how predictive algorithms sometimes learn to be sexist and racist. If you typed in I am a white man, you would get positive sentiment. If you typed in I am a black lesbian, for example, negative sentiment. We hear how algorithms are affecting the criminal justice system. For black defendants, it was much more likely to incorrectly predict that they were going to go on to come in a future crime when they didn't, and for white defend it was much more likely to predict that they were going to go on to not commit a future crime when they did. And we hear from a policy expert about whether these systems should be regulated. A lot of the horror stories are about fully implemented tools that were in work for years. There's never a pause button. To reevaluate or look at how a system is working real time. Stay with us Chapter one, The Big Black Box. To better understand these algorithms, I decided to speak with one of the creators of the technology that transformed Google Translate. My name is Greg Kurado, and I'm a distinguished scientist at Google Research. Early in his career, Greg had trained neuroscience, but he soon shifted his focus from organic intelligence to artificial and not turned out to be really a very lucky moment, because I was becoming interested in artificial intelligence at exactly the moment that artificial intelligence was changing so much. Ever since the field of artificial intelligence started more than sixty years ago, there have been two warring approaches about how to teach machines to do human tasks. We might call them human rules versus machine learning. The way that we used to try to get computers to recognize patterns was to program into them specific rules. So we would say, oh, well, you can tell the difference between a cat and a dog by how long it's whiskers are and what kind of fur it has and does it have stripes? And trying to put these rules into computers. It kind of worked, but it made for a lot of mistakes. The other approach was machine learning, let the computer figure everything out for itself, somewhat like the biological brain. The machine learning system is actually built of tiny little decision makers or neurons. They start out connected very much in random ways, but we give the system feedback. So, for example, if it's guessing between a cat and a dog and it gets one wrong, we tell the system that it got one wrong, and we make little changes inside so that it's much more likely to recognize that cat as a cat and not mistake it for a dog. Over time, the system gets better and better and better. Machine learning had been around for decades with rather unimpressive results. The number of connections and neurons in those early systems was pretty small. We didn't realize until about two and ten that computers had gotten fast enough and the data sets were big enough that these systems could actually learn from patterns and learn from data better than we could describe rules manually. Machine learning made huge leaps. Google itself became the leading driver of machine learning. In twenty eleven, Krrado joined with two colleagues to form a unit called Google Brain. Among other things, they applied a machine learning approach to language translation. The strategy turned out to be remarkably effective. It doesn't learn French the way you would learn French in high school. It learns French the way you would learn French at home, much more like the way that a child learns the language. We give the machine the English sentence, and then we give it an example of a French translation of that whole sentence. We show a whole lot of them, probably more French and English sentences than you could read in your whole life. And by seeing so many examples of entire sentences, the system is able to learn, oh, this is how I would say this in French. That's actually, at this point about as good as a biling human would produce. Soon Google was training predictive algorithms for all sorts of purposes. We use neural network predictors to help rank search results, tell people organize their photos, to recognize speech, to find driving directions, to help complete emails. Really anything that you can think of where there's some notion of finding a pattern or making a prediction, artificial intelligence might be at play. Predictive algorithms would become ubiquitous in commerce. They let Netflix know which movies to recommend to each customer, Amazon to suggest products users might be interested in purchasing, and much more well, they're shockingly useful, they can also be inscrutable. Modern neural networks are like a black box. Understanding how they make their predictions can be surprisingly difficult. When you build an artificial neural network, you do not necessarily understand exactly the final state of how it works. Figuring out how it works becomes its own science project. One thing we do know. Predictive algorithms are especially sensitive, so the choice of examples used to train them. The systems learn to imitate the examples in the data that they see. You don't know how well they will do on things that are very different. So, for example, if you train a system to recognize cats and dogs, but you only ever show it border collies and tabbycats, it's not clear what it will do. When you show it a picture of chihuahua, all it's ever seen as border collies, it may not get the right answer. So its concept of dog is going to be limited by the dogs it's seen. That's right, and this is why diversity of data in machine learning systems is so important. You have to have a data set that represents the entire spectrum of possibilities that you expect the system to work under. Teaching algorithms turns out to be not so different than teaching people. They learn what they see. Chapter two, retina fundoscopy. It's cool that predictive algorithms can learn to translate languages and suggest movies, but what about more life changing applications. My name is Lily Ping. I am a physician by training, and I am a product manager at Google. I went to visit doctor Ping because she and her colleagues are using predictive algorithms to help millions of people avoid going blind. So. Diabetic retinopathy is a complication of diabetes that affects the back of the eye, the retina. One of the devastating complications is vision loss. All patients that have diabetes need to be screened once a year for a diabetic retnopathy. This is an asymptomatic disease, which means that you do not feel the symptoms. You don't experienced vision loss until it's too late. Now, diabetes is epidemic around the world. How many diabetics are there? Though by most estimates, there are over four hundred million patients in the world with diabetes. How do you screen a patient to see whether they have diabetic retinopathy. You need to have a special camera while a fundis camera and it takes a picture through the people of the back of the eye. We have a very small supply of retina specialists and eye doctors and they do a lot more than reading images, so they needed to scale the reading of these images. Four hundred million people with diabetes. There just aren't enough specialists for all the retinal images that need reading, especially in some countries in Asia where resources are limited and the incidence of diabetes is skyrocketing. Two hospitals in southern India recognize the problem and reached out to Google for help. That point, Google was already sort of well known for image recognition. We were classifying cats and dogs and consumer images, and the idea was, well, if you could retrain the model to recognize diabetic retinopathy, you could potentially help the hospitals in India get to more patients to screen them for disease. How did you and your colleagues set out to attack this problem? So when I first started the project, we had about one hundred thirty thousand images from eye hospitals in India as well as a screening program in the US. Also, we gathered the army of opthalmologists to grade them eight hundred eighty thousand diagnoses or rendered on one hundred thirty thousand images. So we took this training data and we put it in a machine learning model and had to do The first iteration of the model was on par with the US board sortified ophomologists. Since then, we've made some improvements to the model, and the initial training took about how long The first time we train a model, it may have taken a couple of weeks, But then the second time you train the next models and next models, it's just it's shorter and shorter, sometimes overnight, sometimes overnight. Well, yes, all right, And by contrast, how long does it take to train a board certified ophthimologist, So that usually takes at least five years, and then you also have additional fellowship ears to specialize in the retina. And at the end of that you only have one board certified ophthimologist. Yes, at the end of that you'd have one very very well trained doctor, but that's not scaled. Yes, So by contrast, a model like this scales worldwide and never fatigues. It consistently gives the same diagnosis on the same image, and it obviously takes a much shorter time to train. That being said, it does a very very narrow task that is just a very small portion of what that doctor can do. The retina screening tools already being used in India, It was recently approved in Europe and its under review in the United States. Groups around the world are now working on other challenges in medical imaging, like detecting breast cancers at earlier stages. But I was particularly struck by a surprising discovery by Lily's team that unexpected information about patients was hiding in their retinal pictures. In the fundess image, there are blood vessels, and so one of the thoughts that we had was, because you can see these vessels, I wonder if we can predict cardiovascular disease from the same image. So we did an experiment where we took fundess images and we train a model to predict whether or not that patient would have a heart attack in five years. We found that we could tell whether or not this patient may have a a vascular event much better than doctors. It speaks to what might be in this data that we've overlooked. The model could make predictions that doctors couldn't from the same type of data. It turned out the computer could also do a reasonable job of predicting a patient sex, age, and smoking status. The first time I did this with an ahomologist, I think she thought I was trolling her. I said, well, here pictures. Guess which one is a woman, Guess which one is a man. Guess which one's a smoker, Guess which one is young. Right, these are all tasks that doctors don't generally do with these images. It turns out the model was right ninety eight ninety nine percent of the time. That being said, there are much easier ways of getting the sex of a patience. So so, while scientifically interesting, this is one of the most useless clinical predictions ever. So how far can it go? If you gave preference for rock music or not? What do you think? You know? We tried predicting happiness. That didn't work, So I'm guessing rock music. Oh, probably not, but who knows. So. Predictive algorithms can learn a remarkable range of tasks, and they can even discover hidden patterns that humans miss. We just have to give them enough training data to learn from. Sounds pretty fantastic. What could possibly go wrong? Chapter three? What could possibly go wrong? If predictive algorithms can use massive data to discover unexpected connections between your eye and your heart, what might they be learning about, say, human society. To answer this question, I took a trip to speak with Kate Crawford, the co founder and co director of the AI Now Institute at New York University. When we begin and we were the world's first AI institute dedicated to studying the social implications of these tools. To me, these are the biggest challenges that we face right now, simply because we've spent decades looking at these questions from a technical lens at the expense of looking at them at a social and an ethical lens. I knew about Kate's work because we served together on a working group about artificial intelligence for the US National Institutes of Health. I also knew she had an interesting background. I grew up in Australia. I studied a really strange grab bag of disciplines. I studied law, I studied philosophy, and then I got really interested in computer science, and this was happening at the same time as I was writing electronic music on large scale modulus synthesizers, and that's still a thing that I do today. It's almost like the opposite of artificial intelligence because it's so analog, so I absolutely love it for that reason. In the year two thousand and Kate's band released an album entitled twenty twenty that included a pression song called Machines work so that people have time to think. It's funny because we use a sample from an early IBM promotional film that was made in the nineteen sixties, which says machines can do the work so that people have time to think, and we actually ended up sort of cutting it and splicing it in the track, so it ends up saying that people can do the work so that machines have time to think. And strangely, the more that I've been working in the sort of machine learning space, I think, yeah, there's a lot of ways in which actually people are doing the work so that machines can do all the thinking. Kate gave me a crash course on how predictive algorithms not only teach themselves language skills, but also in the process acquire human prejudices, even in something as seemingly benign as language translation. So in many cases, if you say, translate a sentence like she is a doctor into a language like Turkish, and then you translate it back into English, and you're saying Turkish because Turkish has pronouns that are not gendered precisely, and so you would expect that you would get the same sentence back, but you do not. It will say he is a doctor, so she is a doctor was translated into gender neutral Turkish as all beer doctor, which was then back translated into English as he is a doctor. In fact, you could see how much the predictive algorithms had learned about gender roles. Just by giving Google Translate a bunch of gender neutral sentences in Turkish. You got he is an engineer, she is a cook. He is a soldier, but she is a teacher. He is a friend, but she is a lover. He is happy and she is unhappy. I find that one particularly odd, and it's not just language translation that's problematic. The same sort of issues arise in language understanding. Predictive algorithms were trained to learn analogies by reading lots of texts, they concluded that dog is to puppy as cat is to kitten, and man is to king as woman is to queen. But they also automatically inferred that man is to computer programmer as woman is to homemaker. And with the rise of social media, Google used text on the Internet to train predictive algorithms to infer the sentiment of tweets and online reviews. Is it a positive sentiment? Is it a negative sentiment? I believe it was Google who released their sentiment engine, and you could just try it online, you know, put in a sentiment and see what you'd get. And again, similar problems emerged. If you typed in I am a white man, you would get positive sentiment. If you typed in a black lesbian, for example, negative sentiment. Just as Greg Korado explained with chihuahuas and border collies, the predictive algorithms were learning from the examples they found in the world, and those examples reflected a lot about past practices and prejudices. If we think about where you might be scraping large amounts of text from say Reddit, for example, and you're not thinking about how that sentiment might be biased against certain groups, then you're just basically importing that directly into your tool. But it's not just conversations on Reddit. There's the cautionary tale of what happens when Amazon let a computer teach itself how to sift through mountains of resumes for computer programming jobs to find the best candidates to interview. So they set up this system, they designed it, and what they found was a very quickly this system had learned to discard and really demote the applications from women. And typically if you had a women's college mentioned, and even if you had the word women's on your resume, your application would go to the bottom of the pile. All right, So how does it learn that? So, first of all, we take a look at who is generally hired by Amazon, and of course they have a very heavily skewed male workforce, and so the system is learning that these are the sorts of people who will tend to be hired and promoted. And it is not a surprise then that they actually found it impossible to really retrain this system. They ended up abandoning this tool because simply correcting for a bias is very hard to do when all of your ground truth data is so profoundly skewed in a particular direction. So Amazon dropped this particular machine learning project and Google fixed the Turkish to English problem. Today, Google Translate gives both he is a doctor and she is a doctor as translation options. But biases keep popping up in predictive algorithms in many settings, there's no systematic way to prevent them. Instead, spotting and fixing biases has become a game of whacamole Chapter four quarterbacks. Perhaps it's no surprise that algorithms trained in the wild west of the Internet or on tech industry hiring practices learned serious biases. But what about more sober settings like a hospital. I talked with someone recently discovered similar problems with potentially life threatening consequences. Hi am Christine Vogeli. I'm the director of evaluation research at Partner's Healthcare here in Boston. Partner's Healthcare, recently rebranded as mass General Brigham, is the largest healthcare provider in Massachusetts, a system that has six thousand doctors and a dozen hospitals and serves more than a million patients. As Christine explained to me, the role of healthcare providers in the US has been shifting. The responsibility for controlling costs and ensuring high quality services is now being put down on the hospitals and the doctors. And to me, this makes a lot of sense, Right, we really should be the ones responsible for ensuring that there's good quality care and that we're doing it efficiently. Healthcare providers are especially focusing their attention on what they call high risk patients. Really, what it means is that they have both multiple chronic illnesses and relatively acute chronic illnesses. So give me a set of conditions that a patient might have, right, So somebody, for example, with cardiovascular disease co occurring with diabetes, and you know, maybe they also have depression. They're just kind of suffering and trying to get used to having that complex illness and how to manage it. Partners Healthcare offers a program to help these complex patients. We have a nurse or social worker who works as a care manager who help everything from education to care coordination services. But really that care manager works essentially as a quarterback, arranges everything but also provides hands on care to the patient and the caregiver. Yeah, I think it's a wonder how we expect patients to go figure out all the things they're supposed to be doing and how to interact with the medical system without a quarterback. It's incredibly complex. These patients have multiple specialists who are interacting with the primary care physician. They need somebody to be able to tie it together and be able to create a care plan for them that they can follow, and it pulls everything together from all those specialists. Partners Healthcare found that providing complex patients with quarterbacks both saved money and improved patient's health. For example, they had fewer emergency visits each year, so Partners developed a program to identify the top three percent of patients with the greatest need for the service. Most were recommended by their physicians, but they also used a predictive algorithm provided by a major health insurance company that assigns each patient a risk score. What does the algorithm do? When you look at the web page, it really describes itself as a tool to help identify high risk patients. And that term is really interesting term to me. What makes a patient high risk? So I think from an insurance perspective, risk means these patients are going to be expensive from a healthcare organization perspective, these are patients who we think we could help, and that's the fundamental challenge on this one. When the team began to look closely at the results, they noticed that people recommended by the algorithm were strikingly different than those recommended by their doctor. We noticed that black patients overall were underrepresented patients with similar numbers of chronic illnesses. If they were black, they had a lower riskcore than if they were white, and that didn't make sense. Just black patients identified by the algorithm turned out to have twenty six percent more chronic illnesses than white patients with the same risk scores. So what was wrong with the algorithm? It was because given a certain level of illness, black and minority patients tend to use fewer healthcare services, and whites tend to use more even if they have the same level of chronic even if they have the same level of chronic conditions. That's right, So in some sense, the algorithm is correctly predicting the cost associated with the patient, but not the need exactly. It predicts costs very well, but we're interested in understanding patients who are sick and have needs It's important to say that the algorithm only used information about insurance claims and medical costs. It didn't use any information about a patient's race. But of course these factors are correlated with race due to longstanding issues in American society. Frankly, we have fewer minority physicians and we do white physicians. So the level of trust minorities with the healthcare system, we've observed it's lower. And we also know that there are just systematic barriers to care that certain groups of patients experience more so. For example, race and poverty go together and job flexibility. So all these issues with scheduling, being able to come in, being able to access services are just heightened for minority populations relative to white populations. So someone who just has less economic resources might not be able to get off work, might not be able to get off work, might not have the flexibility with childcare to be able to come in for a visit when they need to. Exactly, so it means that if one only relied on the algorithm, you wouldn't be targeting the right people. Yes, we would be targeting more advantaged patients who tend to use a lot of healthcare services when they corrected the problem, the proportion black patients in the high risk group jumped from eighteen percent to forty seven percent. Christine, together with colleagues from several other institutions, wrote up a paper describing their findings. It was published in Science, the nation's leading research journal, in twenty nineteen. It made a big splash, not least because many other hospital systems we're using the algorithm and others like it. We've since changed the algorithm that we use to one that uses exclusively information about chronic illness and not healthcare utilization, and has that worked. We're still testing. We think it's going to work, but as in all of these things, you really need to test it. You need to understand and see if there's actually any biases. In the end, you can't just adopt an algorithm. It's very important to be very conscious about what you're predicting. It's also very important to think about what are the factors you're putting into that prediction algorithm. Even if you believe the ingredients so right, you do actually have to see how it works in practice. Anything that has to do with people's lives, you know, you have to be transparent about it. Chapter five Compass Transparency. Christine Vogeli and her colleagues were able to get to the bottom of the issue with the medical risk prediction because they had ready access to the partners healthcare data and could test the algorithm. Unfortunately, that's not always the case. I traveled to New York to speak with a person who's arguably done more than anyone to focus attention on the consequences of algorithmic bias. My name is Julia Anguin. I'm a journalist. I've been writing about technology for twenty five years, mostly at the Wealthy Journal and Pro Publicat. Julia grew up in Silicon Valley as the child of a mathematician and a chemist. She studied math at the University of Chicago, but decided to pursue a career your in journalism. Her quantitative skills gave her a unique lens to report on the societal implications of technology, and she eventually became interested in investigating high stakes algorithms. When I learned that there was actually an algorithm that judges used to help decide what to sentence people, I was stunned. I thought, this is shocking. I can't believe this exists, and I'm going to investigate it. What we're talking about is a score that is assigned to criminal defendants in many jurisdictions in this country that aims to predict whether they will go on to commit a future crime. It's known as a risk assessment score, and the one that we chose to look at was called the Compass Risk Assessment Score. Based on the answers to a long list of questions, Compass gives defendants a risk score from one to ten. In some jurisdictions, judges use the Compass score to decide whether defendants should be released on bail before trial. In others, judges use it to decide the length of sentence to impose undefendants who plead guilty or who were convicted a trial. Julia had a suspicion that the algorithm might reflect bias against black defendants. Attorney General Eric Holder had actually given a big speech saying he was concerned about the use of these growers and whether they were exacerbating racial bias, and so that was one of the reasons we wanted to investigate. But investigating wasn't easy. Unlike Christine Vogeli had partner's healthcare. Julia couldn't inspect the Compass algorithm itself. Now, Compass isn't a modern neural network who was developed by a company that's now called Equivand and it's a much simpler algorithm. It's basically a linear equation that should be easy to understand. But it's a black box of a different sort. The algorithm is opaque because to date Fant has insisted on keeping it a trade secret. Julia also had no way to download defendants Compass scores from her website, so she had to gather the data herself. Her team decided to focus on Broward County, Florida. Florida has great public records laws, and so we filed a public records request and we did end up getting eighteen thousand scores. We got scores for everyone who was arrested for a two year period. Eighteen thousand scores. All right, So then what did you do to evaluate these scores? Well, first thing we did when we got the eighteen thousand scores was actually we just threw them into a bar chart black and white defendants. We immediately noticed there was really different looking distributions for black defendants. The scores were evenly distributed, meaning one through ten lowest risk to highest Chris. There's equal numbers of black defendants in every one of those buckets. For white defendants, the scores were heavily clustered in the low risk range. And so we thought, there's two options. All the white people getting scored in Broward County are legitimately really low risk. They're all Mother Teresa, or there's something weird going on. Julia sworted the defendants and to those who were rearrested over the next two years and those who weren't. She compared the compass scores that had been assigned to each group. For black defendants, it was much more likely to incorrectly predict that they were going to go on to commit a future crime when they didn't, and for white defendants, it was much more likely to predict that they were going to go on to not commit a future crime when they did. They were twice as many false positives for black defendants as white and twice as many false negatives for white defendants as black defendants. Julia described the story of two people whose arrest histories illustrate this difference. A young eighteen year old black girl named Brecia Borden, who had been arrested after picking up kid's bicycle from their front yard. Riding at a few blocks. The mom came out yelled at her, so that's my kid's bike. She gave it back, but actually by then the neighbor had called the police, and so she was arrested for that. And we compared her with a white man who had stolen about eighty dollars worth of stuff from a drug store Vernon Prater. When teenager Brecia Borden got booked into jail, she got a high compass score and eight, predicting a high risk that she'd get re arrested, And Vernon Prader, he got a low score a three. Now he had already committed two armed robberies and had served time. She was eighteen. She given back the bike, and of course these scores turned out to be completely wrong. She did not go on to commit a future crime in the next two years, and he actually went on to break into a warehouse steal thousands of dollars of electronics and he's serving a ten year ten And so that's what the difference between a false positive and a false negative looks like. It looks like Fresha Borden and Vernon Prater Chapter six Criminal Attitudes. Julia Anguin and her team spent over a year doing research. In May twenty sixteen, Pro Publica published their article headlined machine Bias. The subtitle quote their software used across the country to predict future criminals, and it's biased against blacks. Julia's team released all the data they had collected so that anyone could check or dispute their conclusions. What happened next was truly remarkable. The Pro Publica article provoked an outcry for some statisticians, who argued that the data actually proved moved Compass wasn't biased. How could they reach the opposite conclusion. It turned out the answer depended on how you define bias. Pro Publica had to analyze the Compass scores by looking backward after the outcomes were known among people who are not re arrested. They found that black people had been assigned much higher risk scores than white people. That seemed pretty unfair, but statisticians use the word bias to describe how a predictor performs when looking forward before the outcomes happened. It turns out that black people and white people who received the same risk score had roughly the same chance of being rearrested. That seems pretty fair, So whether Compass was fair or unfair depended on your definition of fairness. This sparked an explosion of academic research. Matt Maticians showed there's no way out of the problem. They proved a theorem saying it's impossible to build a risk predictor that's fair when looking both backward and forward unless the arrest rates for black people and white people are identical, which they aren't. The pro public article also focused at tension on many other ways in which COMPASS scores are biased, like the healthcare algorithm that Christine Vogeli studied. Compass scores don't explicitly ask about a person's race, but race is closely correlated with both the training data and the inputs to the algorithm. First, the training data, COMPASS isn't actually trained to predict the probability that a person will commit another crime. Instead, it's trained to predict whether a person will be arrested for committing another crime. The problem is there's abundant evidence that in situations where black people and white people commit crimes at the same rate, for example, illegal drug use, black people are much more likely to get arrested, so Compass is being trained on an unfair outcome. Second, the questionnaire used to calculate Compass scores is pretty revealing. Some sections assess peers, work, and social environment. The questions include how many of your friends and acquaintances have ever been arrested? How many have been crime victims? How often do you have trouble paying bills. Other sections are titled criminal personality and criminal attitude. They ask people to agree or disagree with such statements as the law doesn't help average people, or many people get into trouble because society has given them no education, jobs, or future. In a nutshell, the predictor penalizes defendants who are honest enough to admit they live in high crime neighborhoods or they don't fully trust the system. From the questionnaire, it's not hard to guess how a teenage black girl arrested for something so minor is writing someone else's bicycle a few blocks and returning it might have received a COMPASS score of eight. And it's not hard to imagine why racially correlated questions would do a good job of predicting racially correlated arrest rates. Pro PUBLICA didn't win a Pulitzer Prize for its article, but it was a remarkable public service Chapter seven Minority report. Putting aside the details of Compass, I wanted to find out more about the role of predictive algorithms in courts. I reached out to one of the leading legal scholars in the country. I'm Martha Minnow. I'm a law professor at Harvard, and I have recently immersed myself in issues of algorithmic fairness. Martha Minnow has a remarkable resume. From two thousand and nine to twenty seventeen, she served as dean of the Harvard Law School, following now Supreme Court Justice Elaina Kagan. Martha also served on the board of the government sponsored Legal Services Corporation, which provides legal assistance to low income Americans. She was appointed by her former law student, President Barack Obama. It became very interested in and concerned about the increasing use of algorithms in worlds that touch on my preoccupations with equal protection, do process, constitutional rights, fairness, anti discrimination. Martha recently co signed a statement with twenty six other lawyers and scientists raising quote grave concerns about the use of predictive algorithms for pre trial risk assessment. I asked her how courts had gotten involved in the business of prediction. Criminal's justice system has flirted with the use of prediction forever, including discussions from the nineteenth century on in this country about dangerousness and whether people should be detained prevactively. So far, that's not permitted in the United States. It appears in Minority Report and other interesting movies. The movie starring Tom Cruise tells the story of a future in which the PreCrime division of the police arrest people for crimes they haven't yet committed. I'm placing you under arrest for the future, murder, Sarah marks. We are arresting individuals who've broken no law, but they will. The use of prediction in the context of sentencing is part of this rather large sphere of discretion that judges have to decide what kind of sentence fits the crime you're saying. In sentencing, one is allowed to use essentially information from the pre crime division about crimes that haven't been committed yet. Well, I am horrified by that suggestion, but I think it's fair to raise it as a concern. The problem is if we actually acknowledge purposes of the criminal justice system, some of them start to get into the future. So if one purpose is simply incapacitation, prevent this person from walking the streets because they might hurt someone else, there's a prediction built in. So judges have been factoring in predictions about a defendant's future behavior for a long time. And judges certainly aren't perfect. They can be biased or sometimes just cranky. There are even studies showing the judges hand down harsher sentences before lunch breaks than after. Now, the defenders of risk Prediction score will say, well, it's always not what's the ideal but compared to what and if the alternative is we're relying entirely on the individual judges and their prejudices, their lack of education, what they had for lunch. Isn't this better that it will provide some kind of scaffold for more consistency. Journalist Julia Anguin has heard the same arguments some good friends right who really believe in the use of these criminal risks score algorithms. I've said to me, look, Julia, the fact is judges are terribly biased, and this is an improvement, and my feeling is That's probably true for some judges and maybe less true for other judges. But I don't think it is a reason to automate bias, right, Like I don't understand why you say, Okay, humans are flawed, so why don't we make a flawed algorithm and bake it into every decision, because then it's really intractable. Martha also worries that numerical risk scores are misleading. The judges think high numbers mean people are very likely to commit violent crime. In fact, the actual probability of violence is very low, about eight percent according to a public assessment, And she thinks numerical scores can lull judges into a false sense of certainty. There's an appearance of objectivity because it's math, but is it really Then for lawyers, they may have had no math, no numeracy education since high school. Many people go to a law in part because they don't want to do anything with numbers. And there is a larger problem, which is the deference to expertise, particularly scientific expertise. Finally, I wanted to ask Martha if defendants have a constitutional right to know what's inside the black box that's helping to term in their fate. I confess I thought the answer was an obvious yes until I read a twenty sixteen decision by Wisconsin's Supreme Court. The defendant in that case, Eric Loomis, pled guilty to operating a car without the owner's permission and fleeing a traffic officer. When Loomis was sentenced, the presentencing report given to the judge included a Compass score that predicted Loomis had a high risk for committing future crimes. He was sentenced to six years in prison. Loomis appealed, arguing that his inability to inspect the Compass algorithm violated his constitutional right to due process. Wisconsin's Supreme Court ultimately decided that Loomis had no right to know how Compass worked. Why. First, the Wisconsin court said the score was just one of several inputs to the judge's sentencing decision. Second, the court said even if Lomas didn't know how the score was determined, he could still dispute its accuracy. Lomas appealed to the US Supreme Court, but it declined to hear the case. I find that troubling and not persuasive. It was up to you, how would you change the law. I actually would require transparency for any use of any algorithm by a government agency or court that has the consequence of influencing not just deciding, but influencing decisions about individual's rights. And those rights could be rights to liberty, property opportunities. So transparency, transparency, and am be able to see what this algorithm does, absolutely and have the code and be able to give it to your own lawyer and your own experts. But should a state be able to buy a computer program that's proprietary. I mean it would say, well, I'd love to give it to you, but it's proprietary. I can't. Should that be okay? I think not, because if that then limits the transparency, that seems a breach. But you know, this is a major problem, the outsourcing of government activity that has the effect of bypassing restrictions. Take another example, when the US government hires private contractors to engage in war activities, they are not governed by the same rules that govern the US military. She's saying that government can get around constitutional limitations on the government by just outsourcing it to somebody who's not the government. It's currently the case, and I think that's wrong for her part, journalist Julia Angwin is baffled by the Wisconsin Court's ruling. I mean, we have this idea that you should be able to argue against whatever accusations are made. But I don't know how you make an argument against a score, like the score says you're seven, but you think you're a four. How do you make that argument If you don't know how that seven was calculated? You can't make an argument that you're a four Chapter eight robo recruiter. Even if you never find yourself in a criminal court filling out a compass questionnaire, that doesn't mean you won't be judged by a predictive algorithm. There's actually a good chance it will happen the next time you go looking for a job. I spoke to a scientist at a high tech company that screens job applicants. My name is Lindsay Zulaga, and I'm actually educated as a physicist, but now working for a company called higher View. Higher View is a video interviewing platform. Companies create an interview, candidates can take it at any time that's convenient for them, So they go through the questions and they record themselves answer. So it's really a great substitute for kind of the resume phone screening part of the process. When a candidate takes a video interview, they're creating thousands of unique points of data. A candidate's verbal and nonverbal cues give us insight into their emotional, engagement, thinking, and problem solving style. This combination of cutting edge AI and validated science is the perfect partner for making data driven talent decisions. Higher View. You know, we'll have a customer and they are hiring for something like a call center, say it's sales calls. And what we do is we look at past employees that applied, and we look at their video interviews. We look at the words they said, tone of voice, pauses, and facial expressions, things like that, and we look for patterns in how those people with good sales numbers behave as compared to people as low sales numbers. And then we have this algorithm that scores new candidates as they come in, and so we help kind of get those more promising candidates to the top of the pile so they're seen more quickly. So Higher View trains a predictive algorithm on video interviews of past applicants who turned out to be successful employees. But how does higher View know its program isn't learning sexism or racism or other similar biases. There are lots of reasons to worry. For example, studies from M I. T have shown that facial recognition algorithms can have a hard time reading emotions from black people's faces. And how would Higher Views program evaluate videos from people who might look or sound different than the average employee, say, people who don't speak English as a native language, who are disabled, who are on the autism spectrum, or even people who are just a little quirky. Well, Lindsay says Higher View tests for certain kinds of bias, So we audit the algorithm after the fact and see if it's scoring different groups differently in terms of age, race, and gender. So if we do see that happening a lot of times, that's probably coming from the training data. So maybe there is only one female software engineer in this data set, the model might mimic that bias. If we do see any of that adverse impact, we simply remove the features that are causing it, so we can say this model is being sexist. How does the model even know what gender the person is? So we look at all the features, and we find the features that are the most correlated to gender. If there are, we simply remove some of those features. I asked lindsay why people should believe higher views or any company's assurances, or whether something more was needed. You seem thoughtful about this, but there will be many people coming into the industry over time might not be as thoughtful or as sophisticated as you are. Do you think it would be a good idea to have third parties come in to certify the audits for bias? I know that's a hard question, I guess I I kind of lean towards no. So you're talking about having a third party entity that comes in an assess and certifies the audit. You know, because you've described what I think is a really impressive process. But of course, how do we know it's true? You know, you could reveal all your algorithms, but probably not the thing you want to do, And so the next best thing is a certifier says yes, this audit has been done. Probably you know your financials presumably get audited. Why not the result of the algorithm? I guess a little bit. The reason I'm not sure about the certification is just. It is mostly just because I feel like I don't know how it would work exactly, Like you're right totally that finances are audited. I haven't thought about it enough to have like a strong opinion that it should happen, because it's like, Okay, we have all these different models, it's constantly changing. How to do they audit every single model all the time. I was impressed with Lindsay's willingness as a scientist to think in real time about a hard question, and it turns out she kept thinking about it afterwards. A few months later, she wrote back to me to say that she changed her mind. We do have a lot of private information, but if we don't share it, people tend to assume the worst. So I've decided, after thinking about it quite a bit, that I definitely support the third party auditing of algorithms. Sometimes people you assume we're doing horrible, horrible things, and that can be frustrating. But I do think the more transparent we can be about what we are doing is important. Several months later, Lindsay emailed again to say that Higher View was now undergoing a third party audit. She says she's excited to learn from the results, Chapter nine, confronting the black box so higher view at first, Reluctant says it's now engaging external auditors. What about Equivant, whose Compass scores can heavily influence prison sentences, but which is steadfastly refused to let anyone even see how their simple algorithm works. Well. Just before we release this podcast, I checked back with them. A company spokesperson wrote that Equivant now agrees that the Compass scoring process quote should be made available for third party examination, but they weren't releasing it yet because they first wanted to file for copyright protection on their simple algorithm. So we're still waiting. You might ask, should it be up to the companies to decide? Aren't there laws or regulations? The answer is there's not much. Governments are just now waking up to the idea that they have a role to play. I traveled back to New York City to talk to someone who's been involved in this question. My name's Rashida Richardson, and I'm a civil rights lawyer that focuses on the social implications of artificial intelligence. Rashida served as the director of policy research at a i Now Institute at NYU, where she worked with Kate Crawford, the Australian expert and algorithmic bias that I spoke to earlier in the episode. In twenty eighteen, New York City became the first jurisdiction in the US to create a task force to come up with recommendations about government use of predictive algorithms, or, as they call them, automated decision systems. Unfortunately, the task force bogged down in details and wasn't very productive. In response, Rashida led a group of twenty seven experts that wrote a fifty six page shadow report entitled Confronting Black Boxes that offered concrete proposals. New York City it turns out, uses quite a few algorithms to make major decisions. You have the school matching algorithms. You have an algorithm used by the Child Welfare agency here. You have public benefits algorithms that are used to determine who will qualify or have their public benefits, whether that's Medicaid or temporary food assistance terminated, or whether they'll receive access to those benefits. You have a gang database which tries to identify who is likely to be in a gang, and that's both used by the DA's office and the police department. If you had to make a yes, how many predictive algorithms are used by the City of New York, I'd say upwards to thirty and I'm underestimating with that number. How many of these thirty plus algorithms are transparent about how they work, about their code. None. So what should New York do? It was up to you what should be the behavior of a responsible city with respect to the algorithms it uses. I think the first step is creating greater transparency, some annual acknowledgement of what is being used, how it's being used, whether it's been tested or had a validation study. And then you would also want general information about the inputs or factors that are used by these systems to make predictions, because in some cases you have factors that are just discriminatory or proxies for protected status is like race, gender, ability status. All right, So step one, disclose what systems you're using. Yes, And then the second step, I think is creating a system of audits, both prior to procurement and then once procured, ongoing auditing of the system to at least have a gauge on what it's doing real time. A lot of the horror stories we hear are about fully implemented tools that we're in works for years. There's never your pause button to reevaluate or look at how a system is working real time. And even when I did studies on the use of predictive policing systems, I looked at thirteen jurisdictions, only one of them actually did a retrospective review of their system. So what's your theory about how do you get the auditing done? If you are going to outsource to third parties, I think it's going to have to be some approval process to assess their level of independence, but also any conflict of interest she use that may come up, and then also doing some thinking about what types of expertise are needed, because I think if you don't necessarily have someone who understands that social context or even the history of a certain government sector, then you could have a tool that is technically accurate and meets all of the technical standards, but is still reproducing harm because it's not paying attention to that social context. Should a government be permitted to purchase an automated decision system where the code can't be disclosed by contract now, and in fact, there's movement around creating more provisions that vendors must waive trade secrecy claims once they enter a contract with the government. Rashida says, we need laws to regulate the use of predictive algorithms, both by governments and by private companies like higher View. We're beginning to see bills being explored in different states. Massachusetts, Vermont, and Washington DC are considering setting up commissions to look at the government use of predictive algorithms. Idaho recently passed a first in the nation law requiring that pre trial risk algorithms be free of bias and transparent. It blocks manufacturers of tools like Compass from claiming trade secret protection. And at the national level, a bill was recently introduced in the US Congress, the Algorithmic Accountability Act. The bill would require that private companies ensure certain types of algorithms are audited for bias. Unfortunately, it doesn't require that the results of the audit are made public, so there's still a long way to go. Rashida thinks it's important that regulations don't just focus on technical issues. They need to look at the larger context. Part of the problems that were identif fine with these systems is that they're amplifying and reproducing a lot of the historical and current discrimination that we see in society. There are large questions we've been unable to answer as a society of how do you deal with the compounded effect of fifty years of discrimination? And we don't have a simple answer, and there's not necessarily going to be a technical solution. But I think having access to more data in an understanding of how these systems are working will help us evaluate whether these tools are even being evalue added and addressing the larger social questions. Finally, Kate Crawford says laws alone likely won't be enough. There's another thing we need to focus on. In the end, it really matters who is in the room designing these systems. If you have people sitting around a conference table, they all look the same. Perhaps they all did the same type of engineering degree. Perhaps they're all men. Perhaps they're all pretty middle class or pretty well off. They're going to be designing systems that reflect their worldview. What we're learning is that the more diverse those rooms are, and the more we can question those kinds of assumptions, the better we can actually design systems for a diverse world. Conclusion, choose your planet. So there you have it, Steorides of the Brave New Planet. Predictive algorithms, a sixty year old dream of artificial intelligence machines making human like decisions has finally become a reality. If a task can be turned into a prediction problem, and if you've got a mountain of training data, algorithms can learn to do the job. Countless applications are possible, translating languages instantaneously, providing expert medical diagnoses for eye diseases and cancer to patients anywhere, improving drug development, all at levels comparable to or better than human experts. But it's also letting governments and companies make automatic decisions about you, whether you should get admitted to college, be hired for a job, get a loan, get housing assistance, be granted bail, or get medical attention. The problem is that algorithms that learn to make human like decisions based on past human outcomes can acquire a lot of human biases about gender, race, class, and more often masquerading as objective judgment. Even worse, you usually don't have a right even to know you're being judged by a machine, or what's inside the black box, or whether the algorithms are accurate or fair. Should laws require that automated decision systems used by governments or companies be transparent? Should they require public auditing for a curacy and fairness? And what exactly is fairness? Anyway? Governments are just beginning to wake up to these issues, and they're not sure what they should do. In the coming years, they'll decide what rules to set, or perhaps to do nothing at all. So what can you do a lot? It turns out you don't have to be an expert and you don't have to do it alone. Start by learning a bit more. Invite friends over virtually or in person when it's saved for dinner and debate about what we should do. Or organize a conversation at a book club, a faith group, or a campus event. And then email your city or state representatives to ask what they're doing about the issue, maybe even proposing first steps like setting up a task force. When people get engaged, action happens. You'll find lots of resources and ideas at our website, Brave New Planet dot org. It's time to choose our planet. The future is up to us, James. Brave New Planet is a co production of the Broad Institute of Mt and Harvard Pushkin Industries in the Boston Globe, with support from the Alfred P. Sloan Foundation. Our show is produced by Rebecca Lee Douglas with Mary Doo theme song composed by Ned Porter, mastering and sound designed by James Garver, fact checking by Joseph Fridman, and a Stitt and Enchant. Special Thanks to Christine Heenan and Rachel Roberts at Clarendon Communications, to Lee McGuire, Kristen Zarelli and Justine Levin Allerhand at the Broad, to Milobelle and Heather Faine at Pushkin, and to Eli and Edy Brode who made Broad Institute possible. This is Brave New Planet. I'm Ericlander.

Brave New Planet

Utopia or dystopia? It’s up to us.In the 21st century, powerful technologies have been appearing at  
Social links
Follow podcast
Recent clips
Browse 9 clip(s)