Pay Attention to that Human Behind the Curtain (2024)

O'Reilly Media

Published in

oreillymedia

5 min read

Aug 24, 2020

Editor’s Note: Cassie Kozyrkov is one of the dozens of expert data scientists who shared wisdom learned from her years of experience as Google’s Chief Decision Scientist in the upcoming 97 Things About Ethics Everyone in Data Should Know. In this short excerpt, Cassie explores the tenets that underpin bias and fairness in AI. We’d love to hear from you about what you think about this piece.

No technology is free of its creators. Despite our fondest sci-fi wishes, there’s no such thing as AI systems that are truly separate and autonomous…because they start with us. Though its effect can linger long after you’ve pressed a button, all technology is an echo of the wishes of whomever built it.

Data and Math Don’t Equal Objectivity

If you’re looking to AI as your savior from human foibles, tread carefully. Sure, data and math can increase the amount of information you use in decision making and/or save you from heat-of-the-moment silliness, but how you use them is still up to you.

Look, I know sci-fi sells. It’s much flashier to say “The AI learned to do this task all by itself” than to tell the truth: People used a tool with a cool name to help them write code. They fed in examples they considered appropriate, found some patterns in them, and turned those patterns into instructions. Then they checked whether they liked what those instructions did for them.

The truth drips with human subjectivity — look at all those little choices along the way that are left up to people running the project. What shall we apply AI to? Is it worth doing? In which circ*mstances? How shall we define success? How well does it need to work? The list goes on and on.

Tragicomically, adding data to the mix obscures the ever-present human element and creates an illusion of objectivity. Wrapping a glamorous coat of math around the core doesn’t make it any less squishy. Technology always comes from and is designed by people, which means it’s no more objective than we are.

Pay Attention to that Human Behind the Curtain (3)

What Is Algorithmic Bias?

Algorithmic bias refers to situations in which a computer system reflects the implicit values of the people who created it. By this definition, even the most benign computer systems are biased; when we apply math toward a purpose, that purpose is shaped by the sensibilities of our times. Is AI exempt? Not at all. Stop thinking of AI as an entity and see it for what it really is: an excellent tool for writing code.

The whole point of AI is to let you explain your wishes to a computer using examples (data!) instead of instructions. Which examples? That depends on what you’re trying to teach your system to do. Think of your dataset as the textbook you’re asking your machine student to learn from.

Datasets Have Human Authors

When I’ve said that “AI bias doesn’t come from AI algorithms, it comes from people,” some folks have written to tell me that I’m wrong because bias comes from data. Well, we can both be winners…because people make the data. Like textbooks, datasets reflect the biases of their authors.

The data you create for your system to learn from will be biased by how you see the world.

This Is No Excuse to Be a Jerk

Philosophical arguments invalidating the existence of truly unbiased and objective technology don’t give anyone an excuse to be a jerk. If anything, the fact that you can’t pass the ethical buck to a machine puts more responsibility on your shoulders, not less.

Sure, our perceptions are shaped by our times. Societal ideas of virtue, justice, kindness, fairness, and honor aren’t the same today as they were for people living a few thousand years ago, and they may keep evolving. That doesn’t make these ideas unimportant; it only means we can’t outsource them to a heap of wires. They’re the responsibility of all of us, together.

Fairness in AI

Once you appreciate that you are responsible for how you use your tools and where you point them, strive to make yourself aware of how your choices affect the rest of humanity. For example, deciding which application to pursue is a choice that affects other people. Think it through.

Another choice you have is which data to use for AI. You should expect better performance on examples that are similar to what your system learned from. If you choose not to use data from people like me, your system is more likely to make a mistake when I show up as your user. It’s your duty to think about the pain you could cause when that happens.

At a bare minimum, I hope you’d have the common sense to check whether the distribution of your user population matches the distribution in your data. For example, if 100% of your training examples come from residents of a single country, but your target users are global…expect a mess.

Fair and Aware

I’ve written a lot of words here, when I could have just told you that most of the research on the topic of bias and fairness in AI is about making sure that your system doesn’t have a disproportionate effect on some group of users relative to other groups. The primary focus of AI ethics is on distribution checks and similar analytics.

The reason I wrote so much is that I want you to do even better. Automated distribution checks go only so far. No one knows a system better than its creators, so if you’re building one, take the time to think about whom your actions will affect and how, and do your best to give those people a voice to guide you through your blind spots.

Join the O’Reilly online learning platform. Get a free trial today and find answers on the fly, or master something new and useful.

Learn more

As Google’s Chief Decision Scientist, Cassie Kozyrkov is passionate about helping everyone — Google, its customers, the world! — make better decisions through harnessing the beauty and power of data. She speaks at conferences and meets with leadership teams to empower decision-makers to transform their industries through AI, machine learning, and analytics. At Google, Cassie has advised more than a hundred teams on statistics and machine learning, working most closely with Research & Machine Intelligence, Google Maps, and Ads & Commerce. She has also personally trained more than 15,000 Googlers (executives, engineers, scientists, and even non-technical staff members) in machine learning, statistics, and data-driven decision-making. Prior to joining Google, Cassie spent a decade working as a data scientist and consultant. She is a leading expert in decision science, with undergraduate studies in statistics and economics (University of Chicago) and graduate studies in statistics, neuroscience, and psychology (Duke University and NCSU). When she’s not working, you’re most likely to find Cassie at the theatre, in an art museum, exploring the world, playing board games, or curled up with a good novel.

Pay Attention to that Human Behind the Curtain (2024)

References