What All Data Scientists and Business Leaders Need to Know about Data Ethics and AI Bias


A necessary step in our ongoing national reckoning on systemic racism involves looking beneath the surface, down to the very data underpinning everything we do, from shopping online to applying for jobs. With algorithms, data, and artificial intelligence touching nearly every aspect of our lives today, technologists have struggled to purge these systems of deep-seated biases, which rarely account for historic social injustices that disproportionately affect certain groups. 

Avriel Epps-DarlingAddressing that AI bias, as well as other pressing issues around data ethics, were the focus of a recent guest lecture by Avriel Epps-Darling, a computational social scientist and PhD student in Human Development at the Harvard Graduate School of Education, who is a highly sought-after expert and author on these topics.

Epps-Darling recently spoke to students at one of Correlation-One’s data science training programs to help them understand new ways of thinking about how algorithms can be used to promote social equity, rather than contributing to the problem.

“Algorithmic bias refers to what we call computational discrimination or disparate impact, whereby unfair outcomes privilege one arbitrary group of people over another in ways that compound existing marginalization in society more broadly,” Epps-Darling said.

Of course, algorithms don’t manifest out of thin air. To function, they’re invariably fed data sets, which are subject to the biases of human programmers.

“Nothing about data science is value neutral. Each deployed algorithm is imbued with the values of programmers, the organizations those programmers work for, institutions, culture, and history. As such, they all take on the various biases in society,” she said.

No matter their political beliefs, social class, or personal history, everyone has biases that serve as a lens through which they view the world. When it comes to humans, identifying and pointing out those biases that unfairly affect someone based on race, gender, sexual orientation, or any other identifying factor is relatively straightforward. But when it’s an algorithm or a piece of software doing the discriminating, the issue becomes more complex.

“Models, engineers, algorithms, data sets, and even mathematical formulas don’t exist in a vacuum. They are born from and exist to serve people in the real world. A mistake many people make is to assume that because computers are not living and breathing that they are somehow infallible, and we know this is not the case.” 

Further complicating matters is that no website or app is run by a single algorithm, they’re all complex, multifaceted networks of software programs and data inputs and outputs that have many different weights across many different layers that determine a single end-user’s experience. As a result, these systems are frequently plagued by inherent biases, “which give computers the power to criminalize, deny access to resources, or otherwise inconvenience people due to their skin color, gender, age, or ability status,” Epps-Darling said.

One area Epps-Darling cited this algorithm bias frequently playing out is when it comes to evaluating job candidates.

“The algorithm has race and gender built into it. Race and gender are two variables being used to make some kind of prediction about whether this person is going to be a good candidate, or the algorithm learns the race and gender differential through users, and then perhaps reflects an underlying social, societal prejudice.”

Of course, it’s illegal to discriminate against any potential hire on the basis of race, gender, age, or sexual orientation. But while a human decision-maker can be held accountable for biased hiring practices, an algorithm cannot.

“In complex algorithmic decision making, many times we don’t know which features were used, and there’s usually no human making a conscious effort to discriminate against individuals, so the question becomes, can we hold a machine learning model responsible in a court of law. What would that even look like?”

The question then becomes who that can be held accountable is most directly responsible for an algorithm’s biases.

“Is it the engineers? The product managers? The corporation’s executives? The shareholders? It’s really kind of up in the air, which is why these things are so difficult to litigate,” Epps-Darling noted.

The necessity for human judgment throughout the development process opens the door for their biases, whether conscious or unconscious, to be programmed into the system.

“Unfortunately there are opportunities for our own prejudices, and through those, society’s prejudices against marginalized communities, and sometimes even if we are members of those marginalized communities, to be incorporated into algorithmic systems at virtually every stage in the development process and the machine learning development process,” she said.

Rather than despairing at the way things are, Epps-Darling encouraged the audience to view their path to careers in data science as an opportunity to effect change and up-end the status quo.

“Never is the development process a simple plug-and-play of predefined actions. It requires expertise, it requires value-laden decisions about what's appropriate, what's feasible, what's worthy of resources, our time, our energy and our money. What's worth sacrificing—because you're always going to have to make sacrifices when you're building models—and what's the most desirable outcome,” she said.

“It's not always just accuracy. Sometimes we have to complicate our desirable outcome a little bit more than that. In other words, understanding algorithmic bias should not disempower you. It should leave you feeling really agentive, because we are the experts, and we get to make these decisions. If you don't feel like an expert just yet, you will soon, and you're going to use your expertise to make all of these really complex value-laden decisions. That gives you a lot of responsibility, but also gives you a unique opportunity to do a lot of good with predictive models.”

In order to make the decisions, and do good through data science, Epps-Darling emphasized, “we have to start by bringing all of our pre-existing ideas about this topic to the forefront.”