Let’s Talk about Bias: LGBTQ+ People and AI

In a guest essay regarding LGBTQ+ people and AI, Scott DeGeest explores seven domains of algorithmic bias and harm.

LBBTQ+ and AI Data Bias | Scott DeGeest | Correlation One

Algorithmic bias harms many underrepresented groups of people, and LGBTQ+ people are no exception. Alas, too few people are cognizant of the specific issues LGBTQ+ people face — and it’s time to change that. To succeed, however, we first must understand the nature of the problem. 

In 2021, DeepMind, the British artificial intelligence subsidiary of Alphabet Inc, published an insightful paper highlighting the dearth of research on how algorithmic bias harms LGBTQ+ people. 

With that review as a touchstone, below I will summarize and expand upon some of its key points. 

The goal here is to:

  • Widen general understanding of how and why LGBTQ+ people experience algorithmic bias and harm
  • Inform individuals (including allies) interested in fostering diversity in technology about the challenges LGBTQ+ people face
  • Educate new and experienced data analysts, scientists, and engineers on the topic so that they can help address algorithmic harm in their workplaces

Note, however, that valuating algorithmic harm based on sexual orientation and gender identity poses its own unique challenges. 

For example, in order to evaluate fairness, most definitions of algorithmic bias focus attention on observed characteristics, or characteristics present and readily available in extant datasets. 

When such information is not readily available, we refer to them as unobserved characteristics, or aspects of people’s identity that are frequently missing, unknown, or fundamentally unmeasurable. 

Thus, sexual orientation and gender identity are prototypical unobserved characteristics that require approaches to algorithmic fairness that discard assumptions about observable characteristics. 



LGBTQ Algorithmic Bias


7 Domains of Algorithmic Bias and Harm to LGBTQ+ People

The domains (referred to as "case studies" in the original DeepMind report) below highlight contexts where the unobserved characteristics of sexual orientation and gender identity interact with AI. 

These overlapping domains are: 

  1. Privacy
  2. Censorship
  3. Language
  4. Dealing with online abuse
  5. Health outcomes
  6. Mental health access
  7. Employment

In each domain, we’ll examine how and where harm occurs and how AI perpetuates that harm. 

Domain: Privacy


One reason that sexual orientation and gender identity are frequently unobserved characteristics is that there are substantial consequences to being outed (or, having one’s sexual orientation or gender identity disclosed without one’s consent). 

Being outed creates emotional distress. In addition, it creates risks of serious physical and social harm in contexts where such identities are openly discriminated against, criminalized, or persecuted.

Algorithmic harm 

LGBTQ+ people face multiple potential vectors of harm in this domain. 

These vectors stem from:

  1. Categorization, or predicting LGBTQ+ identities based upon sensitive data
  2. Surveillance
  3. Invasions of LGBTQ+-friendly safe spaces

One notorious example of categorization risk occurred in 2017 when a now-debunked research paper claimed to have trained an AI to identify LGBTQ+ people based solely on a person’s profile picture. 

Other attempts have been made to identify LGBTQ+ people through genetic data or behavioral data as well. 

The capture of online behavioral data to determine LGBTQ+ identity bleeds into secondary and tertiary concerns associated with privacy and the disruption of LGBTQ+-friendly spaces. 

Organizations seeking to target LGBTQ+ people as a profitable consumer bloc have regularly used these kinds of online data in the pursuit of revenue and profits. These efforts often capture sensitive information, creating surveillance and outing risks that disrupt LGBTQ+ people’s privacy. 

Domain: Censorship


LGBTQ+ people face unjust, injurious restrictions on their freedom of expression and speech around the world from individuals, groups, institutions, and governments in both physical and digital spaces. 

Often, the entities pushing this censorship falsely justify such censorship as ”preserving decency” and “protecting the youth.” More often, these claims are smokescreens for laws and policies that infringe on human rights and erase LGBTQ+ people from public discourse. 

Algorithmic harm 

People and organizations that produce media and content with LGBTQ+ representation report that automated content moderation restricts and removes their content with staggering regularity

Though these tools could combat censorship and its associated harms, they are far more often abused to enforce discriminatory and harmful anti-LGBTQ+ censorship laws. 

Domain: Language


From homophobic pejoratives to rejections of gender-affirming pronouns, there is a long history of oppressive language practices used to dehumanize, demean, and inflict harm upon LGBTQ+ people. 

These harms highlight the value of equitable, inclusive language models and the potential of natural language processing to help LGBTQ+ people. Historical precedent makes clear that people who create AI systems must pay keen attention to changes in natural language in order to avoid perpetuating harm.

Algorithmic harm 

Biases like homophobic slurs and abusive speech patterns persist in almost all texts used for building natural language processing models. When training data is rife with homophobic speech, incidents like a chatbot using homophobic slurs in social media are inevitable. To avoid such problems, AI researchers must develop more effective fairness frameworks around LGBTQ+-inclusive language. 


LGBTQ Censorship


Domain: Dealing with Online Abuse


One benefit of the advent of online platforms has been how marginalized groups have used them to participate in community building to build support. Unfortunately, pervasive online abuse remains a problem for LGBTQ+ people

Algorithmic harm

Automated systems for moderating online abuse frequently fail to protect LGBTQ+ people. 

For example, drag queens and other LGBTQ+ people will use provocative, mock impoliteness as a tool to cope with hostility; however, a recent study demonstrated that an existing toxicity detection system would routinely consider such language offensive enough to warrant censure

Such risks are compounded for LGBTQ+ people of color who experience disproportionate exposure to online abuse.

Domain: Health Outcomes 


As a consequence of discrimination, LGBTQ+ people experience disproportionately unequal healthcare outcomes. Issues include disproportionate impacts from HIV as well as a higher incidence of sexually transmitted infections and substance abuse. 

The difficulties of LGBT+ people in getting access to care compounds these problems.  

Algorithmic harm

The push for advances in AI in healthcare has led to a risk of perpetuating these inequalities. 

The frequent absence of information about sexual orientation and gender identity in the datasets used for building healthcare AI tools will lead to problematic downstream consequences. 

For example, because cisgender patients provide most anonymized health data, information on trans patients remains comparatively rare. This scarcity leads to adverse impacts on model validity due to issues with interactions between hormonal treatments and other health issues that trans patients experience. 

Domain: Mental Health


Because of prejudice, stigmatization and discrimination, LGBTQ+ people experience acute mental health problems. 

In addition, LGBTQ+ people often face additional barriers in asking for help and accessing treatment. A recent survey from the Trevor Project clarified the frequency of risk: Over 40% of the respondents had seriously considered attempting suicide in the preceding 12 months.

Algorithmic harm 

Advances in the automation of intervention decisions and mental health diagnoses may prove a boon to some; however, they pose risks for LGBTQ+ people. 

While AI systems can aid mental health workers in identifying and reaching out to at-risk individuals, these models can be misused to expose and exploit the LGBTQ+ people they were supposed to support, with these systems shutting people out of employment opportunities due to their medical history or charging disproportionately higher health insurance premiums.

Domain: Employment


LGBTQ+ people face frequent workplace discrimination. These issues interfere with these employees’ engagement, development and wellbeing such that nearly 50% of LGBTQ+ adults surveyed in a 2021 Williams Institute report experiencing some kind of employment discrimination in their careers. 

Algorithmic harm 

Past research with hiring processes has shown that resumes with items that signal LGBTQ+ identities receive substantially lower quality scores than resumes of comparable quality. 

Unfortunately, resume-parsing machine learning models readily learn and reproduce such patterns. As such, machine-learning-based decision-making systems developed using historical data will assign lower scores to LGBTQ+ candidates based on these historical biases. 


LGBTQ+ Data Bias and Algorithmic Harm  Correlation One


The Takeaway: LGBTQ+ and AI

LGBTQ+ people have faced and surmounted many historical challenges. They continue to face and resist oppression that occurs in physical spaces around the world. 

The trend toward the enmeshment of digital spaces into other spaces via AI has drawn, along with it, those historical challenges and oppression into the lives of LGBTQ+ people. The increasing use of AI represents much promise in creating opportunities for LGBTQ+ people to carve out new lives while also posing a risk of reconstituting and automating current bias and harm against them. 

Knowledge of these issues and better understanding these risks can enable us to be better citizens in the community of data professionals. 

At the same time, leaders at forward-thinking, data-driven companies can advance work in each of the seven domains above by taking action to recruit, hire, and retain LGBTQ+ data talent. Diversification of this talent will not only help protect LGBTQ+ people from harm but also open up new ways of thinking and working with data. 

As Tammy Duckworth, the first openly gay person to become a U.S. Senator, once said, “There will not be a magic day when we wake up and it’s now okay to express ourselves publicly. We make that day by doing things publicly until it’s simply the way things are.” 

Her point here reflects the idea that in order to achieve more equity, we must take action in the public sphere to embrace and accept LGBTQ+ people. 

The same logic – that we must take action to embrace and accept LGBTQ+ people, also applies to LGBTQ+ people working in data and AI. 

Explore More

Want to make the data and analytics space more equitable and elevate emerging talent from underrepresented groups, including LGBTQ+ people? Become a DS4A Employer Partner.

Guest blogger Scott DeGeest, MBA, Ph.D. is a Community Advocate for Correlation One’s DS4A program and recipient of a Correlation One teaching award. He also works as a Lead Computational Social Scientist with Interos, a tech company specializing in AI-powered supply chain risk management.

Publish date: June 28, 2022