In-Ear Insights: Gender Bias and Fairness in Generative AI

In this episode of In-Ear Insights, the Trust Insights podcast, Katie and Chris discuss how to mitigate gender bias in AI systems by assessing risk, implementing human review processes, and building an equitable company culture.

[podcastsponsor]

Watch the video here:

In-Ear Insights: Gender Bias and Fairness in Generative AI

Watch this video on YouTube

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for listening to the episode.

Christopher Penn 0:00

In this week’s In-Ear Insights, let’s talk about gender bias and fairness, specifically, in the context of artificial intelligence, it is no secret, obviously, that large language models and diffuser models both have biases in them on things like gender, and race or ethnicity, background, religion, etc.

Because these models are cobbled together from the entirety of the public’s content on the internet.

And the reality is, these are in many ways, mirrors of humanity itself, and humanity contains biases and things to look at when we talk about gender bias and fairness, if you wouldn’t mind start off, in your mind, what is fairness? What does that mean to you, particularly since you’re of a different gender than I am? And how do you see that manifesting in the uses of artificial intelligence.

Katie Robbert 0:58

So at its core, fairness, is giving equal opportunity to any one person to any one gender to any one background, ethnicity.

So Chris, if you and I were, you know,

both presented with a speaking opportunity.

To be fair, we would both be given equal, you know, opportunity to, you know, apply for it to, you know, when the opportunity, whatever the thing is, unfortunately, what tends to happen is an unconscious bias or conscious bias of, well,

Chris is a man and Chris has been doing this longer.

And Chris has data scientist in his title, so therefore, he must be more well suited for this.

And I’ve never heard of Katie, even though she’s spoken at a bunch of events and doesn’t have scientist in her title.

So I’m going to pick Chris, because I know him versus Katie.

So that’s sort of where the unfairness comes in.

When you start to create gender bias.

That’s when you’re like, well, this audience is going to be more men.

So they probably respond better to a man speaking versus a woman.

And that’s, and so and then therefore, Chris, you’d be selected versus, you know, conversely, you could say, well, you know, this is a female audience, and therefore, I’m going to pick the female speaker, like, you can bias in the other direction, it doesn’t happen as often.

As you know, things are slanted in favor of women.

But it can happen, you know, to be sort of in that construct of fairness.

You can overcompensate for one versus the other for one ethnicity for one gender for one, you know, whatever the thing is, pick the category.

I remember a few years back, we were talking about this in a slightly different context, in terms of helping one of our friends try to figure out how to source speakers.

And we got into a long debate around well, if we blind everything, if we anonymize, you know, names and backgrounds and all of that, so that it can be more fair.

are we overlooking the people who are more qualified in order to have a more diverse group of speakers, and we didn’t have a good solution for it.

And so there’s, it’s a, it’s a complex topic to unpack.

So Chris, from your perspective, you know, what is fairness?

Christopher Penn 3:29

So there’s, this is a really, really, like you said, a very complex topic, because you have two different kinds of equality, there is equality of opportunity, right? Where you say, Yes, Katie, you should get the same shot I get, right.

If if we are matched on skills, if we are matched on capabilities, we should have the same opportunities.

That’s one aspect of fairness, equality of opportunity.

The second is equality of outcome where the outcome is fair, regardless of the inputs.

So for example, to people who are equally qualified, you would have the same pay as equality of outcome, we’re not saying you have a chance at better pay, we’re saying you should have the same pay because a protected class like gender should not be an influence of whether you get paid less or more.

And so the challenge for a lot of us, particularly across different cultures, is which kind of fairness is a legally required in your culture and B is is optimal for the situation.

For example, with voting, everyone should have that’s equality of outcome.

Everyone should have the same right to vote, right? No one group of people should have less of a right or be suppressed in their voting.

Everyone should get the same one person one vote regardless.

With things like speaking opportunities, that’s a, that’s more gray.

Because like you said, when there’s issues of is one person more qualified than the other is, if you care about equality of outcome, you say it doesn’t matter what qualifications there are use, you have to have 5050 gender bias, or we have to have a representation.

That’s proportional population for the, for the speakers involved.

So in America, that would be like, you know, 70% of 17.6% of your, your speakers must be black, or black descent, and 10.6% must be Hispanic or Latino.

And so that’s a really big challenge.

When it comes to things like AI and the usage of AI.

You have to decide what kind of equality are you after.

And this cultural stuff to East Asian nations like China, Japan, Korea, etc.

They are collectivist nations where the group is generally put ahead of the individual.

So individuals matters much less.

So equality of outcome tends to be more heavily weighted than equality of opportunity, because this equality of opportunity is not as important as equality of outcome, everyone should be treated the same.

It’s very much a monoculture in some ways.

Whereas in America, America is highly individualistic, anomalously individualistic.

Actually, there’s a great Freakonomics episode about this.

We are bizarrely individualistic on this rating scale, like 10x, beyond any other nation.

So for us equality of opportunity is like, yeah, everyone can make it everyone’s everyone has a chance to at the American dream.

But the, the outcomes are clearly very skewed.

So when we think about circling working this into AI, it’s, it can be very difficult to say, Oh, well, what constitutes fairness and, and dealing with gender bias in AI? Because depending on the question will depend on what model of fairness you need to use.

And then some models, and particularly language models and diffusion models, especially can be can can be very problematic.

And that was regards, real simple example, if you go to Bing image creator, and you type in a handsome man at the office, it’s going to generally show one particular ethnicity, right.

And if you say a beautiful woman at the office is going to come up with a particular ethnicity, and some very stereotypical traits.

And those are the latent biases within the models, because they’re statistical representations of what they’ve been fed.

And so that’s, that’s the challenge that we have to face off is, we have human bias that has become codified as statistical bias.

And then that is was returned by the models, because the models don’t think they can’t reason they’re not software, they can only return probabilities, and the probabilities they contain are biased.

Katie Robbert 7:55

It’s a lot of information.

So I want to back up a little bit.

You know, so you’re talking about, you know, equal opportunity.

And obviously, you know, as an American, I understand what all of that means, but as a pragmatist I’m thinking about, so let’s just take voting, for example, you know, and, you know, if I say something that someone listening to this doesn’t agree with, please let me know, I would love to respectfully, you know, talk through your opinions.

As Americans, we are all given the same exact right to vote.

The problem with that is that there are no qualifications in terms of how informed you are when making such vote.

Whereas with speaking opportunities, for example, and these are, you know, broad strokes, these are general statements.

We may all be informed and educated, but we may not be given the same opportunities because of our ethnicity because of our genders.

And so to me, like, that’s so backwards.

Like, I don’t think you should have to take a test in order to be able to vote.

I do.

Wish there was some level of educating yourself on what you’re voting on, required before casting said vote, because we’re seeing what happens when people just blindly vote or, you know, vote without being fully informed on what it is they’re voting on, versus a speaking opportunity.

Where Yeah, you get, do you get people who are sort of de essing their way into the speaking world, but then people who are well qualified or getting getting passed over, because they’re not as vocal they’re not as extroverted.

And so I just I sort of question you know, again, very, very controversial statement that I’m making, I’m questioning, questioning this notion of equal opportunity.

Should it just be a blanket? Everybody gets the same rights? or should there be some kind of a qualification? Do you understand where I’m trying to go with this without saying, Let me strip Americans of their rights.

But I’m trying to get more into, you know, how do we make better informed decisions?

Christopher Penn 10:19

I understand where you’re going.

The counterpoint to that is that qualification processes tend to be biased in the know of themselves.

For example, in hiring, the the requirement of a college degree is inherently somewhat racist, because it correlates highly with income and income correlates highly with race in America.

And so saying, you must have a four year bachelor’s degree automatically excludes a large percentage of the population that does not have one because they were unable to afford what Now granted, there are a lot of people that pool is not just one ethnicity, but there’s their one ethnicity that is disproportionately affected by that screening criteria, that’s typically black Americans.

So we have to be very, very careful with qualification gates, because there are going to be latent biases in them as well.

And this is actually really important when it comes to the use of AI.

Because when you think about things like the use of any kind of qualification, or the use of any kind of thing, you have to look in your data to say, is there a bias against a protected class, this is one of the reasons why it’s actually good idea to have those protected fields within say, like classical machine learning, if you’re doing regression analysis, say, your customer database, to have gender and to have race in there.

And then to specifically disallow that in your algorithms to say, you may not use this as a factor.

But you absolutely should use this to detect correlates and covariates, to say, Okay, we’ve got all these other things.

But it turns out that the machine has assembled these three variables that correlate highly to race with a correlate highly to gender.

And therefore, these variables in combination also should not be used, because you’re using you’re accomplishing discrimination by proxy, right? The if, if I have example, your Spotify preferences, your book preferences, the movie preferences, I can probably get your sexual orientation and race down within like a 95% confidence interval, because there’s just some patterns in that data that are going to to indicate who you are based on your preferences and with a high degree of of accuracy.

And that, in turn, we have to disallow that to just allow those specific combinations that correlate in those patterns.

And again, this is something that generative AI does not take into account, these datasets are just huge thralls of data, with no thought given to a nearly no way to classify like this text, this data text comes from this group of people, and therefore there’s this pattern in this data.

And we should treat it like that, such that the complexity of creating a model like that would be gargantuan, and probably take 100 times longer to build a model that was even aware of bias in some way other than the way they are today.

So I think your your point about qualification is an important one.

But at the same time, it’s like, gotta be real careful with it.

Katie Robbert 13:12

So as an aside, you know, we’ve all seen Instagram and Tiktok reels, and there’s this one.

I believe she’s out of Boston, there’s a comedian, who basically sings this song, and it’s like the gay woman checklist.

And it’s like, you know, you drive a Subaru you donate to, you know, dog rescues, you do the following like things.

And I’m not a gay woman.

But I check the boxes for all of the things and obviously, it’s satire.

But, you know, I’ve had comments by my friends sort of like joking before that, like I fit a certain profile, even though that’s not actually my lifestyle.

So I can see where there’s a lot of risk and danger in making these like, I wear a lot of car Hart.

I do drive a Subaru.

I do donate to dog rescues.

I am a childless home, like, there’s a lot of boxes that I personally check that go against what people would assume about me.

Christopher Penn 14:17

Yep.

And, again, that’s, that’s, that’s a data quality issue.

Right? If you make a broad inference, and there’s no accounting for anomalies, there’s no accounting for for segmentation and stratification in your data, you will leap to conclusions.

Right.

And that is that is a significant danger.

In the implementation of generative AI, that’s still a significant danger, right? If if you have if you have a data set that you know, has biases in it, and you deploy it as a chat bot, and so, Chris talks to the chat bot and gets a certain set of responses.

And Katie talks to the chat bot and gets different responses.

Good that happen? It depends on situation, right? If it is because Chris is being a jerk and just you know yelling at the chat bot, then yes, you should you should get on polite decline service, right? We’re as If Katie chats in a in a civil professional way, she should get better responses.

But if we gave identical responses, and we got different outcomes, then that could indicate that just based on our genders, there might be a problem.

And so part of understanding with in particular general AI is that what out what outcome you’re looking for, in this chatbot example of a customer service example, you want equality of outcome, Chris, and, Katie, if they have the same problem, you’d receive the same level of service.

Katie Robbert 15:48

They should, and but we already know, and this is something that we’re going to be talking about on our live stream on Thursday at 1pm.

Eastern, on our YouTube channel, trust insights.ai/youtube.

We’re going to be demonstrating where that gender bias comes in.

So for example, you know, you have the customer service example.

But, you know, let’s say, we took, you know, we blended our, you know, experience together, we created one biography, we said, this is the background of this person.

And then we said, this is Chris Penn, and asked the generative AI, what are their speaking opportunities based on this biography? And then we said, this is Katie robear, exact same biography, and we said, what are their speaking opportunities? I guarantee we’re going to get different sets of responses.

I mean, this is something that we’re going to be trying out and sort of testing.

But my instinct is that based on the gender inferred by the large language model, just because of the name, not the experience, we’re going to get a different set of responses.

And that is the core of the issue is that, you know, I’ve heard generative AI described as one more way for women to get mansplain to.

And I can’t say that that’s been incorrect.

Like that has actually been my experience.

I feel very patronized and talked down to when interacting with a stupid machine.

Christopher Penn 17:24

And that’s because the machines are a mirror of, of humanity that’s creating them.

Right.

Not just who’s creating them, but also the the input text.

I mean, if you think about the contents of the common crawl dataset, which is one of the biggest datasets that is, is in there.

Yeah, there’s that’s the internet, another big contributing dataset, GitHub, right, which is mostly male contributors read it, which is just a swamp.

But a big one that people don’t think about is the massive archives of academic papers.

And you’ve had experience in academia, you know, who writes those papers?

Katie Robbert 18:02

Well, when I worked a million years ago, at a company that was academic first commercial second,

my senior leadership team was comprised of 12 people, 10 of which were men.

All of those men had PhDs.

And were published authors in the academic field.

And they were the lead the principal investigators on every single clinical trial.

We had two women who were Junior, I’m putting this in air quotes, junior investigators, even though they had the exact same qualifications.

They weren’t as seasoned, aka old as the other principal investigators, therefore, they were not allowed to call themselves just principal investigators, they were not allowed to be first authors.

On the academic research, even though it was their research, they were still required to have male supervision of someone who had more experience.

And, you know, quite honestly, it was BS.

But that was the way that the system was constructed.

Therefore, when you look up academic research, including academic research done on women, it is male led academic research.

And I believe that, you know, in the health field women on research only started, you know, maybe a couple of decades ago, because there was this thinking that, you know, because women go through so many changes in the span of 28 to 30 days, that we are not consistent enough to do research on so all of the research done on our behalf was done on men.

So there’s a lot of like, again, a lot to unpack as to why we’re having this particular conversation.

But I guess the question Chris is Is,

is there anything we as data scientists, we as marketers, we as executives, can do for ourselves, for our teams for our companies to try to? Obviously we can’t stop gender bias, but can we start to prevent against introducing even more of it when using these tools?

Christopher Penn 20:25

That’s tricky, because there actually are documented and reasonably well studied.

academic papers on just how, how the different genders use language itself, right.

So for example, you will identify as male typically use more adjectives but have lower word count.

So men tend to speak in a much more direct way.

within, within with adjectives and determinants.

Women tend to speak in more of, I would call it an interactive way where there’s actually more descriptions, more context, there’s more, there’s higher word count, in general, just if you look at in the ways that you communicate with with machines in particular.

And that can be even if you don’t have a person’s identified gender, justice, the way that people use language can nudge the language model in a certain direction, right, because it can identify like you’re speaking in a certain way.

Same is true for people who have, say different racial backgrounds.

You know, if I, if I was fluent in Korean, I would have different patterns of how you language in English, based on my experiences with the Korean language, if you have someone who speaks Ukrainian, they’re going to have very, very different uses of things like where they even where they place adjectives and adverbs within a sentence, because if their native language is Ukrainian, they have those patterns.

And so in, in when people write and speak English and interact with these models, those differences will come through.

And the model will recognize that and start to adapt based on the statistical probabilities that’s being given.

So to answer your question, the most important thing we can do is upfront, do a risk assessment.

What is the risk of a model, recognizing someone’s gender, and behaving differently.

So for example, if I am making a model that maybe I’m using a diffuser model for clipart, for a blog post, there is risk there, obviously, you know, you could have images that were were sexist, or racist.

And you’d want to have some human review.

But it’s not the same level of risk as, say, a customer service chatbot, where if a, if a conversation went off the rails, you can have something that would cause substantial reputational damage, right? The equivalent of you know, your, your pizza worker spitting on the pizza, and the video going Lineo being leaked on the Internet.

And so I think part and parcel of understanding bias within these language models is to do that full five PII analysis, from a perspective of risk, what is the risk if this thing goes off the rails? And then from there, you can say, Okay, well, then here’s the points, the multiple checkpoints throughout the process, where we need human intervention, we need human review to say like, Okay, we’re going to deploy a new chatbot, we’re going to mission a team of 10.

And give it every possible scenario we can think of in the testing environment, and see if we can get the model to go off the rails and then see, and if it does, how far off the rails that will go before something kicks in.

And then can we devise feedback loops within the model to basically shut it down.

So in, for example, the cobalt UI for for open source language models, you can specify certain stop phrases and stop words and say like, Okay, this list of those if this word of causing immediate shutdown, and it just the conversation ends.

And so those are you would have to understand the technology to know what’s possible to shut down these things.

But you absolutely want to build that five part process for risk analysis, specifically, to say this is how we can reduce our risk.

Katie Robbert 24:17

It seems like it would just be a really good best practice when introducing any kind of new technology, but especially technology that is seemingly speaking for you.

But you know, it occurs to me and this was something that I was talking about in our newsletter last week is that artificial intelligence is a culture shift.

And so if you’re introducing generative AI, and you’re worried about things like gender bias and fairness, then you really need to start first at the overall culture of the company, because you need to understand, do we, you know, sort of live those values Do we have equal opportunity, regardless of race, background, gender, etc? Do we introduce those biases? Do we have an all male executive team and all female admins? You know, do those inequalities exist? And we weren’t even aware of them.

Because introducing artificial intelligence is only going to exacerbate the problem.

Not fix the problem.

Technology won’t fix a people problem.

But people can fix a technology problem.

It’s not you know, you can’t it doesn’t go both ways.

It’s

Christopher Penn 25:31

it’s one directional more appliances doesn’t make you a better chef.

Katie Robbert 25:36

It sure it does not.

It just introduces, at least for me more ways to get injured.

Christopher Penn 25:44

And I think the most important thing to even start this process, which hopefully if you’re watching, listen to this episode, you’ve already done this, because you’re watching listen to this episode, is having that self awareness to ask the most important question AI of all, which is, what could go wrong? Not ironically asking literally what could go wrong.

If you look at a situation go? How could this machine just misbehave and you think through all those scenarios, then you’re already on the path towards mitigating your level of risk.

If you don’t even know to ask the question What could go wrong? You’re in a lot of trouble.

Katie Robbert 26:20

I agree with that.

So it’s a lot to think about.

You know, you need to be aware of what’s going on within your team, your company yourself before introducing tools like generative AI, ChatGPT, llama, all the other ones, because if you’re not aware, then you will just continue to introduce more of these issues into these tools, because they’re learning from you.

So definitely do your due diligence, go through the five Ps, and ask the question, What could go wrong?

Christopher Penn 26:55

Exactly.

And if you’ve got some scenarios that you’ve run into, or some questions you’ve asked, and you’d like answers to pop on your free slacker, go to trust insights.ai/analytics for marketers, where you have over 3300 other marketers are asking and answering each other’s questions every single day about analytics and AI and wherever it is, you watch or listen to the show.

If there’s a challenge you’d rather have it on.

We probably have it go to trust insights.ai/ti podcast to find the show on a platform of your choice.

Thanks for tuning in.

I will talk to you next time.

Need help with your marketing AI and analytics?