Katie and Chris answer your marketing, data, and AI questions every Monday.
This week, Marion asks, “How do you scan the data coming out of ChatGPT so it is not skewed politically, religiously, racially, etc?”
Tune in every Monday to get your question answered!
Subscribe to our weekly newsletter to catch past episodes at trustinsights.ai/newsletter
Katie Robbert 0:00
Welcome back to another mailbag Monday where Chris and I are tackling all of your questions each week. So this week, Chris, what question are we handling?
Christopher Penn 0:09
We have an interesting one from the recent martec Coffee talks from Mary and asking, how do you plan on screening the data that comes out of a system like ChatGPT? So that is not skewed, politically, religiously, racially, etc?
Katie Robbert 0:22
Oh, man. That’s an excellent question, Chris. This is something I feel like you’ve spent a lot of time researching and thinking about so where would where would Marian start with this?
Christopher Penn 0:39
The short answer is you can’t.
Katie Robbert 0:42
I was afraid that was gonna be the answer. And before I said, I wanted to make sure that was really true.
Christopher Penn 0:47
The first place you need to start is to understand what biases exist in the model. Right, so OpenAI has published this in their technical documentation, they’ve said, for example, we know that European names are our favorite higher positive sentiment than African American names, and that there are negative biases against African American female names within the database as an example. But here’s the here’s the thing, the platform the technology is the wrong place. To be looking at this yes, you want to make sure that a data set that’s been used to train it is free of bias as possible. But even in the technical paper for GPT-4. They said, there’s some unusual trade offs to get more fair answers. They had to accept increased bias, they tried to reduce bias to zero and started creating unfair answers fan answers that were, you know, slight, outright slanderous. And so there’s something in the way that we use language as human beings. That is, this was our trade off. So I want to go back to the sort of the five P’s right. But model GPT-4. Bard, whoever is the platform that the fourth peon platform, if you expect the model to, to do all this stuff, and to not spit out things racially biased, you’re essentially saying you’ve given up on the third beat, which is process. The model is just a piece of software, just like Microsoft Word, Microsoft Word, will create whatever you type into it, but the it’s the person doing the typing, and then the process for requirements gathering up front, and then the process for how you handle the output, that is going to reduce or mitigate those biases. And you need to have QA processes in place to be able to do that. Because if you don’t, then yeah, you’re gonna get what you get out of the model. But that’s kind of like throwing darts.
Katie Robbert 2:36
So in that instance, do you think that it’s better to build your own model with your own data versus relying on something like a ChatGPT? Where you can’t be sure, you know, what that you know, sort of inside black box looks like,
Christopher Penn 2:54
if you have the technical capabilities to do so. Yes. But that’s not a that’s not a small undertaking. And you hit the hardest part of fine tuning is actually system monitoring to see like, Okay, well, what is happening? Oh, you need some kind of reinforcement feedback loop in place to be able to say, Okay, this was an appropriate response, this is not an appropriate response. And during the QA on the model, is actually the hardest part, because you’re gonna have it spit out 10,000 responses? And then guess what? You got to go through all 10,000? give it thumbs up, thumbs down?
Katie Robbert 3:30
Yeah, it’s, I think this is going to be, you know, the conversation around the information that comes out of these kinds of models is only going to get bigger and more complicated. You know, one of the famous examples, Chris, that you cite is the Amazon example, where they gave, you know, the Amazon, you know, data science models, all of their hiring, historical hiring information, to help find, you know, the right candidates for people who are applying. And what they found out was that there was a strong gender bias in the model that they weren’t fully aware of, they may have been aware of it, maybe thought the model wasn’t aware of it, whatever the situation. And what ended up happening was the model, using the historical hiring information, developed a bias to only bring in male candidates as opposed to female candidates. And Amazon was able to identify this, but it’s because it was their data. Now, when you’re looking so if Chris and I were to go into Amazon and go, Oh, this looks like this, but we don’t know Amazon data that well, you know, we’re at a disadvantage. And so when you’re using systems like OpenAI, you know, I don’t think they’re going out of their way to give you, you know, biased and unethical data. But that’s just sort of the reality of grabbing these datasets from all over the internet.
Christopher Penn 4:56
And, in particular, with language model Also, there are, there’s their grades of bias, right? So I’ll give you a real simple example. If a model spit out, you know, something talking to me and said, Hey, you goofed. Okay, clearly, that is a a racial slur. Clearly, that is not supposed to be there. That’s the flag, right? If a model said, Hey, let’s double click on that, that is a corporate term for a certain age group, right that other age groups, it’s not in their lingo. So the model is speaking to people in a way that is targeted at one specific age group. That is a much more subtle form of bias, right? If you look at the way that people use language, based on gender, there are differences in how different genders as different genders and different cultures use language. So is the model speaking in a way that is appropriate to the audience that speaking to? These are all things that have to be in that process part of the five P’s and defined clearly as requirements? Like yes, if we were speaking to somebody who is of a certain age, your language should not be substantially different than speaking to somebody from a different age group, right? If you’re say, using it for a customer service application, it should not treat people different. Even if they are using language that is specific to them, and their age group. Right. Now, my dad will say things that hold your horses, Charlie Brown, he’ll say, unlike that is, that is an expression that nobody under 60 says,
Katie Robbert 6:35
I mean, I say hold your horses, and I’m under 60. But no, and, and I think you know, Chris, to your point around the process, part of it, it’s not enough to say in your prompt, you know, speak to me like I’m a woman, or speak to me, like I’m a man, that is the absolute wrong way to be approaching to make sure that you are cutting out the bias, because all you’re doing is introducing an additional layer of bias into the output that you’re likely to get. And so in the background section of that prompt engineering, which you can find on trust, insights.ai, in our insights, that’s where you want to include all of those things to take out. So for example, just a really terrible example. You know, because I’m a woman, you know, I could, you know, program this and say, every time I talk to ChatGPT, I want it to respond with Hey, girlfriend, or hey, bitch, you know, that’s a terrible example. But if I were if Chris were then like, why is it suddenly calling me girlfriend? Like, actually, that’d be kind of funny. I think I need to figure out how to do this now. But, you know, you don’t want to introduce bias by saying, I am a woman speak to me like I’m a woman. That’s the absolute backwards way to reduce bias in these kinds of datasets, or in general. Yes, but the bottom line is,
Christopher Penn 8:07
the bottom line here is, and I think Marion’s asking the right kinds of questions. The bottom line is, you should always be asking question, What could go wrong? What could go wrong? As we’re deploying this thing? What are the things? How could this thing go off the rails? And then do we have sufficient protections in the process portion of our five P’s to to reduce the likelihood and the impact of when these things go wrong? I think it’s a great question and an interesting question. If you’ve got questions that you’d like to ask for a mailbag Monday, pop on over to our free slack have go to trust insights.ai/analytics For markers, where you over 3000 other marketers asking questions like this every single day. If you want to catch up on previous episodes, go to trust insights.ai/newsletter subscribe there, you’ll get the updates every week. Thanks for tuning in. We’ll talk to you next time.
Transcribed by https://otter.ai
Need help with your marketing data and analytics?
You might also enjoy:
Get unique data, analysis, and perspectives on analytics, insights, machine learning, marketing, and AI in the weekly Trust Insights newsletter, INBOX INSIGHTS. Subscribe now for free; new issues every Wednesday!
Want to learn more about data, analytics, and insights? Subscribe to In-Ear Insights, the Trust Insights podcast, with new 10-minute or less episodes every week.