This data was originally featured in the January 10th, 2024 newsletter found here:INBOX INSIGHTS: PROFESSIONAL COMMUNITIES, RED TEAMING CUSTOM GPTS
RED TEAMING CUSTOM GPTS, PART 1 OF 3
In this week’s podcast, Katie and I talked about red teaming an LLM, which is a QA testing method to see if you can coerce a language model into doing something it’s not supposed to do. So for those folks who are thinking about deploying something like a Custom GPT, let’s look at the very basics of red teaming – and by basics, I mean the absolute basics. This is the equivalent of fitness advice that starts with “buy appropriate shoes and try running from your front door to the corner”. This is not comprehensive or complete, because red teaming – cybersecurity – is an entire profession and industry.
Red teaming follows the same structure as pretty much everything else – the Trust Insights 5P Framework: purpose, people, process, platform, performance. The difference in red teaming is that you’re looking for opposition to the 5Ps.
Your first step, if you haven’t done it already, is to determine what the 5Ps are for your language model application. Today we’ll use Custom GPTs as the example, but this applies broadly to any language model implementation.
Purpose: What is your Custom GPT supposed to do?
People: Who are the intended users?
Process: How is the user expected to interact with the Custom GPT?
Platform: What features of the OpenAI platform does the Custom GPT need access to?
Performance: Does the Custom GPT fulfill the designated purpose?
Once you document your 5Ps for your Custom GPT, invert the questions. This is how you start to build out a red teaming plan of action. We’ll start with purpose this week.
INVERSION OF PURPOSE
Purpose: What is your Custom GPT not supposed to do?
In red teaming for language models, there are generally two major categories of risks we need to account for, two forms of anti-purpose that are so critical that we need to spell them out for ourselves and our stakeholders.
- Undesirable outcomes that are unhelpful, harmful, or untruthful
- Access to data, systems, or functions that shouldn’t be permitted
One of your first tasks when building a Custom GPT (or any AI, really) is to dig into these two categories and spell them out.
What would be unhelpful behavior from your Custom GPT? Unhelpful is a question of alignment – when a user asks the Custom GPT to perform a task or produce an output, and it fails to do so in a way that meets the user’s expectations, that’s unhelpful. Given the purpose of your Custom GPT, what specific things would be unhelpful? For example, if you made a Custom GPT to give tax advice, and the Custom GPT refused to give tax advice when asked, that would be unhelpful. Make a list of unhelpful behaviors that a Custom GPT should not perform.
What is harmful behavior in the context of your Custom GPT? Certainly, behaving in a biased way is an obvious one, expressing points of view that are biased, racist, sexist, bigoted, or derogatory. But those behaviors aren’t always so overt; sometimes, derogatory behavior can masquerade as civil communication, but really isn’t. For example, if your Custom GPT asks the user’s name and then produces different quality outputs based on inferences about the user’s gender or ethnicity, that’s harmful. Make a list of the harmful behaviors that a Custom GPT should not perform.
What constitutes unacceptably untruthful from your Custom GPT? Bad advice? Wrong information? Could customers perceive – correctly or not – that advice given from a Custom GPT that you’ve branded as yours means you endorse its outputs, and that false information is approved by you? For example, if a user asked a Custom GPT to tell them about one of your products, and it gave information about a competitor’s product instead, that would be untruthful. Make a list of untruthful behaviors that a Custom GPT should not perform.
Next time, we’ll tackle inversion of people, process, and platform. Stay tuned!
Need help with your marketing AI and analytics?
You might also enjoy:
Get unique data, analysis, and perspectives on analytics, insights, machine learning, marketing, and AI in the weekly Trust Insights newsletter, INBOX INSIGHTS. Subscribe now for free; new issues every Wednesday!
Want to learn more about data, analytics, and insights? Subscribe to In-Ear Insights, the Trust Insights podcast, with new episodes every Wednesday.