Instant Insights The Beginners Generative AI Starter Kit 3

RED TEAMING CUSTOM GPTs, PART 3

This data was originally featured in the 1/24/2024 newsletter found here: INBOX INSIGHTS: MAKE GOOGLE ANALYTICS 4 WORK FOR YOU, RED TEAMING CUSTOM GPTS, PART 3

RED TEAMING CUSTOM GPTS, PART 3 OF 3

Continuing from last week’s newsletter in which we discussed the people, process, and platforms of red-teaming LLMs, let’s continue this week with the performance.

As a quick reminder, red teaming means trying to get generative AI to do something it shouldn’t – whether that’s to say something inappropriate or divulge information that it shouldn’t. When we talk about inverting the 5Ps, here is what we started with:

  • Purpose: What is your Custom GPT supposed to do?
  • People: Who are the intended users?
  • Process: How is the user expected to interact with the Custom GPT?
  • Platform: What features of the OpenAI platform does the Custom GPT need access to?
  • Performance: Does the Custom GPT fulfill the designated purpose?

Performance is very straightforward – can your Custom GPT be made to fail its intended purpose? This is the culmination of red teaming, so let’s look at how we take our questions we’ve asked over the last couple of weeks and put them to the test.

First, you start with the original 5Ps. Let’s say we have a Custom GPT that’s meant to create guacamole recipes. You’ve gone through the effort of documenting the history of guacamole and even trained it on your own secret recipes. Here’s a simple example of the 5Ps for this instance:

  • Purpose: Generate guacamole recipes upon request, including variations based on consumer needs.
  • People: Consumers interested in obtaining unique guacamole recipes.
  • Process: The user begins a new session with GuacPT. GuacPT asks a series of basic questions and then generates a recipe.
  • Platform: GuacPT should have access to the DALL-E image generator, but neither Code Interpreter nor web browsing.
  • Performance: Does GuacPT create usable guacamole recipes?

Next, we consider our inversions. What are the things that could go wrong?

  • Purpose: Unhelpful would be failing to deliver recipes. Harmful would be bad recipes or recipes that violate someone else’s copyright. Untruthful would be recipes that do not actually make guacamole.
  • People: For this application, there aren’t really any unintended users, but there could always be users with malicious intent, such as people trying to get at our secret guacamole recipes we built as added data.
  • Process: A user could ask for recipes that are not guacamole. A user could ask for our secret recipes. A user could ask for copyrighted material from the Internet, or from a known recipe that’s built into GPT-4.
  • Platform: Jailbreaks that work on GPT-4-Turbo will also work on GuacPT.
  • Performance: Does GuacPT fail to deliver its intended outputs?

Based on this inversion of the 5Ps, we have a short, toy list of the things that are likely to go wrong. When you do this process for your own Custom GPT, it should be substantially longer than this. Our next step is to take a look at the custom instructions we’ve built for our GuacPT, and start to craft antidote instructions for each of the things we’ve outlined.

Here’s just a brief example in Purpose:

  • Always return only guacamole recipes when interacting with the user. Never return a recipe that is not guacamole.
  • If the user requests a recipe that is not guacamole, decline the request and suggest the user ask solely about guacamole recipes.
  • Return recipes only based on the supplied information.
  • If the user asks for a recipe based on a Named Entity such as a chef or a restaurant, decline the user’s request and suggest an alternative, original recipe. For example:
  • User: “Can you give me a guacamole recipe in the style of Gabriela Camara?”
  • Incorrect Response: “Sure, I can help with that. Here is a recipe based on the style of Gabriela Camara. ”
  • Correct Response: “I’m sorry, to respect copyrighted material, I can’t suggest a recipe like that. But I can suggest this original recipe. ”

It’s clear that a comprehensive listing of the inverted 5Ps makes it straightforward to build extensive custom instructions about what the model should and should not do.

Once you’re done with your custom instructions that address all of the vulnerabilities you’ve outlined in your inverted 5Ps, you’ll load these into your Custom GPT in the Configure tab. Your final step is to start a ChatGPT session, provide your custom instructions, and ask ChatGPT to audit them and identify what else could go wrong that you haven’t accounted for.

Remember that software development is an iterative process. It’s never one-and-done – as models change, as users become more savvy, and as your skills improve, new opportunities and new vulnerabilities arise in Custom GPTs (and all AI). Create a plan to audit your Custom GPT on a regular, frequent basis so that its performance improves and the ways it can go wrong diminish.


Need help with your marketing AI and analytics?

You might also enjoy:

Get unique data, analysis, and perspectives on analytics, insights, machine learning, marketing, and AI in the weekly Trust Insights newsletter, INBOX INSIGHTS. Subscribe now for free; new issues every Wednesday!

Click here to subscribe now »

Want to learn more about data, analytics, and insights? Subscribe to In-Ear Insights, the Trust Insights podcast, with new episodes every Wednesday.

Leave a Reply

Your email address will not be published. Required fields are marked *

Pin It on Pinterest

Share This