So What? Pre-requisites for Large Language Model AI Fine-Tuning

So What? Marketing Analytics and Insights Live

airs every Thursday at 1 pm EST.

You can watch on YouTube Live. Be sure to subscribe and follow so you never miss an episode!

In this week’s episode of So What? we focus on Pre-requisites for Large Language Model AI Fine-Tuning. We walk through what kinds of data you’ll need, what kind of data governance you’ll need in place and how to generate the data necessary for fine tuning. Catch the replay here:

So What? Pre-requisites for Large Language Model AI Fine-Tuning

Watch this video on YouTube

In this episode you’ll learn:

What kinds of data you’ll need
What kind of data governance you’ll need in place
How to generate the data necessary for fine tuning

Upcoming Episodes:

Have a question or topic you’d like to see us cover? Reach out here: https://www.trustinsights.ai/resources/so-what-the-marketing-analytics-and-insights-show/

AI-Generated Transcript:

Katie Robbert 0:27
Well, hey, how are you everyone? Happy Thursday. Welcome to so what the marketing analytics and insights live show. I’m Katie joined by Chris and John and John is, believe it or not not sitting in front of a fake background, John, that is a real background, not a zoom background where the heck are you today?

John Wall 0:45
This is this is the winner of the most annoying zoom background contest that we had last week. I am a framework the virtual workspace in Pittsfield Massachusetts, they actually part of the Berkshires here you can see the Berkshires make tables turn, which I always thought that was what the bad guy does. I thought the bad guy always turns the tables on you. But what do I know I have a ridiculous background. Here I am.

Katie Robbert 1:09
Well, we are we’re happy to have you all three of us in one place for the first time in a few weeks, which again, will change again next week. But while we’re here, today, we are talking about prerequisites for large language model, AI fine tuning, which is super easy and rolls right off the tongue. But essentially, what we’re doing today is we are going to look at a large language model like Loma two, and tune it to specific use cases and needs. You know, it’s an interesting topic to be covering. Because once again, there’s a lot of things that go into the prerequisites upfront, before you even get into the model. And I think a lot of the questions, Chris, that you’re getting, especially as you’re heading out to events, and speaking about large language models is okay, how do I build it? What do I do? Like? How do I make sure that it you know, works exactly for me and what I need? So let’s start to get into that. So where would you like to start today?

Christopher Penn 2:08
Okay, let’s start with let’s start with the the strategic approaches, the general approaches, first, talk about the process, and then we can demo a little bit of it. But obviously, because some of these training processes can take days, we’re not going to do it on the show, it’s gonna be like a cooking show that guy. Uh huh. And the turkey has fully cooked

Katie Robbert 2:27
magic.

Christopher Penn 2:29
So when we’re talking about fine tuning, we are specifically talking about taking a large language model of your choice. And saying, I want you to work with my data, I want you to, I want you to give answers that are relevant to me. So for example, if you were, if you were a company, and you had, say, a really big HR handbook online, and you wanted a language model that could answer employee questions, that’s a really good example of when you would want to find a some kind of fine tuned model where you could say, this model is designed now it’s gonna be it’s gonna, we’re gonna force it to statistically focus on our employee handbook, right? And our benefits and all this stuff. So the model will, it’s kind of like dogs, right? If you train a bomb sniffing dog, it can’t become a drug sniffing dog, cuz it’s brains. Now we tune these models are exactly the same thing. When you say, Well, this is going to be the HR model. Now. It can’t tell jokes anymore. It can’t right limericks, it won’t, you know, won’t write a screenplay for your next big thing. But it will answer those questions really well, because you have forced it to be focused and give extra weight to your stuff. So that’s conceptually what we’re trying to do.

Katie Robbert 3:41
Okay, and so I’m, I’m an executive, I come to you and I say, Chris, my team needs a large language model. I don’t want to use what ChatGPT has publicly. I want to build my own. That sounds exactly like me, I want, you know, the Katie bought, you know, 900, so that I don’t ever have to sit down and write a blog post ever again. That’s what we’re talking about. Right?

Christopher Penn 4:12
Exactly. We’re talking about getting a model to be to be purpose built for a very specific use case. Okay.

Katie Robbert 4:19
So let’s say this, let’s start with the best part of a use case, which is the user story. The as a persona, I want to so that it’s a three part sentence that tells you a lot about what it is you need to know. So the persona is the people that’s your audience the want to is your intent, that is your process and your platform. And the so that is your purpose and your performance, how you’re measuring, did you do the thing? So this use case would be as a CEO, I want to find to a large language model to my voice specifically, so that I can expedite writing content from my brand.

Christopher Penn 5:10
That would be a, that’d be an excellent start something like that. And then you probably do want to document the five P’s. Right? You do want to spend some time saying okay, well, clearly, it’s you want to have a blogging software, who’s going to administer the model, right? Because there is modelled administration, there’s work that needs to be done to make these things work. There is

Katie Robbert 5:34
this guy over here. John’s gonna administer it.

John Wall 5:38
Exactly. Excellent.

Christopher Penn 5:40
is a question of general process, how frequently are things going to change? If you are in HR, your employee handbook probably isn’t going to change a ton from year to year, they may be updates, but it’s not like say, like a Trust Insights client, where we might have new meeting notes from every single client meeting every week, and that that database would change a lot. So you want to have a an understanding what the process would be to do that. The big question is platform, how are you? What are you going to use? How are you going to use it? And there are so many different resources for this. There’s so many different ways of managing it. And we’ll, we’ll talk about some of the technology on it. And then of course, the performance, like does the thing actually work or not? And with language model tuning, that is not always necessarily immediately obvious. So one of the things you want to have is a bench test to say, Okay, we’re going to make this maybe the HR bot, you should have set of questions like how much PTO is an employee allowed, and questions that are known, you know, the answers to backwards and forwards so that you can test out the model and say, Well, okay, it just hallucinated, and seven employees are not permitted any time off, and they might, they will, they might not be allowed to leave the office. So clearly, this, this bot is not working quite correctly, or the employees are allowed 365 days of PTO. I think we might have missed tune that. So those are the five big things. Now. Let’s talk about the two approaches to tuning a model, because there’s two very different approaches that you need to have a conceptual understanding of to know which one is right for you. One is called parameter efficient tuning are pet, where you take a model, and you basically force your data into that model and knock away and chip away everything that isn’t you, right. So again, you would if it was the employee handbook, you’d force a lot of that data into the model, and you take out bunches of stuff that you didn’t need. That’s one branch. The advantages of that approach is that a model is very fast, it is self contained, it does not require, you know, a ton of maintenance, the, and it’s computationally very efficient, once it’s running, it is computationally very expensive to do that kind of tuning that takes time and a lot of processing power. If you think about it, it would be like taking a deck of cards, like me, I had all all red cards, and I want black cards as decks got to rip out a bunch of the half the deck, right, throw it away, put black cards in, and now I’ve got a deck that’s mixed way I want by had to do a lot of sorting, or maybe I want none of the queens in the deck. So I’m gonna, it’s gonna take me time to customize this, right? That’s one approach. So computationally expensive, upfront, computationally cheap to run, good for speed. Bad for updates, because every time you made an update, you got to read, redo the training process over and over again, that’s approach one. The second approach is called just another mouthful, retrieval, augmented generation or reg. CF pet and rag retrieval, I bet the generation is when you have a model, a base model that does not change runs, and it’s got to be fairly big model. And then you have a big document store in a very specific database format, where all your stuff goes. And then every time someone talks to that model, it’s like, instead of having separate black cards, you know, taken out the red cards together deck, you have the black cards put on top. So now you have to sort through twice as many cards to find the ones you’re looking for. But you don’t have to change the underlying base model. The other analogy I use is it’s like a pet approaches you get a base pizza, you rip off the cheese, you put on the toppings you want to put the cheese back on, put it in the oven rebake the thing as opposed to the retrieval augmented model, we have the base pizza, and then you have a big bowl of toppings and every time you take a bite of pizza, you put different toppings on. It’s inefficient, but you don’t have to rebake the pizza every time right? The pizza can change with every single byte. The advantage of the retrieval augmented generation approach is it is computationally fast to get rolling because there’s really no tuning you need to do in the model. It is slower when it’s actually running is computationally more expensive ones running. But the data can change really fast. Like, if you have a client meeting that week, that week’s client meeting notes go right into your document store. And now the model knows everything that happened that week. So it depends on the approach you want to take. There’s these two big forks in the road. And they each have trade offs. Right? So you have to figure out what trade off Am I comfortable with?

Katie Robbert 10:26
So let’s, let’s step back for a second. Let’s go back to that user story. Because I have, unsurprisingly, I have a lot of questions for you, Chris. And I’m sure John has some questions, too. We’re both like, pet and REG. That’s about all we got from that spiel. But okay. So let’s say, in this user story, I’m the CEO, I want to build a large language model to mimic my voice, so that I don’t have to do any more of the writing. Of have the two approaches? Well, actually, no, that’s not even the right question. The question is really, how the question that I would expect that you would ask me is, if we are building a large language model that mimics your voice? How often do you think your voice and your tone and your opinions and your perspectives change? And I think that might start to determine which approach you would advise the company who wants to build a large language model would take? So I would think that that would probably be the first question is, how often do you have new information to give this lunch length? No, is that the question you would start with?

Christopher Penn 11:39
I would start with that question. And also, and this is a really big one, do you have the data? Right? If if you don’t have data available, you can’t do either, because both approaches require you to have clean data that is categorized. And we’re going to talk about what the preparatory process looks like for that, because you have to structure the data in a very specific way for either approach, either approach, need some structured data so that you can better understand how you’re going to work with these things.

Katie Robbert 12:14
John, what questions do you have this? I mean, this is a lot to take in?

John Wall 12:18
Yeah, well, how about as far as training then? So it seems like the the rag approach, like you said, is better, because you’re gonna be laying all your data into that. And obviously, you could have integrations feeding that, as opposed to have to do it as a batch kind of thing. Through pet, I mean, is that the right way to think about that, as far as getting it to be more accurate and training it over time.

Christopher Penn 12:43
The biggest thing with the training really is the data quality, right? It ends building the data and structuring the data in such a way that you can easily you have good governance over I mean, that’s that is the biggest, hardest problem is people don’t have good data. The data is messy, the data is not well labeled, the data is sometimes incorrect, or you’re using the wrong data. And there’s a number of you know, there’s so many different tools that you can use to do this sort of thing. But without, without that governance upfront that preparation, you can’t make these tools work, they will, they will simply, if you recall back about, I think two or three months ago, we did an experiment where we took a bunch of Katie’s blog posts that she wrote, and we fed it into a fine tuned model GPT-3. And you’ll recall that it it got the tone, right. But there was a lot of garbage in there. And it would spit up weird characters and stuff because the data going in wasn’t clean.

Katie Robbert 13:48
You know, it’s funny. This is a total intrusive thought that just needs to come out. But as you were talking about, like the HR bot needing to ask it questions, my first question for, you know, the KT bought 9000 would be when I say stop, you say and if it doesn’t say collaborate and listen, then obviously, we need to continue to fine tune the large language model to be more accurate to the actual true authentic KT. And yet,

Christopher Penn 14:20
that’s actually a relevant point like should Vanilla Ice his lyrics be in the document store like is that a, if that’s a thing that you want in there, then that data should be part of the data set. If you have all 90 songs and sitcoms and weird cultural references and things and you’re trying to make something that is custom to your voice, that is knowledge that it probably should have, because it’s not going to be in all your writing. And when you pull one of these models off the shelf, they will have some knowledge of it but not extensive. Art of fine tuning a model is all about saying well what does this what does this model supposed to do? How is it going to sound So much of our voice is a reflection of the books, we’ve read the shows we’ve watched the songs we listened to, and that can be part of the document store. Interesting. So you know, Ville I spell Biv DeVoe all the classics.

John Wall 15:19
Store, have a new tech store.

Katie Robbert 15:24
There is a use case for this, I promise. This is this. I mean, this is exciting. This is absolutely the large language model we’re going to be building. You were talking about the data, the governance. And so I know that we covered this a couple months back on the first version of the show, but talk a little bit about the data itself. So you know, when I think, you know, if I’m just going strictly off of the use case that I gave you, so as a CEO want to build the models, so I don’t have to write you more. I’m like, Well, what do you mean data, because when you say that, I’m like, Well, I don’t have spreadsheets of this work that I’ve done, I’m going to give you blog posts. So talk a little bit about, like the kinds of acceptable data to train for, like the different use cases.

Christopher Penn 16:12
Sure. So let’s actually walk through an example. So we can see this. So I have here, the opening of my most recent newsletter, this is the past Sunday’s newsletter, this is just a plain text file, right, and it’s all about the marketing AI conference, this will be one document that you would use for fine tuning. And again, depending on the approach you take, you might do different things with this. So if I was taking the approach of retrieval augmented generation, I’d want to put some metadata in this file, like author name, event, date, and stuff like that, just just plain text, kind of like what you’d have in a regular blog post on your website, so that you would maybe some key words, so that the system would know, okay, that’s what this is about. For the parameter efficient tuning, I would actually almost need to use a a another model to create a prompt. And the response this blog post is the response. This is what we would want the system to generate, if in that example, and so I would go to a system like clause and topics Claude and I would say, you are an expert in gender of AI, you know, blah, blah, blah, your prompt should follow this format, I give it the prompt our the Trust Insights, prompt sheet, which, if you have not gotten your copy, go to TrustInsights.ai AI slash prompt sheet to get your copy of this framework, I give it a what’s called a few shot learning example. And I attach that blog post to it. And what it does is then it generates prompts that you would give to like a ChatGPT, for which the post I’ve written is the answer. And this would have to return into basically a big spreadsheet. So each prompt would have a blog post as the answer. And you would have dozens or hundreds, possibly 1000s of these for the parameter efficient tuning version in one big honkin spreadsheet. And that’s what then gets put into a tuning mechanism like this one. Here’s a tool. The name of it is ridiculous. It’s called ooga booga.

Katie Robbert 18:22
Oh, that was fantastic. And I’m gonna try to make you say it as much as possible.

John Wall 18:26
Yeah. Is that an acronym or?

Christopher Penn 18:29
It’s, I don’t know. And is it spelled as it sounds, is spelled exactly as it sounds good. That’s

Katie Robbert 18:36
the best thing I’ve heard all day.

Christopher Penn 18:41
Because you can see in the system itself, there’s an option options to load, like your your training data, set your evaluation data, set the formats, and you have to obey specific formats evaluations. And so your data is those documents and most software these days. So there’s, for the retrieval augmentive generation, this is a piece of software called GPT-4. All and there’s a document store where you can say in your configuration file, I want my documents to be stored in this folder, on my on my laptop, right, and you can put all your stuff your meeting notes in that folder. And as long as they’re compatible formats, it will start to do the processing of that for retrieval augmented generation. So either way, you need clean data that’s formatted Well, that is the kind of data that you would want. So for example, you would not want to just rip out all the blog posts that you’ve ever written, Katie, because to your point, blog posts from Katie robear of 2018 are different than blog posts from Cato Bear 2023. So you have to go back through and say do I want this even in the Demyx dataset or not was just like yes really hit me anymore.

Katie Robbert 19:51
Well, and that was going to be one of my questions along those lines of You know how much and I know the It’s just gonna be it depends, but I would like you to expand on it, but how much data is enough data to train it? So if I have like, two really good blog posts that I’m like, super proud of, and I’m like, This is it, this captures my voice. I’m assuming that that’s not enough. Because what I would end up getting back is just a different version of those two blog posts over and over and over again, versus if I give it a, you know, 1000 blog posts, I would theoretically get more variation. The question is, really, so my understanding is that these models don’t know what you don’t tell them. So I couldn’t necessarily give it 1000 blog posts written by me that I’ve never talked about, you know, large learning models, large language models, and say, write a blog post about large language models. Like, how does that work? Like? Does it only know the information that it’s fed based on my blog posts, or is there other information that comes in,

Christopher Penn 21:07
there is other information in there, and that is a function of what model that you’re what model you’re using as the base model. That is, that is a relatively well, it’s not complicated, it’s just knowing what’s out there. Generally speaking, let me show you an example here. So I’m gonna go over to Hugging Face the model repository, which is kind of like the App Store for all this stuff. And in this case, I’m going to look for models by this one guy who goes by the bloke who creates tons of models, and we’ll look for a specific format. Look, you’ll notice there, these numbers here, 13, B, seven, D, B, seven, B, anything that ends like that, that is a model that indicates the number of parameters in a model. The more parameters there are, think of parameters like this, you’re here to go to Subway, the sandwich shop, and get your Subway sandwich. The parameters, essentially, are the number and the diversity of toppings you get on your Subway sandwich, right. So if you get a Subway sandwich, and it’s just like, I want to somebody sandwich with cheese, just cheese, very small number of toppings, very fast for the sandwich artist to put together. Not a particularly satisfying sandwich, right? If you on the other hand, get a sandwich, and maybe in the subway app, and you choose every topic and you go to go double toppings and you get sandwich like this tall, right? That would be the equivalent of a model with more parameters lots more forever. So 13 billion parameters. 70 billion parameters know that you’re not like a giant sandwich, it’s gonna take forever, for the sandwich, the sandwich artist to assemble that sandwich, right? But it’s going to be really flavorful and have a lot of different stuff. And model parameters are like that. The larger the number, the more knowledge it has, typically. But the slower it is, if you want your your language model to be able to answer questions about stuff that you didn’t provide information for, the bigger the model is, the more likely it is it can do that. So a 7 billion parameter model is going to have some big knowledge gaps. So it will be able to answer to rephrase and summarize information you give it but it’s not going to have a lot of outside knowledge, a 30 billion parameter model or 70 billion parameter model, there’s gonna be a lot of extra knowledge in there. So it might be able to quote Vanilla Ice without you providing the lyrics, right, it might just know that you might be able to, to all the 90s rap songs, right? So you should just know, it should just know, right? So you could literally have it weave in stuff. You could say, hey, I want you to incorporate the phrase, you can’t touch this somewhere in this block straight and the associated lyrics do so in the larger the larger models and the smaller models, it might be like, I don’t know what that means. Right. And so that’s part of the trade off to what size model dictates the computational power you need to run it. But it also scales the amount of information it knows. A model like OpenAI is GPT-4, or Google’s POM to Google POM to has 570 billion parameters like is a massive, massive beast, clods and throw pics we thought was cloud we think has in the 500 billion parameter range, same for OpenAI, as GPT-4. These are really really big. And as a result, they have a lot of knowledge. That’s why you can ask ChatGPT You know, give me some restaurant recommendations in Chicago because it has that somewhere in the wallet scene. If you were to ask, you know Vicuna seven be that public. I’m not really sure. I don’t know the answer to that I don’t have that information or we just hallucinate something that’s wrong, because it just doesn’t have that, that innate knowledge.

Katie Robbert 24:54
Everything you’re saying continues to point back to doing those business requirements First setting and being able to set those expectations. And so you know, if I go once again, if I go back to simple user story, there’s a lot of information that needs to be unpacked and documented. So as the CEO, I want to create, you know, the Katie bought 9000, so that I never have to write a blog post again. However, I also then need to set the expectation of it needs to bring in outside pop culture references, it needs to understand, you know, and stay up to date on what’s changing in the AI industry, it needs to, you know, be able to do this be able to do that. And that’s not necessarily the content that I’ve written that I’m giving it, I then need to know where to get all of those outside pieces. And I think if I’m following, that’s where we’re really talking about the fine tuning of these large language models, because it’s not a one and done, you don’t just stand up and go, Okay, now it’s gonna write, you know, blog posts forever, and they’re always going to be brilliant and up to date. This, when you say, Fine tuning, you’re really talking about updating and maintaining, and making sure that the data that exists within the model is the data that you need it to be in order to produce the output. Now, my brain hurts, but I think I got it.

Christopher Penn 26:23
That That is correct, though. That is correct. Those requirements are really important. And there’s, there’s actually even more layers of requirements as well. So for example, for example, are you working with protected information, and that’s going to change what architecture you use, that’s going to change what models you use, because some models are, are better at certain types of content than others, that’s going to change the the approach you use, you might not want to do, say the parameter efficient tuning, because that if customer information is changing frequently, you might need the retrieval augmented generation where you have a CRM that’s dumping records in a format that these models can can understand. Guess what that does? Now you have security requirements for the document store? Where are you going to store these documents safely and securely because they contain personally identifiable information, sensitive protected information, protected health information? That document store is radioactive, right? You’ve got to lock that thing down and be able to control it be able to produce results for an auditor to say like, yeah, we’ve we’ve guaranteed that this language model is secure. And the data store that feeds it is also secure. So there’s, those are extra considerations that are part of that users were had better be part of that user story, because if not, you’re gonna have a bad time.

Katie Robbert 27:43
John, thoughts, I can see like, I can see the sunflowers behind you spinning like the wheels and the gears in your brain.

John Wall 27:51
So no, your model has to be legit, too legit to quit. That’s really thank you where it needs to be.

Katie Robbert 27:59
Full circle,

Christopher Penn 27:59
and out. So that’s, that’s sort of the prerequisites for fine tuning. And again, it doesn’t matter which approach you take there, you have to do a lot of this legwork upfront. If you want it to work, the risk that a lot of organizations are facing is they just rushing ahead like, oh, let’s let’s turn this thing on. Let’s build it like, Oh, here’s a model. Let’s slap that in place. Well, is that the right one? Are you do you even know what the underlying architecture of the model is, and whether it’s, it’s well suited now. So there’s one called I swear, the names of these models, easily be the names of cannabis strains, there’s one called Cherry blast that is specifically good at coding writing code. So if you pull that model off the shelf as your foundational model for fine tuning, and you’re working with medical records, those two things don’t go well together. Right. So that would be that would be problematic if you picked the wrong model. And again, that’s something that as part of the requirements gathering, hey, this model had better be able to code to oh, well, now we have to look at something very different.

John Wall 29:11
Go ahead, John. Well, it’s just gonna say how does that play into then, if you want to make the your architecture updatable, right? I kind of it feels like doing the rag approach would give you a dataset that can sit outside of the model. So now when you have a new model, it’s just plug and play the new thing and you don’t have to go back. But is that how that works?

Christopher Penn 29:32
That actually is a really important point is something that came up at the MAE con conference. A lot of vendors who were there had built these very specific fine tuned approach the parameter efficient tuning approaches on GPT-3, the DaVinci model from three years ago, and they were so much of their infrastructure was tied into this thing now that as new models kept coming out, they could not adapt right. So if you work with some of these, these Aiva There’s like, yeah, you’re using a model is three years old now. And it’s not the best in class for those particular tasks. When you take an approach like the retrieval augmented generation, yeah, there’s higher compute cost, but the underlying model you can swap that in and out, right? As long as you thoughtfully figure out what the abstraction layer looks like between your data and your your, your underlying engine, you can pull the old engine out put new one engine engine in and that approach as of today, makes the most sense because these things are changing so fast. Like if you look if we look for example, the llama to model came out from from Phase four Mehta. Three weeks ago now, you are just let’s take a look at llama two. There are 3365 derivatives of llama two available, right. So you have this one is a kilo or fine tune version for French. There’s an E commerce Frequently Asked Questions model in here. So as companies like med or whatever released new models you like, well, doesn’t this one’s even better? If I go the fight, but the parameter efficient tuning approach, I kind of restart my training all over again. And granted, it doesn’t necessarily take long can take a few days. But

Katie Robbert 31:16
oh, you might not have a few days to spare.

Christopher Penn 31:20
Exactly. And the other advantage of the the retrieval augmented generation approach is, all these models perform slightly differently and do slightly different things. If you receive complaints from customers, like hey, your chat bot just said something really offensive. And, and you know, you just swapped in the new model. Okay, well, great, pull that one up, put the previous one back in, right. And we’ll we’ll QA and test and stuff like that, if you went the parameter efficient unicorns, you’d have to shut that chat bot down until you were able to figure out what had gone wrong, either in the tuning or if there’s something wrong in the base model. And so having the modular approach does make more sense for the environment today. But it comes to those trade offs.

Katie Robbert 32:03
It also occurs to me that what you’re talking about so you mentioned that, you know, you were at the MAE con conference, you talk with a lot of vendors who have built their applications on top of these fine tuned models. A question that we hear a lot from our audience is what questions should I be asking a vendor. And I feel like a lot of what you’re talking about are these finds who models the type of model the requirements. If you’re, if you are assessing vendors, who have built some sort of an interface on top of a large language model, if they are not asking you for your user stories for your requirements, that’s a huge red flag. Because to your point, Chris, some of these models are going to pull in PII, some of these models are going to pull in, you know, information on literal llamas, when really your business is all about camels, and they’re different animals altogether. And so really just making sure that the vendor if you’re choosing to rent from a vendor versus build your own, that the vendor is able to partner with you and explain what they’re doing that they’re not black boxing it and if even if they’re not blackbox, if they’re like, Oh, we’re using, you know, Lama Rama ooga booga to they can they can tell you what is in that model, and if it aligns with your requirements, because that’s, that’s the missing piece is, do you know what you need? And can they match you with that with those requirements? That’s going to be a big deal, especially to your point around privacy protected health information. I can tell you from my past experience, dealing with the federal government, with a HIPAA violation is long and expensive and painful, and that is not time that you get back.

Christopher Penn 34:01
Exactly. Even asking a vendor Hey, what is your foundation model? Because the reality is the foundation models, only a handful of organizations are going to publish those meta, who will OpenAI the government of China, the government of Saudi Arabia. Because these things are actually no is the United Arab Emirates that put up Falcon. These foundation models are so huge. And so costs compute intensive that you have to you almost have to use one of the big ones that a big tech is but because they are the only ones who have the infrastructure even assemble such a thing. I think LOM was trained on like 1.3 million hours of compute time, just you know, data centers full of compute cards. And if you when you talk to vendors say oh, we use a proprietary model like okay, so either you have a very small model that you custom train that’s going to be horrendously underperforming compared to what’s best in the market, or you have no idea what’s in infrastructure. And you can say, Oh, well, cool. I understand you fine. tuned it. What’s the foundation smaller you started with and then if then they suddenly get really uncomfortable like, we’re on. We’re on OpenAI, you know, 002, like, okay, so you’re four years out of date and your foundation is shaky. Those are questions that you can ask those questions that you know, if you want it if you are trying to vet, a vendor and you and it’s like mission critical you want some help? Let us know, we will talk to the vendor for you. Probably? Well, I mean, depends on how much you like the vendor or not. If you don’t like the vendor, have me talk to them. I’ll be a rough rock for them. But we can we can ask those questions to say, what’s the foundation model? How did you do you’re fine to what kind of fine tune is it now that you’ve watched this episode, you know what the two different branches are and you have a sense from your, from the emergence of which approach you might want to take. These are all questions you can ask that A should ask vendors to say, tell me more about what’s going on under the hood. You want to give away all your secrets, but at least give me the broad scopes.

Katie Robbert 36:01
And if you’re like me, then I’m just start making up names of models and asking the vendors if they’re using them if they say yes, then you know then they’re definitely full of it. So if they are using the llama Rama Olga Boga, you know, breakers three braking three, Electric Boogaloo. I was actually referencing that yesterday, then you know that there’s definitely something wrong. So John, you know, what would you call your larger model?

John Wall 36:27
Yeah, was that breaking three Electric Boogaloo Boogaloo would be here.

Katie Robbert 36:34
This is why Chris can’t hang out with us. We just don’t happen seriously enough. We do. You know what it is, it’s a lot of information to take in. And I think that the advice of, you know, bringing us on to help vet those vendors. Because this is a, it’s a big decision. We’ve always talked about making sure you’re doing your requirements, doing your due diligence, asking your vendors, questions about things like a CRM, or an email marketing system, or, you know, a financial system, the stakes for a large language model are different. It’s still a piece of technology that you still need to use the five P’s to figure out where within your organization it fits, and who’s going to be using it. But the data, the stakes for the data that you’re bringing in, and that you’re pushing out are so much higher, that you really need to make sure that you know what you’re getting into. Because it could be a very costly decision if you make the wrong one.

Christopher Penn 37:37
Yep, it can be reputational, damaging, it can just blow up things. So all of those are all those are risks that you can mitigate if you have some help. But that’s it. I mean, that’s, that’s the fundamentals of the prerequisites of fine tuning. You’ll know we did not actually fine tune anything in this episode yet. And again, that’s because the process can take days. Depending on the size, your data and what foundation model, you’re starting within which approach you’re going to you’re going to do, but when you do it. It’s like the software development lifecycle. So you also need to have a plan for the fine tuning process. What are the technical requirements? Right? Do Are you going to do a pilot and MVP to see like, Okay, I’m gonna give this dataset of 10 examples, and run it through, okay, go because that’ll tune in like an hour. And then you can check the results real quick and say, okay, the tuning process worked, or the tuning process did not work. And then from there, you can iterate kind of like a scrum cycle, and where you can do additional revisions and things. All of those are parts of the actual tuning process itself, which we did not touch on today, because this is just gathering the materials, you need to be able to start the tuning process.

Katie Robbert 38:54
Well, it’s like any other piece of tech or software, taking the time to do your requirements upfront. And spending more time on that saves you resources and budget and energy, when you actually start to put your hands on the technology itself. That’s always been true of any kind of software development. If you just go right into pushing buttons and, you know, trying to make things happen. It’s going to take longer and cost more money than it would if you just sat down and did something like Oh, I don’t know, the five P’s. What is the question you’re trying to answer? Who needs to be involved? What is the repeatable process? What pieces of equipment are you using? And how do you measure success?

Christopher Penn 39:39
Exactly right. So any final questions Katie and John?

Katie Robbert 39:46
I want to know when my Lamma Lamma ding dong is going to be available.

Christopher Penn 39:51
First quarter of never

Katie Robbert 39:53
perfect. I look forward to it. John.

John Wall 39:57
We’re out of here. Peace.

Christopher Penn 40:00
I can’t touch this next time. Thanks for watching today. Be sure to subscribe to our show wherever you’re watching it. For more resources and to learn more, check out the Trust Insights podcast at trust insights.ai/t AI podcast, and a weekly email newsletter at trust insights.ai/newsletter Got questions about what you saw in today’s episode? Join our free analytics for markers slack group at trust insights.ai/analytics for marketers, see you next time.

Transcribed by https://otter.ai

Need help with your marketing AI and analytics?

So What? Marketing Analytics and Insights Live

airs every Thursday at 1 pm EST.

In this episode you’ll learn:

Upcoming Episodes:

One thought on “So What? Pre-requisites for Large Language Model AI Fine-Tuning”

Leave a Reply Cancel reply

Subscribe to our Weekly Newsletter

Pin It on Pinterest