So What? Fine-tuning large language models (LLM)

So What? Marketing Analytics and Insights Live

airs every Thursday at 1 pm EST.

You can watch on YouTube Live. Be sure to subscribe and follow so you never miss an episode!

In this week’s episode of So What? we focus on Fine-tuning large language models (LLM). We walk through what they are and what kind of language model is right for your business. Catch the replay here:

So What? It’s our birthday - 5 years of Trust Insights


In this episode you’ll learn: 

  • What exactly are large language models like GPT-4, StableLM, etc.?
  • What kind of language model is right for your business?
  • How fine-tuning language models works, why it might be necessary, and what you should be doing to prepare

Upcoming Episodes:

  • TBD

Have a question or topic you’d like to see us cover? Reach out here:

AI-Generated Transcript:

Katie Robbert 0:25
Well, hey everyone. Happy Thursday. Welcome to so at the marketing analytics and insights live show. I am Katie joined by Chris and John. How’s it going? Guys?

Christopher Penn 0:34
Good, good. Just, you know, getting all the pieces together for a lot of fun today.

Katie Robbert 0:39
Yeah. So today on today’s episode, we are talking about fine tuning large language models. And so everybody, and I do mean everybody, not just marketers are talking about large language models, specifically, you probably more commonly know them as things like ChatGPT and Google Bard. There’s also stable LM. So there’s a lot of different types of large learning models. But so we want to today we want to talk about, just go over the basics of what they are, if you want to get deeper into the details of what a large linear model is, Chris, and I actually covered that on the podcast last week, this past week, so you can get that at trust podcast, and then we wrote about it a bit more in the newsletter, that’s trust, where we talked a little bit more about your readiness for a large language model. But today, Chris, we want to talk about, okay, you have one, what the heck do you do with it.

Christopher Penn 1:43
And more specifically, you have one, but maybe the public models, like the models used by ChatGPT or, or by a bard or by Bing, maybe they’re not, they’re not doing it for you, right, they maybe you’ve got something that is very specific, very specific domain of expertise, that the models just aren’t as good as you would want them to be. And so today, we’re gonna talk about fine tuning, because that is the the way to get those models to do more of what you want. The best analogy is think about it like a dog, right? Like, let’s say you buy like a bloodhound or something, you can train that dog to be like a bomb sniffing dog, it gets really good at sniffing out explosives, but in the process, can no longer really sniff out like drugs or survivors an earthquake, right? You’ve tuned that that service animal to be very specific about one thing like training a helper dog, a service and a service dog. There’s other things that it can’t do anymore. Because it’s brain capacity has been focused on thing, right, it’s not going to run an agility course, at the puppy bowl. Because it’s helping its its vision impaired owner, navigate. So that’s kind of what fine tuning is with large language models where you’re teaching them. Hey, let’s go ahead and do me one more specific task at the expense of not being able to do the more common tasks.

John Wall 3:17
So what’s the next step? Well,

Katie Robbert 3:22
the password here,

Christopher Penn 3:24
the next step is unsurprisingly,

Katie Robbert 3:27
oh my goodness, who knew? Well, you always have to start with what’s the question you’re trying to answer? Why are you doing this in the first place?

Christopher Penn 3:38
Exactly. What is the purpose of the fine tuned model? Right? So if we think about what these models are good at, right, there’s the six basic categories that the models are good at. There’s generation, right there is extraction, summarization, rewriting classification in question answering. And so what is it that you want a model to do better? Be focused on a specific task? Do you want it to write blog posts like a very specific person? Do you want it to be able to classify documents with sentiment analysis? Whatever that thing is, you’ve got to have that purpose established first.

Katie Robbert 4:19
I think what we’re seeing Chris is a lot of people, a lot of companies are using these large language models, purely for generation. That seems to be the most common. We’ve talked about this on past episodes, that it’s not actually the use case that’s best suited, but it’s the one that is most commonly used. And so when we’re talking about fine tuning these large language models for specific voice, that’s really where we’re starting to bring the pieces together.

Christopher Penn 4:51
Exactly. Right. Exactly. Right. So today, let’s talk about maybe you want to do some generation, maybe, maybe Katie, you We want to just get some more free time, right? You want to spend some time out on the deck, right? And taking writing content takes takes a good amount of time. So how would we get a model fine tuned to be able to write more like you what? What, from your point of view would be sort of the people on the process before we start getting into the toys?

Katie Robbert 5:25
Well, I think from the people, you know, obviously, if it’s going to be a model tuned to my voice, we should probably start with me, and what is my voice and so thankfully, in this instance, I’ve been riding the open to the newsletter for well over a year, so we have a lot of content to work with. If that were not the case, then before you get into fine tuning a model for specific person, that person needs to define what their voices and not just say it’s warm, it’s friendly. It’s, you know, coherent, like that’s not detailed enough. So they need to provide writing samples that really demonstrate and so for me, we have plenty of those samples. So that would be the people.

Christopher Penn 6:09
Yep. And then let’s talk about the process, we’ve got a lot of content is all of that content, useful for training? So let’s, let’s take a quick peek. So what we’re going to show here is the back end of the Trust Insights blog, this is the WordPress blog, right? So you see, we’ll see all sorts of great stuff. Now I could just shouldn’t just export this whole thing and stuff into a large language file. That’s probably the worst possible idea. But what we see in here things like, who the author is, the date, the content of the post, the title of the post, what kind of post it is, the slug, it stands when it was last modified. And so you have all this information in here. from a process perspective. Okay, well, obviously, we want to be your voice. So we should probably know, the author who is the author, and in this case, your author, too, and on the blog. But what else would we want to take into consideration?

Katie Robbert 7:14
I would want to do a little bit of analysis of the content I’ve written first. And I don’t know if that you know, exists through this interface. Or if I would look into, you know, a Google Analytics or Search Console to see, okay, great. I’ve written a lot of content. But that doesn’t mean that every single piece of content has resonated. And that’s true for any author. And so I would first want to know, of the pieces that I wrote, what are the top performing ones and use those, you know, provided that they are contextually relevant, not just, you know, a throwaway post, but something that really demonstrates and aligns with what Trust Insights wants to be known for?

Christopher Penn 7:56
Yeah, you won’t find that in this particular data set. And you’re right, that is something that could be extracted. However, one thing that is in here is time. Would you say that your writing has improved since 2018?

Katie Robbert 8:08
I would like to believe it has. I mean, that’s that’s subjective, but I’m, I feel confident that it has, it comes a little easier to me, if that makes sense.

Christopher Penn 8:19
No, it makes total sense. And yet, in here is everything dating back actually to November of 2017, when we first actually created this website. And so we probably want to start by saying, maybe how far back do you think, would you say we should go with that where the writing is still representative of your level of comfort today?

Katie Robbert 8:40
Well, I mean, even if we went back two years, to, you know, April of 2021, I feel like that still? That’s what 52 times two is 104 posts to pick from? I mean, John, let me ask you. So John, you thankfully gratefully edit my posts, just about every week? Would you say that they’ve gotten better? Do you spend less time editing them?

John Wall 9:06
Yeah, it’s definitely sad. Well, the biggest thing you can always tell and you nailed it, it’s in the speed of generation, you know, when you’re struggling, getting those first drafts out, kills you, and then to be able to bang them out, you know, in relatively less time always gets you on the mark. So yeah, you because I mean, you start with, you know, nothing on Monday morning, and you’ve already got the idea spun up in the draft done by end of Tuesday. So yeah, you’ve definitely fallen into a cycle the past two years. So I think the past 100 would be good. The big problem we have because I’ve done similar stuff with marketing over coffee, is you have all these posts that are topical and timely. And you know, nobody wants to read any of those MySpace shows that we did, you know, eight and a half years ago. So that’s a real pain to try and figure out what stuff do you still want out there? But yeah, I think you go back two years, you’ll be fine with everything that you’ve covered. And then you We don’t have to worry too much about timeliness with ti stuff because we tend to be a lot more strategic. So that’s less of a pain point.

Christopher Penn 10:08
All right. So let’s do this, then let’s do a quick edit here. And say I want the title and the content from our posts where it’s newer than April 2021. It’s a published posts, we don’t want anything that was was not published, the type as opposed to no pages, no landing pages or anything like that. Katie is the author, author number two, and we end up with 165 pieces of content? That seems like a decent amount. That’s a you can see, it’s the so what post we see also things how do you manage expectations, this is all good stuff, right? This is the kind of stuff that that we would really want to be focused on and teaching a machine to learn.

Katie Robbert 10:50
Now, what I would say, because I know our content inside and out, is that the so what posts aren’t going to be super helpful in terms of training, the large language model, because that’s a template. That is, you know, it’s something it’s the recap of this particular live stream, every single week, and we just swap out the topic and the links, but it’s not representative of my writing. So what I would do, what I would do, if I, if I were doing this is I would exclude the so what posts,

Christopher Penn 11:24
okay, so let’s go ahead and see,

Katie Robbert 11:28
and it will leave you with a lot less content, but it will be more representative of my style until

Christopher Penn 11:35
Okay, so let’s we’re gonna do that in post processing. I mean, I could sit here and do it and sequel, but I’m not going to. Let’s go ahead and export all this data, all 165 pieces. Okay. And that’s going to be sort of the first step. Now, the next stop on our tour, is we’re going to need to to actually do some more processing on this data because the data itself is in? Well, it’s not in the best condition. And let me show you, let me show you what we mean by that. Let’s go put up our fine tuning here. And let’s see. There we go. We’ll clean that up. Alright, so we have 165 posts, we have the titles, and we have the post content, you’ll notice it’s a bit hard to see. But you’ll notice that the post content contains HTML. So this is not clean text, this is kind of a problem, I have six not kind of it’s, it’s a pretty big problem. There was no way that you could feed this to a language model and have it create coherent text, because it’s gonna see a bunch of code, and you can actually throw it off and have it be distorted, like the outputs would be all kinds of crazy.

Katie Robbert 12:57
Well, it’s gonna think that I’m a robot talking in an HTML.

Christopher Penn 13:01
Exactly. So the first things we’re probably going to want to do is let’s take a look at this, this data, okay, we’re going to want to name rename these columns. So let’s rename up this first column Completion is the second column. And then

Katie Robbert 13:20
while you’re doing that, I have a question though. So I’m guessing that a lot of people will be borrowing the content from a website. So you would have HTML, you know, in almost all of those writing samples, let’s say I happen to have all of my posts that I want to use in like a text file, or like 165 text files, would I still run into the same formatting issues? If it wasn’t on a website? No,

Christopher Penn 13:51
there’ll be you would have different methods in ingesting the data, like basically do a big file read. But you’d also run into the trouble of how do you separate out, say, the title of the post from the rest of the file?

Katie Robbert 14:06
So the method of ingesting the content, each one’s going to have its own set of roadblocks? Exactly. And, Justin,

Christopher Penn 14:19
let’s see if this is what you hope it does. Alright, so yes, that’s looking much better. Here’s what we’ve done. Let’s let’s step through this. We’ve renamed the two columns prompt and completion, right? Because we’re gonna actually use the prompt. The title of the blog post is essentially the prompt. If you were to type that into a large language model, you would say, write a blog post about dark data mining right and get up. We’re going to strip out everything that looks like HTML. So in that text, we’re going to just rip that all out. We’re then going to compute token length. So token length is the number of characters roughly divided by four. Most large language models have a limit to how much individual texts per day Keys, they can they can take in with the GPT-3 model, which is what OpenAI officers tuning right now is 2048 tokens, with GPT-3. Point five, it’s 4096. And with GPT-4, it’s 1192. And soon to be 30,768. But so in this part, we’re going to count up the number of tokens. And also do a rough word count, just to see what that looks like. And then give each row an ID number. And then like you said, Katie, we’re going to filter out so what because you said it’s not a good representation. So that leaves us with, we have our prompt, essentially, right? Accountability, where do projects fail, we have our text, you can now see this is no longer filled with HTML. Now this looks like regular good old fashioned text. And we have our token lengths, here. So we have for the most part, these look good. And a rough word counts, we’re going to filter it, we’re also going to do another filter here, we’re going to filter our token length. be less than 2048. That will again keep it from blowing up on us. That leaves us with 87 pieces of content that emphasize the voice of Katie’s or we’re another step closer to building our own Katie GPT.

Katie Robbert 16:26
I don’t know if I’m excited or nervous about. I don’t know how you guys feel. I mean, you talk to me every day, I don’t think I don’t think the world needs more than one of me

John Wall 16:38
remains to be seen. We’ll see how this goes. Go way back well.

Christopher Penn 16:46
So believe it or not, this is the hardest part of fine tuning, right to select the data to clean it, to think through how you want to process it. Because if you don’t do this, you just pour it all in, you’re going to end up with kind of a mess. Okay, now, here’s the question, for fine tuning. Let me go ahead and just bring up a a bit of thought on this front, you have a couple of different choices, you can do public models, or private models. So a public model is using something like open API’s API to create a custom public model that you can then use in their cloud. So it saves you all the pain of setting up your own servers and downloading an open source model. And you’re provisioning an instance and all that fun of a fun, but it’s still a third party, so you shouldn’t use it with PII shouldn’t use it with protected health as

Katie Robbert 17:47
well. And I was gonna say I was at well, actually, rather, I was going to ask, you know, is there a risk to creating your model publicly in that hosted space? Like, sort of taking it to a more extreme? Like, let’s say, I was a very well known very influential person who had strong opinions about things, you know, would there be a risk that somebody could access that model of Katie GPT and start spoofing my voice?

Christopher Penn 18:19
There’s not a risk of them doing that with the model you upload because it’s tied specifically to your OpenAI account, there is a risk that somebody could just scrape your content publicly and build their own.

Katie Robbert 18:30
Well, thankfully, I’m not that well known. So I don’t think anyone’s gonna do that.

Christopher Penn 18:34
This is your highlights a really important point. This is something that’s happening in the disinformation space, where people are grabbing text, audio video of prominent personalities, and using large language models and deep fake models to generate fake information. And there was a report put out a couple of days ago about on Twitter, Russian sponsored and sourced disinformation has increased by 1/3. By 33%. Since the new management, they’re open source to the the recommendation engine model, they’ve they the hostile actress have figured out how to game it and flooding it with all this extra propaganda because that is machine generated.

Katie Robbert 19:23
Knock it off,

John Wall 19:24
just what we needed.

Christopher Penn 19:26
Exactly. So in this case, because this is Katie GPT-4. writing blog posts, we probably don’t need the private model, right? We probably don’t need something that is on our own hardware. Okay, so how do we do this? Great question. OpenAI does have a whole bunch of instructions on how to actually fine tune these models. But one of the things we have to think about is which of these models do we want to tune? They have three four models ADA Babbage Curie and DaVinci. Have these models, the main differences is capability. And then cost, right? So DaVinci is the biggest model like it can process 4000 tokens. It’s very, very capable, does creative generation where it’s Curie does translation complex classification sentiment. Babbage does center semantic search, and ADA does. Keywords and corrections. So let’s go ahead and we’re going to now start doing the fine tuning process of getting that data file ready. So we’ll go back to our so there’s our KT training model. Now this part here, you got to read the directions, which is always challenging for someone like me. It’s not my favorite thing in the world. But we’re going to prepare this data file. So we’re going to so you, you have to in on your computer, install these tools from OpenAI, the command line tools, there’s no nice pretty interface, I mean, I’m sure someone’s selling a piece of software that has a nice interface, but for the most part, it’s not going to be available. So you just use the command line. It says it will tell you hey, here’s what we think we see your file contains 87 prompt completion pairs, and generally recommend having a few hundreds examples. performance tends to increase. It contains two columns of keep row additional columns are present. So it’s telling us Hey, you probably should get rid of those extra columns. Some are these talking like they’re getting ignored. Let’s see what are the warnings here it does not contain comma separated at the end of your prompts should have that did not contain a common ending at your completions, completion start with a whitespace. So it’s gonna say here’s we’re going to do you got to convert to a JSON ELMo JSON line file, we’re going to remove those additional columns and add a suffix separator and saying, do you agree? And in this case, yes. I mean, we could do this in the our code, if we were going to do it on a regular basis. But for right now, for convenience, we’re gonna go ahead and do this. So it’s gone through it set up some stuff, it’s done the fixes, let’s add the suffix.

Katie Robbert 22:11
I think this is the first time I’ve ever seen software offer to fix the problem for you. Like, it’s usually just gives you the big fat warning. It’s like good luck soccer. But this time, it’s like, okay, so we see the problem, do you want to just go ahead and take care of that, like concierge? It’s actually kind of nice.

Christopher Penn 22:27
It is kind of nice. So we’ve gone through, and you can see now there’s there is the JSON l file of Katie, right? So you can see there’s the prompt, there’s the completion. And there’s all the texts in it. There’s no HTML, this time, there are line breaks, but there’s no HTML, which is nice to see. Okay, go ahead and close that file. We don’t need to save that file. But now we’ve got our training file. Now, here’s the thing. Almost every language model uses this format. So if you were doing this, and you decided, You know what, maybe I want to do a private model. Instead, maybe you read through the data, like, oh, this was something that shouldn’t be seeing the public. At this point. You can say, Okay, I’m not going to use OpenAI, I’m gonna use something else, because you haven’t done anything with the data yet.

Katie Robbert 23:21
So I could create, in some ways, if I’m thinking, if I’m understanding correctly, I could almost do like a head to head between ChatGPT and Google Bard with my KT model, and just see like, which one sounds more like me.

Christopher Penn 23:40
Exactly. That’s exactly right. You could do totally do that.

Katie Robbert 23:43
I see a bakeoff coming up over the next couple of weeks for us

John Wall 23:47
in the future. Chris, how about the models, the DaVinci and the others? Do you have you noticed any difference between those? Do you know what the what those different versions do?

Christopher Penn 23:56
Yeah, for the it actually lays it out pretty clearly. It says like, here’s what each of these models is good at. So we’ll go back here. So ADA says this is simple classification, keyword finding and stuff. It’s it’s a very simple model. Babbage does moderate classification, Curie does language translation, accomplish specification and DaVinci. Does generation so, because we’re trying to make a version of Katie here, we have to use the DaVinci model for generating text. So let’s go ahead and create this model. But create I’m gonna find Katie’s json file here which we just had up the key training prepared. And the model I want to use is the VINCI

John Wall 24:53
so you are generating the Davinci Code here.

Christopher Penn 25:00
Oh, wow. Yeah, yeah, they will do that. Well. And you’ll see it says, Hey, this is this processing this fine tuning, it’s gonna cost $8.80. Which

Katie Robbert 25:17
I think I’m worth it though. So let’s, let’s go ahead

John Wall 25:19
are gonna century Here we go.

Christopher Penn 25:21
You are you definitely worth but think about that that was 89 blog posts. That’s it.

Katie Robbert 25:26
I know that if I had Yeah 1000s, which for larger companies, I can definitely see them willing to make the investment to really, you know, so let’s say that, you know, you causally have a ghostwriter for your executive suite for their thought leadership pieces, I could see where you know, spending a couple $100 on 1000 pieces of content to really fine tune, the large language model on the CEOs voice would be worth it and a huge financial savings down the line.

Christopher Penn 26:02
Exactly. And so the but the challenge with that is, if you’re going to have multiple purposes, you’re going to need to train multiple models. So if you want to have a generation model for blog posts, that’s its own model. If you want to have a question answering system, we have a for as an internal knowledge base. That’s another model. And you’ll have to format the data very differently than the way we did.

Katie Robbert 26:27
So I couldn’t use this Katie GPT-3 model to both generate and do q&a With

Christopher Penn 26:39
you can it will do less well, at the q&a.

Katie Robbert 26:43
Because that wasn’t the content, we trained it on the exactly right. Which totally makes sense. I mean, I feel like you would have some kind of a cooking analogy here of, you know, some sort of like a classically trained French chef being asked to, oh, I don’t know, make Mexican street food or something, which I’m sure they thought they would do amazing way still way better than me. But the point being is that it’s a mismatch.

Christopher Penn 27:14
Exactly, exactly. So at this point, what’s going to happen now is this data set has been submitted, it’s going to be loaded to open a as queue. And then it’s going to take probably an hour, maybe two hours to do all the processing. Because if you remember, from previous episodes of podcasts and things, we’ve talked about how large language models work, and we talked about the architecture behind them, folks will remember this lovely diagram. What is going to do now is take all those blog posts that we put in and run them through this whole process, breaking them into individual words, calculating the embeddings, which is turning words into numbers, and calculating the statistical probabilities among them. And then what it’s going to do is it’s going to take the existing model that that we start with the DaVinci model, and all of Katie’s specific, unique ways of writing, it’s going to change the weights in this model, it’s going to save, you know, for example, this phrase, the two word term, Vanilla Ice is going to get a boost in the tuned bottle because Katie uses more than appears in the general model, right? So it will change the probabilities of the of the entity known as vanilla ice, and be more likely to generate that than it would be otherwise. So that’s what’s happening right now is all these blog posts are changing the underlying weights in the DaVinci model to to to skew, you’re intentionally skewing, you’re creating an unintentional bias to sound more like Katy.

Katie Robbert 28:48
Fun fact, I’ve only quoted Vanilla Ice once in writing more so in well, so what’s interesting is we didn’t pull any transcripts from so what. And so that is actually where you would get more, I would argue more of my personality than through the blog posts. So I am sort of interested to see, you know, what the output is in terms of how well it does my voice. I do feel like my writing for the newsletter and for blog posts is very close to how I actually talked to people. But it’s not going to have all of those additional like Katy isms and quirkiness. I think that I would, you know, get from a transcript of me actually talking or speaking. So that I think that’ll be interesting. And, you know, for people who are considering creating these models, that’s something you know, so when we were talking, you know, how do you get started, if that content doesn’t exist, that might be a way to generate a lot of that content is whoever you’re trying to train the model on. Just have them start talking. Have a conversation with them and ask them questions. Do you get those transcripts that then turn into the content to train the model on?

Christopher Penn 30:05
Exactly. Now, as I said that this is going to take probably a couple of hours for this to chew away. When it’s done inside the developer playground, it will show up in a list of fine tunes that are specific to your account. So again, this is not something the general public would see, this would be something that is only available in your individual OpenAI account. And then once you have those fine tunes, you’re able to use them for the different tasks that are available. This was from this week’s newsletter. Do not under any circumstances use this it was it because I didn’t clean the HTML. So it just generates gibberish all the time. It’s also using Curie instead of DaVinci. So but that’s, that’s effectively it. So in terms of how do you use, how do you fine tune, we’ve just walked through the process from from end to end of starting with your data, cleaning your data, much, you know, start with your purpose, what is the purpose? The people and the processes, then get your data, clean your data, upload your data, and ultimately make it available to the interface now, you can’t use this in a chat window. Right? This is not available in ChatGPT. This is available via you the playground or the API. So part of your purpose. And part of the performance in the five piece is how you’re going to use this thing that exists, and you paid money for it. But it’s not as easy as a chat window.

Katie Robbert 31:34
That was gonna be my question is how the heck do I get to it? But I would imagine, if I’m remembering the different pieces, I could still build out a prompt the same way that I would in the chat window. And but use the Katy tune model. Is that true?

Christopher Penn 31:54
Um, you do it in the playground.

Katie Robbert 31:57
But that’s yeah, that’s, I guess, whatever you were just looking at. But so there’s the ChatGPT. There’s the chat interface that we’ve all grown used to. And then there’s the playground, but I could build a prompt the same way that I would in the ChatGPT chat in the playground and get the results I’m looking for by using my own model.

Christopher Penn 32:17
Exactly. So let’s it actually turns out, it’s got a ping saying that the Da Vinci model is now available. Right? So it says Use your DaVinci Trust Insights model.

Katie Robbert 32:27
So lets you rename those if you have like multiple you can you have to do to the API. Okay. No, I was just but more so just like out of a user. So let’s say we had, you know, 10 people that we built language models around just sort of wondering from a usability. Could we renamed them to be like John’s model Chris’s model?

Christopher Penn 32:47
Yes. Yes, you absolutely could. So let’s do write a blog post about the importance of skills evaluation in people management. Let’s see what our new synthetic Katie is like.

Katie Robbert 33:11
Katie is a hot mess. So asthma is?

Christopher Penn 33:16
So this is? This is part of, let’s take a look here. Oh, you can figure out what, where it went off. So change the temperature? Just a bit.

John Wall 33:33
Oh, if it printed HTML to the browser might freak out. Oh, there you go. No, you seem to dial it back.

Christopher Penn 33:38
Well, it’s so it there’s. So again, there’s a data quality issue here. Right, the data quality issue is that there, even though we removed the HTML, it didn’t remove the links from the text, right. And so this model, this fine tune has all of those links embedded in there and is just has no idea what to do with them. Because there’s so many links in the text that remember, a word is known by a company, it keeps the association of all these different words to so many URLs, because we link to our stuff all the time. Right? We’d have to now go back into all this text and rip out all those URLs in the plain text.

Katie Robbert 34:18
Man, I was so excited to see what I look like as a computer. But per usual, I’m just a mess.

John Wall 34:26
Well, I thought it was gonna be awesome. You’d like to hear this next week’s blog post done, but

Katie Robbert 34:29
I was so excited but no, not so much.

John Wall 34:33
Bound by technology again. Wow, wow.

Christopher Penn 34:37
Yep. And this process also underscores like, hey, it underscores the super importance of that, that data cleaning, but be this is one of those things where you might want to know if you’re a shop that’s going to specialize in this like, Trust Insights. You will want to have a private instance or a private service setup to do your tests. because this just cost us $9 For an unusable model, right? In terms of what spitting out that which, again, not a huge deal, it’s still $9, fewer than we had with at the start of the hour. Running this on, like your computer with a smaller model will at least get let you do the diagnostics, and they make the easy screw ups first.

Katie Robbert 35:19
Right, but So $9 here, $9, there another $9, it adds up pretty quickly, if you’re not doing that work upfront to really clean the data, which is, you know, so there’s, you know, if you bring up, Chris, that slide with the four components, the prompt engineering, the prompt deployment, public and private models, those are the general services that you can offer and accept with all of this generative AI right now. But what’s missing from this? And Chris and I were talking about this earlier, is that upfront, sort of governance piece, that management piece of it, that’s really going to help you understand like, what do you even What are you even working with, before you get to this stuff? You know, how much content do you have? How clean is it? You know, Is it usable? Is it relevant? So we just, you know, we went through a very quick example of, you know, we looked at all the blog posts, we said, you know, cut out the so what blog posts, but we didn’t do a deep enough dive to see there’s too many URLs for the model to read it correctly, and give you an output. So we have to go back and start over again, and cost ourselves another $9.

Christopher Penn 36:37
Exactly. I asked it to simplify as a write a post about B2B marketing. Alright. It goes through how do you know which platforms you should use? So there’s that you can see it’s, it is doing what we told it to do. It is drawing from our content, right? There’s, there’s this the phrase that you use every single week in the newsletter, right, which is part of what’s been ingested as the free slack group analyst marketer. So this is very clearly us. This is very clearly the Trust Insights blog. This is do a clearly the things that we talked about. And you could see, we didn’t have to do a whole lot for the prompt, because it’s fine to draw data. But right, it’s still a mess.

Katie Robbert 37:18
It’s still a mess. So my It’s okay. Yeah.

Christopher Penn 37:26
Not doing a great job of saying

John Wall 37:30
no, like that. Yeah, that is me your conversational style, though. Your sentence structure there and did totally grab that.

Katie Robbert 37:37
Well, and that’s, that is a really good call out, John. Because when I write, I try to write in smaller paragraphs more readable, you know, break it up, but it looks like it’s doing the same thing. I do, we have to save that sentence, because even machine is having trouble focusing.

Christopher Penn 37:59
And you can see, you know, from the experiments we’ve done, this, much more captures your tone of voice than even a really long prompt, right? This is very much Katie’s voice, it’s just needs a lot more tuning.

Katie Robbert 38:14
It does. And which makes me really excited for you know, Katie GBT version two, when we get all of those URLs out of there, because I know from being the one who writes those posts, they’re all roughly about between six and 800 words each. So there’s a lot, there’s a decent amount of content to work with, in each of those posts, once we get the rolls out of there.

Christopher Penn 38:42
Exactly. So that’s, that’s the process. And I’m glad to see that this, this, this tuning went faster than I thought it would, of what it looks like to deploy a fine tuned a large language model. Now, again, you may depending on the your data types, wants to run this privately, or in Google colab, maybe, which is a private security environment. If you’re working with sensitive information, you have to use a private model. And if you want to be at all responsible and adhere to many, probably NDAs you’ve signed, you’ve got to do it that way. But you can see that you know, the technology it’s not difficult to deploy a fine tuned model. It’s not like arcane surgery. The hard part is that data cleaning

Katie Robbert 39:32
Well, I mean, to be fair, though, you know, you were showing a lot of technical steps that are second nature to you. It’s basically the language that you speak. If you said to me, okay, here’s the steps, you know, here’s your you know, terminal prompt and here’s this and here’s that like, I would struggle with it so I wouldn’t necessarily call it you know, easy peasy lemon squeezy. I would say it’s gonna take me, you know, some tears and throwing a couple of things and then begging you to do it for me.

John Wall 40:03
That’s a normal data load. Like that’s,

Katie Robbert 40:07
that’s normal day. And I’m like, Yeah, I know,

John Wall 40:08
even for developers, I’ve seen many a developer cry, you know, trying to get a file, and I’m good to go, I’m gonna gather up all my medical records and credit report will fire right into the model?

Christopher Penn 40:24
Yep. And if you think about it, too, this also underscores how difficult it is to build a solid, fine tuned model, if you don’t have enough data, right? You we talked about using this for analytics, being able to give a model some data and say, write some conclusions? Well, it’s really hard to do that if you’ve only got 50 analyses, if you’ve only done 50 reports, which is a lot for a person to do. But for a machine, it’s like, that’s not really enough to work with, if you got 500 to 5000, that would be better. And you’re like, I don’t think I’ve written 5000 reports in my life.

Katie Robbert 40:55
Well, and if you think about other examples of machine learning that we’ve talked about in the past, you know, that’s that same example, for predictive forecasting. The more data that you have, the better your predictive forecast is going to be. But if you only have, you know, a small sample, like, you may think, a year’s worth of data, well, I collected a year’s worth of data, well, that one year doesn’t really get into any of the seasonality or anomalies that you could expect. And so having two years, five years, 10 years is really going to be more useful. So it’s the same with fine tuning these large language models, the more content, the more data, the more information you can give it to train on, the more representative it’s going to be.

Christopher Penn 41:39
Exactly. So that’s fine tuning large language models. The process, which is sort of stages three and four, of deploying these things to be as useful as possible, you can see from what we talked about today, it’s not something that you can just throw over the wall and hope it all works out. The machines are not smart enough for that. And if you’re doing it and you would like, some help with it, or your you just watched this and said, I’m not doing this, No way. Why right? Why am I doing this? You’d like to help with that. We’re happy to help out with that.

Katie Robbert 42:16
You can talk to our chief statistician, John Wall.

John Wall 42:19
I’m ready. Two standard deviations to go. Oh,

Christopher Penn 42:27
boy, any parting words?

Katie Robbert 42:30
I want to know if you guys are going to create a marketing over coffee, large language model and just have the AI start generating your podcast episodes?

John Wall 42:38
Oh, that’s a good question. At what point has there been any talk? Will we ever be able to upload audio files or we still stuck like having to generate transcripts to do stuff?

Christopher Penn 42:46
For the time being? It’s text based? The GPT-4 API can handle images but no, no audio yet? Yep.

Katie Robbert 42:58
All right, well, let’s go. Let’s go clean up the mess. That is Katie GPT-3 and get her working. Someone’s got to work around here.

John Wall 43:07
I Anna Katie getting down to business for us. Sounds good.

Christopher Penn 43:12
I’ll talk to you all next week.

Speaker 2 43:15
Thanks for watching today. Be sure to subscribe to our show wherever you’re watching it. For more resources. And to learn more. Check out the Trust Insights podcast at trust AI podcast and a weekly email newsletter at trust Got questions about what you saw on today’s episode. Join our free analytics for markers slack group at trust for marketers, see you next time.

Transcribed by

Need help with your marketing AI and analytics?

You might also enjoy:

Get unique data, analysis, and perspectives on analytics, insights, machine learning, marketing, and AI in the weekly Trust Insights newsletter, INBOX INSIGHTS. Subscribe now for free; new issues every Wednesday!

Click here to subscribe now »

Want to learn more about data, analytics, and insights? Subscribe to In-Ear Insights, the Trust Insights podcast, with new episodes every Wednesday.

One thought on “So What? Fine-tuning large language models (LLM)

Leave a Reply

Your email address will not be published. Required fields are marked *

Pin It on Pinterest

Share This