INBOX INSIGHTS, April 26, 2023: Large Language Models and Fine Tuning

INBOX INSIGHTS: Large Language Models and Fine Tuning (4/26) :: View in browser

Inbox Insights from Trust Insights

👉 67 days until Google Analytics’ Universal Analytics shuts down. Take our GA4 course to get going now on Google Analytics 4

Look Before You Leap Into a Large Language Model

Are you thinking of introducing a large language model, like ChatGPT into your organization?

I’m here this week to be Captain Buzzkill. Why? Because you’re probably not ready.

Let’s take a step back and review the foundational pieces that need to be in place first, before you’re ready for Artificial Intelligence.

The best place to start is at the bottom, with your data. Specifically, your quantitative data. This is the data that tells you what’s happening on your website, how many people downloaded your latest ebook, and how many newsletter subscribers you have.

My first question: How confident are you with your quantitative data? Click on your answer and we’ll summarize the data in a future issue.

I want you to take a minute and give an honest answer. If you’re not strongly, or even somewhat confident in your quantitative data, you’re not ready for AI.

To be confident in the quantitative data means a few things. First, you know where all the quantitative data lives. It can be in various systems, or all funneling into one place (like a data warehouse). Next, you know that the data is clean. If you had to pull the data right now, you’d be able to do so and know that it was correct. Third, you can make decisions with the data.

Think I’m full of bologna? We also need get a handle on your qualitative data, the behavioral data. This is sometimes numeric, sometimes not. Regardless, it’s more complex to collect and analyze than your quantitative data. If you don’t know why people read your blog, or come to your website, you don’t have a handle on your quantitative data. If you put out a video that goes viral and you can’t say for sure why, you need to go back to the beginning.

Let me put it another way. Let’s say you get the green light to start using a large language model with your marketing team. Everyone is super excited that you can produce content at 100x the rate you have been. This will be fantastic for your SEO and content marketing. Right?

Well, remember, I’m Captain Buzzkill today.

Some organizations are looking to build their own large language models because of the amount of content they have created over the years. It sounds good in theory, training a model on all your content so that it learns your voice, your tone, your quirks.

But what if not all your content hits the mark? What if some of your content has never been seen by anyone?

My questions for you: Do you have a good grasp on where the different pieces of your current content work in the customer journey? Do you know which pieces of content convert? Which pieces drive awareness? Which pieces have fallen flat?

If the answer is no, my next question would be: How long would it take you to find the answer. And to that, how confident are you in the answer. Confident enough to make financial decisions with your data?

This is the crux of the issue. Many companies are eager to get started with AI. I get it, it’s cool, has a lot of potential, and every one is doing it. No one wants to be left out.

The challenge is that the majority of companies aren’t ready to introduce AI in a way that will succeed and sustain. Companies are making hasty, sometimes panicked, decisions about how to bring AI into the fold. They are using it as a bandaid to cover up their people (skillsets) and process (data collection) issues.

All to say, a large language model can be a game changer with your marketing. But before you take the leap, make sure you have a parachute.

Are you bringing a large language model into your organization?

Reply to this email or come tell me about it in our free Slack Community, Analytics for Marketers.

– Katie Robbert, CEO

Share With A Colleague

Do you have a colleague or friend who needs this newsletter? Send them this link to help them get their own copy:

Binge Watch and Listen

In this week’s In-Ear Insights, Katie and Chris answer the big question that people are afraid to ask for fear of looking silly: what IS a large language model? Learn what an LLM is, why LLMs like GPT-4 can do what they do, and how to use them best. Tune in to learn more!

Watch/listen to this episode of In-Ear Insights here »

Last week on So What? The Marketing Analytics and Insights Livestream, we discussed how AI can improve your podcasting program with a generative AI bakeoff. Catch the episode replay here!

This Thursday at 1 PM Eastern on our weekly livestream, So What?, we’ll be discussing fine-tuning large language models. Are you following our YouTube channel? If not, click/tap here to follow us!

In Case You Missed It

Here’s some of our content from recent days that you might have missed. If you read something and enjoy it, please share it with a friend or colleague!

Paid Training Classes

Take your skills to the next level with our premium courses.

Free Training Classes

Get skilled up with an assortment of our free, on-demand classes.

Data Diaries: Interesting Data We Found

In this week’s Data Diaries, let’s expand on Katie’s opening by digging a bit deeper into fine-tuning a large language model. Services like OpenAI, as well as open-source projects like the Hugging Face libraries, offer technically-savvy developers the ability to customize a language model to do more of what we want it to do. We talked about this and why you might want to do this in the April 12 issue of the newsletter, so today, let’s look at the step-by-step process of how to do it.

We’re going to work within the context of OpenAI’s architecture, though you’re welcome to use the model and platform of your choice. Some models’ fine-tuning processes are more elaborate than others, so be aware of the technical requirements before you begin by reading the supplied documentation for any given model. If a model doesn’t have robust documentation, probably don’t use it.

You’ll start by installing OpenAI’s command line tools for use in whatever platform you’re on (Windows, Mac, Linux, etc.). Once the tools are installed – which require a current edition of Python 3 – you’ll move onto preparing your data.

It’s at this point that most AI projects go off the rails. To prepare the data, we have to know what data we’ll be fine-tuning with, and to do that, we have to know what we’re trying to accomplish. In the Trust Insights 5P framework – Purpose, People, Process, Platform, Performance – this is the first P, purpose. What exactly do we want this model to do?

Think of fine tuning like training a dog. If you get a dog and you rigorously train it to do, say, explosives detection, that dog excels at that particular task. However, the dog will then be very poor at sniffing out illegal substances, or sniffing for survivors in an earthquake zone. The dog’s brain has been fine tuned for a very specific task to the detriment of other tasks. That’s the difference between a fine-tuned model and a general model like GPT-4 (and why GPT-4 is so much larger and more expensive, because it has to be moderately good at everything).

So, what do you want your model to do? Generate text? Answer questions? Do sentiment analysis? For the purposes of this exercise, let’s say we want to tune a model to write more blog posts, blog posts specific to Trust Insights. How would we go about doing that? We’d start by selecting examples of what we’ve already created – in this case, from our blog:

Post data

Now, to what Katie said, you can’t just dump this raw data into a large language model. Yes, there’s a lot of data, but is it any good? Is content we wrote five years ago a good example, a good baseline for how we write today, or is it like those embarrassing photos your parents used to trot out of your five year old self in cringey Halloween costumes?

The data is also unclean. Even a cursory glance shows HTML markup in the text itself, and that’s not something you want to pass to a language model, lest it start trying to do that itself. So the next step would be to take that data, filter it to maybe the last 2 years of writing, and clean up the content to be plain text. For the sake of illustration, let’s proceed with trying to load the poor quality data to see what happens with the understanding that this is literally the worst possible practice:

Fine tune data clean and load

The fine tuning tools offer a rudimentary amount of screening and suggestions, then prepares the dataset.

A quick inspection of the processed file shows that OpenAI has made no effort to clean the source data – it’s still littered with HTML markup:

Fine tune file

Loading this into OpenAI right now would be the height of folly because of the terrible results it would generate, so let’s go ahead and do that. Again, worst practices, friends. Worst practices.

Fine tune load

The fine tuning process itself costs money; in this case, it’s about $7 for it to learn over 500 blog posts. Once it’s done, the model will be available in the OpenAI API as a custom model for us to use.

That’s the process of fine-tuning. The technical part of the fine-tuning itself is very straightforward; the part that’s exceedingly difficult is cleaning and preparing the data. Done well, you’ll have a powerful model purpose-built to do a specific task better than the big public models. Done poorly, you’ll have spent a lot of money and time on a model that won’t do what you want, so be absolutely clear in your purpose and over-invest in the data preparation time so that the rest of the project succeeds.

Now, if you’ll excuse me, I have a fine-tuned model to delete.

Trust Insights In Action

Job Openings

Here’s a roundup of who’s hiring, based on positions shared in the Analytics for Marketers Slack group and other communities.

Join the Slack Group

Are you a member of our free Slack group, Analytics for Marketers? Join 3000+ like-minded marketers who care about data and measuring their success. Membership is free – join today. Members also receive sneak peeks of upcoming data, credible third-party studies we find and like, and much more. Join today!

Blatant Advertisement

We heard you loud and clear. On Slack, in surveys, at events, you’ve said you want one thing more than anything else: Google Analytics 4 training to get ready for the July 1 cutoff. The newly-updated Trust Insights Google Analytics 4 For Marketers Course is the comprehensive training solution that will get you up to speed thoroughly in Google Analytics 4.

What makes this different than other training courses?

  • You’ll learn how Google Tag Manager and Google Data Studio form the essential companion pieces to Google Analytics 4, and how to use them all together
  • You’ll learn how marketers specifically should use Google Analytics 4, including the new Explore Hub with real world applications and use cases
  • You’ll learn how to determine if a migration was done correctly, and especially what things are likely to go wrong
  • You’ll even learn how to hire (or be hired) for Google Analytics 4 talent specifically, not just general Google Analytics
  • And finally, you’ll learn how to rearrange Google Analytics 4’s menus to be a lot more sensible because that bothers everyone

With more than 5 hours of content across 17 lessons, plus templates, spreadsheets, transcripts, and certificates of completion, you’ll master Google Analytics 4 in ways no other course can teach you.

If you already signed up for this course in the past, Chapter 8 on Google Analytics 4 configuration was JUST refreshed, so be sure to sign back in and take Chapter 8 again!

👉 Click/tap here to enroll today »

Interested in sponsoring INBOX INSIGHTS? Contact us for sponsorship options to reach over 22,000 analytically-minded marketers and business professionals every week.

Upcoming Events

Where can you find Trust Insights face-to-face?

  • B2B Ignite, Chicago, May 2023
  • ISBM, Chicago, September 2023
  • Content Marketing World, DC, September 2023
  • MarketingProfs B2B Forum, Boston, October 2023

Going to a conference we should know about? Reach out!

Want some private training at your company? Ask us!

Stay In Touch, Okay?

First and most obvious – if you want to talk to us about something specific, especially something we can help with, hit up our contact form.

Where do you spend your time online? Chances are, we’re there too, and would enjoy sharing with you. Here’s where we are – see you there?

Featured Partners and Affiliates

Our Featured Partners are companies we work with and promote because we love their stuff. If you’ve ever wondered how we do what we do behind the scenes, chances are we use the tools and skills of one of our partners to do it.

Read our disclosures statement for more details, but we’re also compensated by our partners if you buy something through us.

Legal Disclosures And Such

Some events and partners have purchased sponsorships in this newsletter and as a result, Trust Insights receives financial compensation for promoting them. Read our full disclosures statement on our website.

Conclusion: Thanks for Reading

Thanks for subscribing and supporting us. Let us know if you want to see something different or have any feedback for us!

Need help with your marketing AI and analytics?

You might also enjoy:

Get unique data, analysis, and perspectives on analytics, insights, machine learning, marketing, and AI in the weekly Trust Insights newsletter, INBOX INSIGHTS. Subscribe now for free; new issues every Wednesday!

Click here to subscribe now »

Want to learn more about data, analytics, and insights? Subscribe to In-Ear Insights, the Trust Insights podcast, with new episodes every Wednesday.

One thought on “INBOX INSIGHTS, April 26, 2023: Large Language Models and Fine Tuning

Leave a Reply

Your email address will not be published. Required fields are marked *

Pin It on Pinterest

Share This