So What? How To Get Started With Small Language Models

So What? Marketing Analytics and Insights Live

airs every Thursday at 1 pm EST.

You can watch on YouTube Live. Be sure to subscribe and follow so you never miss an episode!

In this episode, you will discover how to run powerful AI models completely on your own machine, keeping your sensitive data private and off the cloud. You will realize substantial cost savings compared to expensive Large Language Model APIs while getting lightning fast results for many common tasks. You will learn which specific models excel at practical applications like summarizing transcripts, classifying documents, or writing code offline.

Watch the video here:

So What? How To Get Started With Small Language Models 📱

Watch this video on YouTube

Can’t see anything? Watch it on YouTube here.

In this episode you’ll learn:

The difference between local AI models and small language models
Why you would choose a small language model
How to get started installing small language models locally or in the cloud

Transcript:

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for listening to the episode.

Christopher Penn – 00:30

Welcome to so what, the Marketing Analytics and Insights live show. Chris and John here. This week, Katie is off searching for yet another real life Law and Order: SVU episode. For those who don’t know the reference, last year Katie was making her annual pilgrimage to New York City and saw of course, the event with Luigi and everything that happened there. And so she’s back this week in New York pursuing that.

This week we’re talking about small language models, the little siblings to large language models, the kind that power ChatGPT, Google Gemini, Claude, etc. So before we get started, John, what do you know about small language models?

John Wall – 01:28

You know, it’s everything that you’ve thrown out in previous episodes is stuff we’ve kicked around. I mean, but I love the idea that instead of firing your data off to who knows where to be used for who knows what, there’s a lot of these models that you can just install on a local machine and you get the same functionality, but you get to hang onto the data.

But the big thing for me, I get that and have played around with that a little bit as far as setting up a model and just having it work. But I really want to hear about what you’ve got going as far as, okay, so you’ve got it set up, but now what do you do? How do you plug it in?

John Wall – 01:59

I can see, yeah, it’s great that you can now work on your desktop and not have to worry about divulging private info or something going horribly wrong. But I would love to hear more about, okay, so you’ve got it set up on your machine, here’s how to plug it into some other stuff to make it more useful. That’s what I’m really hoping we can dig into a bit.

Christopher Penn – 02:20

Nice. So let’s clarify. There’s a difference between the small language model and a local AI model. Local models run in your infrastructure, so on your laptop, on your phone, in your server room. They don’t have to be small. So for example, if you have $50,000 worth of hardware, you can download DeepSeek Deep Sea 3.2 and run it in your infrastructure completely privately. Safe, but not small.

A small model, there’s no universal agreement as to what constitutes small. However, the general rule of thumb that I use is if you can run it on a laptop and not be connected to the internet, that is small. And again, that depends on the kind of laptop you have. If you have like the top of the line MacBook Pro that can run pretty decently sized models, not Deep Seek, but it can run the newest versions of Qwen 3, etc. If you have your average corporate laptop that can basically run a browser and Microsoft Office and that’s it, you are much more constrained.

Because all language models require memory. They specifically require video memory. And the more video memory your computer has, the bigger the models you can run. The exception to that rule is Macs. Macs don’t have separate video memory, it’s all in one. So however much memory your Mac has, after you’ve turned it on and turned on whatever you want to be running elsewhere, whatever’s left over, that gives you the budget.

So that’s the difference. Small language models can be also in the cloud. So there are some really good providers out there that allow you to run in their clouds.

Christopher Penn – 04:11

A couple ones that are pretty well known: Grok with a Q, nothing to do with Elon’s shop. This is a shop that is known mainly for having the fastest models. Where you run some of these models that are available in here, they run crazy fast. And they have things like Qwen 3 in here, they have Kimmy K2. Another one that is fairly well known is Deep Infra. This is a company that allows you to run, they have hundreds of different models, big and small, inside.

So if you wanted to use small models, you can do it on your hardware or you can rent someone else’s hardware. The reason you would rent someone else’s hardware is a company like Grok, for example, has a zero data retention API.

Christopher Penn – 05:05

So you could if you work at Acme Megalithic Corporation and they give you the cheapest possible piece of hardware that you can’t even run solitaire on it, you’re not running a local model on it. So if you want to use a small model, you would do it in the cloud.

And the reason you would run a small model is mainly cost at that point—cost and privacy. These small models, when you run them, they run much, much faster because they require much less hardware to run. That also means that they are much more sustainable in terms of environmental impact because they require less compute. If you can take a bicycle to the store and you don’t need to drive the Ferrari, you will save a little bit of carbon.

John Wall – 05:54

That’s interesting. So this small hosted model, it seems like that would be a better application if you were trying to build something. I mean, the idea that it would be up all the time, you don’t have to dedicate your own machine to it. What’s the remaining reasons then for wanting to run a small model locally? I mean, if privacy is a top concern, is mission critical, is that the—

Christopher Penn – 06:19

Yeah, privacy is a big one where you’re like, “We have data that cannot leave our network,” or “cannot even leave this machine”. That’s one reason to run it locally.

Another is cost savings, particularly if you’re doing R&D or you’re building like your first beta tests. You might say, “We’re going to use Meta Llama 4 Scout in the cloud eventually, but while we build and debug this thing, there’s no reason to run up a huge bill or a modest size bill when we can run it locally”. It’ll be slower, but while we work out all the bugs of what we’re working on, there’s no reason to spend that money.

John Wall – 06:59

Very cool. All right. Yes, so that does kind of change the thing. I like the idea of if you’re kind of in the lab or in early development, yeah, you run it local. But then once you get to a point where you want to do maintenance and not have to babysit, I could see throwing it out to the cloud would be a great way to go.

Christopher Penn – 07:15

Yeah. The other thing is if you want to run offline. So a big one for me because I travel so much. Qwen 3 Coder is one of the best small models for coding. It’s still not as smart. Ethan Mollick said this recently: open models, which is the ones we’re talking about, small language models and local AI, typically open models, they lag behind state of the art by about eight months. So today’s Qwen 3 is about as smart as where, say, GPT 4.1 was back in April. It’s not the cream of the crop for a lot of tasks, but it doesn’t have to be.

When I get on an airplane, if I’m on an airline that shall not be named and the internet is either not available, highly unstable, or is like $280 for 15 minutes, whatever the case is, I’m just going to use the local model. I’m going to run a small language model on my machine to get some stuff done.

John Wall – 08:20

Yeah, that’s a great point too. I’m just presuming that internet and current are ubiquitous, which is not the case in a lot of places.

Christopher Penn – 08:28

Exactly. Yeah. Even if you’re going, we’ve all stayed in that hotel at an event that has internet speeds that rival 1999.

John Wall – 08:42

What is this? Katie is saying hello from the train. Okay, I’m okay if you’re on the train, you’re not interrupting tourist time, but well, safe ride on the rails there.

Christopher Penn – 08:55

Exactly. So our first four things are privacy, cost savings, offline capability, sustainability. Those are the reasons that you would consider small language models. And small language models can run on your hardware or someone else’s hardware. What makes them useful is that they are small so that they consume fewer resources.

Now, here’s the catch: the smaller a model is, the dumber it is in terms of knowing how much knowledge it contains. Because a lot of models, you take a model like Gemini 3—we don’t know how big it is, but you can hazard a guess. It’s probably like 3 to 4 trillion parameters, which is a bookshelf that goes of books that goes around the equator like eight times. Compare that to a model like Qwen 3, that’s an 80 billion parameter model.

Christopher Penn – 09:49

So it’s literally a hundred times smaller, which means it contains a hundred times less information in it.

John Wall – 09:58

How about as far as when you configure those, are you able to select what data you want, or it’s just, “No, this is what’s been released and what’s in there for this version”?

Christopher Penn – 10:07

For the purposes that most folks are going to be using these things for, you get what you get. There are a lot of what are called fine tunes where someone else has tuned it to do a specific thing. But for the most part, for the average business user, it is what it is. That brings us to: these work best when you bring the data.

So if you are using a small language model and you’re like, “Write a blog post about B2B marketing,” it’s going to give you such slop, such crap, because it has even less knowledge than ChatGPT. So it’s just going to come up with this very mealy-mouthed, boring AI. You see this a lot, all those annoying comments on LinkedIn that are 100% machine generated, those are on the back end.

Christopher Penn – 10:55

Those bots use small language models because of the cost. If you’re going to send a hundred thousand robo comments, you don’t want to be spending a dollar per comment with OpenAI. You’re like, “I’m going to use Llama 4 Scout and cost me a penny a comment”. And you go out and you spam LinkedIn and then we all hate you.

John Wall – 11:12

Right. Well, and then the thing with that is that you know that they’re doing that to get around all the fact there’s no API and they’re actively discouraging automation stuff. So that’s more criminal activity.

Christopher Penn – 11:27

Exactly. Now, if you want to run small models on your machine, the first thing you have to know is how much memory do you have available. If you’re on a PC, you have to know how much video memory you have available, and that’s usually in your Windows settings. If you’re on a Mac, you can open up your Activity Monitor and look at how much memory is free. That will tell you how much capacity you have.

Then comes the challenging part of figuring out which of the 2.3 million models should you use. There is a fantastic tool that I use, a website comparison tool called Artificial Analysis.

Christopher Penn – 12:24

We have no relationship with them, but one of the things they’ve got is the ability to choose from all these different model families. I’ve put up the biggest, most well known families: Alibaba’s Qwen, ByteDance’s Seed, Meta’s Llama, Mistral’s various models, Microsoft’s Phi, Google’s Gemma, and LG, the refrigerator company, LG’s XO1. This is sort of a comparison intelligence-wise about what these models are good at. And this particular site creates kind of a nice average of all the different tests. And so you can see here, we see Qwen 3, next, is sort of the newest version of this. It scores a 54 on their index. 65 on this index is a human PhD, just to level set. And 20 is like face rolling. So you see the very smallest version of Google Gemma face rolling.

Christopher Penn – 13:25

Microsoft Phi, face rolling, Phi for the regular version slightly above face rolling. Qwen, Alibaba, ByteDance, LG, and Mistral all score pretty decently. Now if I add in, just for comparison, if we add in regular GPT 5.1, which is the current version of ChatGPT, and I put in the non-thinking version—70 is where the smartest version of ChatGPT is today. So you can see there is a big difference compared to, say, Microsoft Phi, but at the same time, it’s not vastly different than Qwen 3.

So what I would suggest for people who want to figure out which model family to use, is a tool like this and say, “Here’s what we’re used to”. Knowing that we can’t do generation, knowing that we probably can’t ask fact based questions of the model itself, what’s going to be closest to what we’re used to, which in this case is either going to be ByteDance or Qwen 3. Those two, those are both Chinese companies. Qwen is made by Alibaba, the e-commerce giant, and Seed is made by ByteDance, the manufacturers of TikTok.

Probably worth mentioning at this point, when you use a small language model that is on your hardware or a provider of your choice, it is safe to use even if it’s made by a model maker that’s outside your jurisdiction. A lot of people have concerns about using tools, models like DeepSeek and stuff, because when you use their version of it on their infrastructure hosted in the People’s Republic of China.

Christopher Penn – 15:15

Yeah, their privacy policy says you have none. So be aware of that. When you take the model and you put it on your computer, it’s now not part of their systems and so it is then safe to use.

So once you know what family, then you have to go get one. And the best way, the easiest way to do that is to use some kind of hosting software. The one that I recommend for most people to use these days is a tool called LM Studio. LM Studio is a local AI system. It’s available for Mac and Windows, and it’s sort of an all-in-one that allows you to download models, run them on your computer, and then also use them if you want to use them for programming.

Christopher Penn – 16:01

What’s really nice about it is that you don’t have to go copying and pasting and finding model names, stuff like that. It’s much easier just to go into the tool and say, “Let’s see what’s available”. So we’re just looking at Artificial Analysis. We know that the Qwen family is probably going to be the best for what we want to be doing. And so we can see here’s all the recommended ones for my computer that I could use. And this connects to a website called Hugging Face. And so it has all 2.3 million models if you wanted them.

I’ve downloaded in advance to show, because these are big files. One of the things to pay attention to is how much disk space it takes up is about how much memory it’s going to take up.

Christopher Penn – 16:51

So you need to budget for that. If you’ve got 12 megabytes of video RAM and you look at, “Oh, I want this cool model. This looks great. 44 gigabytes,” well, you ain’t running that. It’s not going to fit. But it is nice and simple.

Now, once you have it downloaded, you can start to do—let’s start a new chat. Qwen 3 Coder is loaded. Now, this is a coding model, but it still can answer general questions. Like, let’s give it a terrible one: “What are the best practices for B2B marketing and account based marketing”?

John Wall – 17:31

Right.

Christopher Penn – 17:32

Terrible prompt. But you can see how fast it is. It spit out crap, but spit out crap really fast.

John Wall – 17:44

There’s no delay in serving up the slop.

Christopher Penn – 17:47

Exactly. If you think about this, this is why you’d run like the back end of a LinkedIn comment bot. Because if you got to make a hundred thousand comments, you need to be fast.

So this is one way to run these tools. Another way is if you’re building something, like if you’re using a tool like N8N, which we’ve covered on past episodes of the live stream. If you’re using N8N, you can use local models or small language models, either one, in there. So let’s say you went and got an account on Grok or Deep Infra. You could put that in your N8N and say, “Use this instead of Google Gemini or OpenAI”.

Christopher Penn – 18:34

And because the prices are so much cheaper and in some cases much faster, you can have a much more satisfactory experience with it because it is just so incredibly fast and inexpensive.

John Wall – 18:51

Yeah, the wheels are definitely spinning on that. I’m trying to figure out how to put those pieces together. So the idea is N8N does have it compartmentalized too. You can just pick and choose the models that you want the thing to dig into as it goes. And now, can you have N8N point to LM Studio though?

Christopher Penn – 19:10

Yeah, you can have it point to your local computer. You can have it point to a cloud provider, either one. So let’s say I wanted to, I’ll put a start a manual transcript here. Let’s go with what if we can get it to transcribe a podcast. Let’s give that a try. So let’s read a file. Got to read a file and call this users/pen/desktop/live stream. And that’ll be the directory we’ll read from. Let me go ahead and actually make that folder, and I’ll put an episode of Marketing over Coffee in there. So on my desktop we’ve got this folder.

And now let’s add in some AI. Use a basic language model and we’re going to choose.

Christopher Penn – 20:07

I already have Grok set up. And we’re going to use Meta Llama 4 Scout, which can listen to audio. So we’ll go back up here and we’ll give it a simple prompt that says, “Transcribe this audio”—terrible prompt, don’t do this for real. And then we should be able to say, let’s go to a write to the file. Let’s write our users/cspen/desktop/live stream transcript.md.

So what this will do is this will talk to Grok. It should read the file, and said, “I expected, I expected I need to convert this, convert to file, convert to plain text”. Let’s make sure that’s there. And I need to actually include the data, which would be helpful. We want the data.

Christopher Penn – 21:27

So what we should have here is our ability to bring this data in and give it— give the MP3. In any event, that’s how you would use one of those small language models. You would say, “I want to pass this to the model,” whether it’s running on your computer or whether it’s running in the cloud, and transcribe it. Now, imagine if you were to put all 900 episodes of Marketing over Coffee in that folder as MP3. It could process all of them in one shot and just give you a huge, painfully large pile of transcripts.

John Wall – 22:25

Being able to plug that in just saves you all the hassle of trying to figure out how to work it. Put it up into the cloud somewhere.

Christopher Penn – 22:33

Exactly. So that’s one use case.

Another use case that is super helpful, at least for me, is you can use these things in coding environments. So again, Qwen 3 Coder is the one that we’re referring to earlier. And just because it’s in a coding environment does not necessarily mean you always have to be writing code. You could be cleaning up transcripts and things like that. But they are also phenomenal coders. So pretty much everybody on the planet nowadays uses a tool called Visual Studio Code or a variant thereof. Way down in the bottom you can connect your small language model, either cloud or local depending on it. You can give it a project to work on and it can actually start to write code.

Christopher Penn – 23:22

If I go into my VS Studio Code, I bring in my live stream folder. I make sure that I’m connected to thing and let’s just do something silly. Let’s make an HTML and CSS infographic about the importance of B2B marketing—terrible prompt. Going to turn on plan mode and it should start talking to the small language model that we have locally. Yep, there it goes. It’s working. It’s thinking its way through it. And once it’s done with its thinking, it will start its generation. And so this is how you would do this on a plane. Here comes the plan, the design elements, and that looks good.

Do I want to focus on anything in particular? Let’s focus on ROI and measurable results.

Christopher Penn – 24:32

And now I can change into act mode and it will literally start writing the code for me to make a nice HTML infographic. Now when you use something like Google Gemini or ChatGPT, this is what’s going on behind the scenes. But you’re able to do this completely under your control. So if you want to make an infographic with sensitive data, you could do this and not have to worry about leaking that information.

A real simple example: you might have say notes from your personal medical history and you might want to put together a flowchart or a timeline of what’s been going on with you. And you don’t feel comfortable handing that to a cloud provider. So you might say, “You know what, let’s make this”.

Christopher Penn – 25:15

But you would use a small language model that is either safely cloud hosted or local. And you can still have all the benefits of that AI with none of the privacy concerns.

John Wall – 25:28

It’s still running. How long does something like this usually take? Is this something that should be able to crank out soon or is it?

Christopher Penn – 25:35

So it’s writing HTML, so it could be a little while because it’s having to think through what are all the pieces that it needs to assemble this.

Now if we go back while it’s churning away, if we go back to what we’re talking about earlier about small language models, we talked about the fact that they’re processors. One of the big things is that they’re good at five of the seven use cases. They’re good at extracting information. So if you can give them information—hey look. Infographic just popped up.

John Wall – 26:10

It just fired out.

Christopher Penn – 26:11

It just fired out. I don’t hate that. It’s not bad.

John Wall – 26:16

Yeah, no, that’s a solid webpage, definitely.

Christopher Penn – 26:18

Yeah, it’s even got little interactives and stuff like that. The cost to us for that was zero because it was churning away on my computer. I might say, “Yep, hit run. That’s fine. You have to use the Trust Insights brand style guide”. Let’s see if we can get it to think that through.

Going back to our review of why these things even matter, if you’re providing data, they can do the first five use cases of AI really well. So they can take data and extract data. Let’s say, “Here’s some transcripts, extract out this information”. “Here’s some social media comments, classify them positive or negative sentiment”. “Here is a conference call. Summarize what happened in the conference call”. “Here’s a blog post. Rewrite it into a blog post using only emoji”.

Christopher Penn – 27:17

“Here’s a whole bunch of small data pieces. Aggregate them and create a paper from large pieces”. In all these cases, you’re bringing the data and then they are acting on it. So they’re very good at that.

Where a lot of people like to use generative AI is asking questions, like, “Hey, I’ve got this situation, what should I do about it”? These small language models, because of their size, do not have enough knowledge. So they’re going to give you wildly wrong information that is not going to be helpful. And the same is true for generation, for making new stuff, like, “Write a blog post about this thing”. Out of the box, they have no ability to know what is true or not. So out of the box, do not use them for that.

John Wall – 28:03

Right. But it just seems like this is a no-brainer for if you have huge piles, like you said, transcripts or surveys or 500 PDFs of books that you want to churn through. Like anything like that—the summarization and categorization is where you can really make it work.

Christopher Penn – 28:20

Exactly. One of the use cases that I love small language models for is classification and renaming. So I use the Meta Llama 4 model, which has multimedia support, to load images, like all my entire library of screenshots on my computer that are all named screenshot 20 25 12 whatever. You have no idea what’s in a screenshot. I say, “Look at this and rename these thousand screenshots into something that’s actually in the image so we know what the heck it is”.

That’s a simple use case. Another one is academic papers. When you download academic papers from places like archive.org that come in with, you know, 2510 1174 V1 as the name of the file, you’re like, “That’s totally unhelpful”. What if you just take this PDF and you name it what the name of the paper is?

Christopher Penn – 29:17

Any of these small language models can take the PDF, if they’re capable of working with it, read it, find the title, and say, “I’m going to rename this file”. And because they have access to the local file system, they can do that. And suddenly your PDF folder, which is filled with just tragic amounts of unnamed stuff, is usable again. So any application where you want to be cleaning stuff up and just doing administrative things, so useful for that.

John Wall – 29:48

Yeah, cleaning up that pile of graphics is just like a job you’d never get to.

Christopher Penn – 29:53

Exactly. So this is now going through and applying the Trust Insights brand standards because we told it what it was, and now it can just go through and fix up the HTML, fix up the infographic, and make it look less super generic and more appropriate. Let’s see how it’s coming along here. Oh, look, it’s still working. But that’s already getting the right colors, it’s already getting the right fonts, so substantially improved there.

So the last thing is, which models should you use? The ones that I recommend out of the box: if you want a general all-around model that is also good at coding, Qwen 3 Coder is very hard to beat as long as you have the hardware to support it. It is such a good model. If you’re dealing with a lot of multimedia.

Christopher Penn – 30:43

Llama 4 Scout is, if you’re like, “Hey, I’ve got screenshots to rename,” that’s the model to use. If you want a model that writes really well, Mistral’s small model is a shockingly good writer. In terms of, you know how a lot of people complain, “Oh, you can tell ChatGPT wrote that, it’s got M-dashes and all that stuff”? Mistral as a literary model is quite good, and there’s some versions of it that are really good at things like fiction writing.

If you want to generate images, there’s a model called Flux 2 from Black Forest Labs that is absolutely phenomenal. It is almost as good as Google’s Nano Banana Pro, and can run on your computer. And if you want a model that transcribes the spoken word really well, Nvidia’s Parakeet is the best model.

Christopher Penn – 31:37

And Parakeet is so small it can run on almost any computer. So across the spectrum, that’s what’s available. If you want to see Flux 2 is just shockingly good. Let’s take a look at Flux 2 Pro. Let’s see, give me an interesting image prompt. Let’s see what it can come up with here.

John Wall – 31:58

Oh, let’s see. How about a robot with an umbrella in Chicago?

Christopher Penn – 32:03

All right. A Terminator style robot with an umbrella on the streets of Chicago in the rain carrying a deep dish pizza. Let’s see what it can come up with here. When you’re using both local models and cloud hosted models, you’ll note that these things can take inputs and unlike the big cloud providers, you can set the level of safety so you can say like, “Yeah, I want this model to generate unsafe outputs” should you want to. Like, you know what, that’s not terrible.

John Wall – 32:45

No, no, that’s not bad at all. I mean, they nailed the pizza. They got the Terminator.

Christopher Penn – 32:49

I know where the pizza box lid is, but.

John Wall – 32:53

Another vanishing thing.

Christopher Penn – 32:55

Yeah. And I gotta say, the streets do resemble Chicago. I mean, it’s your typical American thing. And the traffic light’s a little funky back there, but yeah, in general that’s not bad. And I like the fact that it put a leather jacket on it for no reason.

John Wall – 33:12

Right, right. The skins burned off, but somehow the leather jacket survived.

Christopher Penn – 33:16

Exactly. Now Flux 2 you can run on your computer, and you’ll notice something that is somewhat concerning about this output. There’s no watermarks on it, there’s nothing that says this is AI, other than the fact that like, you know, there’s still those little weirdnesses like the lettering on the pizza box is not quite right.

John Wall – 33:38

Right.

Christopher Penn – 33:39

The lighting is a little off. The pizza is really well lit.

John Wall – 33:45

Chicago paying their dues there.

Christopher Penn – 33:46

Exactly. Jon Stewart would be evolved. However, one of the challenges with local models and with small language models—any kind of model is you have complete control, which is good and bad. You have complete control, which also means you have to use it responsibly because it can generate things that the big models that have a lot more censorship built in won’t let you do.

So to recap, we want to use these things when we want privacy or cost savings, be able to run them offline, and be sustainable. Small language models are differentiated by their size. You can run them anywhere, in the cloud, you can run them on your computer, whereas local models are based on where the model is run. All small language models can be run locally.

Christopher Penn – 34:47

Not all local models can run on small hardware like DeepSeek for example, because you need thousands of dollars worth of stuff. The models themselves, make sure that you are providing the data. You always need to provide the data and make sure you have enough memory for them. And of the use case categories, the ones they’re really best at—extraction, classification, summarization, rewriting and synthesis. Question answering and generation can only be done with data you provide. You cannot ask them questions for things that you didn’t provide the information for.

And, you know, the panoply, the five different ones that we use. So with that, where are you going to get started, John?

John Wall – 35:31

You know, I would definitely want to mess around with the image generation because the baked in stuff in Adobe Photoshop has not really impressed me that much. It’s done a lot of weird stuff. So I would like to see if I can do some more interesting stuff on that front. And then, yeah, I don’t know. I think for most use cases I’m still okay going to the web. LM Studio is what I’ve played with before and would keep going with. But it seems like I’d better served by spending more time shining up my prompts than making sure that it’s accessible offline.

Christopher Penn – 36:08

Yeah, you can see by the way, it just popped up the infographic again. So it is now revised. It even got our logo in there, which I like. The labels on these bars are still a little funky and things. And 100% of this data is hallucinated. So bear that in mind if you’re like, “Oh, Trust Insights says that B2B marketing has a 4.2x”. This is completely made up by a small language model. Please do not believe this is factually untrue. But you can see that when you’re on that flight from Boston to San Diego and the internet’s not great, you can still get quite a lot done.

John Wall – 36:52

And, you know, you’ve got the page, the HTMLs laid out. It’s no big deal to go in there and set those stats to what they’re supposed to be. You’re just not having to go through the grunt work of getting that page laid out.

Christopher Penn – 37:05

Exactly. If you’d like a copy of the show notes infographic here, we’re going to put it in our free Slack group. Go to Trust Insights AI analytics for marketers and you’ll be able to grab a copy of this lovely thing. Any final parting thoughts?

John Wall – 37:22

You know, just dig in there and start breaking stuff. Play with it. Get to figure out what’s going on.

Christopher Penn – 37:28

Yeah. And again, remember, you don’t have to run small models locally if you just don’t want the hassle of it. You can sign up with providers like Deep Infra and Grok with a Q, it is pay as you go. I believe I put $50 in with Deep Infra, and even with generating images and stuff like that, I should see what is my billing here. I have currently used 3 cents. I have $48.69 remaining.

John Wall – 38:01

So you’ve got a balance?

Christopher Penn – 38:03

Yeah. You’ve used a dollar since you signed up with us like six months ago. Because that’s how cheap local models are.

John Wall – 38:14

That’s great. Yeah, that’s, it’s just not even entering the equation as far as headache.

Christopher Penn – 38:20

Exactly. All right, folks, that’s going to do it for this episode of So What. Thanks for tuning in and we will see you next week for the last and final So What of 2025 before we are on holiday break. So make sure you tune in. I have no idea if there’s going to be shenanigans, but there probably will be. Take care. Thanks for watching today. Be sure to subscribe to our show wherever you’re watching it. For more resources and to learn more, check out the Trust Insights podcast at TrustInsights.ai/TIpodcast and our weekly email newsletter at TrustInsights.ai/newsletter. Got questions about what you saw in today’s episode? Join our free Analytics for Marketers Slack Group at TrustInsights.ai/analyticsformarketers. See you next time.

Need help with your marketing AI and analytics?

So What? How To Get Started With Small Language Models

So What? Marketing Analytics and Insights Live

airs every Thursday at 1 pm EST.

In this episode you’ll learn:

Transcript:

Leave a Reply Cancel reply

Pin It on Pinterest