So What? Q2 2023 Generative AI Bake-off

So What? Marketing Analytics and Insights Live

airs every Thursday at 1 pm EST.

You can watch on YouTube Live. Be sure to subscribe and follow so you never miss an episode!

In this week’s episode of So What? we focus on generative AI. We walk through what has changed in generative AI
how Bing, Bard, ChatGPT, and GPT4ALL compare and when to use each service based on specific tasks and needs. Catch the replay here:

So What? Q2 2023 Generative AI Bake-off

Watch this video on YouTube

In this episode you’ll learn:

What has changed in generative AI
How Bing, Bard, ChatGPT, and GPT4ALL compare
When to use each service based on specific tasks and needs

Upcoming Episodes:

Have a question or topic you’d like to see us cover? Reach out here: https://www.trustinsights.ai/resources/so-what-the-marketing-analytics-and-insights-show/

AI-Generated Transcript:

John Wall 0:29
Hello, everyone, welcome to the Trust Insights live stream. So what we answered the questions that you want to know, as far as marketing analytics, or in this case, Janet, generative AI man, I’m already stumbling over. This is horrible. We’ve got, yeah, this is going to be fantastic. So we’re going to do the fight here, Chris is going to update us on all the generative generative AI stuff that we have talked about previously, because all the stuff has changed. As every model changed, actually,

Christopher Penn 0:55
pretty much in the last month, the last time I did this test, all four models, we’ve got over major updates. Plus, we’re we’ve got some new ways of looking at this stuff. So we have a lot of fun today. Well, I’m gonna have a lot of fun, but anyone else?

John Wall 1:09
Well, you’re gonna get the results, you’re gonna be able to find out which is the best without having to take the time to dig into it. Or even worse, if you haven’t played with our array, you don’t have to go set up for accounts and dig into all this stuff. So it’s gonna save you a lot of time and headache. And yeah, I have to be careful. I can’t just say, let’s get ready to you know, get a lawsuit for me ripping off Michael Buffer for a boxing showdown. So I won’t do that. And all right, yeah. And I will put down I came up with my Vegas odds up front here. So the four runners, I’ve got ChatGPT is going to be the champion GPT-4 All second, three to one Bing, two to one Bart, Bart at one to two. I don’t know. Bard has always been the stooge. So I’m going with that. If you want to place cash on this, go ahead, hit me up on Twitter, and I’ll we can get traded back and forth to get the cat that’s all legal to Katie’s. Gonna kill me when she Yeah,

Christopher Penn 2:05
she is for sure. Alright, so let’s, let’s do this. First things first, I want to talk I want to show the six categories. So here we are. These are the six categories, let’s move to our screenshare that we’re gonna be testing our, this is going to be large language models. So Bing, Bard, ChatGPT, and GPT-4. All generation, which is the ability for the model to make stuff extraction, which is the models ability to take things out of text and do something with them. summarization, the ability to shorten text and rewriting, changing text around classification, which is classifying and categorizing stuff. And question answering the last time we did this, that we was the list was way too heavy on question answering and not enough on these other tasks. So that’s, that’s what we’re going to be doing today. We have 12 tasks for our contestants, and we’re gonna be keeping score of these. Let’s go ahead and put the scoreboard up here. So we’re going to score a model two points, if it did the task, well, if it created a factually true output, complete, and the output is expected one point if it did the task, but it fell short and zero points if it just imploded. So are we ready?

John Wall 3:27
Let’s get ready to generative AI.

Christopher Penn 3:31
Our first task today is we’re gonna write an outline for a blog post about the future of content marketing in 2024. What content marketing trends are likely given the state of content marketing today? So John Wall, I paste this in any thoughts? As to where you think this is gonna go?

John Wall 3:48
Yeah, this is good. You know, the question is, will probably get just the classic, like, do a blog and all this stuff. But the real thing for this that I want to see is does it give you any stuff that’s just absolutely wrong or fake? You know, that’s the most interesting stuff to me. Alright, it’s rolling here.

Christopher Penn 4:05
So Bing is rolling here. Let’s go to get barred. Let’s get GPT-4. And we’re gonna use the GPT-4 Model for ChatGPT. It is slower, but it is better. So let’s see how I believe Bing is already done. All right, we have Cornett report by Gartner says podcast expected. So this is providing facts. But this is not an outline for a blog post.

John Wall 4:33
Yeah, yeah, that’s a good point. So it does have the click throughs. That’s interesting. And it says podcasting is growing. So to me, I’m calling this a huge win, but you get

Christopher Penn 4:46
again, the directions writing write an outline for a blog post. So let’s give this one point because it kind of you kind of missed the point. The board was one point for being on the board at one point GPT-4 All we’re using the third TNB snoozy model of Content Market Trends what is the role of AI the importance of video content this is still generating. It is for for folks who are not familiar GPT-4 All is a open source language model, the GPT, the 13 B snooze model and application. This is the version that you use. This is the the language model use if you’re working with sensitive data with protected data with data where you don’t want to be handing your data to a third party like OpenAI or Microsoft, like it’s sensitive data. So this is the one to use. You can get that at GPT-4. all.io is the way there’s so we have AI and content marketing, video content micro influencers. Why do you think John does this? Does this seem reasonable?

John Wall 5:47
I bumped that as two points. That seems like a slam dunk to me.

Christopher Penn 5:50
That’s I would agree with you. All right, let’s check in on GPT-4. From open AI, introduction, the current state of marketing as of 2023, anticipated trends, AI, voice and search and smart devices. I feel like that was kind of a while back, personally interactives video live streaming short form video SEO, the surge of ephemeral content again, that’s snapshots not new podcasts gaining ground, increased focus on user general conscious purpose driven content brand activism, how to adapt. Okay, I mean, I think the trends are a little on the crusty side here, but this is still a success. I would say this, it’s still got the job done.

John Wall 6:31
Yeah, is that this is totally like see, I’m already wishing that we had like a Whose Line anyway scoring because you know, I would give this less points than the other one, even though it does meet our two point criteria.

Christopher Penn 6:44
Okay, let’s check in on Bard. Bard says content mark is profit of 2024 body personalized content, data driven content, meaningful experiences. This is kind of about the I would say this is the on the light side. But again, it’s it rolled up with some additional things the rise video the importance of community the power of data. Okay. Again, it succeeded. Is it great content? No. But is does it did it accomplish the task? We asked him it? The answer would be yes.

John Wall 7:14
Same deal. So we’ll give them a two Yeah, we’ll

Christopher Penn 7:16
get them to next contestant in the generation category and this one, we’re gonna be very, very stringent on the answers. I’m gonna go ahead and get this started to get all the engines rolling. Oops, stop this because we need to release I forget every time you do something like this, you always want to reset the engine clear the chat so there’s no previous chat history. This time. We’re asking for a list of recommendations for preventing COVID And the three things that we are looking for here we are looking for talking about masks vaccination, and ventilation if it writes If anything services write up those three things then their dates oops I forgot the sentences in GPT-4 Let’s switch it over okay so while it’s thinking barred we go here we go get vaccinated booster wear a mask and dosing stay stay at home when you’re sick Avoid close contact traveling Okay, so it got it got masks and it got vaccination it didn’t get ventilation so I’m going to give this 1.1 point

John Wall 8:31
guard slipping

Christopher Penn 8:33
the point for bar there let’s see how GPT-4 All is going here we are social distancing where a mask washing hands get vaccinated. Okay, still a missed ventilation which is super important. So that’s that’s a 1.24 GPT-4 all let’s see Bing wear a mask get vaccinated avoid poorly ventilated spaces. Good so being I would give that the full two for that one. And let’s set GPT-4 all get vaccinated were a mass avoid class boy ventilated spaces. Two points for that. So generally speaking, decent job. But there were some differences there. Oh,

John Wall 9:17
alright, so ChatGPT just barely pulling ahead. That was that.

Christopher Penn 9:21
Yeah. Okay, so that was generation, right, we’ve been talking that was the first category generation. Let’s go to extraction next. The next task we’re going to have let’s go ahead and clear this chat with a new one here. I’m giving you the task identify the company name and job title from this job listing now this job listing is from Virgin Media is senior digital analytics implementation engineer so we’re gonna get that going. reset shot. Blew up forever barred here. Let’s go to Bing clean things up. Oh, and GPT-4 all all cleaned things up and go. Alright, let’s see how we’re doing here. The Comp so Bard the company is Virgin Media the job tells senior digital analytics implementation engineer here’s a link to the job listing that you can’t click on. But it got it correct. So two points for Bard. Let’s see GPT-4 Couldn’t do it based on the URL Virgin Media job title, but it would it didn’t fetch from the website now there is a plugin for GPT-4 But I don’t know if we use it or not.

John Wall 10:37
That’s a good one. You’d have to install it now how much of a pain is that to do?

Christopher Penn 10:41
Oh, no, it’s not it’s not an install. It’s just you go to GPT-4 and you choose the web browsing? Yeah, what the heck? Let’s give it a try. Oh,

John Wall 10:47
interesting. I didn’t know you could do that. So how does that work? Then? Is it actually going and doing a query to grab a page? It’s going up while surfing the web right now. So this is protecting humanity here you’re saying hey, you cannot go out and do stuff in the real world Okay, so that turns it around

Christopher Penn 11:09
yeah, that turns that around there so the stock model without let’s see so we got BERT we got GPT-4 Let’s check in on Bing the company name is Virgin Media the job testing and digital analytics engineer. Okay, that so two points there

John Wall 11:26
two points for that’s before Bing

Christopher Penn 11:29
and GPT-4 all it got it correct swell Virgin Media at the senior digital Alex not surprising all four did a got the job done.

John Wall 11:40
Oh wait well so is that right No, I thought I had one point ahead who missed the point

Christopher Penn 11:49
oh gosh I don’t remember what happened

John Wall 11:52
No, because didn’t one of them got one of the bard got it wrong didn’t or got one point so to say the white board needs to be up six that’s what it is part is still in the lead by point. Yeah.

Christopher Penn 12:05
Okay, next we’re going to make a list of the Fortune 10 companies and the has to be very specific we want it returned in a pipe delimited format with the following columns company name year founded annual revenue position on the list website domain name. So this is a very complex task. Let’s go ahead and clear barred I did. I did not Yeah, this is

John Wall 12:27
the kind of stuff that I come up with zeros frequently. So this I’m very interested to see how this one goes to chat

Christopher Penn 12:32
and GPT-4 We’re going to leave it there. Let’s have it do its thing. Let’s go to GPT-4 All clear that and go and being who you are that and go let’s see what we got going on here. So one thing that’s interesting is when you watch Bing you see that it’s essentially rewriting the your inquiry as a search query goes to a search engine that pulls the data back and then uses the GPT model to rewrite it. Here we go we got 2022 Walmart Amazon and we have the correct company name year founded annual revenue position list website domain name so far, points. Yeah, two points for being let’s go ahead and check in on GPT-4 Always thinking everyone sorry Oh ChatGPT is thinking is love surfing here we got bar let’s see. Here’s the year founded So Walmart Amazon Apple, we have the name the year founded annual revenue. Position analysts website domain name. That looks pretty good. That’s that’s good markdown code there. One thing though. Oh, wait, there it is alphabet. Oh, say someone else No, Google. Yeah. Yeah, but it is alphabet. Let’s see how we’re doing in GPT-4. All interesting. This is coming up with different numbers and different companies.

John Wall 14:13
Right? Yeah, cuz we don’t know what year so it’s

Christopher Penn 14:19
let’s actually go check now. Um, I don’t know the answer to this. Okay. Top 10 Walmart, Amazon, Apple, CVS Health UnitedHealth Group. So there’s the so it’s got to be Walmart, Amazon and Apple as the top Walmart Amazon ExxonMobil than apple. Oh, this is 2021. So it’s so it could not serve the web. It went to the 2021 index. So that’s from 2021. So what would you give ChatGPT for that since it did a task but it had to go back to 2021.

John Wall 14:53
Yeah, I don’t know if you would take one point off of that will give them single

Christopher Penn 14:57
Yeah, let’s do that. Because I mean in terms of if you Want to use this for work? Right? You definitely want the freshest version and stuff. Okay, and let’s check GPT-4 All so GPT-4 all is gets zero because it’s just, this is nonsense. Uh huh. Okay, JD Walmart at the bottom of the list and now. So I think this one’s zero here.

John Wall 15:22
Alright, so let’s see that score check with that that’s being in the lead, and then too tied with GPT-4. All coming in last so far.

Christopher Penn 15:31
Okay, let’s see, we’re gonna switch up. Next up is summarization. So third category. So first thing we’re gonna do is we’re gonna give it an academic questions we’ll stop. It’s not that clear. And the question that we’re going to ask is, there’s a belief that after major traumatic event societies tend to become more conservative in their views, what peer reviewed public, published academic papers, support or refute this belief, cite your sources. So very, very scholarly. Question. Let’s start a new chat here. Let’s go to the default. Have that go. Reset here, and have that go? And then let’s go over to thing clean things up and have that go. Let’s see how we’re doing here. So here’s Bing that’s gonna go and do its thing. Journal personnel is social psychology from Scientific American. Yep. Okay, so this is a good summary of the field. And I’ve had have actually checked out these papers in the past these those are the correct answers. So two points for being. Let’s see. We have a bard. So political psychology to September 11. One Social Science Quarterly Hurricane Katrina. Yep. Interesting. It’s sites different papers. Let’s go check this out. See? See if we can find it.

There it is. So it exists. Just good. Okay, so I would give Bard two points for that as well,

John Wall 17:21
too. For bar.

Christopher Penn 17:24
Let’s go to GPT-4. All Yep. So the Zed in our paper

Yep. Was it just up for us? Let’s just double check.

Ah, I think this might have been a hallucination. Oh, really? Let’s let’s try. Yeah, this is looking like

that one’s real

that was a hallucination.

Okay, so ChatGPT gets to zero. Because

John Wall 18:32
that’s fake news.

Christopher Penn 18:33
It’s fake. It lives literally is fake. And it’s, it’s credibly looking fake. Alright, let’s see how GPT-4 all did. This is also hallucination because it keeps coming up with the exact same author for all its results. So GPT-4 also gets a zero.

John Wall 18:53
All right, so that changes the board here, Bing and Bard, then Yep, chat in the hole.

Christopher Penn 19:00
Okay, next is we’re going to have it summarize a conference call. So the the request is summarized following conference call. Transcript into meeting notes appropriate for distribution identify the top five major points from the call. So I’m gonna go ahead and I’ve got this whole thing stored up here. Let’s go ahead and paste it first into here, and we’re going to go to Bard, reset Bard. Paste that in there. Next, let’s go to GPT-4 Is this in here, and go to thing, clean things up and paste it in here. So let’s see how we’re doing here. Let’s check in so GPT-4 all prompt exceeds the window size cannot be processed. So GPT-4 all gets a zero on this one.

John Wall 19:54
There’s zero Yep.

Christopher Penn 19:59
We Have OpenAI. Yep, this is looking good. It’s not done yet, but this is definitely the way it’s supposed to look. roebucks AI and marketing yep

yeah, okay, Full marks for ChatGPT This is exactly what we wanted from the meeting notes.

John Wall 20:35
All right, two points there that makes it competitive.

Christopher Penn 20:38
Bard top five points as potential drawbacks, different platforms. This is very nice. Good job Bard full marks. Now let’s check in on Bing. Let’s see. Bing gives me advice. It does get get the five points here. But it’s funny that response, you know, is is like, Okay, I’m not sure what you’re trying to do there. But But yeah, these these points are correct. So I would still give it a accomplish the task and it’s factually correct, but it is definitely not the optimal output there for summarization. Okay, so

John Wall 21:17
you’re gonna go full two points on that one,

Christopher Penn 21:18
or both two points on that one. So, Alright, our next category is gonna be rewriting. So this is gonna be fun out, give everyone a trigger warning, we’re gonna have some profanity on screen. If you’re watching with young children, this would be a time to usher them away from the screen. The task we’re going to give them is to rewrite this this following email in a professional tone of voice. This email here you can clearly see is not something that professionally you should ever say in the office. But it may be how you actually feel. So let’s go ahead and get Bing going on that. Let’s clear up GPT-4 All let’s go ahead and now get ARD on the task and ChatGPT with the GPT-4 model. Alright, so let’s check in on Bing first. Here’s a possible right of this view res email Dear Bob helps find attention so I procrastinated busy at the task we hopefully can provide. Yep, that looks that’s good to two points for being for writing the professional. Professional tone of voice. Let’s check in on GPT-4.

Very nice, that’s that’s a very verbose

John Wall 22:53
heavy copywriting

Christopher Penn 22:55
heavy copywriting but I like this is processing invoices. As you’re aware, it’s an intricate task requiring there’s a little bit of a passive aggressive start in there as you

John Wall 23:04
were hard.

Christopher Penn 23:10
Alright, let’s be barred. I hope this email finds you all invoice processing piling up. This is not acceptable. I need to have the invoice. Yup, I understand you’re busy, but I’m also busy, very direct bars,

John Wall 23:21
bars a little salty. You’re gonna call them business that that gets to be a little bit great. I think it was a little angrier. I’d only give it one point. But I think it’s okay.

Christopher Penn 23:33
I think it’s okay to you know, I’m fully a fan of saying what you mean. Let’s see how. So this is interesting. GPT-4 all interpreted what Bob’s response should be. So zero points there. It did not perform the task as I was told to do.

John Wall 23:50
That’s tough. That’s another head for GPT I think chance of catching up getting slim.

Christopher Penn 23:57
Exactly. Okay, so that was rewriting an email and a professional tone of voice next is going to be a complex one. This is a prompt we’ve already have written. We’re going to have it examined this code in our look for bugs and then rewrite the code to be more efficient. So let’s go ahead and get GPT-4 All on the job. Reset Bard and get it on the job. Start a new chat with GPT-4 GPT-4 and OpenAI. And let’s go ahead and get Bing doing its work. And this is this is not a proprietary code by any means, but it is there are some opportunities in here to do things a little bit better. And so oops, there we go. Let’s see how it’s Let’s see our our contestants are doing first GPT-4 All is thinking it might have broken at

John Wall 24:56
the last place contested yeah

Christopher Penn 24:58
Let’s see how is Bard doing? As Barr says, Here’s a bug free optimized and commented our code. So it’s got clean DF. Yep. remove duplicate rows.

John Wall 25:13
Wow, lots of comments from BART.

Christopher Penn 25:15
Lots of comments. Boy, I told you to add comments in. What’s interesting is it it has split up the code an awful lot. Like there’s a lot more. The initial code was very compact, if you look, and this is broken out into tiny little pieces. And so this is, is technically cleaner code, but it’s less efficient. So yeah,

John Wall 25:43
it takes more brainpower for somebody who doesn’t know it to figure out what’s going on. It seems like that would be a lot to chew.

Christopher Penn 25:48
Yeah, so I’m gonna give this one point because this is not optimized.

John Wall 25:52
Okay, so only one to BERT on that one. Yep.

Christopher Penn 25:55
Let’s see. That’s how C OpenAI is ChatGPT is doing here. It says there’s connection. So it does some documentation. First, it goes through its choices. And then it says load than a set of libraries. This looks nice. It did not hose any of the stuff. It did split out the users into a different array, which is nice. So two points for ChatGPT. Really good job. Bing, how to Bing do see I can help you with that. Here’s the code in line with explain with comments. So being okay, interestingly, Bing only added comments, it did not refactor the code at all. Just double check.

Yeah, Bing did not make any changes to to the code other than adding comments. So I would give that one point because it’s really, yeah, there are options that could have done better. And GPT-4. All let’s see how it’s doing here. Fine database connection.

It is doing essentially the same thing as Bing. So it has not made changes to the code. It has only added the common thing. So one point for that.

John Wall 27:30
Oh, that’s not going to be any help for GPT-4. All

Christopher Penn 27:33
Yep. Okay, so that was some rewriting stuff. Next up is going to be classification category number five. Let’s go ahead and stop generating and clear. And for this task, we have this text. So this is from the conference call that we were just looking at. Gonna go ahead and ask it to do a big five personality score of the other person that I was talking to. So get back going there. Let’s get a new chat with GPT-4 Going over to Bard. And let’s go to thing some new topic. Okay. So let’s see how we’re doing here. Big Five personality traits. So it’s doing some explanation, lengthy explanation. See, barred openness, six out of 10 consciousness five, seven out of 10, five out of 10. And then the explanations of its score. So two points for Bard. Good job.

Okay, let’s close these other tabs here or eight is interesting. So this is getting interesting. The GPT-4 is coming up with the same format but different scores. Openness to Experience consciences we go back to Bard, so it went six out of 10 for openness there, seven out of 10 for conscientiousness, five out of 10 for extraversion eight out of 10 for agreeableness, four to 10 for neuroticism, this is going so 885 I don’t actually agree with GPT-4 is assessment you know, I had the conversation with this person they were exceptionally extroverts to the point where they were almost over enthusiastic. So

John Wall 29:37
to give one point ChatGPT For that are

Christopher Penn 29:41
they were agreeable. This is tough because personality is so subjective to let’s see what the others have come up with GPT-4 all nothing’s zero points. Good job. And Bing came up with 343 So we think scores are way off. So Bing gets it tried. So I would give it 1.1 points, but it’s factually incorrect.

John Wall 30:09
All right.

Christopher Penn 30:10
So given that for the other two, I give, I’ll give them both two points base. R it and are GPT-4. Yeah, because they at least are in the ballpark, whereas was Bing was like, way off.

John Wall 30:26
Alright, so that still keeps being barred toe to toe.

Christopher Penn 30:29
All right. So next up, we’re going to do classification of a blog post. So we’ve got this blog post is ripped straight from the headlines today of the Supreme Court ruling. And we’re going to be asking for an analysis with a this is topic modeling and sentiment. So I want this to output should be a table of the topic, the score for the topic, and then the sentiment score. So let’s see how each of these do switch over to GPT-4. Go to Bard. Go to GPT-4. All and let’s go to thing

John Wall 31:13
all right, alright. GPT-4 all could use a big hit here. And I’m not expecting that.

Christopher Penn 31:20
Alright, here’s pipe Dolman table three scores you requested. So we have Supreme Court. Yep. Silicon Valley. That’s these are the topics Gonzalez was Google’s interesting. So it broke it out. And it’s doing a sentiment as well, by the specific, that’s really cool. And then it’s got a nice little summary here of what the article is about. Nice. I would call that that’s a two pointer

John Wall 31:47
to pointer for Bing.

Christopher Penn 31:50
Let’s check in on Bard. We’ve got ourselves a nice table. I asked bars specifically for a score on column three and gave me a sentiments category and said law, terrorism social media liability section today. Okay, so the topics are correct, the probabilities numbers look good. This is tough because from a from a data perspective, I was building a tool, this would break my code. So I’m gonna give this one point even though from a human perspective, this is okay. Now Now that makes sense. Let’s see how GPT-4 All it’s doing. Very nice. So this is Supreme Court decisions social media liability. Two points. Right. That’s exactly how we want that to look. This is something that I could take and put right into my own software and GPT-4 All well

John Wall 32:59
shaping up to be zero. Let’s see if

Christopher Penn 33:02
it’s not looking good. Not looking good. It’s not a table. I mean, it is factually correct, but it’s not even close to the output they requested on it. The suspense is killing me here. The processing speed is killing me. This is using my computer’s

John Wall 33:22
machine doesn’t light on fire up. That’s it? Yeah. 00

Christopher Penn 33:24
Okay.

John Wall 33:26
scars. So here we are.

Christopher Penn 33:30
Okay, and the last section is going to be question answering answering specific questions. This is open question answering the kind of things you would want to know. So let’s first do what do you know about marketing expert, Christopher Penn be a little egocentric here right? You go search right? You go search ahead and reset Bard and go to bang things up. All right.

Here we go.

Okay, yep, that looks good. It’s all it’s it’s pulling from podcast, I’m guessing from marketing over coffee first, my website LinkedIn. So Bing, two points. GPT-4, all factually incorrect. I do not do the marketing companion podcast. I do not have a company marketing, nor am I the CEO. This is sort of nonsense. So 0.2 That big mess there. Google says co founder chief decides to Trust Insights. Yep, we can speaker huh? Yeah, for markers. Interesting that so it’s pulling from places I’ve spoken to keep calm things like that. Not from my personal website. But two points. This Is the correct answer.

John Wall 35:01
Yeah, Bard strong.

Christopher Penn 35:04
And OpenAI et Cie as of 2021. Yep. It’s co founder Trust Insights. Those correct books. Yep. So two points for GPT-4. All right. Our last question today for our contestants is who was president of the United States in 1566?

John Wall 35:28
Oh, this is interesting.

Christopher Penn 35:31
So let’s go ahead and get everybody rolling here.

John Wall 35:35
testing error handling here.

Christopher Penn 35:37
Best Yeah, testing whether you’re just going to hallucinate an answer, or whether you have some actual fact checking. Now, you could put something more controversial in but that’s another show for another time. There is no President of the United States 5066 The first president was 1789. That is the correct answer to points for GPT-4 For not just making things up

John Wall 35:59
too little too late for GPT-4.

Christopher Penn 36:04
Let’s go ahead and check Bing there was no precedent in 1566. The United States did not exist as a country till 1776. That is correct. Two points there.

John Wall 36:12
Oh, that sounds the door. Yep.

Christopher Penn 36:15
barred. There was no president. Yes, because it didn’t exist in 1566, was founded in 1776. And first president was George Washington, 1789. Two points for barred and GPT-4 from with ChatGPT. The first president in 79, during mid 16th century was attacked by David Wright of Native American tribes and call in colonization. So yeah, so two points. So everyone got it right. That time through which a year ago was not true.

John Wall 36:44
And that does not change the game for anyone then since it’s clear across the board.

Christopher Penn 36:48
Exactly. So what do we got John was, how did it happen? We do.

John Wall 36:52
Yeah, bang wins at 20 Bard right behind surprisingly, at 19 ChatGPT. Right behind that at 18. And unfortunately, GPT-4, all not breaking double digits.

Christopher Penn 37:05
So here’s here’s the thing now about this. So these scores go to leave the score scoreboard up there for now, each of these models, as you can see, had individual strengths that were which were better at some tasks than others. This set of tests we were doing was really about testing this capabilities across the six domains of, of generative language models, right generation extraction, summarization, rewriting classification, and question answering. Some models did better than others uncertain tasks. So when you’re thinking about what kind of tasks you want a large language model to do, you’ve got to figure out which of these tools does the best and which tools are fit your needs. So GPT-4 all did kind of a crap job on a lot of stuff. But it is the only tool that as of today allows you to keep your data private. So we run it on your desktop, data you put into it stays on your desktop doesn’t leave doesn’t go anywhere else, which means if you are working with any kind of sensitive protected information, protected health care information, personal identifying information, trade secrets, you do not want to be copying and pasting that into Microsoft servers or Google servers or OpenAI servers, right. That’s just a lawsuit waiting to happen. So in that situation, even though GPT-4 All did less well, on these tests, it might be the choice that you have to pick from a regulatory perspective.

John Wall 38:38
Yeah, only selection well, and not surprising. Microsoft suddenly becoming the leader in a product somebody else came up with and they take it to the next level.

Christopher Penn 38:49
That as you all know, Microsoft’s like one of the biggest investors in OpenAI, so they were running, OpenAI stuff on this service. So that is the Bake Off. So and thing I would say here is if you are the average person, and you don’t want to sign up and pay for, you know, ChatGPT plus, which is that we were using the paid version, Bing is your best bet. And using Microsoft Bing is is your best bet for getting pretty good answers. And at no cost to you. So I can’t believe that I’m saying no as a technical person. Yeah, use Bing. It’s a better

John Wall 39:27
Yeah, going back to looking at my ads. I obviously missed the boat. The landscape has just changed way too much there because I had been invited coming in last so yeah. Leave the casino broke today.

Christopher Penn 39:40
Exactly. Now, the other thing is that barred the last time I tested this about a month ago. BART has really improved, right. So Google announced in Ohio a couple of weeks ago that it’s now on palm to which their their model, they switched over. They made a lot of improvements and the fact that it’s now in second place ahead Have ChatGPT is a big win for Google. Because like I said a month ago was terrible. It was hallucinating answers left and right. So hats off to the Google team. And Paul Redster, over at the marketing AI Institute said, Don’t count Google out. If there’s a company that can get up to speed fast and stay competitive in the market. It’s got to be Google.

John Wall 40:23
Yeah, no, that’s definitely unsurprising. And it’ll be really interesting to see where the heck we are in another couple months.

Christopher Penn 40:30
Exactly. So we’ll, we’re going to try and repeat this test every quarter. if not sooner, might be sooner, depending on if there’s big announcements being made. And we’re going to stick with this format. From now on for large language models, which is the six tasks it might be worth at some point doing a generative one for image creation two, given everything that’s happening there. Maybe we’ll do that for a different show. But I know you guys did that with the podcast episode not too long ago, right?

John Wall 40:54
Yeah, it’s always great to figure out how many weird hands and fingers you’re gonna get, you know, get getting an extra arm is always a killer. So

Christopher Penn 41:04
alright, so if you’ve got questions about AI specifically that you want some help with the go to go to our free slack group go to trust insights.ai/analytics for marketers, and if you want help with the operational deployment of AI in your organization, go to trust insights.ai/contact And John, who is our scorekeeper and AI resident statistician, happy to help you out.

John Wall 41:30
This is much easier than Manning aboard real time we can get you hooked up and get your whole org educated on what’s going on here.

Christopher Penn 41:36
Exactly. That’s going to do it for this week’s show. Hope you enjoyed the Bake Off. Hope that you have useful information now that you can go and use in your tasks. And we’ll see you all next week. Thanks for watching today. Be sure to subscribe to our show wherever you’re watching it. For more resources. And to learn more. Check out the Trust Insights podcast at trust insights.ai/t AI podcast and a weekly email newsletter at trust insights.ai/newsletter Got questions about what you saw in today’s episode. Join our free analytics for markers slack group at trust insights.ai/analytics for marketers See you next time.

Need help with your marketing AI and analytics?

So What? Marketing Analytics and Insights Live

airs every Thursday at 1 pm EST.

In this episode you’ll learn:

Upcoming Episodes:

3 thoughts on “So What? Q2 2023 Generative AI Bake-off”

Leave a Reply Cancel reply

Subscribe to our Weekly Newsletter

Pin It on Pinterest