So What How to Measure AI Visibility

So What? How to Measure AI Visibility

So What? Marketing Analytics and Insights Live

airs every Thursday at 1 pm EST.

You can watch on YouTube Live. Be sure to subscribe and follow so you never miss an episode!

In this episode, Katie, Chris, and John break down the actual data and frameworks required to track your brand inside large language models.

Mastering the right tools to measure AI visibility will give your marketing team an immediate edge over competitors. This knowledge will help you to get major search engines and enterprise assistants to cite your content. By implementing advanced server-side tracking, you will separate real human traffic from aggressive machine crawlers. This breakthrough framework will empower you to measure AI visibility with complete confidence across all major platforms.

Watch the video here:

So What? AI Podcast Editing: Level Up Your Workflow

Can’t see anything? Watch it on YouTube here.

In this episode you’ll learn:

  • Why AI visibility is so hard to measure
  • What AI visibility measurement
  • How to measure AI visibility with Bing Webmaster Tools and Google Search Console

Transcript:

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for listening to the episode.


Katie Robbert – 00:33

Well, hey, everyone. Happy Thursday. Welcome to So What, the marketing analytics and insights live show. I’m joined by Chris and John, and my dog is barking from the other room because it’s hot outside and she’s bored. How’s it going, fellas?

John Wall – 00:46

Yeah, trying to.

Katie Robbert – 00:48

It’s hot. Yeah, we’re all located in New England — I know, it’s hot all over, pretty much everywhere right now. So that aside, we want to do something productive. Today we’re talking about how to measure AI visibility. This is a hot topic. There’s a lot of unhelpful resources out there, and we wanted to provide you with a general framework of the different data you can look at in order to understand your AI visibility.

So on this episode, we’re going to walk through what you could be doing. But as always, if you have questions or you want to learn more, you can reach out and contact us at TrustInsights.ai/contact. If we get the correct banner — I’m going to blame the heat. So, Chris, where should we start today?

Christopher Penn – 01:42

Let’s start with what you can’t measure. You can’t measure what people are typing into chatbots, whether it’s AI Mode, AI Overviews, or ChatGPT, and anyone who says they can measure that is lying to you.

John Wall – 02:01

Red flag. Here we are, we’re only what, 32 minutes in, and we—

Katie Robbert – 02:08

You’re welcome.

Christopher Penn – 02:10

And you can’t even necessarily measure what’s in the model itself. I know folks have been sharing this one tool that supposedly somebody put together called “In the Weights,” and when you read about the methodology, it’s completely wrong. Not to go off on too much of a rant here, but today’s AI models are all what’s called mixture of experts models. They’re made to have huge amounts of data, and then internal miniature versions of themselves — they’re called experts — and we don’t know what they are. Neither do the model makers. But every time you ask something like, “What’s the best AI consulting company in the metro Boston area,” if you’ve got other things in the chat that the model interprets, it will route it through different experts.

Christopher Penn – 02:58

So let’s say, as we’ve talked about with things like AI Mode and AI Overviews that Garrett Sussman at iPullRank talked about recently — say in your Gmail inbox you just got a promotional set of emails from Patagonia. That context gets loaded with your search term, and that will invoke different parts of the model when it answers. So you can’t even say with definitiveness that Trust Insights has this much relative brand strength in this AI model, because different portions of it activate at different times.

The best analogy I can come up with is imagine you’re at a college dining hall — if you’ve set foot in one recently, they’re all really nice now, and they’ll have like 82 different food stations. You start off by saying, “I’m hungry, I want tacos.”

Christopher Penn – 03:48

And the person routes you to the taco station, right? That’s one of the experts. But suppose you said, “I’d like Korean food, but I want tacos.” They’ll route you to the Korean food station first before getting you to the taco station, and it might be Korean-inspired tacos at the Korean station, in which case you’ll never get to the taco station.

So if you have an AI tool or an AI company saying, “Oh, we know how you rank in the taco station,” you’re like, “Yeah, but I’m at the Korean station — that doesn’t help me.” So all this to say, you cannot know what’s inside the model itself with any level of confidence, and you cannot trust or believe anybody who tells you how your brand ranks in a model.

Christopher Penn – 04:29

Because there’s so much distortion in the way these models all function.

Katie Robbert – 04:35

I mean, now I’m hungry for Korean-inspired tacos, which sounds delicious.

Christopher Penn – 04:39

There’s a chain called Corianos here in MetroWest that does exactly that. They have bulgogi tacos, kimchi tacos — it’s amazing.

Katie Robbert – 04:46

Yes, please. But no — I think it’s important, I think that’s a good level-set for people on what you can’t do, because there’s a lot of misinformation. And I think a lot of analysts are struggling because their leadership is asking the old question: “Why aren’t we ranking number one in the search results?” Because that’s not what we have control over — that’s not what we’re looking at.

So I think one of the biggest pain points for analysts is trying to explain that a large language model is never going to give you the same kind of data you get from something like Google Search Console, in terms of “this is what somebody typed and this is how they found us.” Because it just doesn’t work that way anymore.

Katie Robbert – 05:35

But okay, so what can we do?

Christopher Penn – 05:38

So we can know three things. One, we can know how AI bots are hitting our sites and resources, so we can know essentially what they’re looking at from us — that is a knowable thing. Two, depending on the system, we can know what citation queries or grounding queries were used, because almost every AI tool does its own searches, and for some tools we can see what they essentially went searching for. And number three, which is the gold standard, we can know how people found us when they tell us — when we ask them, “How did you find out about us?” That is the gold standard. If you take away nothing else from this entire episode, please ask people, “How’d you hear about us?”

Katie Robbert – 06:26

And I feel like that’s why people like John exist, because if you don’t have a John on your team, you’re missing a lot of great opportunities. So for those of you who don’t know, John is our head of business development. He’s one of our partners, and he’s the one who’s talking to prospects all day long. One of the things you do, John, is you say, “Hey, how did you hear about us? Where did you find us? Did you see Chris? Did you see Katie?” One of your core responsibilities in those intro calls is to find out how people found out about us. And that, to me, seems like table stakes — that’s just a given, that’s what you should be doing.

Katie Robbert – 07:05

Do you have any thoughts on why people maybe don’t do that?

John Wall – 07:11

It’s really a trust thing, right? Because you’ll have their last-touch attribution — you’ll have something in your system, like, “Hey, they showed up in HubSpot because they downloaded this or that.” But then it’s, do you have the permission to dig deeper and find out? Because nobody just lands on the registration page for the white paper, right? Somebody told them about it, or they were looking for something. And for us too, it’s wild — we’ll keep pulling and it’ll turn out they worked with a guy 25 years ago at some company and met us at a random podcast event. It can be anything.

John Wall – 07:43

So you really just have to go with the best that you can do. Over time, you earn enough permission to get to the truth, and prior to that, you start with the data that comes in automatically. If you’ve earned enough trust to make a better guess, that’s great, but you really just have to play it by ear and see where it goes.

Katie Robbert – 08:05

Yeah. All right, so, Chris, one of the basics: ask people how they heard about you. And if you don’t have a John on your team, you should at least have it on your contact page. As they’re filling it out and trying to download all of your IP so they can rip you off, you should at least be able to ask them, “Hey, how’d you hear about us?”

Christopher Penn – 08:26

Yes, exactly. All right, so let’s work our way up the chain, because that’s the most important part — ask people. And use AI. If you get a lot of contact form submissions, just use AI and say, “In this field, in this response…” Katie, you’ve shown this in workshops and things — in fact, you’re doing a workshop at the Marketing AI Conference on, among other things, Claude in Word, Claude in Excel, Claude in PowerPoint, etc. You can have Claude in Excel write a formula saying, “Hey, was AI mentioned in this form field response?” So that’s level one, the most important level, the foundation level: ask people.

Number two, we all have, or should have, good experience with what the major search companies provide us for data.

Christopher Penn – 09:12

There are two tools in the space: Google Search Console and Bing Webmaster Tools. Let’s start with Search Console. It’s slowly rolling out, but what Search Console currently gives you is your general search results. In some accounts, what you’re now seeing, in addition to that, is a Generative AI subtab that gives you information about the inquiries from generative AI. So this is your regular one that everybody has, and then there’s this one, which is generative AI. This goes back to about mid-March of 2026 — it doesn’t go back earlier. And Google will only tell you what pages were recommended, what countries, what devices, and what days, and then what impressions you got.

Christopher Penn – 10:03

You get no click data at all, and you get no query data, but you can at least see, okay, for our site it showed up in 4.13 million impressions, and here are the pages that got the impressions. That’s what Google is currently giving us as of July 2026 — they might give us more at some point, but this is what you get right now. This is from a client’s page, so I can’t show any of the grounding data. What’s useful to know here, though, is that 4.1 million is a subset of the total 152 million impressions. So you can also get a sense of how much of your search data is actually AI versus regular Google, and in this case it’s still pretty small — it’s like one twenty-fifth or something along those lines. It’s not a lot for this particular site.

Katie Robbert – 10:51

But that’s helpful — even if it’s small, that’s helpful for when the conversation comes up of, “Why is our search traffic declining? Oh, it must be AI.” It’s like, nope, that’s actually not what’s happened. I’m not saying that’s what’s happening for this particular client, but that’s a conversation happening across a lot of organizations: “What am I doing, how do I fix my organic search?” And we know, and we’ve talked about in multiple instances, that you still have to have the basics of SEO, which we cover in our GEO 101 course at TrustInsights.ai. The core tenets of SEO still exist. AI visibility is now just another part of it — it doesn’t replace it.

Katie Robbert – 11:36

So if you’re seeing your search traffic go down and you’re blaming AI, you’re looking in the wrong place.

Christopher Penn – 11:44

I’ll point out that for this particular website — and I’ve seen this with a couple of our other clients whose Search Console accounts we manage — that purple line for generative AI impressions is going up and to the right. It’s growing, and it’s growing fast.

Katie Robbert – 11:58

It’s growing, but it’s not the scapegoat that people are trying to make it out to be, I guess is my point. Yes, it’s growing, but generative AI has now been around for what, two and a half years, in terms of people looking at a large language model to find information. What’s been happening to your organic search in that time? You can’t just say, “Oh, it’s AI” — you can’t blame it. So my point is, you need to have been doing all this stuff all along.

Christopher Penn – 12:28

Exactly. The other tool you should be using is Bing Webmaster Tools. Now, a lot of us like to poke fun at Bing, right, even though it powers stuff like Yahoo. But what’s really important about Bing Webmaster Tools is this: it’s Microsoft Copilot grounding data, and Copilot is the largest installed AI base in enterprise companies. So if your audience works in enterprise companies — B2B marketers — you should be 100% all in on Bing Webmaster Tools, because if enterprise is your jam, that’s your jam. Even mid-market companies use Copilot as their approved AI. But also, if you serve people who work at those companies, whether you’re B2B or B2C, you should be in here looking to see what you get. Go ahead, Katie.

Katie Robbert – 13:21

No — I feel like Bing is the unsung hero. I remember when we were working with clients on getting Google Analytics — Universal Analytics — installed correctly, and making sure they had Google Search Console, and there was always, “Well, do I need Bing Webmaster Tools?” It’s not a bad idea to have it, but nobody ever seemed to pay attention to it, and now, ha — joke’s on them. We’ve covered on other shows and podcasts why Microsoft Copilot is the system of choice for enterprise companies, but the short version is that it’s highly regulated, and a lot of companies have already bought into the Microsoft ecosystem. So it’s just a natural fit. If you’re looking to use something other than Copilot, that’s a conversation with your IT team.

John Wall – 14:14

Exactly.

Christopher Penn – 14:15

So what do we get with Bing? As with Google, we get the list of pages, and you get to see how many citations — how many results was this page cited in. For example, our top page on our blog is the post on “Why I Can’t Do Math” from about almost a year ago. We also have other pages, like some of our white papers — we have a decent number of PDFs, you can tell because it’s the wp-content path. PDFs get some love and show up in the citations. However, what you get out of Bing that you don’t get out of Google is the Grounding Queries tab, and this is pretty awesome. This tells you what Bing searched for — I refuse to use “Bing” as a verb.

Christopher Penn – 15:00

It tells you what Bing searched for, what the intent of the search was, what the topic is, and then how many times your site as a whole was cited for that topic, and what percentage of the citations were yours. So: “LinkedIn candidate ranking algorithm,” informational intent, recruitment and talent acquisition — 141 times we’ve been cited, and we are 66% of the citations. When Bing searches for this particular topic inside a tool like Copilot and returns results, Copilot says Trust Insights — 66% of the results are ours.

John Wall – 15:45

Is there any visibility into which large language models they’re hitting, or do they have a short list of where they’re pulling that stuff from?

Christopher Penn – 15:53

Well, Copilot has four models under the hood. Copilot has Microsoft Phi, which is their auto-fast one; they have a chain-of-thought version of Phi, which is less dumb but still dumb; and then they have OpenAI’s GPT models — 5.5 and 5.2 are available as the models in there. But what happens in Copilot is you ask it something, Copilot says, “I’m going to do a web search,” goes to Bing, uses Bing to pull the data, and then Copilot interprets the data that comes back.

John Wall – 16:24

Oh, I see — so even though it’s Bing, these are still Copilot queries.

Christopher Penn – 16:29

Yes.

Katie Robbert – 16:30

I just had to do a quick search because I needed to validate that — I was like, wait, isn’t LinkedIn part of the Microsoft ecosystem as well?

Christopher Penn – 16:38

It is.

Katie Robbert – 16:39

It’s interesting, because we talk about the Google ecosystem, including things like YouTube and other major platforms, and now we’re seeing the other side of it — it’s Microsoft’s turn to show off their whole ecosystem. I don’t know that there’s any correlation between LinkedIn showing up as the number one thing and the fact that LinkedIn is part of the Microsoft ecosystem, but I also can’t say that there isn’t. I think it’s something for marketers to pay attention to — understanding which platforms belong to which of the big guys.

Christopher Penn – 17:17

Oh, for sure. In this case, these two queries specifically ask about Bing and Copilot — someone’s having conversations about how LinkedIn’s job candidate ranking works. So, HRIS stuff for people using LinkedIn talent acquisition services. This, by the way, tells us — because what it’s linking to is the Unofficial LinkedIn Algorithm Guide, which is the Trust Insights publication about how LinkedIn itself works. We’ve never done a piece specifically on the talent acquisition part — we probably could, and we might want to, if it made sense from a “will it get us any business” perspective.

Christopher Penn – 17:57

But there’s a whole bunch of papers that LinkedIn has on how their internal ranking systems work when they’re recommending you for job searches and things like that — that’s what people are asking about there.

Katie Robbert – 18:09

That makes sense. It’s a big topic right now.

Christopher Penn – 18:11

It’s a huge topic. The second one there on the list, Cowork plugins — as of about two weeks ago, Microsoft Copilot Cowork went general availability, which is their licensed version of Claude Cowork. So people are looking for stuff around that. Katie, this would be an opportunity for Trust Insights to make sure we’re promoting the various plugins we have in our academy as Cowork plugins, in addition to just saying they’re Claude plugins, because obviously Microsoft Copilot Cowork, being a licensed version of Claude, should be cross-functional. So we’d want to have some content about that.

Katie Robbert – 18:55

No, that makes sense. For a long time, Microsoft Copilot had that firewall up, like, unless you’re a Microsoft user, you can’t know what’s inside Copilot. But now that Claude Cowork is part of it, it’s like, “Oh, well, we are well-versed in Claude Cowork — we’ve taught classes and built instructional materials on Cowork. And by the way, we also have things you can buy to install into your Cowork.” So yeah, absolutely — I’m going to write that down somewhere so I don’t forget about it, but that’s a really great idea.

Christopher Penn – 19:31

Yep. And finally, we see our years-long addiction to making frameworks for everything actually paying off, because the majority of these things are AI frameworks for marketing, consulting frameworks for production, writing for marketing consultancies, AI frameworks, and so on and so forth. So we’re winning that. In fact, even right there near the bottom is the 5P Framework by Trust Insights — well, they didn’t put the brand name, but the 5P Framework is there. So Microsoft Copilot is citing us when people ask about this particular framework, and it’s great that people are looking for it by name.

Now, what we’d also want to do is take a look at our regular search performance to see how that compares to what Copilot is doing, and we see some different stuff here.

Christopher Penn – 20:19

We see our brand name, we see some YouTube collaboration stuff, we see “Get Insights on Marketing AI Agents” — that looks real. We see a lot more YouTube-based stuff, people using regular Bing, not the Copilot stuff — regular Bing, for search engines like Yahoo, for example, that use Bing as their backend. Safari used it as its backend for the longest time. But even here, something like “Trust Insights Academy” — people are searching for us by name. So that’s a great thing.

So that’s level two. Level one is ask people. Level two is take a look at Google Search Console, especially if you get the Generative AI tab, and take a look at Microsoft Bing Webmaster Tools.

Katie Robbert – 21:09

I was going to ask a question. Years ago, when we talked about Bing Webmaster Tools, it was something we had to install and set up for our website. I assume that’s still true — that’s still a step people have to take. You can’t just open up Bing Webmaster Tools and expect your data to be there.

Christopher Penn – 21:31

That’s correct.

Katie Robbert – 21:32

I just want to make that clarification. So if you’ve never done the steps to install or integrate Google Search Console, it’s roughly — I think — probably the same steps to install Bing Webmaster Tools. But this is a great time to do it, especially as questions around AI visibility become more and more prevalent in conversations. People are doing their 2027 planning already. They’re going to want to know, and this is a great data source to be looking at if you don’t have it.

Christopher Penn – 22:02

Yep. Now, the third layer—

Katie Robbert – 22:06

Is?

Christopher Penn – 22:08

If you have a CDN, a content delivery network, that collects data about what AI agents are hitting your website, you can potentially get at this data. We talked about this — and it’s actually in the GEO 101 course — Cloudflare is one of those CDNs that gives you seven days’ worth of data about which AI bots are hitting your website. The challenge here is it kind of sucks, downloading a bunch of CSVs every seven days. Clearly there’s got to be a better way to do this, and the answer is yes, there is. I did this last night: I set up a Cloudflare Worker, which is a little miniature server on their system that intercepts and makes a copy of all the bot data and can send it to other places.

Christopher Penn – 23:03

So what I wired up last night for fun — because this is what I do on Thursday nights — is I wired that worker into Google Analytics. And what we can now see in Google Analytics is all the Cloudflare data, to understand what kinds of AI bots are coming to the Trust Insights website. Are they crawlers, which are looking to vacuum up training data? Are they assistants that browse on your behalf, like when you’re using Gemini or ChatGPT and you say, “Hey, what’s a good marketing AI consultancy,” and it says, “Let me go look for that”? And then AI search engines, like AI Mode and AI Overviews specifically. So those are the three categories — you can see what pages they hit, and of course you can see the event counts.

Katie Robbert – 23:49

So to clarify even further — let’s say I’m using Claude with the Google Chrome extension, and I say, “Hey, go pull the data off of this website, because I’m too lazy to do it myself.” Is that website’s Google Analytics going to categorize it as an AI assistant?

Christopher Penn – 24:15

In this instance—

Katie Robbert – 24:17

If—

Christopher Penn – 24:18

If it’s ClaudeBot, yes — Cloudflare will categorize that and send that data. Here’s the catch: Google Analytics does not do this out of the box at all. In fact, Google Analytics, out of the box, blocks all bots, because they don’t want your analytics filled with junk. So what I had to do was wire this in a way that bypasses Google’s bot blocker, using what’s called the Google Measurement Protocol — part of their API — and set up a separate Google Analytics property just for this, because you absolutely do not want bots mixing with regular human traffic. But to your question, Katie, you can tell by the user agent of the bot what bot it is.

Christopher Penn – 25:01

So if I start a new tab here, just freeform, and we put in our user agent as one field and our event count as the other, you can see, based on the name — this is Amazon Bot, there’s Meta External Agent, there’s ChatGPT-User bot.

Katie Robbert – 25:21

So is ClaudeBot the Chrome extension, or is that something else?

Christopher Penn – 25:26

That’s principally going to be Claude’s web interface, Claude Desktop, and Claude Code.

Katie Robbert – 25:33

Got it. So basically, when you say, “Hey, go find this stuff,” and you’re using Claude Desktop essentially as a search engine and saying, “This is what I want to search, go search it” — that’s what we’re talking about when we say ClaudeBot. So if you go into ChatGPT and say, “This is the deep research I want to do, go find it,” then that’s where you get the GPT bot, for example. I think that’s data people are craving.

Christopher Penn – 26:05

Yep, you can get all that data. One of the things I found interesting was that Amazon’s crawler bot was the most aggressive of them, which I thought was interesting. So I put “bot category” here, and then “user agent,” and now we can see crawler, assistant, crawler-search, etc. So we can start to get at what those different bot types are. ClaudeBot is the assistant version of Claude.

Katie Robbert – 26:31

So with Amazon — because they don’t have a large language model like ChatGPT or Google Gemini or Anthropic’s Claude — what’s Amazon’s version that users would be using in order to hit our website?

Christopher Penn – 26:48

They do. When you use Claude or Gemini or any of the models through AWS, because you’re an enterprise company using Amazon Foundry and AWS, all of that data comes through Amazon Bot — it’s behind Amazon’s AWS. So Amazon Bot is a big hodgepodge of all the enterprise companies that use AWS for protected models. For example, we have one client in the healthcare space — they have their own internal AI hub, and that routes through Google Cloud in their case. So you’d see Google stuff coming out — even if they’re using Claude, they’re using a version of Claude that lives on Google Cloud. In this case, you’d be using a version of Claude that lives on AWS.

Katie Robbert – 27:32

This is just getting more and more messy.

John Wall – 27:36

It is.

Christopher Penn – 27:38

But what’s nice about this is, once you set it up — and it took me a good hour and a half to figure out all the weird tricks to get it working — you can then start to build a Looker Studio dashboard. As time goes on, this dashboard will populate for us, and Katie will be able to go in and see how much AI assistant versus search versus crawler traffic we’re getting, and what the top pages on our site are that they’re hitting. And if Katie wanted to, she could say, “I only care about AI search,” and turn off the rest.

Katie Robbert – 28:12

That makes it so easy, doesn’t it?

Christopher Penn – 28:14

Yeah. And these are the pages that matter most for AI search, right — when someone is using something like Google, or, since we know Claude is classified as an assistant, what are the pages here? We could even put in a dropdown to specify Claude versus ChatGPT or whatever, and Katie could slice and dice this without having to do it in Explore Hub, which is kind of a pain, or look at raw server logs, because Katie doesn’t like looking at raw server logs.

Katie Robbert – 28:44

I really don’t. I’ve done it, I don’t care for it at all. The reason I’d want to look at this information — the “so what” of it — is so that I could look at those particular pages and say, is there enough machine-readable content on it so that when someone’s searching and the machines are saying, “Let me go find the most helpful information there is,” they’re finding us. That’s what we’ve been talking about for well over a year: you have to have that content for the human as well as the content for the machine. So your pages need to get longer, they need to have more information. Again, in the GEO 101 course, we cover “being everywhere” and what that means.

Katie Robbert – 29:33

So if you don’t have those transcripts included with your podcast, with your videos — if you’re not putting the descriptions into your YouTube channel, into your YouTube videos — those are missed opportunities. Because now we’re being found by these AI assistants, these AI crawlers, because we’ve been putting in the work to make sure there’s machine-readable content. I remember, Chris, when you first started putting that onto our website, and it said something like “this content is specifically for—” and I was like, “What is this? Why is this jacking up the UI on the website we’ve worked so hard on?” But now I totally get it, because this is the kind of information you’re going to get back. And then I can look at it and say, “Okay, where are the missed opportunities?”

Katie Robbert – 30:23

Where do we need to do more? Because yes, there’s a human behind the large language model searching, but now you have this intermediate step of a large language model. So you’re no longer getting sessions from humans — you’re getting AI assistants powered by humans, powered by large language models. I’m trying not to overcomplicate it, but basically, instead of a human going straight to your website, you now have this intermediate step: human to large language model to your website. That’s what we’re looking at.

Christopher Penn – 30:56

Yep. So if you remember, a couple weeks ago we talked about WebMCP. One of the things we didn’t mention in that episode is that with the WebMCP Imperative API, which is the JavaScript version, one of the things you should consider doing — and we built this into a WordPress plugin — is sending Google Measurement Protocol hits to an AI property in your Google Analytics, so you can see, is anyone using WebMCP, and if they are, what’s the use? But I want to show this because I think it’s fascinating. If I go to our regular Trust Insights Google Analytics property and go into our real-time overview — this is right now, it’s relatively quiet on our website, right, this is our regular GA4 property, this is fine.

Christopher Penn – 31:39

It is what it is, because it’s the middle of the day, it’s a holiday week. If I then flip over to the Cloudflare bot page — look how many bots are on our website right this minute.

Katie Robbert – 31:50

And I think that, in and of itself, makes the case for why you’d want a separate Google Analytics property for your Cloudflare bot data specifically, because this tells a very different story. Along with Kelsey, I look at the data for our website month over month, and this is what we’ve been struggling with — how much of that is bot traffic and how much of it is real. Now we can actually differentiate. Note for Kelsey: you’re going to have to look at a different GA4 property for the monthly metrics now.

Christopher Penn – 32:23

Yep, exactly. So this is level three of AI visibility. Level one, ask people. Level two, look at the approved tools like Google Search Console and Bing Webmaster Tools to see what those are showing you. Level three, start looking at your CDN data and put it in your analytics tool of choice, so you can see how the bots are coming to you and what kind of bots they are. If you use Akamai, it has a similar thing to Cloudflare called Edge Workers, and you’ll have to program and build that for Akamai. If you’re using Cloudflare, it’s Workers and Worker Routes. And if you’re interested in getting some help with this — because I can tell you right up top, it is a massive headache to set up—

Christopher Penn – 33:11

You can go to TrustInsights.ai/botanalytics, and that will connect you with John Wall, who will help you talk through how we can get this set up for you. So that’s level three.

Katie Robbert – 33:24

And I want to acknowledge, Chris, because you said, “Oh, this took me about an hour last night” — I want to put that into relative terms, because you’re so well-versed in setting things up in different technology systems and infrastructures, especially Google Analytics 4. You’re very efficient with Google Analytics. When you say it takes someone like you, who has that deep level of understanding, an hour — for someone like me, we’re talking probably a good couple of days to wrap my head around all of it. And John, who’s like, “Get me away from setting up Google Analytics” — he’s like, “I’m just going to outsource it, where do I find the John Wall version of this?” So I just want to put that in perspective.

Katie Robbert – 34:08

When you say it took you an hour, that’s you, a super advanced user. The rest of us are going to take a while, probably first and foremost, just to figure out, “Well, what CDN do we even have, and where’s the login?”

Christopher Penn – 34:22

Yes. And do you have permission to use it? Which is one of the things that a lot of companies have IT departments that have separated out. The other thing I’d strongly recommend, if you do this with Google Analytics, is to tie it in to your BigQuery instance — Google Cloud BigQuery. Because remember, depending on your Google Analytics settings, you may have data retention at the event level for either two or 14 months, and then that data gets deleted, it goes away. If you tie it into a BigQuery instance, you keep that data in perpetuity — but you pay for it. For us, it’s going to cost probably about $5 to $10 a month in usage costs, which isn’t outrageous.

Christopher Penn – 35:05

The other thing BigQuery gives you that you can’t do in Google Analytics is you can use BigQuery’s native AI with Gemini to do much more advanced modeling. Again, it will cost you money, because everything you do in BigQuery costs money. But if you had things you specifically wanted to do for advanced data analysis, that would be the place to do it, because that database is tuned for very heavy machine learning loads.

Katie Robbert – 35:34

I recall when Google Analytics made the switch from Universal Analytics to Google Analytics 4, BigQuery was a recommendation we were making at that time anyway. So if you’re a user of Google Analytics 4, BigQuery shouldn’t come as a surprise. Chris, is the reason we’re recommending BigQuery because it’s natively connected to Google Analytics, versus using some other database or data collection system? Could you use something that isn’t BigQuery?

Christopher Penn – 36:09

You could use something that wasn’t BigQuery, if you had the ability to export the data to that system. But BigQuery will still be your intermediate stopping point — the place you send some of that data through.

Katie Robbert – 36:27

I bring this up because, yes, we’re promoting that we can do this for you, but there’s a reason you might want someone who’s familiar with all of these systems doing this — because it’s not just setting up a new type of report in Google Analytics. Gosh, I wish it were that simple. But you have your CDN, you have BigQuery, you have these other systems you need to be well-versed in to get this level of data.

John Wall – 36:51

Yep.

Christopher Penn – 36:51

So this is what it’ll look like in BigQuery. Inside the BigQuery table itself, you can see things like, is it a bot, what kind of bot is it, and so on — the stuff we’ve been talking about with AI Assisted. This is the data that comes from Cloudflare to BigQuery. Now, here’s the reason you might want to use BigQuery as opposed to Google Analytics, or as opposed to the approach Katie was talking about — in BigQuery, if you’re skilled at it, you can query across tables. So I can take my AI bot table and cross-reference it with my regular GA4 table, to see, for example, humans and machines looking at the same thing. Right now in Google Analytics, you’d have to flip back and forth between properties — it’d be a pain.

Christopher Penn – 37:32

You could do it all inside one query in BigQuery and do the advanced data analysis there. And then, if you want to get fancy, inside Google Looker Studio you can issue a custom BigQuery query as a view that lets that data get blended at the BigQuery level first, and make you a consolidated view. Maybe you have five websites and you want bot analytics on all five — you could blend it all together into one big view before it goes into Looker Studio. So you could see a big-picture view: here’s the AI bot activity for our portfolio of sites. Or if you’re an agency, here’s the AI bot view for all of our clients, and you can see, “Wow, we’re getting a gazillion bots a day.”

John Wall – 38:23

Yeah, unless you’ve got your DBA merit badge, you just want someone else to do this for you.

Katie Robbert – 38:30

I agree. I have this little pain starting above my eyebrow — I’m like, oh my goodness. I get how I crave this data. We’ve been trying to answer this question, but I also don’t have the time or energy to figure out how to set it up. So I appreciate that you took the time to do this, and that you’re sharing how difficult it can be if you don’t readily have access to all these systems, or aren’t well-versed in them. We’re just watching the live active users, and we’re seeing the number of bots continue to creep up. I’m looking at the bottom right-hand side, event count by event name — number one, AI bot hit.

Christopher Penn – 39:18

The other thing I should mention is that in Cloudflare, where you deploy this on the free account, you get 100,000 requests per day for free, and then you have to upgrade to the paid version of Workers. I installed this 16 hours ago, and you can see we already have 62,000 requests. So we’re clocking 4,000 to 5,000 requests an hour. Now, granted, it doesn’t strain the worker in a very difficult way, but if we’re honest, we have a modest little website. For the record, one of our customers is Triple A — they have millions of humans a day; they probably have billions of requests per day.

Christopher Penn – 40:09

So if they implemented something like this, they’d get a pretty decently sized bill from Cloudflare for the servers that would need to spin up to handle that kind of load.

Katie Robbert – 40:20

Yeah, we often talk about B2B, but when you think about B2C — consumer products, things people are searching for all the time — it’s going to get out of control really fast.

Christopher Penn – 40:34

Yes. This has the potential to get very expensive very quickly.

Katie Robbert – 40:40

I was going to ask — when we talk about pricing for Cloudflare, without getting too deep into it, we recommend people have it regardless for their websites, but is Cloudflare an affordable system? Are there other CDNs — you mentioned Akamai — are there other CDNs that a small business could use for this kind of work?

Christopher Penn – 41:03

Cloudflare is the preferred choice for small businesses.

Katie Robbert – 41:08

Got it.

Christopher Penn – 41:09

We’re on the $20-a-month plan. Gotcha. That gives us a lot of features — it’s almost like AI, there’s the SMB $20-a-month plan, there’s the $200-a-month plan, and then there’s the pay-as-you-go enterprise plan, because the bills are going to be large there. It’s at the enterprise level where Cloudflare and Akamai start to diverge in terms of pricing — below that, at the SMB level, Cloudflare, hands down — we can’t afford Akamai. Akamai wouldn’t even talk to us for less than like $4,000 a month, and we’re like, oh, $4,000 a month? $20 a month.

Katie Robbert – 41:45

Well, I think that when we talk about analytics, a lot of times we’re saying the data is directional, the data is good enough. This helps us understand how many bots are hitting our website, or which pages we need to refocus on to make them more attractive to AI assistants. That’s really what I need — I don’t need to know if it’s 62,000 or 62,395. That’s not going to make a big difference to me.

Christopher Penn – 42:16

Yeah, exactly. And the Worker stuff, even on the free plan, gives you some coverage. For example, KatieRobert.com is not yet a top-ten internet site, so we could install this on your site on the free plan — it would probably have enough coverage to do that. It’s actually something probably worth doing, because I’d be curious to see how many bots are hitting your site, even though it’s not a hugely trafficked site — just to see how much AI bots impact even a tiny site.

Katie Robbert – 42:49

I’m going to go ahead and say 100% of the traffic to my site is bots. The site itself is like five pages, it’s very small. But I think it would be a good test, and also a good demo of what it looks like to set it up fresh on a website — because who knows, I don’t think my site’s going to break the bank, but you never know.

Christopher Penn – 43:15

And more importantly, I’d want to know which AI bots are hitting your site and why. Is Claude hitting your site a lot? Is ChatGPT hitting it a lot? Is it assistant, is it search, is it crawler? I’d want to know that as a site owner — if it’s assistant or search, it means people are asking about you in AI tools, and it says, “This is a valid resource,” and that’s helpful to know.

Katie Robbert – 43:42

We also — KatieRobert.com, which does actually exist — I don’t believe we have Bing Webmaster Tools set up on it. So perhaps that could be next week’s livestream: how to set up Bing Webmaster Tools.

Christopher Penn – 43:56

Yeah, we could do all three. There’ll be some parts that get pretty messy, because you have to go into the command line with Cloudflare’s Wrangler node package, but yeah, we could do all three next week just for giggles.

Katie Robbert – 44:10

People are curious. That said, I’d highly recommend at least chatting with us about what it takes, depending on the size of your site and what systems you have. Go to TrustInsights.ai/botanalytics — you’ll get a real human. That real human is right there, that’s John Wall. I can’t guarantee what kind of mood he’s going to be in, but he is a real human.

John Wall – 44:34

That’s right, humans are messy — roll the dice.

Katie Robbert – 44:39

But no, I think it would be interesting just to show how messy the setup itself is, but then, start to finish, what do you get?

Christopher Penn – 44:50

Yep, get ready for a lot of you and John telling jokes while I type.

Katie Robbert – 44:56

Well, that’s the magic of television — we have pre-baked things.

John Wall – 45:01

Betty White show.

Katie Robbert – 45:03

Yeah. All right, John, we’re going to have to work on our shtick — I think we can come up with something.

Christopher Penn – 45:08

Exactly.

John Wall – 45:08

Endless topics, endless.

Christopher Penn – 45:10

Yep. Now the last part — and I’ll mention this — the last part of AI visibility that people do care about is competitive, and we cover that in our GEO 201 course, which we’re not going to get into because we’re almost out of time for this week’s episode, and also, we have a course for that. But that would be level four: you’ve asked people, you’ve looked at the approved tools that have grounding in reality, you’re looking at real bot analytics to see what content is getting hit, when, and why based on bot category, and then the fourth is looking at the activity data you can get for competitors compared to the two. That’s AI visibility in a nutshell, right now.

Christopher Penn – 45:51

The thing I find interesting — like the Cloudflare stuff we talked about today — that came out yesterday. That literally came out yesterday. As I was getting ready for the livestream last night, doing the prep, I was like, “Oh, I can implement this.” It totally threw off my plan for today’s show. But the space is changing very quickly.

Katie Robbert – 46:10

And I think that’s just another good reason why you should probably be connected with an expert like Chris, who understands these things and whose full-time job is to make sure he’s on top of the latest and greatest. So, red flag to anyone who says they can tell you what someone is searching for directly in a large language model without doing any of these steps.

John Wall – 46:36

Exactly.

Christopher Penn – 46:36

Even then — even with these steps — it won’t help. That’s going to do it for this week’s show, folks. Thanks for tuning in, and we’ll talk to you all on the next one. Thanks for watching today. Be sure to subscribe to our show wherever you’re watching it. For more resources and to learn more, check out the Trust Insights podcast at TrustInsights.ai/tipodcast, and our weekly email newsletter at TrustInsights.ai/newsletter. Got questions about what you saw in today’s episode? Join our free Analytics for Marketers Slack group at TrustInsights.ai/analyticsformarketers. See you next time.


Need help with your marketing AI and analytics?

You might also enjoy:

Get unique data, analysis, and perspectives on analytics, insights, machine learning, marketing, and AI in the weekly Trust Insights newsletter, INBOX INSIGHTS. Subscribe now for free; new issues every Wednesday!

Click here to subscribe now »

Want to learn more about data, analytics, and insights? Subscribe to In-Ear Insights, the Trust Insights podcast, with new episodes every Wednesday.


Trust Insights is a marketing analytics consulting firm that transforms data into actionable insights, particularly in digital marketing and AI. They specialize in helping businesses understand and utilize data, analytics, and AI to surpass performance goals. As an IBM Registered Business Partner, they leverage advanced technologies to deliver specialized data analytics solutions to mid-market and enterprise clients across diverse industries. Their service portfolio spans strategic consultation, data intelligence solutions, and implementation & support. Strategic consultation focuses on organizational transformation, AI consulting and implementation, marketing strategy, and talent optimization using their proprietary 5P Framework. Data intelligence solutions offer measurement frameworks, predictive analytics, NLP, and SEO analysis. Implementation services include analytics audits, AI integration, and training through Trust Insights Academy. Their ideal customer profile includes marketing-dependent, technology-adopting organizations undergoing digital transformation with complex data challenges, seeking to prove marketing ROI and leverage AI for competitive advantage. Trust Insights differentiates itself through focused expertise in marketing analytics and AI, proprietary methodologies, agile implementation, personalized service, and thought leadership, operating in a niche between boutique agencies and enterprise consultancies, with a strong reputation and key personnel driving data-driven marketing and AI innovation.

Leave a Reply

Your email address will not be published. Required fields are marked *

Pin It on Pinterest

Share This