So What Data Horror Stories

So What? Data Horror Stories

So What? Marketing Analytics and Insights Live

airs every Thursday at 1 pm EST.

You can watch on YouTube Live. Be sure to subscribe and follow so you never miss an episode!

In this episode of So What? The Trust Insights, you’ll learn about common data horror stories that can plague any marketing team, and how to avoid them. Discover the impact of poor data quality on your marketing efforts and gain practical insights to prevent future data disasters. You’ll understand how to apply the 6C data quality framework and a 5P context to audit your data and ensure it’s ready for AI-driven decisions.
Watch the video here:

So What? Data Horror Stories!

Can’t see anything? Watch it on YouTube here.

In this episode you’ll learn:

  • Some of the worst data horror stories we’ve heard
  • How the 6C Framework can save you from a terrible fate
  • How being terrorized by your data can ruin your AI initiatives

Transcript:

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for listening to the episode.

Katie Robbert: Welcome back, everyone. Happy Thursday. Welcome to “So What? The Marketing Analytics and Insights Live Show.” I am Katie. Chris is traveling. John is somewhere, but today I am joined by, I think, did you say your name was Clark?

John Wall: Violating the copyright laws playing a John Williams hit. But, yeah, I figured I could wear my costume rather than just get two hours Halloween evening out of it. I could get one more play out of it.

Katie Robbert: I love it. I am here for a Halloween costume. I usually go big on Halloween. I used to when I worked in an office. I would always dress up. But now that it’s just me by myself in my house with my dog, she doesn’t care what I do.

John Wall: I don’t know. I think it’s part of truly a marketing creative thing, just like costumes are a challenge. What kind of story can you tell? What can you pull off? The thing for me that has been crazy is I got into it heavy with my kids starting from when they were babies. We would get them dressed up as stuff. Now they’re both teenagers. I have all these pictures of all these ridiculous costumes over the years, which is fantastic. It keeps me going. So, I’m all for a little bit of Halloween creative muscle exercise. Looking forward to it.

Katie Robbert: Our buddy Brian says he’s wearing his full Transformer costume right now, working at home by himself. He said, “Just kidding,” but I don’t think he is. I think he’s very serious.

John Wall: Autobots, roll out.

Katie Robbert: So today on this week’s live stream, in the spirit of spooky season, we’re talking about data horror stories. We asked our Analytics for Marketers Slack community, which is free to join Trust Insights, AI Analytics for Marketers, and John, you also posted on LinkedIn asking people for their data horror stories. Unsurprisingly, we want to be able to tie this back to something like the 6C data quality framework. So let me see. I’m going to make this a little bit easier to see. There we go. The six Cs of data quality are clean, complete, comprehensive, calculable, chosen, and credible. John, you did the due diligence of collecting a bunch of these horror stories, so let’s talk about some of them. What were some of the better stories that you got?

John Wall: Well, unsurprisingly, many people were reticent to share tales of stuff that they’ve really screwed up and crashed and burned. So I did get a couple that were in there. Thankfully, at least, I’m already going to admit to a bunch of disasters. So I have a trove of “burning the whole village” kind of tales that I can throw out here. I had eight that came down. I know we didn’t get to do as much of a rundown. How many do you want me to just give a quick overview of them, and do you want to run down through the stack first? How do you want to roll it?

Katie Robbert: No, I’m open because, basically, what I got from collecting the stories was, “Okay, we’re not alone.” A lot of people have these stories. I think where people find challenges with these types of things is, “I have these stories, but what do I do about it?” So, I’m happy to have you just go for it because when we get towards the middle of the show, I actually have a little bit of the solutioning. But first, we have to set the stage: What are the problems?

John Wall: All right, that sounds good. So I’ll do. I’ve got eight tales. We’ll start at the bottom. Number eight, the least horrifying of the batch, is for all marketers. Everybody loves this when you sweat when you hit that send button. You’re sitting there with the email marketing automation, and you’ve checked it a dozen times, and you click the button. Fifteen, twenty minutes go by. You’ve moved on to the next thing. You’re all good, and then suddenly the emails start coming in that, “Oh, the link in this message is bad.” What are you going to do about that? If you’ve just mailed 60,000, 250,000 addresses, you’ve now created a huge mass of inbound customer service complaints and just everything across the board.

Who knows how much damage you can do? That’s the first tale of panic right there. Usually it comes down to unclean. It’s just a matter of somebody broke something somewhere along the way. So that was the story that came in. That was enough to be frightening but was the bottom of the list.

Katie Robbert: See if this works. I feel like this deserves a sound effect.

John Wall: We’re not getting the clunk that we need, or at least it’s not coming through to my side. Oh, man, that’s definitely a sad trombone. I got to see on the board. Did that come through? Yeah, there we go.

Katie Robbert: There you go.

John Wall: Okay, we’ll save the.

Katie Robbert: Save the sound effects for another time. I feel like that’s such a relatable thing. This is why we’ve talked about it in past episodes of the live stream and the podcast, and pretty much anything people will listen to us on. Don’t QA your own stuff. You’re going to look at it 10 times, and that 11th time is when someone else is going to look at it and go, “Oh, I see the mistake immediately.” That happens to me all the time because we move too fast. We think we did the right thing. We click on it. We have 8,000 tabs open.

But to your point, then the shameful, “Hey, I clicked on your link, and it goes to a 404,” or, “Hey, I clicked on your link, and I don’t know if you want it to go to that picture of you from your summer vacation,” or, “Hey, I clicked on your link, and nothing happened.”

John Wall: Yeah, it’s just nothing but pain comes from that. That outgoing run there. I think another thing with this is you hit a great point in that you’ve got to have a process that has secondary validation. Somebody else has to do it. Over time, pick out somebody in your organization, like I have. I’ll give a shout out to Erica Halloran, who I haven’t worked with in over a decade, but she was the one person that everybody knew in the company that it didn’t matter. It could go through five people, and none of them found any mistakes, and it would go to her, and she’d find three mistakes in the thing. There are some people that are born that have that skill set and the eye for that kind of work.

You want to find some of those people and then start buying them lunch or pizza or whatever it takes to get them to help out. Find the experts. Enroll with those folks because that’s the easy way to avoid that train wreck as it comes along.

Katie Robbert: I would say people are going to ask the question, “Can’t I just use generative AI to do proofreading and test?” You can, but with the caveat that it’s going to also miss obvious things. I usually try to run a lot of my writing through basic spell-checking and sentence structure grammar, that kind of stuff. But AI always misses something because it’s not human. So, things like “it versus is”—it won’t correct those things necessarily, not because it’s not looking for grammatically correct, but it’s not so blatantly wrong that it’s not going to pick it up. So you’re still going to publish things with errors if you don’t have that human intervention. So that’s the case for you still needing a human doing that.

You can build really elaborate prompts to catch all that stuff, but I would say you still need a human to go through and make sure that links are working, UTM tracking codes are set up correctly, and that Tag Manager or Adobe Analytics is set up correctly. There’s a whole bunch of it that goes into that one little link in your email.

John Wall: Absolutely. All right, so at number seven, I have “having to install an Oracle database to be ready for tomorrow.” This is a project thing where there was a group of people going off to an event in Chicago, and they knew that they basically had to have two or three demo systems built to go there. But the problem was they hadn’t calculated that the demo software had so much data in it that it would actually take about two days to build the machines to do the demo. So when they showed up the day before the event and started working, they had to work all through the day. By 6:00 p.m. that night, they were like, “Oh, wait a minute, no, we’re still not even half installed on these machines.”

The team had to stay up through the night all the way through until the next morning. It was literally a matter of packing a bunch of laptops in a bag and somebody driving them to the airport to meet the team as they were flying out because they weren’t able to build the machines in time. Normally, you would want to have them built at least two days ahead of time so that the team could then play with the machines, make sure everything’s working, and get used to it. But yeah, this is, and I filed that one under “not calculable.” You need to figure out how much data you’ve got, how long it takes to process, and how long it takes to run reports or whatever.

Because if you have just way too much data, it’s not in the right format, and it takes 10x what you thought it would do to get the job done, you’re going to get yourself into another train wreck.

Katie Robbert: There’s a lot to unpack with that. I agree. I think that’s calculable. That’s also complete and probably comprehensive and probably credible and probably clean. So let’s just say that one hits all the marks. If that’s only number seven, I’m nervous for what’s to come. That’s definitely a case for, I know this is going to come as a huge shock to you, John, but the 5P framework could have helped. Within the process P, you would have outlined, “We know we need these people. The process is this. Let’s back into a work back plan and a timeline.” Oh, we probably need more than 24 hours. I’m not shaming whoever submitted this story. Things happen.

People ask for last-minute crazy banana stuff all the time. I get that. That one, to me, if that’s number seven, I’m terrified for what’s to come.

John Wall: Well, the good. In fact, they actually did get the gear there. The landings get worse as we go here. So, in at, what are we, number six right now? This one is more. There are a bunch of different stories that fell under this category, and anybody who’s done marketing ops in a smaller organization has done this. “The spreadsheet is off by only one row,” right? This is. You’ve got this thing where you’ve got a spreadsheet, and you’re trying to clean it up to load maybe into your email system or for some kind of survey, but whatever. You’re basically massaging a spreadsheet, right? That’s the thing.

Everybody always does this where they grab the columns that they want to sort, and they miss one of them, and then they do the sort. Now that last column stays in the order it was in, but everything else gets shuffled. So now you’ve polluted the data across the board. This came up a dozen different ways, but one that comes over multiple times is this: where it gets shuffled. So you do your email blast, and it goes out to the email address, but the first names are all off by one. Nobody gets the right first name in their message. The link one is bad, but you can fix the link. In fact, today on better email systems, you can update a link, and that will just work.

But something like this, you polluted the whole data pile.

Katie Robbert: You.

John Wall: Everybody gets a bad email address, and it just looks horrible for everybody. One upside of this one I did here, this was the number one driver for people to switch to CRM systems. This is something where this happened one too many times, and then they finally said, “No, we just need whatever our CRM is to automatically dump the data into our email system so that it’s not a manual process, and we remove all chance for error.” So, yeah, the spreadsheet jockeying coming in at number six.

Katie Robbert: I remember a friend of mine telling me about his wife’s job. She worked in marketing for a large grocery store chain in New England. He said that he was always amazed at her ability to basically wrangle her spreadsheet because she was in the, there were so many columns, and it covered so many things. It was all in one sheet. You had all of the hidden columns, and then you had things that were locked, and you couldn’t see all in one sheet. So she was in the triple letters. So, once you go like A through Z, it goes AA through ZZ, and then you start to get into.

It’s just bonkers beyond my ability to comprehend how you can do your job with something that big. To me, that kind of falls under spreadsheets when they get that big and unwieldy, fall under, “The comprehensive must cover the question being asked.” That’s too much data. How can you answer a question with that much data?

John Wall: You should never. If you’ve got anything and you’re going beyond ZZ, that needs to be in a database. That’s what databases are made for. The spreadsheet is not a database. You have to get out of that.

Katie Robbert: Yeah.

John Wall: All right, let’s see. All right, so at number five, we’ve seen this firsthand with a bunch of people. It’s basically your AI project failed. You wanted to do something with AI, but then you went through all the hassle of building the thing. You use your data, and your output is all garbage. It gives you hallucinations. Everything is just fake and false or looks bad. But yeah, that was a. That one came up many times, and that one was a little bit of a. It wasn’t the scariest ones because a lot of these are pilot projects, and they’re new things that people are trying to do, so the stakes were kind of low.

But for everybody that does have a crash and burn now, your ability to get selected to do the next really crazy pilot project goes to zero as you can tank that. But yeah, you’ve seen this firsthand too with a bunch of stuff. What’s your take on this? This one here?

Katie Robbert: I’m going to tap into my spooky side and be a fortune teller for a second and say that if people stick around, there may be a solution to that exact problem if they want to see what it is, which is at Trust Insights, AI, AI-Ready Data Quality Audit. That’s the demo we’re going to do shortly. That’s what we talk about a lot, and by no means are we the only ones to talk about this. I feel like it’s a well-known thing that AI is going to amplify the things that are wrong. It’s not going to fix them unless that is your goal. So really, it’s going to say, “This thing is wrong.” I’m just going to run with it.

It’s not going to look at your data and go, “You know what, they could do better,” or “They could be giving me something else,” unless you’re prompting it to do that exact thing. So if you’re giving, if you’re, let’s say, using your marketing automation, exporting from your Salesforce system, and you want to find out how many clicks led to conversions, but you don’t have conversions set up, it’s not going to magically make that happen. Or if your data was corrupted for three days that month, and you weren’t aware of it, and so you’re missing that information, it’s not going to guess and be like, “Hey, I wonder what happened on those days? Let me just fill it in for you.” It’s going to amplify.

What Chris says is that AI is a very overeager intern who is a people pleaser. So it’s going to try to give you the answer regardless of how bad the data is. That’s why you have to work on that data quality before you give it to AI. It’s just a non-starter. If you haven’t done any kind of data quality checks first, don’t let AI just go ahead and make decisions.

John Wall: That sounds good. So at number four was, I was not expecting this. Everything that I had originally run with was stuff where we’re handling the data. You get data, and something gets screwed up. But there were a ton of votes that just came in for data breach. Somebody broke in, took the data, and everything went south from there. I don’t know, the big thing with that is data security. That’s kind of a whole, that’s not along the lines of anything that we do directly. There’s a whole bunch of vendors that you work with to lock up your website, lock up your internal IT. But yeah, I can see how that scares the hell out of people.

Katie Robbert: Oh, for sure. And to your point, John, we don’t do that work specifically, but we do have a framework that we would recommend to people to at least start thinking about it. This isn’t going to replace any kind of data privacy or governance, but we recommend the RAFT: Responsible, Accountable. Oh, gosh, I should probably know what they all stand for, and now I’m putting myself on the spot and exposing myself. But ideally, the idea is that if you don’t have any kind of policy in place, and I highly recommend that you work with a proper legal team to put that kind of thing in place, RAFT is at least going to get you started to think about it. So it’s R-E-S-P-E-C-T. Here we go: Respect, Accountability, Fairness, and Transparency.

We frame it in terms of responsible AI use. But I’ve also used it to think about data privacy because, really, when you go through all of those steps, it’s, “How are we asking people to give us data? Are we being clear about how the data is being collected? Are we being transparent? Would we want our data collected in that way? How are we securing that data?” A lot of these systems—if you build a contact form in WordPress or HubSpot, a lot of them have some of those security controls built in. Don’t skip them just because you think it’s going to be annoying to the end user to have to check “I’m not a robot” or to fill in those six scrambled letters to validate who they are. That’s.

Katie Robbert: Those are there for a reason. So you’re right, it doesn’t fall into data quality specifically, but data breaches are awful, and people can inject false data into your data, which then does become a data quality issue.

John Wall: Yeah, that’s. That was kind of in one genre of horror movie, and somebody. I was like, “Oh, yeah, that’s not in our thing.” But, man, that is absolutely horrible over there too.

Katie Robbert: It’s the Freddy versus Jason crossover.

John Wall: It’s.

Katie Robbert: You don’t want to see it coming.

John Wall: It’s a buzzsaw when it comes at you. No one survives that. Okay, it’s coming in at number three. Now we’re getting into the big guns. This is one. I, knock on wood, have never done this one, so I am like, wearing the badge of, “Oh, this is good.” But the mailing, the blacklist, this comes up all the time where you’ve got your email system, and for some reason on some day, somebody messes up, or in loading too, they’ll do this where there’s that list over there of, “Okay, here’s the people that said they’re going to sue us and they hate us,” and the crazies are always in that pile over there.

John Wall: For some reason, somebody grabs that and throws it right into the next mailing so that they get the $5 off coupon. And that’s what you always hear about from customer service. Customer service explodes immediately because those people can’t take one extra email. But, yeah, again, I was kind of psyched. I was like, “Yeah, no, that was one firecracker I’ve never lit myself.” But that’s definitely one. That’s a whole lot of pain when that happens.

Katie Robbert: Well, we can think about it this way. This is something that happens to a lot of people when they think about their email hygiene. So it falls under clean. It falls under complete. Sometimes someone’s unsubscribe link just stops working. So you think you’ve unsubscribed to something, and they’re not aware that you’ve unsubscribed. They’re sending out their weekly emails or whatever, and you keep getting the email. That is a data quality issue on their side because they don’t have a clean list of people who have opted in. They don’t have a complete set of people who’ve opted out. So that is a data quality issue. It goes back to sort of number eight, I think it was, where testing the links, making sure that things like an unsubscribe works as an end user.

You know this, there is nothing more frustrating than unsubscribing from something multiple times and having it continue to show up. It’s damaging to the reputation of the brand. It’s damaging to their email deliverability, and it’s really poor data quality on their side. So there are a lot of things wrong with that. I would say it’s not even as drastic as sending to the wrong list. Just not even having a clean unsubscribe process is a huge data quality issue.

John Wall: Absolutely. In most cases, those people are just a waste of resources in every way and form. They’re raising their hand saying, “Don’t talk to me.” If you ever have to talk to them, it’s a waste of time, and there’s never a chance of them ever buying anything. So, yeah, that’s just like playing with quicksand. You’re going to get stuck over there, which is another horrible thing, which is really overrated. But I’m getting pulled aside on quicksand.

Katie Robbert: Well, yeah. I mean, when we were growing up, how dangerous did we think quicksand would be when we were adults?

John Wall: Yeah, like every TV show, there was one quicksand episode that was just like an easy target. But yeah, no, there’s not much quicksand in New England, that’s for sure.

Katie Robbert: I think we’re okay.

John Wall: All right, so coming in number two. This one was interesting because this, I mean, it’s not urban legend, it was real. I guess the thing with this one was there were tons of people that it happened to that pointed out, but it hasn’t happened as much in current state of affairs. People who had this problem in the last three or four years were very few, but there were tons of people that had it, which is just these backups don’t work. Something goes south.

So you send somebody off to grab the off-site backup of whatever, and you find out that the kid next door has been using them as a magnet, or the tapes got flooded, or another one too, of like systems saying the backups are running, but they’re not actually running. So that whole idea. Even too, you see family members, normal people with this, with their computer gets hit with a virus, or something weird happens, a hard drive crashes, and they just have no backups of any kind, anywhere. So they’re like back to zero, complete clean slate. So yeah, process, huge process thing. You’ve got to have a backup process. Of course, as I said, now with everything on the cloud, people tend to trust the cloud to do it.

I get that, it’s a lot of work, and it’s a pain to do backup. If you want to trust the cloud, I mean, these are huge companies, it is their job. I have heard occasionally really weird stories of something getting lost here and there, but not that often. But yeah, if you do take a backup, and even if it’s just your time machine on Apple, just go in there once in a while and open up a file from three weeks ago just to make sure that if you lost something, you can get to it. Because yeah, one of the top nightmares is just waking up to a blank computer and having lost everything.

Katie Robbert: I think that, in terms of data quality and process, to your point, we as consumers and as businesses have become very reliant on third-party tools to be doing that work for us. In terms of our data quality and the credibility, it must be collected in a valid way. It must be complete. We need to make sure that we understand if we’re using a third-party tool like a CRM or marketing automation, what are their backup policies? We do some backup of our own data at Trust Insights, but we’re a small team, and if Chris is doing backups, I couldn’t tell you where they live. So that is a huge data quality issue for us that we need to address when he’s back from wherever he is.

It’s true for a lot of companies. They don’t know if there is a backup, where it lives, who has access to it, what’s on it, how much is being backed up, is it being overwritten every 30 days, or is it completely comprehensive of everything you’ve ever done? Those are really good questions to ask. So when you’re thinking about data backups, data quality, you can ask all of the 6C questions, like, “Is it going to be clean when I get it? Is it going to be one for one? Is it going to be complete? Is it going to be comprehensive?” And so on and so forth. What’s the point of having a backup if you can’t use it when you need it?

John Wall: Yeah, absolutely. When you need it, you bet. It’s got to be there. That’s just a no-fail thing. I will give a shout out for WP Engine, our website provider. We’ve used them for years. Chris and I have both been big fans of theirs, and just daily backup is part of the thing. If anything tanks, you can just go back there and click a button and say, “Make it like it was yesterday.” That kind of SaaS service is a wonderful thing. It’s worth the extra bucks to get that. All right, so at number one, this was the biggest one. This was the complete nuclear blow. I know this is nail-biting.

Katie Robbert: I’m nervous.

John Wall: So this is like an old marketing case study that was around. It was basically a company that did holiday promotions and sales. The holiday period was their greatest point of sales. It made the year for them. For the whole year, it was catalog mailing. So one of these companies where they’re going to print up a couple million catalogs, drop them in the mail again, pay a couple million for postage, and then the orders come in, and everything’s great. Somebody in the data department screwed up as they were putting the mailing together. Normally, they do this mailing for the top 10% of customers. The best of the best customers, the folks that buy like crazy. The top decile is who gets this catalog.

They screwed up, and instead of the top 10%, they got the 90 to 100th percentile, which somebody thinking 90 to 100 is great, but no, actually that’s the 10% worst. The people who never really buy by one candlestick every five years, kind of people. By mailing the bottom decile, it tanked the entire company. They ended up going bankrupt. The catalogs went out, none of the business came in. They didn’t have the money to cover everything, and it completely burned the business to the ground. That was the end of the place. They had to close up shop. That’s, they selected the wrong button and burned the whole place to the ground, which was.

Katie Robbert: The chosen data was incorrect.

John Wall: Yes, the select. They grabbed the wrong select. Because of that one file, everybody lost their job.

Katie Robbert: That’s brutal. Yeah. Yikes. I mean, and that’s the thing is, we talk about data quality as these six aspects, but when you bring it to that larger scale, it can have devastating business impact if you’re not thinking about basic data quality. You’ve covered the eight horror stories that you collected, and I feel like we’ve covered all of the tenets of data quality. When we were talking in the pre-show, you didn’t feel like you had a good example for credible.

John Wall: Yeah, credibility is, we talk about it as far as validity. It’s kind of like how you grab the data. Is that correct? I mean, I did have mailing. The blacklist is burning your credibility, but that’s not the same thing. So I mean if you could give us an example of where you would see credible. How often does that come up, and what usually involves that?

Katie Robbert: Yeah, absolutely. A couple of weeks ago, we did a very quick example, which you can find at TrustInsights.ai, YouTube, go to the “So What?” playlist, and were able to show that our Google Analytics 4 data was not being collected in a credible way despite the way that we had set up the system. We figured, we feel very confident with how the system is set up. It’s one of the things that we offer as a service to our clients. But despite that, we don’t have control over the system.

So the data, when we took a look at it compared to, I forget exactly what we compared it to, but we were able to see that there was enough missing data, there was enough unusable data that it made it not credible and collected in a valid way. So, think about your third-party systems and the data that you’re using to make decisions. Do you know the definitions of those data? Do you know if they’re sampling versus pulling everything? Another really good example of this is survey data. When Chris and I worked at the agency, one of the things that we used to do for the other teams was run the Google Consumer Surveys, which I loved. I love writing surveys. I love running surveys. You can do an employee feedback survey, any kind of survey.

The challenge that we ran into is that a lot of teams would approach us and say, “I want to run a survey,” and I’d be like, “Great, let’s talk about it.” And they’d say, “So my client has a headline that they want to run. I just need the data to back it up.” That is not how you run a survey because you’re already going into the survey with a bias of how you want it to be answered. Therefore, you’re asking questions that support the end result versus asking unbiased questions to find out what people actually think and feel about something. So that’s a big thing in terms of credibility. Think about anytime you see a methodology statement, or how we collect this data, or how we use this data, that’s where you would factor in credible. So a lot of it is credible.

Really, it fits nicely with market research. How are you collecting the data? I like to joke with you and Chris and everyone that I’m an “N of 1.” I am not a representative sample. I am not enough data to say I represent this many people. That also is credibility. If you’re pulling a small sample of information, is it representative enough? Is it statistically valid in a way that it covers me and you and Chris and Kelsey and our Analytics for Marketers community? My one opinion is not credible enough to cover everybody. We need more information to say, “Okay, now this is how everybody’s feeling about something.”

So that’s, I could go on my soapbox all day about credibility, but it really is, especially when we’re using third-party tools like our SEO tools and our web tracking tools, you have to think about how that data is collected, where they’re getting it from. Are they inferring it using now, in the day and age of AI and machine learning? If they can’t collect it, are they inferring it? Basically, are they sort of guessing at what it should be to make a complete data set? Those are the things that you really want to dig into.

John Wall: That sounds good. Yeah. It’s funny, Chris is on his flight to London, but somehow he is still. He’s either got enough bandwidth to check in or whatever, but he put his backup process over in Slack for you. So he is reporting in and validating for us.

Katie Robbert: I will have to look at that later, but thank you, Chris, we appreciate that. We’ve talked about the horror stories, and I feel like everybody has a horror story. It doesn’t have to be something that took down a whole company. It doesn’t have to be something that got you fired. It could just be a small, niggly little thing. But the next question is, “How do we fix this? How do we get ahead of it?” Especially when we’re trying to use AI paired with artifact. Wait, what did I just say? AI paired with our data. We want to have data that is ready. So that’s where things like, “Where did it go?” The AI-ready data quality audit comes in, and I am going to do a quick demo of this.

With the examples, John, that you were able to get from people, I think they were fantastic. What we didn’t get is we didn’t get samples of the data so that we could say, “This is how we would fix it.” So I am manufacturing another data horror story. The way in which I would try to get ahead of the data slasher coming to murder me in my sleep is I would do something like use Google Colab. I like Google Colab. Chris has talked about Google Colab because I’m not someone who’s about to sit down and write code in order to fix my data. I’m just not going to do it. It’s going to take me way too long. It’s a really poor use of my time.

I have access to Colab, which with our Google, we’re a Google shop. In terms of our workplace, Google Workspace, we have access to Google Colab, and people, I believe, can use a free version of Google Colab. So now that I’ve said Google Colab about 20 times, and I have sort of, much like Beetlejuice, invoked its presence into our data horror story episode. What I’m going to do is I’m going to start a new notebook. So when you open this system, when you open Colab, you are given this conversation box, and I’m just going to click on “New Notebook,” and it’s just going to be fresh.

Maybe what I’m going to do, and I need your help with this, John, is let’s say I wanted to look at the SEO data for katierobert.com. I want to see if it’s clean, I want to see if it’s complete, I want to see if it’s comprehensive. What I have, and people can’t see because I’m sharing my whole screen, let me see if I can fix this. I’m not as savvy with this stuff as Chris is. He is way more, he’s way better at this stuff than I am. Here we go. What I have is a set of instructions, and so the prompt, “Analyze the included data using the 6C data quality framework.” This has already been written out, and this is what we can do for other people. Is the 6C data quality framework clean?

It goes through a bunch of questions that you would naturally ask about the data. The thing that I have to do is what’s included here, and what I always recommend people do is have a sense of the 5P framework. You following me so far? Is this confusing? Is this straightforward?

John Wall: No, we’re right with the air.

Katie Robbert: Okay, so the 5P framework: Purpose, People, Process, Platform, Performance. Purpose: What’s the question you’re trying to answer? What’s the problem you’re trying to solve? People: Who is this for? Who’s doing the thing? Who needs to be involved? Process: How are you doing the thing? How are you collecting the data? How are you using the data platform? What tools? So in this case, where did you extract the data from? Where is it going? Performance: What is your expected output? So this is all part of your prompt as well. So the first part of the prompt are the six Cs and the questions you would ask. The second part of the prompt is your 5P framework. So purpose: The intended user of this data is a business user. Preparing SEO reporting makes sense. People: The end user is non-technical. Process.

The outcome from the end user is an SEO report explaining what happened in the company’s SEO program for the last 60 days. The time frame of the data is the last 60 days. Platform: The tooling of the user is a spreadsheet and slide software. Then performance: process the data file and produce a detailed, complete, comprehensive audit of the 6C data quality framework, which again, that’s the first part of the prompt up here. The final audit should include a document in markdown format in an outline with a diagnostic of each of the six Cs and rating on a scale of 0 to 10 for each of the six Cs based on your analysis. What that means is Google Colab is going to take my data file, which I’ve already downloaded from Ahrefs, our SEO tool.

Use whatever SEO tool you have or whatever tool you’re looking at, and it’s going to say, “Okay, based on this prompt, I have six aspects to judge this data file on.” So out of a score of 60, how did it do? It’s going to go through piece by piece. Because I’m giving it the 5P framework context, it’s going to know, “Can this person do what they’re intending to do?” Because otherwise, if you’re just saying, “Is it clean and free from errors? Is there no missing information?” you haven’t given it the context for it to be able to answer that question. So you need to pair the 5P framework with the six Cs. So now I feel I’ve been talking a lot. You still following me? Do we still make sense now?

John Wall: Yeah, this is it. We’ve got a prompt ready to run here. Are we ready to get the magic happening?

Katie Robbert: We’re going to get the magic happening. So again, all I have done so far is I have opened Google Colab, and I have said “New Notebook.” What I have down here, just like any other ChatGPT or Google Gemini or Claude, is you have the chat window. This is where you put your information. So I’m going to upload a file, and I’ve already done this, this is on my machine, and you can put it over here on the side. You can upload the file into here and just select it from there. I’m just uploading it straight from my machine. I am keeping this as simple as possible. Now I am going to copy the entire prompt that we just talked through. I’m going to paste it into the chat box, and I’m going to hit send, just like you would any other system.

The difference here is that Google Colab is going to write the code that another system would struggle to do and would probably ask you to do it for you. So it’s gone through. It says, “Happy to help.” See, there it is, the helpful, people-pleasing intern. “Happy to help. I can analyze the data using the 6C framework.” Colab is going to give you a plan, and it’s going to say, “I’m going to generate based on what you’ve asked. Here’s what I’m going to do.” The first thing I’m going to do is load the data, then I’m going to analyze the cleanliness, the completeness, the comprehensiveness, the chosenness, the credibility, and so on and so forth.

It’s going to go through the actual 6C framework. It’s going to generate the document. It’s going to finish the task, and you’re going to say, “Yep, that’s amazing.” Accept an auto-run. “Are you sure you want to auto-run?” “Yep, I do.” What you’re going to see down here is this is the plan that it just outlined for you. It’s going to go through step by step. So this might take a second, John, hopefully not too long. While we’re doing that, Chris is saying, “You’re doing great, boss. Thanks.” It’s funny, I’m very comfortable doing these kinds of technical demos. I’m just not comfortable doing them. Why? I usually have everything prepared in advance. I’m sort of, I don’t like to have it run live.

My little bit of nervousness is just making sure that the system doesn’t crash because I want to be able to show people something. But, John, when you’re, so you’re primarily responsible for sales, new business, what kind of data do you think that we would want to run through, like a data quality audit that you use on a daily basis?

John Wall: Well, the big thing is it’s the whole CRM file. How many of those emails have we talked to recently? That’s always. That’s the whole data pile that you live and die by every day. It’s just a matter of what’s in there. If you can have a scoring system to help bubble up stuff to the top, and then something else too, to clean out the ones that are garbage to try. In fact, that was one of the data stories that we had come in. Who was it that had that one? Sunny Hunt was talking about, she had a project where they deduplicated a record database from seven million down to three and a half million records in five months. So that kind of stuff just goes straight to productivity.

In the bottom line, if you can clean the data up so that all the activity you’re taking is, that’s just when you’re looking at a funnel diagram, you’re doubling that point of the funnel. If you can clean out that much junk.

Katie Robbert: Yeah, I think that makes sense. You mentioned lead scoring, and so I think perhaps that might be. We can cover that in next week’s live stream, if your game. Just as a little preview, one of the things that we did on a previous episode, while this is still. This is just about done almost. If you go to Trust Insights AI YouTube, you and Chris actually walked through putting together a sales playbook. The playbook is really comprehensive in terms of our services and our approach, and the sales frameworks and all of those things. We put that together, and so now that exists. What we don’t have in our CRM at the moment is lead scoring setup. We just haven’t gotten around to doing it.

We’ve been busy servicing clients and speaking and being on the road and doing things like what we’re doing in Google Colab. It occurred to me to experiment to see, “Could I take our sales playbook and could I take what we know about what we can do in our CRM and marry those things together to come up with a plan for lead scoring?” The answer was yes. Now, is it valid? I don’t know. I gave it to you to look over. So I’m still waiting for your feedback.

John Wall: Yeah, that. Well, and then the big question mark for us is, what does our CRM provider support? There are all kinds of different pricing, and unfortunately, our provider is a big fan of, “Oh, every time you want to do something else, it’s like, ‘Oh, well, you’re going to have to run the credit card if you want to turn that on.'” So yeah, if there’s enough there for us to kick around, I am definitely up for talking about that because it’s always interesting to see if you can pull some quality leads out of the stack. But yeah, it’s always a question of, “How much is it? How much are you really willing to pay for that stuff,” because, unfortunately, the majority of that stuff is second tier.

It’s like the customers and prospects that come up and want to talk and have a deal to get done. You don’t need scoring for those folks, you just work them through the cycle, and you go. So lead scoring is great for trying to find stuff that slipped through the cracks, but it’s usually stuff that’s not as warm. So yeah, it’s never top priority, but it is, the larger you get, the more kind of money you can find hidden.

Katie Robbert: So I think that it’s a great segue back into data quality because you’ve just touched upon a lot of potential data quality considerations when doing lead scoring. So we can talk about that next week. This week, we’re talking about a data quality audit. Now everything has run, you see, everything is green. So I’m just going to go ahead and close that and take a look at what we’ve got. What it’s been doing in the background is taking my data file. This is something else that, if you aren’t aware, you may have already run into this yourself. Regular, I’m going to call them regular large language models like an Op like a ChatGPT or a Gemini or a Claude, in order to be using. They are not great at reading CSV files or Excel files.

They kind of struggle with that. Google Colab, because it’s more of a developer-focused tool, is the place where you would want it to read a CSV file. So that’s why that’s what I gave it. What it’s been doing in the background, while John and I have been chit-chatting, is taking the prompt that I gave it to evaluate the data file through the 6Cs paired with the 5Ps, and it’s going to give me the results. It’s written the code for me, it’s done in the background. I didn’t have to do any of that. So take a look at what we have. So the task: “Analyze the data in the file using the 6C data quality framework.” It’s done that. It’s gone through step by step, and we saw those green check marks happening.

So it loaded the data. It took a look. This is sort of a snapshot of what it looks like and what the system was seeing. You see all the data. So step one, “Analyze the cleanliness.” So check for duplicate records, outliers, inconsistent formatting, so on and so forth. It did that. This is all the stuff that I was not interested in doing myself, that I’m happy to have the system do. So it did all of those things, and so we can keep going and examine unique values. It basically checked through based on the prompt that I gave it for the six Cs of data quality. So I won’t bore you with all the technical details. Happy to answer those questions over in our free Slack community Analytics for Marketers. Analyze comprehensiveness.

If we keep going, the thing that I’m most interested in is the end result because what you get at the end is an audit document. So, “Create a markdown document summarizing the data quality audit based on the six Cs. Score each C on a scale of 0 to 10.” At the end, you get this result. I could take this document, paste it into a text file, and start to go through it, which is probably something I’m going to do. I can see a high level, cleanliness: 8 out of 10. Completeness: 5 out of 10. Comprehensive: 6 out of 10. Chosen: 7 out of 10. These are way higher than I thought they would be. Credibility: 8 out of 10. Calculability: 9 out of 10. Overall summary.

So you have this information here, and then you get this really nice, easy-to-read summary. All of this work was done for us, John, while we were shooting the about lead scoring. This all happened in the background. So I have my diagnostic for each of the six Cs. I have my justification for why it was scored that way. What I also have is the more detailed finding and my insights and next steps. I have my roadmap: if I want to increase the score, let’s say it’s 40 out of 60, and I don’t want to use anything that’s below a 55. I have my insights and next steps of what I need to do to fix the data. So when presenting this data to a non-technical user, provide clear definitions for each SEO metric. Okay, great.

Explore options for addressing the missing data in August, the crawled pages. So it’s not a complete data set. That is a huge red flag. If I was saying I need to make big decisions about my website, and I need to increase my SEO, my awareness, I need to do all the things. I would be using incomplete data. That’s a big problem. Again, I’ve just run through everything pretty quickly, but any thoughts on that, John?

John Wall: Yeah, it’s solid. It’s a great way to just take the data file, throw it in there, and find out where the red flags are, and you’ve got your next steps to dig into and clean up. The big thing with this too is it’s definitely a “go, no-go” report. You can see if you’re getting ones and twos on everything. You’re like, “Okay, we have to totally go back to the drawing board. This is going to kill the whole project.” Whereas if the scores here are actually pretty high. So I would be interested in running the reports and seeing if the resulting reports kind of pass the smell test too. But yeah, overall, it looks like it’s a pretty solid file. It’s not that angry with it at all, that’s for sure.

Katie Robbert: No, not at all. It says the overall summary: “The dataset provides a high, valuable, high-level look of website SEO performance presented in a format highly compatible with a spreadsheet and slide software. It is generally clean and credible, likely originating from a reputable SEO tool. The chosen metrics are relevant for basic reporting. However, the data has significant limitations in completeness. It lacks comprehensiveness and missing crucial granular data. The data is also susceptible to various forms of noise which is chosen.” So it’s saying, “Yeah, you can use it. However, proceed with caution because these things could be a problem.” I just wanted to sort of give the rounded. We talked about the data horror stories, things that can happen commonly to all of us. But there are ways to get ahead of it.

So the data, the AI-ready data quality audit, which you can get more information at Trust Insights AI Data Ready AI-Ready Data Quality. I’ve now said certain words so many times that they’ve kind of lost meaning a little bit. It’s not going to solve all the problems, but it is going to help you, especially with that horror story, John, that you told of, “I gave my data to AI, and it just kind of fell apart. It just kind of failed.” You don’t want to do that.

John Wall: I feel like, in a classic horror story form, we’ve laid everything out, and now things seem to be good again. I needed to have somebody drag me off, like some last shock close, so we could get back to the fact that the data is always there and can always be a problem. But we can stick with the happy ending for this one.

Katie Robbert: All right, any final thoughts? So you’re going to be Clark Kent, Superman for Halloween. What are your kids going to be?

John Wall: My son is going to be Nightwing. We’ve gone DC Comics this year, so it’ll be Superman and Nightwing. My daughter is, she had a ’50s outfit that she had for school, but she’s going as a Lorax tomorrow for trick-or-treat. So she has a bunch of friends that are all going to do it. So there’ll be a pack of Loraxes. So yeah, we’ll see how that one goes. She’s always running with the pack, which is usually amusing. So we’ll have to get some pictures and see how that goes.

Katie Robbert: I love it. Well, if you want to tell us about your Halloween costume or join tomorrow’s Question of the Day, which is Halloween-themed, you can join our free Slack community at Trust Insights AI Analytics for Marketers. I don’t know, John. I’m exhausted. I did more talking this week than I usually do. Chris needs to come back soon.

John Wall: We need to get Chris back on. Chris, you’re somewhere over the Atlantic, but still watching safe.

Katie Robbert: All right, thanks for watching, everyone, and we’ll see you next time.

Speaker 3: Thanks for watching today. Be sure to subscribe to our show wherever you’re watching it. For more resources and to learn more, check out the Trust Insights podcast at Trust Insights AI, The Podcast, and our weekly email newsletter at Trust Insights AI Newsletter. Got questions about what you saw in today’s episode? Join our free Analytics for Marketers Slack group at Trust Insights AI Analytics for Marketers. See you next time.


Need help with your marketing AI and analytics?

You might also enjoy:

Get unique data, analysis, and perspectives on analytics, insights, machine learning, marketing, and AI in the weekly Trust Insights newsletter, INBOX INSIGHTS. Subscribe now for free; new issues every Wednesday!

Click here to subscribe now »

Want to learn more about data, analytics, and insights? Subscribe to In-Ear Insights, the Trust Insights podcast, with new episodes every Wednesday.


Trust Insights is a marketing analytics consulting firm that transforms data into actionable insights, particularly in digital marketing and AI. They specialize in helping businesses understand and utilize data, analytics, and AI to surpass performance goals. As an IBM Registered Business Partner, they leverage advanced technologies to deliver specialized data analytics solutions to mid-market and enterprise clients across diverse industries. Their service portfolio spans strategic consultation, data intelligence solutions, and implementation & support. Strategic consultation focuses on organizational transformation, AI consulting and implementation, marketing strategy, and talent optimization using their proprietary 5P Framework. Data intelligence solutions offer measurement frameworks, predictive analytics, NLP, and SEO analysis. Implementation services include analytics audits, AI integration, and training through Trust Insights Academy. Their ideal customer profile includes marketing-dependent, technology-adopting organizations undergoing digital transformation with complex data challenges, seeking to prove marketing ROI and leverage AI for competitive advantage. Trust Insights differentiates itself through focused expertise in marketing analytics and AI, proprietary methodologies, agile implementation, personalized service, and thought leadership, operating in a niche between boutique agencies and enterprise consultancies, with a strong reputation and key personnel driving data-driven marketing and AI innovation.

Leave a Reply

Your email address will not be published. Required fields are marked *

Pin It on Pinterest

Share This