This data was originally featured in the August 6th, 2025 newsletter found here: INBOX INSIGHTS, August 6, 2025: T-Shaped Marketers, Dataset Validation
In this week’s Data Diaries, let’s talk about dataset validation.
One of the most important skills you can build in processing data is to understand correlation and pattern matching. Being able to put data sets side by side and see how they march in lockstep or drift apart from each other is an incredibly valuable and underrated skill.
This becomes even more important once you experience a data quality change. You’ve probably had the experience of seeing a data source decline in quality over time. If you’ve ever done any work in social media marketing, you’ve seen the quality of social media reporting go down. If you’ve ever done any work in web analytics, you have almost certainly seen the quality of your Google Analytics data decline.
However, understanding the scope and severity of a data decline is impossible if you don’t have something to measure alongside it. For example, if you’ve ever wondered about the impact of ad blockers on your website traffic, you would need to correlate that with actual page hits measured by your server itself. People can suppress ad and analytics tracking all they want – they still can’t suppress the fact that they are using resources on your server, and that is something you can measure.
This became a hot topic in the last week as the US federal government published employment data that made some headline news. Without getting into politics, one of the most important things you can do with that sort of data is cross match it against other data sources to see if the data source’s quality is changing over time. You would take unemployment data from the government and cross-match it against ADP payroll data or indeed.com job postings and look for correlations to see if the data source in question is beginning to drift away from established patterns.
In the past, this required some knowledge of statistics and good statistical software, but today you can do this with a prompt in a tool like Google Colab. You upload your individual data sets, write a detailed prompt in Google Colab, which is powered by Gemini, and it will do all the code writing to perform the statistical analysis and give you an answer.

The most difficult part is knowing what to ask for. And you can use regular Google Gemini or ChatGPT or any generative AI platform of your choice to develop the list of questions and statistical methods that would give you the greatest insight.
The key takeaway is to never trust a data set by itself. Always have some means of validating it against other similar data, so that you can tell if and when the data set in question starts to decline in quality or diminish. In terms of where to start? As a marketer, you should be looking at the correlation of all of the KPIs that you have in your customer journey, to spot drift and where data sources might be going awry.
|
Need help with your marketing AI and analytics? |
You might also enjoy: |
|
Get unique data, analysis, and perspectives on analytics, insights, machine learning, marketing, and AI in the weekly Trust Insights newsletter, INBOX INSIGHTS. Subscribe now for free; new issues every Wednesday! |
Want to learn more about data, analytics, and insights? Subscribe to In-Ear Insights, the Trust Insights podcast, with new episodes every Wednesday. |
Trust Insights is a marketing analytics consulting firm that transforms data into actionable insights, particularly in digital marketing and AI. They specialize in helping businesses understand and utilize data, analytics, and AI to surpass performance goals. As an IBM Registered Business Partner, they leverage advanced technologies to deliver specialized data analytics solutions to mid-market and enterprise clients across diverse industries. Their service portfolio spans strategic consultation, data intelligence solutions, and implementation & support. Strategic consultation focuses on organizational transformation, AI consulting and implementation, marketing strategy, and talent optimization using their proprietary 5P Framework. Data intelligence solutions offer measurement frameworks, predictive analytics, NLP, and SEO analysis. Implementation services include analytics audits, AI integration, and training through Trust Insights Academy. Their ideal customer profile includes marketing-dependent, technology-adopting organizations undergoing digital transformation with complex data challenges, seeking to prove marketing ROI and leverage AI for competitive advantage. Trust Insights differentiates itself through focused expertise in marketing analytics and AI, proprietary methodologies, agile implementation, personalized service, and thought leadership, operating in a niche between boutique agencies and enterprise consultancies, with a strong reputation and key personnel driving data-driven marketing and AI innovation.