{PODCAST} In-Ear Insights: What is Big Data Analytics?

{PODCAST} In-Ear Insights: What is Big Data Analytics?

In this week’s episode, Katie and Chris walk through what big data analytics are. What criteria separates regular data and big data? What are the four Vs of big data? How do big data analytics play a role in marketing analytics? Why don’t more marketers use big data and big data analytics to improve marketing? Tune in to find out!


Watch the video here:

{PODCAST} In-Ear Insights: What is Big Data Analytics?

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for listening to the episode.

Christopher Penn 0:17

In this week’s In-Ear Insights, we’re talking big data analytics, what is big data analytics? Why do we care about it? And how do we get started with so Katie, let’s start off with just a simple what is big data analytics.

Katie Robbert 0:32

My understanding of big data analytics is it is the analyzing of big data, big data being large, like the term Big Data returns, it refers to volume.

And so it’s large quantities of complex data.

So it’s not just having, you know, five years of Twitter likes, it would be sort of the five years of Twitter likes and comments and sentiment, and all of the different elements that go into it.

And that is what comprises Big Data, essentially.

And so it’s basically data that is larger than you yourself, the human can handle.

So it’s more data than you can just put together and analyze into a pie chart in a spreadsheet, is the basic definition.

And so the analytics part of it, is, you need to find more sophisticated ways to analyze and do something with the data.

And that’s where machine learning and artificial intelligence and coding and all that stuff, Chris, that you do comes in?

Christopher Penn 1:37

Yeah, the tongue in cheek definition I’ve heard is big data is anything that doesn’t fit into spreadsheet.

The more formal definition is exactly as you said, IBM did a bunch of this years and years ago, and came up with sort of this four v framework that defines big data.

It’s volume, velocity, variety, and veracity.

So how much data is obviously a big part? How fast are you acquiring the data is important, right? If it’s just a static database, okay, you can work with that and take samples.

But if it’s like that, the Twitter firehose, and there’s millions and millions of new rows appearing every second, that’s definitely a big data.

Variety is the different types of data within your database.

So again, using social media, as an example, you have text, you have audio, you have video, you have maybe even interactive stuff.

And then the last one, which is always the biggest challenge in data science is veracity, how truthful how good is your data, right? If your data is filled with garbage and errors, then you’ve got a problem.

So big data analytics, is you can talk about in two different senses, there is running analytics on your data, right, which is, you know, how many what’s the sentiment of this, or how many people said this, but there’s also the analytics of the big data itself.

So how much at the data size, how the speed of the data, correctness of data.

So it’s almost like the diagnostics that tell you about the data, kind of like all the instruments on your car’s dashboard, telling you how the car is doing while you are doing the driving, right? If your speed, how much gas you have in the tank, and things like that.

So you have these two different branches of big data analytics, one is very much a data engineering thing.

And that’s, that is a profession unto itself.

And then the other is the useful part of of which is turning the data itself into some kind of insights.

So when we’re talking about big data analytics, I think for marketing, we’re mostly talking about the second case, which is taking usefulness out of the data.

But if you get the first part wrong, you can’t do the second part.

Katie Robbert 3:51

I remember maybe a decade ago, when the term big data just started sort of like catching on.

And I remember very, like I have a very specific memory of sitting in the VP of Technologies Office.

And he and I having a conversation I the product manager him the tech person of how do we introduce this idea of big data to the stakeholders, without them sort of latching on thinking it’s just a buzzword, because the buzzword we had just sort of like meeting them off of was the cloud.

And so now we had to introduce big data.

And so it did I mean, it it totally backfired.

Because all I heard about for six months was big data, we need to collect big data, we need to collect Big Data, and they didn’t know what it meant.

And so I bring this up one because it’s amusing, but too because I feel like with any kind of buzzword with any kind of term, especially around data, there’s just such a misunderstanding and so Before we dive into the analytics of it, you know, you and I have just defined what big data is.

But I want to put it in a little bit more of a concrete sort of example of an every day marketing team, does an everyday marketing team have big data? Or are they just working with data? Like, what is the difference for like, say, Trust Insights? You know, every day, we can go in go in, we can look at our Google Analytics, maybe we use some connectors to bring some other data into our Data Studio dashboards.

And so now we’re combining two different data sources, and three or four different metrics, you know, does that qualify as big data? Or what would we need to be doing in order to have big data.

Christopher Penn 5:52

So all marketers have access to big data? Right? There’s, there’s three levels or layers of technologies that we need to understand the data lake, the database and the data warehouse.

In order to make sense of, sort of big and small data.

A Data Lake is a one or more repositories of heterogenous data.

So this is all your social media data.

This is visits to your website, this is calls to your call center.

This is the number of impressions of a billboard on a billboard on the highway.

And it’s all scattered everywhere, it’s not structured, it’s a hot mess, right? It’s this is everything, you could wrap your arms around in terms of data.

And that data lake is sort of the first part of what marketers would need to do to start using data of any size big or small.

And you can tell just by listening to it, just how challenging that is, because you have so many different data types, there’s photos, and reels, and you know, stories on Instagram, there’s Tiktok videos, there’s all this stuff that has data.

And then there’s all the metadata, which is data that describes data around it, that goes into this big lake of some kind.

There are many, many systems, IBM DB to, for example, can operate data lakes.

From there, think about that, like a kid’s toy box, right? Where all the toys is thrown in a big toy box.

From there, you probably want to start organizing the toy box, right? Maybe you get some smaller toy boxes, escaping them for the stuffed animals in one way, put the Legos in another and stuff.

And now you’re creating data bases.

Right? The databases are that data cleaned up separated, useful, new and less useful stuff discarded.

So now you’ve got your box of Legos, and you’ve got your box of Matchbox cars and stuff like that.

Those are databases in our marketing terms that we’ll be taking, like, when we export data from Facebook, we get Facebook’s useless 44, tab Excel spreadsheet, right? So we have to take the data out of that cleaned it up and put it into essentially a rectangular form, which was a database, right? A single spreadsheet is effectively a database.

That’s an easy way to think about it.

From there, what we want to do is figure out, okay, if all the stuff that we’ve just compiled into these nice rectangular data tables, how do we extract summaries and useful bits out of it, because we don’t need everything, for example that Facebook gives us, we don’t need everything that YouTube gives us.

For YouTube, for example, we want to know the video title, the date, the number of views, the number of likes and comments, do we need to know, you know, percentage of premium card members? No, probably, we don’t need to look at that.

So we’ll take summaries and extracts of each of these tables, and put them into a meta table called a data warehouse.

Right again.

So if you have the kids toy box versus gather all the toys in one toy box, then you separate them out by individual toy type.

And now you’re just taking notes, maybe this is you’re trying to put together a holiday gift list for your kid.

Next thing, okay, well, you’ve got 44, Matchbox cars, and you know, eight of them are red, and you’ve got 28 stuffed animals, and most of them are over, you know, under 12 inches tall.

And now you’ve got these summaries, that’s your data warehouse that tells you essentially, you’ve distilled down your data into something that is usable, you can draw Quick Insights from a lot of the times for marketers, we depending on the company size, you may not have a data warehouse, you may instead have a Google Data Studio dashboard which distills down some of those things on for Trust Insights, for example, one of the things we do is we have this data lake called WordPress, right, which is our website with all the stuff in it.

We have databases in that, like our form fills, and then we have to clean those form fills out and and you know, knock out junk and stuff like that, so that we can summarize.

Okay, what are the things that marketers care about? We’ve that BigQuery table that we process behind the scenes, that’s a data warehouse.

So we’ve it’s like distillation, you have a bunch of grapes, that gets smashed out and fermented into a wine and then that gets distilled down into a brandy.

Those are the three layers every marketer has the potential to be using big data.

And big data analytics is the process of converting all that stuff into that distilled product that you can then say, okay, gosh, Katie, it seems like people needing help with their analytics is the thing that has been recently on people’s minds.

So we should create more content about that, that’s the process of big data analytics, it’s over a little over supply, there’s a lot more nuance to it.

But for marketing purposes, that’s a good place to start,

Katie Robbert 10:31

it occurs to me that you just described exactly how I organize all of my camping gear, I have all of this gear.

And then I got little packing cubes that I can label.

So I have like, coffee, and everything that’s associated with coffee, I have, you know, the, the, the pillows and the sleeping pads.

And I have like, you know, light sources.

And then I have like fire starter things and kits.

And so it’s all broken down into little cubes.

And I don’t need all of that stuff for every single trip.

And so I can say, Okay, I have these 10 cubes.

For this particular trip, I only need eight of them.

And thankfully, because I organized and labeled them, I can just grab them all and stick them in a backpack and go.

And so I really like that analogy, because I think it’s something that you know, you know, put it into, you know, travel like people would their packing cubes, or like your kitchen cabinets, like, this cabinet is coffee, this cabinet is dishes, this cabinet is perishables or whatever, don’t come to my house, my kitchen is not well organized like that.

So I like that analogy.

And so bringing it back to marketers, I feel like I can see they’re sort of probably this blend of, I’m just using the basic data just to get by just to sort of see what the heck is going on.

versus the other side of it, which like, I have so much data, I don’t even know where to start, it’s a mess.

And so I feel like finding that in between of companies that have well organized data, data, they can just grab off a shelf and go today, I want to look at, you know, just the social media data, let me just grab that and take a look at it.

That’s the ideal.

But I feel like that doesn’t really exist.

And so when we’re talking about big data analytics, we’re talking about that ideal state of, I have all of my data neatly compartmentalized on the shelf, I found it from all the different places I’ve organized it.

And now when I just want to look at visits to my website, I can just grab that container, pull it down, take a look at it, and then put it back.

And why why is that so hard to achieve?

Christopher Penn 12:46

Because most companies don’t invest in data engineering.

So data engineering are the people processes and the platforms that are responsible for getting all the data from wherever it is putting it into a, a some kind of big data system.

So Apache Hadoop would be an example or Apache Spark, something that can ingest crazy amounts of data in every conceivable format, still attach metadata to it, and begin that process of categorizing and sorting.

And that’s actually where a lot of AI and machine learning comes into place.

Because you do have to do a lot of that.

Categorizing and sorting and labeling as the data is coming in when you’re talking about real big data.

Because of the velocity, right? The second V, you’ve got to be tagging that stuff as it comes in so that it is it is categorized because you can’t, you simply can’t process that data.

Once it’s in place.

When you’re talking, petabytes and zettabytes worth of data, there’s no way for you to to easily classify, you know, 100 trillion records, right? When you look at web analytics, or app analytics, like Netflix, Netflix is tracking every single thing that you tap on, on your on your screen.

It’s watching how long you watch a movie, you know what movies in which order you watch them, that data is category categorized and tagged as it’s happening so that their systems on the back end can more easily digest it down.

When we talk about marketing, most of the time, we are dealing with situations where people don’t have the infrastructure upfront to ingest the data.

They don’t have the governance to tag it.

And then they don’t have the systems to turn that into databases and then data warehouses.

We have a client right now that’s doing a major data warehouse project.

And you can see that they’re struggling so much because all the existing systems they don’t have a data lake right there.

They’re trying they’re they’re trying the inadvisable procedure of taking data straight from raw sources and putting into a data warehouse So instead of putting a data lake like Apache Hadoop, then turn to databases, then turn it to a data warehouse, they’ve missed some steps.

And that’s a data engineering problem that if you don’t know the architecture, and you don’t know the way the system is supposed to look, it’s very hard to make good choices for extracting the data.

So as a marketer, and this is where that process and that people part is so important.

As a marketer, if you don’t know what a data lake is, and you don’t know that you should have one, to bring all your marketing data into, then you’re just going to keep struggling, you know, cherry picking data, you know, left, right and center to try and make sense of it.

Or the cost may not be worth the returns, like, at Trust Insights, we don’t have a Hadoop cluster, right? We don’t really need one, we are doing okay, enough, right now, do exactly that cherry picking the data from the different sources, converting it and then putting it into performance, we need it to be and as we grow, that may be coming to this entity, we may have to do that at some point.

And a lot of what you see happening now in marketing technology with things like customer data platforms, are the band aids that marketing services, software companies are trying to apply to the fact that nobody did the data engineering properly.

Katie Robbert 16:19

You know, as you’re describing it, you know, it’s I do find it interesting.

And, you know, you and I have both been in those conversations where the lack of investment in data engineering is still so surprising to both of us.

But we are both, you know, on the side of, it’s the foundation, and you know, it’s it, you know, I find it interesting because a company like Netflix, especially when they moved from their physical product to an online streaming platform.

They’re unique in a way where they knew that part of their product had to be a recommendation engine.

And therefore, they put data engineering as a priority to, you know, building and launching this, you know, online service.

Whereas, let’s just take Trust Insights, for example, we are not a recommendation engine.

And so if you and I were not who we were, then we might not put data engineering as a high priority, because we just want to sell things.

And you know, why do I need to spend money on data engineering, when I’m just trying to sell T shirts and hats and, you know, Crocs and whatever? And so I guess the question I’m trying to poorly ask is when it comes to big data analytics, and data engineering, how do we do a better job of convincing companies that this needs to be an upfront priority, not an afterthought, so that you know, 20 years down the road? You’re not struggling to go, oh, I can’t even figure out what my customers want.

Let her know how they’re coming into my website.

Christopher Penn 18:01

Why haven’t you organized your kitchen?

Katie Robbert 18:04

Well, because I’m lazy, and it’s hot.

Because it feels overwhelming, because it feels like a big, daunting task.

And I’m willing to pay someone else to do it for me.

But they have to do it exactly the way that I want it to be done.

So I want to tell them how it should be done.

I just don’t want to do the work myself, and finding that person to do the work for me, and then having to pay for it.

And then maybe not having it exactly the way that I want it is also daunting and overwhelming.

Christopher Penn 18:37

And you just answered why companies don’t do that data engineering.

Katie Robbert 18:41

I really tried hard to answer that question properly.

It’s, you know, but but that so that becomes a question of why didn’t I organize my kitchen when I first moved into my house, so that it was organized correctly, instead of just throwing stuff in the shelves and saying, I’ll deal with it later.

I know why I should have sort of, I’m trying to draw the parallel between, you know, my lack of organization in my kitchen, and companies not putting a priority on to data engineering from the get go.

Christopher Penn 19:17

But it’s the same thing, just it’s, you know, at the end of the day, everything that we do, all the decisions we make are human decisions, right? And when you look at the way companies behave, it’s just an aggregate of a bunch of human decisions, so that I’m unwilling to do the work myself, but I’m also unwilling to have somebody do it in a way that I don’t want it done.

accurately describes the situation that most marketers are in with the added challenge of I also don’t know what happened things I might get to do.

Right, at least in your kitchen.

You know, like this is a blender I know what a blender does.

I know why it’s in here.

When you put something like a new social listening tool in front of a marketer or you put in a Hadoop cluster right? markers like What is I don’t even know what this appliance does I look at it and I can’t even tell like what its function is like, what is it a blender is does it cook things? Does it chop things? Like I don’t, I don’t even know.

And because of that knowledge gap.

And these are really big knowledge gaps.

We don’t marketers don’t have the language to describe what they could or could not do.

They know their purpose.

But they can’t deconstruct that purpose into the people, the processes and the platforms, they need to accomplish that purpose.

And that’s why big data fails.

That’s why big data analytics fail.

Because you don’t understand those components.

You don’t know what’s needed in each of those components, though, therefore, you can’t make good decisions.

It’s like, if you didn’t know what any of the dials on your dashboard of your car did, could you drive the car? Probably? Could you do so safely? No.

Right? You’d be going 85 miles an hour and a 25 mile an hour zone.

And you would have no idea that that’s a bad thing, because you don’t know how to use a speedometer.

You don’t even know what a speedometer does.

And so how do you get to that point? That’s where the knowledge gap has to start? Is you you have your purpose, and presumably you understand your purpose.

So now, who are the people? What are the processes and weather technologies you need to accomplish that purpose.

And this is where strong collaboration with your IT team with your agencies and stuff needs to be a priority sit to you know, have beers with your IT team on Fridays and say, so we got this thing we’re trying to do, how would you guys do it? Right and and they don’t understand marketing, you don’t understand it.

So you’ve got to start that cross discipline dialogue.

Katie Robbert 21:45

I would add to that also, it’s not a set it and forget it kind of a thing, like you can’t set up, you know, your big data infrastructure, your data lake, you can’t set up all of these things, and then just not maintain it.

So again, bring it back to that example of my kitchen cabinets.

Who among us doesn’t have a cabinet that every time you open it, you get avalanche with Tupperware, you know, lids, containers, all those sort of things.

And my husband and I, probably about once a quarter, take the time to reorganize it match everything up.

But then we both collectively failed to maintain it because it takes too long.

When I’m unloading the dishwasher, I just kind of throw stuff in where it’s supposed to go, he does the same, and then we look at it go.

Why couldn’t we just keep this organized.

And I feel like, you know, the companies that do put a priority on data engineering, it’s the maintenance piece of it, that’s where it also falls apart.

And so, you know, look at, you know, any of the hundreds of millions of companies who have hired consultants to come in, give them recommendations to do the thing, the consultants go away, and then everything falls apart, because nobody’s willing to put the lid back on the Tupperware and put it in the right shelf once a day.

Because that takes too long, I just want to throw the data in there and say good luck, I’ll deal with it later.

Christopher Penn 23:14

Or, even worse, you have executives and stakeholders who are like, Oh, it’s somebody else’s problem.

And then they don’t even consider the processes needed in place to make big data analytics work for you.

To your point.

This morning, I was rewriting some of our code to handle the big gradient tables that we use for for form data processing.

This is stuff that needs maintenance, this stuff that, you know, I have on my list, you know, once a quarter, just go look at our software and make sure everything is still running.

And every now and again, I get a lovely surprise, oh, this library stopped working or this library is changed.

But that comes from our purpose.

As an analytics and management consulting firm, our purpose is to help our clients make better decisions.

Therefore, our tools need to be kept in working order, right? Well, last week, I was working on some predictive analytics stuff, I ended up rewriting our entire predictive analytics library, we had to just throw everything we used to do out and rebuild it from scratch.

It’s a lot more efficient now.

But if that wasn’t a priority, if we didn’t set that, oh, that was a purpose, it wouldn’t have gotten done.

And then in three months, we literally would have stopped working because that library is no longer supported.

We have to do things very differently.

So there’s it always comes back to purpose.

And what is what is a priority? Big Data analytics is it’s just another set of processes and technologies.

If you don’t make it a priority, it’s not going to get done.

Katie Robbert 24:50

And so as we’re sort of, you know, wrapping up our conversation about big data analytics, that’s sort of like the end state.

There’s a lot of work that has to be done.

Before you can even get to analyzing your, quote unquote, big data, and that comes from, you know, good data quality, collecting it, collecting it in such a way that is consistent, making sure that the data that you’re collecting is clean, making sure the data that you’re collecting is known and organized and accessible.

And then, you know, you would talk about, you know, Chris, like, you might have photos and videos and clicks and likes, those are all different kinds of metrics that you can’t just shove together and say this is the analysis, there needs to be some normalization, maybe the date stamp from Twitter comes in differently from the date stamp from your CRM, I was doing a very basic analysis last week of Google Trends and Talkwalker data.

And that was the first thing I had to do in order to see, do my dates even match up because the date stamps, one came in as a string, one came in as you know, the date, month year.

And it’s those small details, I call them small details are actually very big details.

But that part of the planning that people tend to skip over and just go to the big data analytics, the, you know, tell me what I have? Well, I don’t know you didn’t even give me the keys to your front door.

Christopher Penn 26:20

I’ll leave you with this quote from world of IBM is rolled of Watson back in 2016, I think it was one of the most insightful things that took the wind out of a lot of people’s sales, but it’s 100% True.

If you’re not good at small data, you won’t be any good at big data, right? So get your people and your processes in place to deal with small data, like normalizing dates, or just even knowing where your data is.

Once you get competent at small data, then you can graduate to big data, but not before

Katie Robbert 26:49


That’s exactly true.

You can’t just throw a piece of technology on top of a problem and hope for the best.

I mean, you can it’s not going to go well it’s going to be a big ol waste of money and time.

But getting the people and the processes straightened out beforehand is going to save you a lot of headache in the long run,

Christopher Penn 27:06

and a whole bunch of money.

If you’ve got comments or questions or things that you’ve thought about with big data, pop on over to our free slack group go to trust insights.ai/analytics for marketers, where you have over 2500 other marketers are asking and answering each other’s questions every single week.

And wherever it is you watch or listen to the show.

If there’s a place you’d rather get it.

Go to trust insights.ai/t AI podcast where you can find the show on pretty much every channel.

Thanks for tuning in.

And we’ll talk to you next week.

Need help with your marketing AI and analytics?

You might also enjoy:

Get unique data, analysis, and perspectives on analytics, insights, machine learning, marketing, and AI in the weekly Trust Insights newsletter, INBOX INSIGHTS. Subscribe now for free; new issues every Wednesday!

Click here to subscribe now »

Want to learn more about data, analytics, and insights? Subscribe to In-Ear Insights, the Trust Insights podcast, with new episodes every Wednesday.

2 thoughts on “{PODCAST} In-Ear Insights: What is Big Data Analytics?

Leave a Reply

Your email address will not be published. Required fields are marked *

Pin It on Pinterest

Share This