In this episode of In-Ear Insights, you’ll enjoy the Data Science 101 session at Social Media Marketing World given by Chris. An off-the-books speaker meetup talk, this is the first time this content has been shown publicly, and due to A/V issues, the slides were hand draw and can be found below.
Enjoy the workshop!
Subscribe To This Show!
If you're not already subscribed to In-Ear Insights, get set up now!
- In-Ear Insights on Apple Podcasts
- In-Ear Insights on Google Podcasts
- In-Ear Insights on all other podcasting software
Advertisement: In the Headlights Newsletter
In-Ear Insights has a companion newsletter, In the Headlights. Every Wednesday, we publish original perspectives, fresh data about marketing analytics topics you care about, and recap some of the most important articles in marketing analytics, data science, social media, SEO, and business leadership. Subscription is free.
Listen to the audio here:
- Need help with your company’s data and analytics? Let us know!
- Join our free Slack group for marketers interested in analytics!
What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for listening to the episode.
Christopher Penn 0:00
So this is data science one on one, the lowest tech presentation here.
What do I do today? This is a presentation that has not been given in public yet.
So what I’m going to walk through and I would love for you to give feedback questions you have, you can either ask them along the way, or you can ask them at the end, either one is fine.
And the goal is for you to understand what data science is why it matters to us as marketers, and then sort of the different pieces and what they mean.
And then afterwards, hopefully, it encourages you to start at least examining Is this a path that’s right for you? If it’s not, who should you be working with in order to make it work? So let’s get start.
Data science itself is fundamentally this meaningful insights from data using the scientific method? Right? marketing hasn’t been so good about using data, right? For those of you who are in my talk yesterday, you saw in the CMO survey that We’re down now to 37% of organizations use data to make decisions.
According to the most recent cmo survey, notices 30 30% anyway, it’s basically 70% of people are guessing and making shit up.
For decision, you look shocked.
This is gonna be shocking.
It’s it’s never gotten above 40% ever in the in the history of the survey, and so we’re bad at using data.
And but the irony is, if you look in that same survey, when asked CMOS say, of the things deliver marketing impact, more than social more than mobile is analytics.
So we know it works, but we’re not doing it.
To which I don’t understand people’s logic, except that they think it’s like this mysterious dark art that you know, you like sacrificing goats and cauldrons and stuff.
It’s not it’s just math.
But the key is the meaningful insights from data using the scientific method.
That’s what data sciences that’s why we care about as composed of four things.
It is business skills and business acumen, being able to understand the business and know what the business is about.
Because a lot of what happens in big companies is you have a data science team that are filled with hardcore scientists who don’t necessarily understand the company.
They know how to do the tests and stuff, but they don’t understand the logic behind it.
In this whole coronavirus thing is there’s been some fascinating examples.
Johns Hopkins University has been publishing their code that models the virus and all its its stuff and it’s so fasting because they write the same programming language I write and I download their code.
I can look at the math, I can see like okay, you’re using you know, a remote time differencing are using seasonal ARIMA cool, I get that, but I don’t understand any of the other decisions because I’m not an epidemiologist.
I don’t know why they made some of the design choices they made.
So that first area is business business and domain expertise.
You’ve got to have that.
The second area is the scientific skills the ability to to think scientifically use the scientific method, the 30 Are the technical skills, data engineering, coding, etc.
And the fourth area is the mathematics statistics, probability, linear algebra, more complex types of algebra calculus, for, by the way, it should be something of a reassurance.
I failed statistics in college.
My final score in stats was 37 out of 100.
The only reason I graduated that classes because they had graded on a curve because no one student got above 50% on the exam, because the teacher was a phenomenal mathematician.
Outstanding world class publisher couldn’t teach to save his life.
Right? So he immediately of the first day classes launches in a row like but he had no idea what he was saying as we all fail, but you do need those mathematical skills and you can learn them.
I had to reteach myself statistics from the beginning because my college experience did not do such a job for me.
Now, that’s a lot of stuff packed into one discipline.
It really is for jobs for the salary of one
What data science is not data science is not analytics.
Analytics is just the process of, can you figure out what happened based on data and whatever tools you want What happened? It literally comes with the Greek word online, which means to unlock or to loosen up.
And so can you get stuff out of your data? That’s analytics.
Data Science is not reporting, reporting, data visualization, showcasing dashboards, all that stuff.
You may use those tools and methods in data science, but fundamentally, that is not your job reporting.
This is separate disciplines entirely.
You data science is not statistics alone.
Again, stats is a big part of 25% of the job, but it’s not the whole thing.
Data Science is not engineering.
So one of the big things that you’ll run into later on is you will run into limit technical data engineering limits.
So you will need the help of the IT department in some cases, or you know cloud services to deal with large volumes of data.
Once you get past a certain point.
data becomes a manual Just the data alone becomes a challenge.
When you look at, for example, how natural language models work in machine learning and AI, some of these things are billions and billions of bytes of data that machines have to crunch.
And that’s a whole other engineering effort.
When you look at how LinkedIn works, LinkedIn has seven core servers around the planet with 16 petabytes of RAM, they’re trying to keep everything you see on LinkedIn stored live on those seven computers to constantly in sync.
The engineering behind that is astonishing, right? That’s not data science, but you may use data science methods to make that work.
And finally, data science is not machine learning and AI.
Machine learning and AI use data science very heavily.
But the fundamental output of data science is an answer to a question.
It is a theory that you can put into practice.
The fundamental output of machine learning and AI is software machines that have written software, and you’re putting that into production.
So they’re very different outcomes, but they use a lot of the same methodology.
It’s kind of like the Science is like grilling, right? Then you have baking and boiling and poaching and all these other methods.
And there’s a lot of similarities, some of the same ingredients and things.
But fundamentally, you can’t really grill soup.
Right? You can try, it’s not gonna go well.
And so these other disciplines share a lot of commonalities with data science, but they’re not data science itself.
So why do we care about data science if it doesn’t make software and stuff like that? Well, first, faster decisions.
Right now we have two pathways to decision making and marketing.
One is struggling mightily to try and prove things rather unscientifically and to was making Shut up.
Neither of those are create the best possible outcomes.
We want faster decisions, particularly in an environment where we’re getting flooded by data all the time.
Second, we want to lower costs and risk.
I am astonished at the number of companies that will say we are going to invest in this million dollar campaign for an ad campaign.
We asked him say, Okay, well, how do you know? That’s right audience? I don’t know.
I’m just gonna do the campaign.
So you could potentially waste a million dollars.
Well, how about testing costs money? Yeah, $25,000 worth of testing.
But that’s a lot of money.
Yeah, but you’re good to spend 1,025,000 to not waste a million dollars.
So the idea being with data science, if you can show people that there’s a a practical, repeatable methodology, maybe they’ll waste less money, and then hopefully, we all get the money as a bonus.
Third is beating competitors.
So when you use data science methods to investigate to prove or disprove hypotheses, you can discover new things that you didn’t know before that you can use as competitive advantage that especially competitors who don’t have data science capabilities, will never discover I was doing a project in the hotel room the other day, looking at medium posts.
So I’m going to look at like why what makes a medium article work really well.
So pulled all this data, this experimenting with it playing around that had a hypothesis.
Maybe You know, the search value or referring domains drove traffic to medium posts.
It turns out, there’s very few metrics that actually indicate what makes an immediate post successful, at least on the surface, I’ve got a lot more work to do on that front.
But if I discover what it is, then I have an advantage over everybody else on medium trying to publish because I will know what works.
And I can go and test it and prove and make a theory.
So if you do that, in your industry know if there’s a certain like, trend that makes people want to buy high end audio equipment, what is that trend? Can you find it before somebody else does? Take advantage of it and run with it? If that’s the case, data science is a way to do that.
And the fourth is that new opportunities, what are the ways that you can discover things that you simply did not exist as options, you know, whether it’s new product lines, whether it’s new audience segments, the the example I’d love to give here, how many of you are familiar with my little pony, the toy line? Okay, how many of you are familiar with bronies? Okay, so the view is Those who seem Netflix special know exactly what I’m talking about.
For those who have never heard this term bronies are a collection of men typically 26 to 50 years old, who are like rapidly in love with my little pony culture, spend exorbitant amounts of money, do the whole cosplay thing, etc.
If you follow a traditional methodology, using demographics and whatever, you would set all your ad budget to the eight to 13 year old girl, right? And maybe their pants, and you totally missed this whole other audience that have much more disposable income and willing to buy in like large quantities.
That’s a new opportunity that unless you were using site, the scientific method to explore and find new opportunities you wouldn’t know was there.
You might discover it by accident and then get in it Netflix special made out Yeah.
But it wouldn’t have been intentional.
So using data science to find those new opportunities.
So that’s what essentially data sciences and why we care about.
One of the first exercises that I think is so important for when people say how do I get started with it.
It’s still Understand what data you have.
And the first place to do that was with KPIs key performance indicators.
It’s the number that you get bonus for you get fired for.
Right? That’s a KPI.
One of the first exercises that are so important to do when you’re trying to get started with data science of marketing is duplicate API map, take your funnel, whatever it is, and ask, what number get somebody fired at the bottom of the funnel at the top, what number gets you fired if you don’t do your job? And then who gets fired? So revenue goes to zero, right? Everybody gets fired.
But first of all, the CMO was the first one that had it on the block, right? If close deals goes to zero, the VP of sales is getting canned.
Right? If leads goes to zero, the VP of Marketing King hand if audience goes to zero, your brand managers getting hand.
You want to know this because when you start working in data science in marketing, you need to figure out whose data you’re working with and then what is the outcome that the They care about the most that’s going to get them fired or promoted.
Getting data science project started in a company is
Unknown Speaker 11:09
Christopher Penn 11:12
No, getting a science project starting company requires some buy in from executives.
And the easiest way to get someone’s buying is to say, Hey, I’m going to help you not fire this year.
That’s the starting point.
So when you do that mapping, you can figure out you’ve got this data, who’s going to be most impacted by it? And then how can we work together with them to get budget to get interest to get executive sponsorship, things like that.
The easiest way to build what is KPI maps is look at your org chart.
And then, in your org chart, every single person, what number you can get fired for.
And if you find people who like I don’t know what this number is, then guess what, you can probably remove that position from your company because they don’t do anything.
I actually have had conversations with a very large technology company, which will remain unnamed.
And somebody asked on conference call so you know, what are you held accountable for and the guy was like, I don’t No.
shop to work.
I’m like, really?
Unknown Speaker 12:06
How does that work?
Christopher Penn 12:10
So if you have if you do this exercise, you’ll be able to figure out whose data you’re working with.
And really zero in on what can we do to to find what makes that data move, whether it’s close deals, or new leads, whatever the case may be b2b, b2c doesn’t matter.
So we talked about the business side, let’s talk about the science side of data science, the scientific method for those of you who fell asleep in science class in high school, the seven step process, question define, predict, test, collect, analyze, refine, and observe.
So the hardest part of data science is that first question, what is it that we’re trying to solve? What interesting problem do you want to solve? In my case, the other night was medium.
I want to know what makes a medium post work.
For a lot of us in social media marketing.
Does anything we’re doing in social media Any impact, right? I think I heard that question more than any at this conference is I got all this data.
But I don’t know if I’m moving the needle.
So you have that first question again, that first question goes back to the business document, knowing what questions to ask.
Second step is to define the data you’re going to need.
So if you have a question about, is social media having any impact? What data do you need? To answer that question, you need your activity, you need the metrics around your accounts, you need maybe data about the posts and the content, you’re creating content types.
You need an outcome one of the KPIs to measure against, and you need to be able to get and you need, what format is as a numbers? Is it words what kind of information are you trying to gather? The third step is that prediction the hypothesis.
This is the part that everyone screws up a hypothesis, a provably true or false single condition.
So if I say I’m not sure my social media has impacted But I think it might.
That is not a provably true or false statement.
That’s just a vague talking to the air.
If I say, my activity on Facebook has boosted my lead generation by 5%, that is a statement I can prove true or false.
I can test that and prove Yes, it’s above 5% above or No, it’s not.
Now in the beginning stages, any data science project, there’s a temptation to want to try and answer all the questions at once.
But that hypothesis is so important because if you’re not clear about a testing one thing, unless you’re really good at like advanced calculus, you can only answer one thing at a time.
We see this the most in a B testing.
When people turn on Google Optimize or whatever on their site, and they start funneling traffic to their site.
Like oh, we could test this, we could test this button color in this text here and change this picture here.
And you’re like, Okay, cool.
You’re going to be like a million visitors a day to get a statistically relevant result.
And you still don’t have a clear hypothesis.
You’re just kind of guessing.
So you need a framework to figure out what exactly you’re going to try and test what is your hypothesis.
We were talking before the session began about Bob stones 1968 direct marketing framework list offer creative when you’re doing any kind of direct mail he was postal mail back then.
Do you have the right audience? Do you have the right offer? Do you have the right creative? Now when you translate this to modern day A B testing on the web, a lot of people go right to the creative change the button colors change the font change, the language will great.
But if your audience is 55, and up, and you’re all your traffic is 25 and below, it doesn’t matter what the offer the creative art, right? Because they’re not going to buy it because they don’t want your product.
It’s the wrong audience.
So when you’re doing your prediction, make sure that as you do that prediction, if it’s if you’re struggling like means you have probably incorrectly defined the problem.
You gotta go take a step back and fix it.
The good news is for a lot of data science projects, the test collect and analyze phases are handled by software.
Now, Google Optimize Optimizely hot jar all these different companies offer testing facilities with varying degrees of statistical relevance.
But they all break down your hypothesis is not clear.
Survey Monkey, same thing.
At the after you’ve got your results, you refine your hypothesis.
If you did that test and you prove that Facebook actually got 10% lift on your leads, then you refine that to say, I believe that Facebook generates 10% of the phone leads you refine your hypothesis.
If you need to, you can go back and start the process.
If it turns out your hypothesis was wrong, you start the process again and try and figure out where you went off the rails.
Once you refined it and you got a theory, she has a repeatable proof.
You observe it and see if it continues to hold true.
You may say for right now, Facebook delivers a 10% lift in Leeds.
And then tomorrow Mark, Mark Zuckerberg pulls the rug out from underneath you again and your conversion rates go to zero because he can right and you didn’t give them enough for your wallet.
And so we have to Continue to observe to make sure that our theory continues to hold true.
That’s the science part.
This is the part that’s missing from almost all data science.
When you talk to people, when you think about hiring a data scientist or hiring a data science agency, this part goes missing, people rush right to the technology, and ignore the science part data science.
So as you talk to people, as you consider hiring agencies or in house staff, make sure they’re actually doing science.
The next section is the math.
We’re not going to do a lot of math here, a because you’re not set up for it and be we have we’re doing a paper on marbeth a college class.
One important lesson about math and numbers in particular, in data science, but also in analytics and reporting, you absolutely need to compare and contrast numbers.
Numbers, by themselves don’t mean anything.
They have to put in some form of context.
So if you come up with a, you know our lift on Facebook activism, 11% great.
So what? Compared to last month it was 8%.
Okay, we got a change, or compared to Twitter, which is 4%.
Okay, now there’s a context that we can understand.
So even if you don’t do any of the rest of the math, just keeping in mind that comparison and contrast helps you make decisions with data is going to drastically change how you do your reporting.
There’s four types of math that to get started with, you should know.
So there would be a slide for this, but I’m gonna have you imagine this instead, don’t close your eyes because you will fall asleep.
It’s the afternoon.
It’s like, imagine your Google Analytics, right? you’ll open up Google Analytics, you’ll see the little wine chart and stuff like that.
What does that tell you? What does that show you, they’ll show you the last 30 days, but it doesn’t really tell you anything, you can extract a whole lot of meaning from it, except maybe it’s going kind of upwards or kind of downwards.
Now, if you were to take that information out of Google Analytics, it’s put it in an Excel spreadsheet, and you sort it by, you know, greatest number used to least number users, you get kind of like this little bar chart, right? And there’s a Tech, there’s a mathematical concept called measure of sensuality.
We call it things like average, or median, or mode, statistical terms.
The median is literally what is the value in the middle, the averages, take all the things, add them up and divide with a number of things.
That’s the average.
A measure of centrality is important because we have to have them like median and mean or average.
If there’s a big difference, it tells you something about the data.
So let’s say that your average is substantially greater than your median.
Good example, Bill Gates walks into this room, the average income of the room jumps to $10 million dollars.
And we all know, the median stays the same maybe goes a little bit up because one more person walked in the room.
But that tells you there was a big outlier that came in.
Now if your Google Analytics data looks like that, where suddenly there’s a big anomaly.
That may tell you that you had a campaign or something that went really, really well.
And that’s who it behooves you to go and look why.
What caused the average to skew so far away from the median on the other If it goes the other way, and the average falls below the median, that means that you actually are in a lot of trouble.
Right? Because it means your site is is declined is not there, no things are pulling that average up.
So your campaigns aren’t working, your emails not working, your social media for sure isn’t working, because you’re not getting those unusual anomalies.
On the upside.
It’s all downside.
So just having those two simple measures gives you a sense of Okay, is my marketing helping me grow or not grow? Basic stats stats, one on one for that.
Now, if you were to take that same Google Analytics data and assign it to Ben’s, like the top 25% of days had this many visitors, the middle 25% had this many days, the other side of that had 20, this number and this number you get looks like a bell curve, right.
And again, just like with averages, if the bell curve leans to one side of the other of the middle, that tells you whether your campaigns doing better or worse, basic distribution.
So it’s called kurtosis, which sounds like bad breath, but it’s not when you do that with Google, analytics data, it tells you, there’s something going on there for leaning in one direction.
Generally speaking, you want to be leaning more towards the right, because it means growth, you’re continuing to grow.
If you’re leaning more to the left, it means you’re, you’re starting to decline.
If you visualize that and a chart and just had like, as part of a dashboard, you could walk into the office and just one look at the at that curve go, how’s our marketing doing in the last 28 days? Are we leaning to the right leaning to the left, or a link to the left? Maybe we need to, you know, dust off a campaign or send an extra email, or, you know, post a more cat photos on Instagram, whatever the case is, it just by looking at the distribution, you have a very quick but sophisticated analysis of your marketing.
The third type of math that you’ll do a lot of is called regression, regression analysis, which is understanding how two variables relate to each other.
So imagine you have your Google Analytics data for 30 days.
And then you have say, your Twitter analytics for the 30 days and you’re looking at visits your website from Google Analytics and you will At I don’t know, retweets on Twitter.
If you plotted that out, you get this kind of scatter chart.
And if you try and find a line mathematically that that creates a relationship between the two, you might discover yes, these two things are related.
There’s a, there’s a correlation between the two.
If it goes up into the right like that, that means that there’s a positive correlation as Twitter retweets go up, maybe your Google Analytics traffic goes upward vice versa.
If there’s the line is straight means there’s no relationship line goes this way.
It means that for every retweet you get you lose visitors, which is unlikely, but regression helps us understand the relationship of variables is one of the most powerful techniques that will use in data science because it can lead to interesting answers.
It can lead to interesting answers you didn’t expect, especially when you start putting things together.
I was down at the Agorapulse booth earlier today.
And I was exporting all of my my Twitter, my Facebook metrics, activity metrics to the last I think was 90 days, I had my Google Analytics, organic searches.
And I, I’ve put it on to IBM Watson Studio.
And I said, Tell me if any of my social media activities, increases my number of searches, and it mixed and matched all these things, still doing regression, just a lot of it.
And it finally told me, hey, these three variables, in combination, seem to have a relationship with the outcome you care about, like, Oh, now I know what to test.
Right? I have questions I define and using that technique, now I have something to test say, I need to get ready to shoot if I share more content on Twitter that are links.
And I have and I consistently grow followers every day, I should see a commensurate increase in organic searches.
So now I can go test that and make sure that there’s a car there’s a causation, right, so regression, super important.
The fourth technique that you’ll do a lot of especially when it comes to numeric data is called clustering, what things go together.
So again, if you would take your Google Analytics in your Twitter data are the clusters the clumps of data that move together, my friend, Tom Webster causes lumpy data, it’s really useful to know if your data is lumpy or not, because it tells you some things go together a lot.
So maybe a certain type of tweet clusters together, or maybe a certain type of Instagram posts generates a certain type of points.
If you cluster you can identify and go there’s a relationship there, we need to investigate that further.
All these beginning techniques, again, help you with that prediction.
So try to learn them.
The good news is you don’t need more much more than an Excel spreadsheet to do any of this.
This is still relatively straightforward math, you can use more advanced tools, but for the most part, all the beginning level math can be done in a spreadsheet kind of what they do.
Fourth area of data science is the technology.
This is the part that kinda It drives me up a wall about data science today.
There are a lot of people who go through this like six week crash course in data science things, you know, get your, your nano degree certificate or something.
And I’m like, you know, this is a profession.
I don’t know that I go to a doctor or you know, six week crash course in heart surgery, I think I’d go to someone who went to medical school.
But the technology is important because it makes a lot of this work go faster.
Everything we’ve talked about so far, you can do by hand, you will close your eyes out, but you can do it by hand.
The technology helps you make it go faster.
One of the things that especially in social media and even public relations happens a lot is we have processes that are repetitive, but they don’t scale well because they’re you have to use human.
I used to work in an agency.
They had these folks are called account coordinators, the junior most role in the agency and their job.
Some of them were literally copying and pasting data from one spreadsheet to another 40 hours a week.
It was a pet.
It was repeatable, but it didn’t scale.
So could we use technology to speed that up? The answer is yes.
The technologies that you need to know are based on your level of skill.
What level of skill do you have with technology? If you’re just getting started learning the ins and outs of Google Analytics, learning the ins and outs of a spreadsheet are actually will get you fairly far in data analysis and in data science, because you can conduct experiments with that.
You can’t build like big models and things but you may not have to just to answer some important questions, right? You learning how to use Google Data Studio or other visualization tools, super important.
If you spent a year just digging into that and finding out every little button and feature you’d be more capable than most marketers, just by knowing those tools really well.
At the intermediate level, you now start Yes.
Google Data Studio, that the intermediate level you’re now starting to talk about getting into more advanced computation and visualization.
So there’s a package called tableau, which is was acquired by Salesforce, I think about a year ago.
It is reassuringly expensive.
But it is one of the best visualization tools for, you know, doing complex things like these regression and clustering, you can do a lot of that within Tableau natively.
And it can take in things like your existing Excel spreadsheets, and just make the processes go faster.
If you want to start getting into the computation using more advanced statistics, IBM Watson Studio is a fantastic piece of software that has both code and no code environments, where you can drag and drop math formulas together to do more complex equations and ultimately be able to answer more sophisticated questions.
That’s sort of an intermediate level.
And you could spend again another year learning those tools being spent a couple hours a day just playing around with them to see what they do.
And then at the advanced level, you start getting into programming languages.
Once you there’s you will reach a point in data science, we the tools simply cannot do what you want them to do.
You have to build the tool, you’re at that point where you need like a very every occasion cooking the chef needed to go to an iron fork just create a specially shaped frying pan for just one task.
But if there was that what would be the equivalent? The two languages that are most commonly used are our which is a statistical language and Python.
Then there’s SQL and JSON for SQL for databases JSON for no SQL databases.
Our Python will basically let you do pretty much everything else.
The everything that you read in data science and in AI is in one of those two languages.
The example from before about Johns Hopkins and releasing their coronavirus data processes, all their stuff is written in our so if you want to be able to work that you download it, and you can run it, the good news is these things are free.
Right? These things are not so free.
Right? Especially Watson Studio is free for the first 50 hours.
Tableau is not free ever.
Actually, that’s not true.
Tableau public is free.
Google Sheets, Google Analytics that is free, but these languages are free and there’s great classes in and universities that will teach you how to work in those languages that I know also for free.
I’m the one that I would recommend you start with because it’s completely free, but it is reputable is called cognitive class.ai from IBM full disclosure, my company’s business partner.
But there’s whole courses you can take and get certificates to start building capabilities in these languages.
And when some of this other stuff too, they actually have a data science track that is very well done.
So little outdated in some parts of code, but that’s okay to start.
Again, be very careful when you’re hiring or evaluating an agency partner or an employee, they need to have this and this and this and this, right, which is kind of a unicorn, it’s it can be kind of tough.
But if they don’t have those, if they only strong one these areas, then you’re going to help us find a lot of gaps very, very quickly.
The last thing about data science is what’s called the soft skills This is the part that nobody talks about.
I don’t know why, except that maybe we don’t follow your soft skills.
There are seven of them to be an effective data scientist.
First you have to be open.
You have to be able to collaborate and play well with others.
A lot of folks in marketing, particularly social media marketing, have a very self centric view of themselves, right? Just hop on Instagram, and you can see hey, buy my book, and then you bought my book And oh, by the way, buy my book, you know, that’s, that’s like their entire feed.
That’s fine for marketing.
That’s not so fine for data science.
If you look at what’s happening in again, in the coronavirus issue.
We have made such strides and is so inspiring to see scientists saying, you know, here’s all our code.
Here’s all our data need.
Everyone in every nation all working together, combined their code and their data work to try and find a solution to this thing.
All their codes open source.
That’s the essence of what really good science is is collaboration.
So that skill is super important.
egos can be absolutely devastating to your efforts.
Second, you have to be resilient as a as a person, you have to be okay with getting punched in the face a lot.
Part of Sciences failure, trial and errors, let and if you’re not comfortable failing, and more importantly, if you’re comfortable if a company is not comfortable failing, that can be a massive, massive problem for your culture.
So risk averse, they refuse to accept failure.
Data Science is not going to go well, they’re right, update your LinkedIn profile.
Third, you have to be curious.
Curiosity is is probably the second most important attribute of a data scientist.
And again, in marketing, what tends to happen is we get so wrapped up in the day to day like, I just gotta get this important.
I’m just gonna get this off my to do list that you don’t allow yourself to occasionally fall down that rabbit hole like whoa, if I do this, or what if I do this? Well, maybe there’s a way to answer this.
Maybe I’m missing some data over here.
Having that curiosity, that thing that makes you really want to, you know, work after hours on it just to get the answer is important part of the personality you need as a data scientist.
The fourth is you have to be patient.
Because once you get start working into larger, larger data sets, it takes a long time to do some of the stuff.
I was processing those medium posts, I think I had about 100,000 of them trying to do word analysis on them.
And it looks like it’s probably gonna take about 75 minutes just for the first run through the code.
To find out by the way, it’s going to be wrong, I have to do it again for another 75 minutes.
So you have to be really patient with stuff because it will go wrong a lot.
Which goes nicely with being persistent.
Again, you there’s this habit we have more than ever now in modern society, just accept the first answer that comes along.
Yep, that’s the answer.
Okay, let’s move on to the next thing.
But is that the answer? We don’t know.
Maybe it is.
Maybe it isn’t.
And so, being persistent, constantly questioning Is this really the answer is a part of that personality type you’re looking for.
The sixth is being humbled again, being able to work together.
As a data scientist, the data is more important than the scientist.
Right? The the work is more important than the person who’s doing it.
A lot of your work will not be used by you.
It The result is handed off to someone in you know someone else in the company and they get the credit.
They get, they get told like a star.
You have to be okay with that.
You have to be okay with Yep, our analysis was good to help the company make a million extra dollars, you know that the salesperson gets goldstar you got to be okay with that.
And that one’s that one’s personally a struggle of mine, because I can be a bit of a narcissistic.
But the number one thing about the sciences you have to be passionate about, you have to love doing the stuff.
You have to think like for those who follow me on Facebook.
I do projects on Saturday nights for fun just to see like, Hey, here’s a cool data set.
I’m going to monkey with it.
There was a thing published on all the different alcohols in the world and various alcohol concentrations of the movie.
They’re the drink type.
And I always want see if there’s a pattern in the data, you know, so you know, other people are actually drinking on Saturday nights, I’m just analyzing the day.
But that sort of drive to want answers to how the world works and why you see the things that you see in in the world.
And the news is an important part of data science.
And so those, those soft skills are arguably almost as important as the hard skills when you’re hiring.
When you’re getting started yourself.
When you evaluate an agency partner, if you know if you’re talking to somebody and they’re the opposite of humble and there, they won’t share their results and they’re impatient, they’re going to be a terrible fit.
We difficult to work with, right? equally true in your company.
If your boss is none of these things, then you’re not going to have a good experience trying to explore what it means to be a data scientist.
You have to do it on your own time and things like that.
You will probably not find it data scientist.
And it will take you a while to become one.
But your average journey is probably four to six years.
If you do it well and thoroughly in all these areas, right, the coding part you can knock down probably in about a year to become competent, not fluent, not expert, but minimally competent.
The same is true for the math, you can probably conquer that in about a year.
But then everything else is is part of experience.
One of the things that sets apart an expert from just someone who got a crash course certificate is the expert will know what’s going to go wrong.
And when you look the day of the next but we’ll say this is what’s going to go wrong.
You know, from experience, you know, from painful experience, here’s the five ways this projects going to fail.
Right? That’s what you’re looking for when you’re looking for is this does this person know what they’re doing? So even when you ask them questions, like okay, here’s the data set, tell me the ways this could be biased.
It’s like, Oh, no, it’s fine.
There’s almost every data set can be biased.
So just questions like that.
When you’re when you’re interviewing someone, when you’re being interviewed, be ready for those things to say like, how will you How will this go sideways on you.
If you’re working in a larger company, there’s a good chance that you may not need or even have a lot of data scientist, you may want individuals in each of these areas, then you as the project manager would sort of coordinate them together.
But I would encourage you, if you are interested in this stuff, and you have these personality traits to start digging into the stuff, learn the math first, then learn the code.
Because the code will make a lot more sense with the math.
And if you learn to code and you’re like, well, but why am I doing these things I don’t understand the context of what I’m doing the math will give you the context for the code Park.
And again, play with it as much as you like this.
Start easy and then work your way into the more complex problems.
Find a problem that you’re passionate about to like if there’s something that you really care about.
There’s a data set out there somewhere that you can play with and and to start testing your skills over and over again.
We’re looking at one on hate crimes, reporting of hate crimes against LGBTQ communities.
And there’s a lot of missing data, there’s a ton of it.
Because even in the United States, different jurisdictions have different definitions of what constitutes a crime.
So being able to say, ha, how is this biased? How can we accommodate that? Could we model for it? It’s a project of passion like that, I want an answer to this, I want to know just how bad is this problem.
So find the data set that is that problem you care deeply about? It’ll be a lot easier to learn all this stuff.
Any questions on all this stuff? If you suck at math, relearn it.
And I’m not I’m not joking about that.
Math is a language, right is a literal language.
And if the first time around, you had a bad teacher, you have to find a different way of learning like, that’s one of the things that I’m astonished about now is you with tools like Khan Academy, and all these things online.
there’s a there’s a concept of Neuro Linguistic program called throwing mattresses, right? Imagine your brain is a door.
There’s only certain ways a mattress will get through that right some way otherwise it’s not gonna get through the doorway.
Every teacher is throwing mattresses in a slightly different way.
You got to find the teacher just throwing the mattresses that will get through your doorway.
So so it’s not that you suck at math or you’re bad at math or that you know, math failed you it is that you didn’t learn from a teacher was compatible with yours find one question in the back.
Unknown Speaker 38:36
Christopher Penn 38:43
they’re good starters.
Especially if you
Unknown Speaker 38:47
Unknown Speaker 38:49
I mean nothing about
Christopher Penn 38:50
I think programs like that are great if you’ve know in your heart that you need someone to nudge you along the You don’t have the internal drive to take a self paced course right? If you just know that about yourself, I think those programs are great because they will do that for you.
They’ll be a teacher to kind of you know, nudge your kick you in the butt or whatever.
But if you know that you can you have the self discipline to go through a self training course I would go with the one of the open courses first.
And then if you find out Yep, that wasn’t enough for I’ve still got a lot of gaps then you could use a you know, one of the more advanced courses to to supplement your knowledge.
But yeah, like the IBM courses I think would be would be great starting points.
But again, if you need to have that person supervising you just to keep you motivated.
do that because that’s that’s less about the course quality and more about how you learn.
Unknown Speaker 39:51
of these things separately, told you
Unknown Speaker 39:53
that they can work together to be a certain manner.
So like what types of job titles would you get like
Unknown Speaker 40:00
Just data scientists are there some other like going around about ways to utilize the skills listed that might not be like I can apply for data scientists jobs, I don’t actually have anything in my background that would
Unknown Speaker 40:13
drive all I need to be a data scientist.
But I have a lot of the pieces that you’ve listed.
Christopher Penn 40:18
I would actually go first to go through one of these courses first.
And it should be easy.
I just blaze right through it, and then go on to kaggle.com.
And there’s Kaggle KG gL e Comm.
There’s a lot of competitions you can enter.
They’re like, Hey, here’s a data set and there’s a prize and a bounty and stuff.
And you’ll build models and see how you do with real world models like Netflix put up a huge data set and they said help us build the next recommendation engine.
j&j put up will help us do this product management CFPB put up a financial set help us find fraud in this data set.
And if you do that, if you win the cago competition a you get a large pile of money and be You get portfolio pieces that you can then say okay, as someone who’s a junior data scientist, here’s the work that I’ve done, you can talk about the work that you’ve done because it’s real world experience.
I also would recommend like you know, do an internship or do a do volunteer at an organization there’s so many companies that have so much marketing data that they use none of it right find an opportunity and and build a portfolio like any like any career, some roles that will get you in they will be like data analyst, business analyst, bi practitioner, things like that towards data science.
But ultimately, using the data science methodology is going to get you the best result for being able to prove what you do.
Unknown Speaker 41:40
So with the
Unknown Speaker 41:45
Unknown Speaker 41:48
when I said yesterday, and he said, that’s what it was.
That is that’s that’s what it needs that that’s what I would want to refresh on.
Christopher Penn 41:57
Yes, that’s right stats one on one you want Thank you.
Unknown Speaker 42:02
I realized that
Unknown Speaker 42:05
for generally speaking, you know, was here for remarketing, regardless of your industry, or whatever it is, they’re kind of a common core set things that you would start with.
Unknown Speaker 42:16
Regardless of the KPI, your innovation, you know, you’re just gonna start, like, I have so much data, I wouldn’t even know which questions to start asking.
Unknown Speaker 42:23
I wonder about a lot of Yeah.
Christopher Penn 42:27
I’d start Google Analytics because it’s the best middle of the funnel measurement system there is get, you know, answer what’s credit, what’s creating conversions and then start digging around there.
If you know that you can answer a lot of other questions.
I think we’re getting kicked out.
Need help with your marketing data and analytics?
You might also enjoy:
Get unique data, analysis, and perspectives on analytics, insights, machine learning, marketing, and AI in the weekly Trust Insights newsletter, Data in the Headlights. Subscribe now for free; new issues every Wednesday!
Want to learn more about data, analytics, and insights? Subscribe to In-Ear Insights, the Trust Insights podcast, with new 10-minute or less episodes every week.