What Predicts Podcast Episode Popularity?

One of the great benefits of having both partners and co-hosts of Marketing Over Coffee as Trust Insights team members is that we have access to a 12-year-long dataset of podcast performance, one of the longest-running consistent podcasts in the world. This tenure should allow us to predict podcast episode popularity. Now, we will not make the bold claim that extrapolating one podcast’s data will yield any kind of universal truth; that’s the epitome of terrible data science. However, looking at over a decade of a single podcast’s data does give us some interesting insights into podcasting itself, especially for a show that is predominantly B2B in nature (marketers talking to marketers).

First Assessment: Audience

Before we did anything else, we looked at the basics: what does audience growth look like over 12 years?

For those who are relatively new to podcasting, long ago there was what old-school podcasters called the “Golden Age of Podcasting”, from 2005-2011. This was a period of time when everyone thought podcasting would be the next big thing – only the overall addressable audience was very, very small. Marketing over Coffee began in 2007, in the middle of this “summer of podcasting”.

Then came podcasting’s long winter, from 2011 – 2016. Not much happened in that half-decade; podcasting conferences became new media and social media conferences, and the space quietly grew, but so slowly compared to the meteoric rise of social networks that only the die-hard podcasters stuck it out, the folks who enjoyed making shows for the creative outlet of making shows.

Podcasting’s winter thawed as several factors came together: smartphone ownership, streaming audio services in need of content that didn’t come with massive licensing fees (unlike music), and brands looking to extend their reach outside of social media. The poster child was the immensely popular Serial podcast from NPR.

Today, podcasting’s enjoying an incredible summer, and we see this in Marketing Over Coffee’s growth (as well as industry-wide, detailed in Edison Research’s Podcast Consumer reports.

Deep Dive: Driving Factors of Show Popularity

Beyond the amazing growth of podcasting as an industry and the overall Marketing Over Coffee audience, we wanted to understand what made episodes popular. Were these specific topics that consistently resonated with listeners? Did shows with guest interviews drive interest? Did show notes bring in audiences, or was the podcast’s growth purely word of mouth?

To answer these questions, we needed to bring out the heavy hitters, the data science toolkit. Let’s first understand what data is available. We took data from four sources:

Google Analytics, for sessions to web pages containing show notes for each episode
Libsyn, for downloads of actual MP3 files
Feedburner, for subscribers and item views in RSS feeds that didn’t necessarily yield website visits
Our own blend of SEO, social, language, topic, and clickstream data for each episode’s show notes page

The First Hurdle: Episode Differences

Immediately after bringing in the data, we noticed a problem: an MP3 file’s downloads in Libsyn didn’t necessarily have a corresponding blog entry that was easily detected and matched up in Google Analytics or Feedburner, and page show names on the Libsyn feed didn’t necessarily match up with the page show names on the blog. To solve this, we had to use a machine learning technique called string distance, which estimates how similar two pieces of text are.

For example, “Now with more fun” and “Now with more fun!” are virtually identical; “Special interview with Simon Sinek” and “Special interview w/Simon Sinek” are closely related. String distance estimation helps us unify very related, but not exact text. Instead of relying on a subset or sample of the data, we were able to use almost every episode thanks to this matching method.

The Second Hurdle: Correlates

When using machine learning software to develop variable importance or predictor estimates – how important any given variable is to a target outcome – strongly correlated variables tend to screw things up. For example, in measuring the popularity of any given episode, Feedburner’s download numbers were incredibly correlated to Libsyn’s download numbers without being predictive. We had to go through the data and knock out correlates that would mess up the model without offering any kind of value.

The Third Hurdle: Validation

When building machine learning models, it’s not a “wave the magic wand” or wait for the AI to do everything. Machines are only as good as the data and software they have to work with. In this case, we had to build and test 5 different machine learning models to determine which offered the best results, as judged by error rates and measures of goodness of fit on the dataset. For those with a statistical background, we specifically focused on R2 and RMSE as the dual measures of model validity.

The Findings

Once we found a winning model, we looked at the results. What did we find?

Unsurprisingly, subscribers to the actual RSS feed/podcast was by far the strongest predictor of episode popularity. Having a loyal audience with a passive mechanism for delivery will do more to keep your podcast consistently performing well than anything else. Episodes that received the most clicks from socially-shared sources also did well, in terms of overall listens, followed by episodes that received clicks within RSS/podcasting applications. Another strong predictor were episodes that had a lot of website visitors/users to the show notes. Of all the topics the show covers, social media related topics tended to minorly help boost show popularity, followed by AI.

Equally interesting, we didn’t find some things we expected. Guests versus just the regular hosts had no predictive importance on whether an episode would do well, contrary to what we expected since MOC has had many notable guests over the years. Neither did many other topics, nor did individual social media shares themselves. Also interesting was that SEO data – things like linking domains – didn’t offer more than weak predictive estimation of a show’s popularity.

Takeaways

It would, for the most part, be a capital mistake to blindly apply these findings to your podcast without doing an analysis of your own; however, some of the general findings should be applicable to most podcasters and their shows:

Build a loyal subscriber base. Instead of relying on social media, ads, etc. to drive listens, get people to opt into delivery of your podcast to them with as little effort as possible.
Create content that gets attention in the feeds and apps. John is well known for his catchy, humorous, and punchy episode titles, and MOC listeners reward his efforts by clicking/tapping through on them.
Carefully research the topics your audience wants to hear about.
Experiment with guest hosts, guest interviews, etc. but chances are, if your podcast is built around your personal brand, it’s you that your listeners want to hear from most.

Above all else, if you truly love podcasting, if you love the art and creative outlet of it, stick with it. Who knows how big your show will be in the next dozen years?

Need help with your marketing AI and analytics?