So What AI Podcast Editing Level Up Your Workflow

So What? AI Podcast Editing: Level Up Your Workflow

So What? Marketing Analytics and Insights Live

airs every Thursday at 1 pm EST.

You can watch on YouTube Live. Be sure to subscribe and follow so you never miss an episode!

In this episode of So What? The Trust Insights weekly livestream, you’ll learn how to use AI podcast editing techniques to enhance your workflow and take your podcast to the next level. Discover expert insights and learn about the latest AI tools that will streamline your editing process. You’ll also learn how to leverage AI for podcast growth, marketing, and even creating show notes. Don’t miss out on these valuable tips to save time and improve the quality of your podcast with AI.

Watch the video here:

So What? AI Podcast Editing: Level Up Your Workflow

Can’t see anything? Watch it on YouTube here.

In this episode you’ll learn: 

  • Advanced AI editing techniques: Noise reduction, mastering, and even generating sound effects with AI.
  • Streamlining editing: Discover tools that make editing and post production easier.
  • Expert insights: We’re sharing our own tips and tricks for incorporating AI into our podcasting workflow.

Have a question or topic you’d like to see us cover? Reach out here:


Please note the following transcript is AI-generated and may not be entirely accurate:

Katie Robbert 0:37
Well, hey everybody, happy Thursday. Welcome to So What? The Marketing Analytics and Insights Live Show. I am Katie, joined by Chris and John. How are you fellas?

John Wall 0:50
Oh wait, wait. You caught me unaware!

Katie Robbert 0:53
Ugh, John! Happens every week! Hahaha. Now, this week and over the next few weeks, we’ll be talking about AI and podcasting. So this week, we’re starting with AI podcast editing: leveling up your workflow. This week, we’ll be talking about advanced AI techniques, streamlining editing, and expert insights. So podcasting is obviously not a new kind of content creation. I have the world’s oldest podcasters with me, Chris and John, who have been doing Marketing over Coffee for how many years now?

John Wall 0:54
237 years.

Katie Robbert 0:56
237 or 17 years, which is really impressive. Because when the pandemic hit, everybody transitioned to videos and podcasts, and it was sort of a big boom of everybody has a podcast now. And it’s one of the, quote, unquote, I’m putting in big fat quotes, easiest types of content to start, according to a lot of people. But as I know, from both of you, there’s actually a lot that goes into creating a podcast beyond just having really good content ideas, hosts, and guests. You also have to edit it, produce it, and then disseminate it. So, there’s a lot that goes into it. So as the two reigning podcast experts in the field, undisputed champions, where would you like to start today?

Christopher Penn 2:25
Let’s start with appropriate uses of AI. I think that’s a really great place to start because it’s one of the things that obviously, we care a lot about.

The place I want to start in specific is with the Trust Insights TRIPS framework. So, if you’re unfamiliar, if you go to, you can get a free copy of this little diagram here. TRIPS stands for time, repetition, importance, pleasantness, and sufficient data. And when it comes to evaluating what tasks should we hand off to AI, this is a good framework.

So just briefly, time, how much time does a task—any given task—consume? The more time it consumes, the more likely it’s a good candidate for AI.

Two, how repetitive is the task? The more repetitive it is, the better candidate it is for AI.

Number three, importance, how important is the task? This is the reverse. The less important it is, the better candidate it is for AI because you need less human review. Like you absolutely should not be, begin and hand the entire show off to AI. That’s going to go badly for you. But if you were to hand off something like, audio leveling, that’s clearly something that is not super important, as long as your microphone is good.

Number four, how much pleasantness? How much do you enjoy the task? There’s a lot of parts to podcast production. Which parts do you enjoy doing? Don’t hand it off to AI! Part of the reason you do a podcast is because you enjoy it.

And number five, sufficient data. How many existing examples do you have of the task? The more examples you have, the better a candidate it is for AI.

So John, with that in mind, we think about from the moment you sit down to start a new episode, to the moment you have the final file ready to upload—that’s the scope of today’s show, the finished product from start to finish product. What are the major tasks for producing a podcast?

John Wall 4:30
So we’ve gone all the way back to even before it’s recorded. Are we going to start with, like, okay, you’ve got your recording in hand?

Christopher Penn 4:37
Even before it’s recorded! I think, as you know, is good for thinking this through.

John Wall 4:41
Okay. So it ends up I usually budget 10 hours for 30 to 40 minutes of audio because you’ve got the whole, coming up with the topic, lining up the guest. Then that’s a whole, there’s a lot of slack in that plan there. Because it’s like, you and I, we have standing weeks where there’s no logistics that go into it. But then there’s people like Malcolm Gladwell, where I can spend three and a half years trying to lock down, like, when are we going to talk? And where are you at, and all that kind of stuff. So there’s logistics there. And then, for us, there’s a lot of prep. As far as putting together the story, letting the guests pre-approve the questions, and doing a couple rounds of back and forth before you even start recording. You set up the recording session, you basically budget an hour for that just in case somebody has tech problems and there’s issues.

So then the point, you’ve got the file, and then it goes into post-production stuff. And that’s a good, usually three hours for 30 minutes worth of stuff.

The big part of it for me is that there’s a lot of tools out there, but it still comes down to the fact that I have to listen to the whole thing and make sure that the storyline is up and the content is solid. So, a lot of the audio tools and automated things don’t save me a bunch of time because I still have to go through the whole thing. But yeah, that’s kind of the thumbnail. Like, is that 10 hours, and figure three of it post-production. That usually gets the majority of the shows out the door.

Christopher Penn 6:12
Of those tasks, which ones do you like least?

John Wall 6:15
Oh, that’s a great question.

Trimming ums and ahhs and stuff, I always say I’m going to find a way to automate that, and I never do. So that would, that would be at the top of my list because I know for a fact that like I can easily shave an hour out of the editing time if I just get the, the obvious edits out of it. There’s going to be stuff where I’m like, okay, they didn’t answer this question right? I’m going to take their second take of the answer. That’s a judgment call. But there’s a lot of stuff where, yeah, the point where the guy was coughing and choking or whatever, like those parts can be taken out without worrying about.

So yeah, cleaning up, cleaning up the ums and ahhs, the obvious stuff. And then, yeah, I don’t know. I guess I never thought about this, this is getting me to think more. I mean, anything at the front end, anything prior to recording, too. I’m up for speeding up. I should probably be doing a better job of having a system that schedules the appointments and picks the dates and does all, because I do, I have my HubSpot calendars there. It’s all there, but I’m still manually kind of doing that, “Okay, how’s Tuesday at three?” thing, which is just a waste of everybody’s time.

Christopher Penn 7:21
Got it, okay.

So let’s do a couple of things just to, so a few examples. First, when it comes to a podcast episode, particularly if you want to be, sort of set yourself apart and be distinct, you should probably have a sense of what’s already been done. And generative AI in this case is a really good example of this.

So, I’m going to go ahead and share my screen here. And today, let’s do something with LinkedIn, the LinkedIn algorithm. We’ve, we’ve talked about this on the past, in the past, but let’s do a new version.

So let’s call this previous shows. Now, this could either be your shows, this could be other people’s shows. But what you want to do is you don’t want to repeat yourself. You don’t want to be just like everybody else. That, that doesn’t make for compelling listening or watching. So let’s say we want to do a topical survey of shows that have been done about the LinkedIn algorithm. I’m going to provide you with a series of transcripts, so these are right out of YouTube, from other shows about this topic. Create a consolidated outline from this data. And again, what we want to do is we want to not reinvent the wheel. We don’t want to be like everybody else because everybody else is not us.

Katie Robbert 8:54
Well, Chris, before, and maybe this is what you’re going to get into, but it sounds like ahead of you getting into generative AI, you still had to do a decent amount of research and data polling, to even get to this part.

Christopher Penn 9:09
I was lazy, I just typed in LinkedIn algorithm hacks into YouTube search and opened the top 20 videos. And then extracted all the closed captions from them.

Katie Robbert 9:17
Alright. So laziness aside, because that’s you personally, it sounds like in general, someone who’s taking this seriously, unlike you clearly, would actually do some real research and find the podcast that, so they would know like, who their competitors are, like what the topics, like they would do that research and find the information first before bringing this in. So, that’s actually a decent amount of time, especially if you’re not totally clear on what you’re looking for, exactly.

Christopher Penn 9:49
So things like, particularly starting a new show, or maybe even just rebooting your current show, who’s your audience, and who do you want your audience to be? So you can go back to pre-this episode of So What? to look at how you build an ideal customer profile. But that works just as well for building an understanding of who is your audience. And you can do that through things like your Google Analytics data, if you have it. You can do it through listener surveys, which you should be doing on a regular basis. We send a Marketing over Coffee survey I think once a quarter. You can take it from your customer service inbox from people emailing you, or from your contact form on your show.

But all that to say is, you absolutely should be doing that groundwork in advance to say, Okay, here’s, here’s what the show is going to be about.

Katie Robbert 10:34
And if you want to get those past episodes of our live stream that Chris just mentioned, you can go to and find our So What? playlist.

Christopher Penn 10:46
Okay. So the analysis is done. What aspects of the LinkedIn algorithm have these existing shows missed or not talked about? So again, this is, this is a useful tactic for just figuring out what the blind spots are, the green spaces, and if you want it to be super clever, like we did a couple episodes ago on So What? You can take things like LinkedIn, academic papers, blogs, etc. and say, Okay, here’s what’s from the, from the horse’s mouth. Let’s compare it to what the YouTube gurus have to say and see what the difference is.

Actually did this recently, for a friend, and I was honestly shocked at how much stuff that the, the so-called YouTube experts, have missed about LinkedIn. If you are interested in getting that guide, it is on the Trust Insights website, we actually have the unofficial—the unofficial—LinkedIn algorithm guide.

Katie Robbert 11:44
I asked you about that, Chris, because what I didn’t see you do, which is something that you do almost every time we bring up generative AI, is I didn’t see you prime the model with, “And here’s what is going on with the LinkedIn algorithm.” So is there a reason you didn’t do that this time? So I said, you said like, if you want to get advanced, but in this example, is there a reason you didn’t do that this time?

Christopher Penn 12:10
There’s not a reason I didn’t do it at that time. I was being lazy.

Katie Robbert 12:14
Okay, so for the sake of the show, please stop being lazy. We’re trying to educate the people!

Christopher Penn 12:19
Yes. Here’s the thing about white papers and stuff that actually, if you wanted to, to make use of them really intelligently, instead of having to prime the model manually, one of the things you could do is go to the white paper of your choice, and drop it into the model and say, “Here’s what you need to know about the LinkedIn algorithm.” And our paper that we prepared that has all those talking points is essentially one big piece of priming data. That’s the way it was written. I actually did not write that paper for humans, I wrote it for machines. So that priming data goes in, and then you can, you can get up and running even faster.

So for the sake of this example, what we’re saying is, here’s the existing stuff. And I asked the model what it knew about LinkedIn, and then said, What did these, these examples miss? And so now, if we were doing an episode on the LinkedIn algorithm for, say, Marketing over Coffee, we could look at this list of topics and go, “Hmm, there’s a lot of things that have not been covered in all the YouTube gurus on LinkedIn.” John and I could do seven different episodes of our podcast just on this topic.

So if we think back to our TRIPS framework, that, that time consuming pre-show research, we can save a lot of time by grabbing data that’s already available, and feeding it into the generative AI model and saying, “Well, what’s the, what are the gaps? What things can we cover that people aren’t talking about?”

Okay, next, let’s talk about assets and things that you might want to, to consider for your production process. If you have, if you’re just getting started, or you’re, or even, maybe you’ve just gotten some basic stuff, you might not have the world’s best microphone, and you don’t need to start. So there are, there are some basic and advanced tools for cleaning up your audio. So I’m going to play a sample real quick here. This is right off of my iPhone.

“Suppose I wanted to record a podcast but I didn’t have the world’s best microphone. In fact, all I had was my iPhone. What could…”

So that’s right off the iPhone. This tool, Adobe Podcast, which also, other tools like Descript, etc. have essentially generative models that will rebuild sound based on what they think studio sounds. So that exact same recording fed through this tool sounds like this.

“Now, suppose I wanted to record a podcast but I didn’t have the world’s best microphone, when in fact, all I had was my iPhone. What could I do to help improve…”

Now you can tell that is still, you can tell that as generated sound because there are some little quirks in voice, but if you’ve, all you’ve got is your phone and you still want to have, a decent sounding podcast, this would be one of the tools that you could use to clean up some of your audio, especially if you’re, say, at a conference or an event. And, there’s John Wall on the conference floor, you grab out your iPhone, and voicemail, say “John, John, I have a question about this,” and you get a response. But it comes out with all the background noise, this is one of the tools you can use to clean that up.

Katie Robbert 15:28
Question. So this makes sense for pre-recorded audio, but a lot of us, like this live stream, for example, are there tools that will clean up the audio live? So let’s say I, it’s happened before, I forgot to plug in the correct microphone for the live stream, and we get halfway through the episode and realize my audio sounds really bad compared to both of you. Are there tools that will do this kind of enhancement live?

Christopher Penn 15:59
There are tools that can do it live that are reassuringly expensive. Enter a lot of extra hardware. They’re very similar to like, auto-tune that musicians use. So your $20,000 rack of equipment will have a processor. However, almost every recording studio software, so we’re currently using StreamYard, has the ability to not only record what’s on screen, but also record individual tracks. So for example, when we’re done with this live stream, it’s going to produce a video and audio tracks for you, me, and John. And I can take any one of those out, put it in this and then later on and an editing software can, fix any one of the individual tracks.

Katie Robbert 16:41
Sure, which is great for post-production, but for live, it sounds like the best bet is for me to set a reminder for myself to plug in the correct microphone versus paying $20,000 for 45 minutes a week of crappy audio.

Christopher Penn 16:56
That’s correct, that’s correct.

The other thing that you will sometimes have, and this is again, this is a post-production thing or an offline production thing, not a live thing. Sometimes you will have a week where you’re just not able, you just cannot record for one reason or the other. Maybe you’re just out sick. Maybe you’ve contracted bird flu, who knows. You can, there are tools, ElevenLabs and HeyGen are two of the tools that are most well-known in the space for being able to generate from trained data. So you would give the tool your existing voice print or your existing video print and it will generate audio from them, the text that you provide. So we’ll just, let’s do a very quick sample from this.

“I, content detection is a losing battle. People and companies are racing to detect AI content from schools trying to see if a term paper has been written with ChatGPT to media sharing services like Instagram trying to determine if an image is AI created.”

So that’s ElevenLabs trained on my voice, two minutes of voice quality. Again, it’s one of the things where you can tell it is machine generated. There’s enough oddities that if you know who I am as a person, you can tell that’s not quite right. But if you have to get the show up for one reason or another, other people have used these techniques reasonably successfully, to at least put, to create something.

Katie Robbert 18:28
I will throw my disagreement hat in the ring to say it’s not even that it’s oddities, that sounded nothing like you at all.

John Wall 18:38
Well, the thing that got me was interesting was, yeah, I think it was way off as Chris. But it is on the mark. Like if you’re listening to some podcast where you didn’t know the guest, it does sound human. Like I didn’t hear anything that was like, oh yeah, that’s obviously somebody fed some text in there. So, it has to get the points, but, I think we’re not going to replace Chris with ElevenLabs this week.

Christopher Penn 19:01
Yeah, exactly.

The other thing that for, in terms of show assets you might want to be looking at, is things like the music. So you’ve seen on our live stream and stuff that we have sort of that intro music. That intro music I did in GarageBand, little loops, thing and stuff. Nowadays, that’s unnecessary. Now you have tools. One, for example, Soundraw allows you to give prompts and it will generate stuff. There’s a whole list of sites. In fact, yesterday, my son was complaining about running, so I made a song post entirely of his complaints, which we will play here, we’ll put it in our Slack group and it looks, for marketers, if you want to hear that. But if I wanted theme music for the show, this is an example I just generated.

So the prompt for that was upbeat house progressive music in major core, and so on and so forth. Again, for stuff like your wait music in advance of, say, your live stream starting, nothing wrong with that. Additional advantage for those folks who are doing, loading shows on places like YouTube for example, AI generated music is almost completely unlikely to trigger copyright ID warnings. Whereas, I have had shows struck from YouTube because I used the Apple GarageBand things, and so does, like every other musician on the planet. And those same loops are enough to trigger copyright ID and forced me to have to dispute with YouTube, I don’t know, that I made that particular sequence. Yes, it sounds just like, Bob, the disco God over here. But it is, the AI stuff is not getting flagged at all, by content ID algorithms.

Katie Robbert 20:53
It’s interesting because I feel like a lot of what you’re talking about in terms of this pre- and post-production, it’s not only time savings, like in terms of the research and putting the show together, but things like this where you can generate AI, mediocre-ish music, at least you’re not paying a license fee or having to, find an artist who can create something original for you. Like, I don’t know what the cost of Soundraw is, but I’m assuming that the cost of creating something in here is a lot less than having to license music from an artist.

Christopher Penn 21:36
Yes. So, for example, there’s music at the end of my shows on YouTube that I paid a human musician to, to compose for me, and it cost me $500 out of pocket to, to the average musician, create this, this stuff. Soundraw’s cost is substantially less than that. Soundraw’s cost is, like, 20 bucks a month, it would end, you get, about 1000 generations, 1000 music generations a month.

The downside of Soundraw and all AI-created content is because it’s machine generated, you cannot copyright it. So if someone else likes that music, they can just lift it, and there’s nothing you can do about it. Whereas if you have a human component, you can send a music industry lawyer to, to put someone in their face, metaphorically.

Katie Robbert 22:24
Well, I’m also assuming that it’s, depending on who you are, what your standards are, and what your show is about. It could also be fairly generic music. Like there is a lot of sameness in AI-generated things. And so yes, somebody could take it because they like it. It’s not copyrighted, but it’s also, I mean, and I know that you put it together quickly. But the clip that you put together, if you ask me, personally is not great. It’s not something that I would want associated with our, our company, our brand, because it’s just kind of, meh. Like, there’s nothing special about it. Like, I just, but depending on the context, it could be fantastic. It could be completely appropriate for what you need, and the cost is very low.

And Chris, how long did it take you to put together that music sample?

Christopher Penn 23:21
This one just now? Yeah. 15 seconds, you can’t beat that. No, because I wrote the prompt. Now, I will say you do need some experience writing prompts for music specifically, because you only get 200 characters. So you have to know the trigger words for what you want to create. Or if you go into custom mode, you can customize it, then you put into the lyrics that you want, and then it will compose around that. And then you put in the style. So for example, what the S&P conference I was speaking at recently, I did a demo where I said, “Okay, we’re going to use Google’s Gemini to create song lyrics first, and then paste them into Soundraw,” and it did a really good job. Again, if you want to hear some of the Soundraw compositions, you can go to the Analytics for Marketers Slack group, and you can, you can listen to some of them, which is always fun. So that’s yet another resource for a podcaster, especially again, if you’re starting out, or maybe you don’t have intro or outro music, stuff like that, and you want something that is good enough.

With AI generative music, in particular, you’ve got to do 10 or 15 variations, because inevitably, like 80% of them are going to suck. They will be musically okay, but that’s not what I asked for. I wanted something that sounded like, Holst’s The Planets, and instead, I got, Taylor Swift. That’s not, and you have to work on refining the vocabulary to get one.

But yet another tool. Okay, so let’s talk about editing. It’s probably one of the most popular tools in the space right now is Descript. Another very popular one is Adobe Premiere. If you are making a podcast in 2024, you should be making it with video, period. And the story, the reason for that is the number one discovery engine for podcast is YouTube. So your content has to go on YouTube. That’s why So What? is on YouTube. That’s why In Ear Insights is on YouTube. That’s why the Almost Timely Newsla is on YouTube, because that’s the discovery engine for podcasts.

Sounds Profitable has talked about this, Edison Research has talked about this. If you’re not making video, you’re missing out.

One of the things that’s nice about Descript, you can load in your raw video clips and do a lot of the editing in here. So for example, you can take out long spaces. You can edit a word or set of words, and it will attempt to generate audio around that to replace what was said. You can do things like, which for, for people who like to read off a prompter, it will programmatically try to adjust your eyeballs so that you’re always looking at the screen.

Unknown Speaker 26:05
Yikes! I don’t like how you said that! Leave my eyes alone, man!

Christopher Penn 26:20
Which, I mean, those things are nice. You can, you can add chapter and chapter markers if you are doing those sorts of things on YouTube, closing up those word gaps, and then creating clips and highlight reels. So you can create a highlight reel from your show and then be able to, to export that.

So let me find, I have a highlight reel here stored here somewhere. Let’s see. Can I download this just from the interface? No, I can’t. But it will pull out what it thinks the highlights are the video. Now, it has not figured out there’s two of us on screen, so you get a lovely video of our shells.

Yeah, of neither of us.

Neither of us. Yeah. So it still has some intelligence detection to deal with there. But for a lot of people, this is a time saver because the built-in AI editors allow you to, to process a show and turn it into a bunch of stuff relatively quickly.

If you don’t want to use Descript, I personally prefer to use Adobe Premiere. That’s my personal preference. Again, you run the transcription is in here, and then you can look for things like what are the pauses, and you can say like, I’m going to go ahead and delete those pauses out, extract them from here. What are the filler words that we used in that episode? And again, you can delete those, lift them out of the transcript, of the video, and then what you end up with is a show that is much more tightly edited. So, kind of John, what you were talking about, about chopping out the ums and ahhs, this will get rid of 90% of them fairly easily.

Katie Robbert 27:54
What does that do for? So for an audio file, I understand, but doesn’t it then make the video itself a little bit more choppy? It does, it, so there’s a trade-off.

Christopher Penn 28:08
It does, it, so there’s a trade-off. You get more jump cuts, which is all the rage right now on YouTube and TikTok is super rapid cuts. In fact, in Descript, the, Descript has, if you have a multi-cam shoot, which, for example, we are recording right now in multi-cam. If you were to go, when you get the files from StreamYard, do you get individual camera feeds from each of us? I loaded all three. And then the multi-cam shoot feature would allow you to very fast cutting back and forth, and whoever is talking and stuff, different camera angles, kind of create that very short attention span theater video that people on the internet really like.

Katie Robbert 28:47
Except for this person. I’m an N of 1. I can’t.

John Wall 28:51
One thing though, is if you have that multi-cam, you don’t get those jumps. Like you can do that if somebody needs an, unfold. You can hop to another speaker, so you can hide all that stuff, which works well.

Katie Robbert 29:04
Would someone, so okay, so you’re talking about Descript or Adobe Premiere, and when you pulled up the ability to put together a highlight reel because of the way that we shoot the show in StreamYard with the cameras side by side versus just whoever’s speaking is on camera, which I know other shows do, should someone consider changing the way that they record to adapt to these systems? Or are you trying to find an editing tool that’s going to work with how you record your podcast?

Christopher Penn 29:39
It depends. So with StreamYard, when you download local recordings, you get what you see on screen plus the individual camera feeds themselves. So you could pull all three camera feeds into this as well and then have it do the true multi-cam, boom boom boom back and forth all the time.

Katie Robbert 29:56
I think that’s something worth testing. Just to sort of see for us, what it looks like. Just as a proof of concept. Because I know when we do the podcast, for example, our cameras are always side by side, but there have been some of these editing systems that we’ve pulled it into that absolutely can’t handle the side-by-side cameras. So it does, it cuts off both of our heads, and just has that weird middle space because it’s looking for the center of the screen. And so I wonder what that would look like if we were to try doing some of that.

Christopher Penn 30:24
Yeah, it’s worth a try, it’s worth a test. So again, Descript does, is this a bunch of different tools that all do this stuff? Again, Premiere does. I personally edit in Premiere mainly because of local software, which means that I don’t have to have an internet connection for it. I can do it on the plane, etc. Given how much a gentleman that’s, that’s not a bad thing.

All other tools in the production process, again Premiere and Descript and all these different services can produce both transcripts and caption files. And you, 100% of the time, should be doing so. The captioning system I like, I personally prefer, is one called Whisper. This is, this is a model from OpenAI. I prefer this one because it is much more accurate in terms of the words that people are saying. However, you can use the system you like. I think of the tools I’ve used, Premiere is probably the worst in terms of capturing it. It has a much more limited vocabulary right now, that may change as their internal models improve over time.

Katie Robbert 31:28
Question. So I’m not using either of those systems to pull out clips for the live stream. So I’m the one who puts together the clips that go up on social. I’m importing the transcript into the video editing system to generate the captions. So, I guess I would still need to look at that transcript to make sure it’s accurate. But is that what’s happening here? Are you importing a transcript? Or is it just trying to guess based on what it’s getting from the video? Like, can you import an edited transcript, or? Okay, you have taken that, understand?

Christopher Penn 32:11
Yeah, you will, the format for captions is a format called SRT, and you can take an SRT file, is nothing but a plain text file. So you can take that plain text file from any, anything, and then edit it yourself. So for example, if I pull up here, this is from the most recent episode of In Ear Insights, I can take this transcript here, this is the SRT, the captions file, and if I wanted to, I could make changes to it. Or if I was feeling clever, I could say, feed this to a generative AI tool and say, “Fix grammatical errors, misspellings, and obvious context errors in this SRT file.” I then reproduce the SRT file, so you could go through and fix things as well.

Katie Robbert 32:57
That, personally, to me, sounds like a better plan than letting these tools guess at what you’re saying. Especially if you have someone maybe who speaks, like, in a, mumbling way, or they have an accent where certain words are hard to understand. Or just, whatever happened, the audio is hard to get to. If you, I would personally, I think having that edited transcript uploaded as an SRT file is probably a better option than having the system just kind of guess at what you said.

Christopher Penn 33:33
So they all guess, that’s, that’s the thing is just from the, from the video files, they all guess. Like I said, that’s why I like Whisper. Whisper guesses the best.

Katie Robbert 33:43
Okay. That’s not what I was saying, but that’s fine. Okay.

Christopher Penn 33:51
So that’s an option if you happen to be the person who has made shows with audio only, and you have not made video, and you don’t want to make video, and that’s perfectly valid. There are tools, one of them very popular called Headliner, where you can upload the audio from your show and it will make what’s called an audiogram, which is basically a waveform video that you then put up on YouTube. So even though you, you are personally not on screen, you can load that, that imagery in and then you can get yourself a decent looking.

So let’s do an audiogram. We want it for YouTube. I’m going to upload the full episode. It says error loading user settings. Alright, try it again. Apparently, he’s having a hard time today. There we go. Let’s a specific file, upload the file. And I’m just going to use that very brief audio clip I made earlier. Let’s see, I have it here. Over here.

There we go, and load. Continue, skip. This is English. And I can go ahead and choose a template. Let’s, this looks nice, thanks. Continue. And it will then start to generate this, this waveform file. Now I don’t know how long this is going to take. So I should have some previous ones stored here.

Stay here, I will not be leaving. But that, that creates a video file that you can then load to YouTube. And it will just have the audio plus like a dancing animation on screen.

Katie Robbert 35:44

So, John, you, so you mentioned at the top of the episode that you take about 10 hours for every, 40-or-so-minute episode. Are you using any of these files, any of these systems? Or is this conversation sort of sparking the, “Oh, this would save me, two hours here or an hour here.”

John Wall 36:06
Yeah, well, it’s rare that I’ll take. Once in a while, if we have an episode that we know has traction, I’ll release it as video. But yeah, I definitely want to play around with Headliner, to just take a couple of episodes from the archives and just throw them up there, because I know we can get more search either. And then yeah, you hit the, the mark. I still haven’t found an easy tool that will kind of do the video clips that, are formatted the way we like. And I think that’s the thing to test with that would be the StreamYard individual video clips and then have that, I, my guess is that we should just do that vertical so that it can, because that works. You’ve got TikTok and Instagram, that’ll work fine for, and then the other plug, it doesn’t look bad on LinkedIn if you have vertical. It’s not what people normally expect. But so yeah, the video is definitely the one area where I need to play around more with the tools to see if there’s something that’s finally at a point where it’s not so painful that I always put it at the bottom of the priority list.

Christopher Penn 37:06
Okay. Let’s continue on.

There is a Python tool, this is, requires a bit of more advanced skill to do this, but it’s a Python tool called Mastering. Mastering is an audio mastering tool that allows you to essentially take a style of audio and map it to your audio. So for example, if you like the way, say, Taylor Swift mixes her songs on, Tortured Poets Department, you could have this tool interpret what their, her and her audio engineers did. And then essentially copy those settings and apply it to your own music.

Well, in the case of podcasting, you can do exactly the same thing. If there’s a show whose sound you like, you can take that style, none of the content, but the style and apply it to your own show. So for example, if you like NPR’s Fresh Air and you’d like how that sounds, or Car Talk, you can lift the style and programmatically put it onto your podcast to improve the way the mix is done, as opposed to just kind of going with whatever the default, like a tool like StreamYard or whatever it puts out by default. It, if there are aspects of voice or audio that you would want to copy from a style perspective, this is a useful tool for doing, and it is, it is free and open source. It does require some technical skill.

Now, let’s get to the last part of the production process, part one, the part that is, in chatting with folks on the recent LIPSON podcast, I heard a lot of people say they really don’t like, and that is writing things like show summaries and YouTube captions, etc. This is 100% something that you should be doing with generative AI because, unlike usual use cases of generative AI, this is just summarization, it’s summarization and it is, and thus, because you’re providing the raw data, the, the probability of hallucination is almost zero.

So let me give, let’s do an example here. For these types of, these types of things, let’s do an episode of In Ear Insights. I’m going to provide some instructions here. Let’s go to Pro, turn off the safeties. And I’m going to give it a prompt and upload my most recent In Ear Insights episode transcript.

And my prompt is, you’re an expert YouTube content manager. You know that crafting content that appeals to us on YouTube and so on and so forth. You’ll create an appealing, enticing summary for a YouTube video by Christopher Penn and Katie Robbert based on the provided transcript. The summary should not give away too many details, so the reader is compelled to watch the video. Right in the second person emphasizing what the viewer will learn. Write in active voice. Avoid adverbs. Avoid business jargon. Start each summary with “In this episode of…” Add “your” insights to “Trust Insights” podcast. “Katie” and “Chris” discuss… Write four sentences focused on the benefits to the listener. After the caption, generate a comma-separated list of YouTube tags appropriate to the content of the video.

So this is something that a friend of mine recently yelled at me about. She was taking a look at our YouTube channels like, “You’re not using YouTube tags, why not?” Like, I did, I was supposed to. Because like you should know better. And she says, “Why would you have your AI do that?” Like, that’s a good idea. And so for the last couple of months, every episode of In Ear Insights now also gets a full plethora of YouTube tags to help other people discover it. And the output here is, “In this episode of In Ear Insights,” Joseph, this podcast, Katie and Chris discuss how to spot AI snake oil salesperson. You learn the subtle clues that differentiate genuine experts from those looking to make a quick buck off the latest trend. And so on and so forth. So instead of sitting here trying to figure out what should I write, but this part of the process, I have a canned prompt, and you can make this into a custom GPT or gem, or whatever. And just have the machine do it for you. This is, this one’s a no-brainer that everyone should be doing.

Katie Robbert 41:20
So our YouTube tags, essentially their version of keywords.

Christopher Penn 41:23
Yeah, essentially, or hashtags.

Katie Robbert 41:27
Oh, yeah. You should have known better, Chris. Yep, exactly.

Christopher Penn 41:33
And so now, this is part of the process. And again, this is, this is the value of generative AI. I don’t have to remember to do that every week. Now it’s part of the prompt, it just gets spit out in the go, I got to copy and paste that and off we go, every single time, which is super helpful.

The other thing that you can do, and that we do now, I started a new prompt here. This court, this is the transcripts themselves. I’m out, kind of look like walls of text and to capture a lot of our speech mannerisms, ending sentences with a word, right, or having the same word repeated a couple of times in a row. So of course, we have a prompt for that to fix up this transcript.

So, we’re going to go ahead and put my prompt in, switch to Gemini Pro, going to turn off all my safeties, and let’s, I want to just trim off this bottom part of the prompt here. And then let’s upload this transcript. And so this is again, from the most recent episode of In Ear Insights. And we’re going to hit go. This prompt is extensive. So I have given it some objectives, I give it some context for what it’s going to be doing. Here’s the speakers. It’s two-host podcast. Here’s the target audience, the language, there are specific technical terms to look for like ChatGPT, LLMs, IBM Watson X, things that were pretty clear that, that you know if it’s unclear how it should be interpreting, it will know that, and then as a bunch of instructions. Don’t paraphrase, maintain the original wording, sentence structure, remove filler words, remove false starts and other speech interruptions, use em dashes, preserve speaker names, remove duplicate words, that, here’s some examples, and so on and so forth. Here’s an example of how to do it right. So here’s the original text, and I, I hand wrote, here’s what it should look like when you’re done. And then I had chat, I did those ChatGPT because I wanted to be wrong, and ChatGPT is very disobedient. I got a copy back, and then I took the ChatGPT version and said, “This is the wrong way to do it, do not do this.” And that, what it does, is it spits out a nice cleaned up transcript. This is what goes on our website because we want well-written, high-quality content that’s extensive, it’s long, full of good words that goes on the Trust Insights website. So this really cleans up that, that aspect. And again, this would take you a couple of hours to edit. And instead, it’s going to be done in less than a couple of minutes.

Katie Robbert 44:12
Well, prior to generative AI being available a few years ago, we did have someone trying to clean up the transcripts manually, and it was a painful process because it never quite translates into a good finished product if it’s not done by the people who were in the conversation. Now that may have just been, because of the people we had working on it, but it was never quite right.

Christopher Penn 44:42
Exactly. So this really does a great job of that transcript cleanup so that we can save that time. And again, with all this stuff, your summaries, I have another version of summary which is for making, for social media posts. So that when I write the social media posts for Agorapulse to share there, I don’t bother writing anymore, I just have a prompt to do it. It captures all the highlights. Same sets of instructions. This is sort of the, now, after the episode is made and you’re ready to start putting it up, this is a great way to streamline the workflow.

Katie Robbert 45:17
And I think something that John mentioned, that sort of he can’t necessarily replace with a machine is that regardless of how it gets put together, he still, as the human, needs to review everything, needs to listen to the episode. And in this instance, you know, we would then say, “Well, you still need to review what AI has put out, make sure it hasn’t made any egregious errors, make sure that the transcripts it’s giving you are correct, make sure that the social posts that it’s writing, make sense.” Don’t just accept what it’s writing and then post it blindly. So there is still human intervention, but you are, to your point, you’re streamlining a lot of the pieces.

One of my least favorite parts of working, producing a podcast, is writing the show notes. Like, okay, here’s what happened. Here’s what they talked about. Because I’m the type of person where once the conversation is over, it’s completely out of my head. And if I have to be involved in the conversation, I want to be present for the conversation, not taking notes during the conversation. And so it was, it was a, it was a very clunky process. And it took a lot of time to then go back, rewatch the episode, then take the notes, and then put them together, and then edit. Like, it was a whole thing. So this, I think, is a huge time saver if, if for nothing else, getting those, prompts to clean up the transcripts, or even get the transcripts cleaned up to put into SRT to then put into the capture. Like, there’s a lot of places where I see opportunity to streamline. So if you’re doing everything manually, and it’s just bogging you down, then it’s certainly worth exploring some of these tools.

Christopher Penn 47:03
Yes. Okay, let’s tackle a few questions that we’ve gotten a few. Bob was asking, “I’m trying to download transcripts of my YouTube videos. I can’t figure out how. What am I doing wrong?”

The utility to do this is a piece of software called youtube-dl. If you Google, Y-T dash D-L-P, this is a piece of command line software. What it does is you give the command, and it will download, say, just the captions files from YouTube videos. This is what I do. You can download an entire channel’s captions. And like, if you were, if you knew you were going to be pitching the show, and you want to be a guest on the show, you download the last 100 episodes of that show from YouTube, it’s captioned, say what? How does the guest normally talk? How does the host normally talk? What are the questions I can expect? And so on and so forth. So this is how you do that.

Maroon is asking, “Am I sharing these prompts anywhere?” I don’t typically share our prompts internally for this, but I might make an exception for the transcript one. However, to get it, you must be a member of Analytics for Marketers. So if you’re not in Analytics for Marketers, go to It’s free to join, no cost. I will put the transcript cleaner caption in our Slack group for people to be able to take advantage of. You will need to edit because if you don’t, kind of, have “Chris had” and “Katie were,” very poor transcription. We appreciate the free advertising, but that’s, that’s probably not, not going to work out so well for you.

Same for Sarah. Sarah was saying that she didn’t like the way that Descript does its summaries and things like that. Again, with all these tools. And this is actually a good question. Sarah said, “How does someone who’s newer to this write these kinds of prompts to clean up a transcript?” This is a lot of trial and error. So part of the reason why we don’t publish a lot of our prompts is because they’re always in flux, they’re always changing. As models change, you have to update them. But also, it’s like a piece of, a prompt is a piece of software. And like any software development, it’s iterative, there’s, there’s those cycles. We don’t formally have like, a Scrum process around them, but we did a couple of episodes of the Trust Insights podcast on how generative AI is like the software development lifecycle.

Katie Robbert 49:24
The other thing I would add to that is if you’re trying to figure out how to write a prompt, ask the generative AI tool! Say, “This is what I’m trying to do. I want to write a prompt that I can use over and over again. Here’s my goal. Here are the things that I have. Here’s what I want to happen. Can you help me put together a prompt?” And let the IDE, very meta, but let the generative AI do the work, to work with generative AI! And then yes, it’s sort of interacting with itself, but it’s going to get you farther along than you sitting down to a blank page and trying to figure out how to write it yourself.

Christopher Penn 50:02
Don’t give away all our secrets!

Katie Robbert 50:04
That’s a very poorly kept secret.

Christopher Penn 50:09
And it is, though there is a technique to have an engineer prompts, that is very, very useful. So that in the last 50 minutes, we’ve talked about ways to use generative AI to create assets, to understand your show, to build show strategies, to summarize, to edit. I would hope that folks can take even just one of these techniques and save yourself, if you save yourself just an hour a week, or a couple of hours a month. I hope the show was worth it.

Any final parting words, Katie?

Katie Robbert 50:40
I think honestly, just, do some experimentation. You don’t have to automate everything, especially if you’re someone who really likes to be hands-on. But see what, there could be some things out there that work. I’m not someone who likes to give up control of everything, I know, huge shock. But I know, but there are times when having these generative AI prompts are super helpful because then I can focus on the things that are most important to me. So I would say, just definitely experiment.

And so over the next couple of weeks, as we mentioned, we’ll be talking about AI and podcasting. And so next week, you and John will be talking about AI podcast growth.

Christopher Penn 51:29
Exactly, how to market your podcast with generative AI. So we hope to see you for that episode next week, same time,

Katie Robbert 51:38
Same bat channel.

Christopher Penn 51:39
I know all these folks are so new, so young. Same bat time, same bat channel, does not resonate with like, half the audience anymore.

Katie Robbert 51:46
They should go learn it. Go look it up.

Christopher Penn 51:50
We’ll talk to you on the next one. Thanks for watching today. Be sure to subscribe to our show wherever you’re watching it. For more resources and to learn more, check out the Trust Insights podcast at, and our weekly email newsletter at Got questions about what you saw on today’s episode? Join our free Analytics for Marketers Slack group at See you next time!

Need help with your marketing AI and analytics?

You might also enjoy:

Get unique data, analysis, and perspectives on analytics, insights, machine learning, marketing, and AI in the weekly Trust Insights newsletter, INBOX INSIGHTS. Subscribe now for free; new issues every Wednesday!

Click here to subscribe now »

Want to learn more about data, analytics, and insights? Subscribe to In-Ear Insights, the Trust Insights podcast, with new episodes every Wednesday.

Leave a Reply

Your email address will not be published. Required fields are marked *

Pin It on Pinterest

Share This