This data was originally featured in the June 11th, 2025 newsletter found here: INBOX INSIGHTS, June 11, 2025: When AI-First Goes Wrong Part 3, Speakability and Text to Speech AI
In this week’s Data Diaries, let’s go behind the scenes on one part of our AI-Ready Marketing Strategy Kit: the audiobook version. We generated it with the text to speech system Eleven Labs, but if you’ve read the kit, you know that it’s not conductive to being read aloud.
The secret here is something called speakability, a close cousin of singability. When music composers write lyrics for singers, not only do they have to think about meaning, they have to think about how easy a song is to sing.
For example, vowel placement is a big part of singing. songs that are easy to sing have vowel placements that can be made big or drawn out. Another example are tough consonants that don’t sound great when sung because they’re complex diphthongs, like the word diphthong.
Speakability is the same thing in many respects. As a human or a machine reading something aloud, text has to flow. It has to be free of interruptions and weird structures – like en dash excerpts in the middle of sentences – that can throw off both a human and a machine.
And for text like the AI Kit, one of the worst things you can give to text to speech software systems are lists. Lists tend to REALLY mess up a text to speech system. You often end up with an audio recording that sounds like “bullet point” and other marks being read aloud.
If you’ve downloaded the AI Kit, you know it’s rife with lists, checklists, bullets, checkboxes… everything that will screw up a text to speech system. So what did we do? We put the entire thing through a speakability prompt using Google Gemini 2.5.
We prompted it to essentially rewrite the guide as a spoken script, smoothing out all the known bumps that trip up text to speech systems.
Okay, but where did that come from? We commissioned a Deep Research report on the things that most commonly trip up text to speech systems and using that as part of the prompt, then had Gemini summarize the things to look for in text. The net result is a large prompt (too large to share here) that transforms ANY text into speaking scripts for humans and machines.
The net result is something that the Eleven Labs software had a much easier time reading aloud, which means a much smoother listening experience for you.
The key takeaway is this: use generative AI tools to reformat text for the purposes you want. You did the hard work of creating your content – now let generative AI transform it for use in nearly any system or format.
Be sure to catch our livestream on Thursday at 1 PM at TrustInsights.ai/youtube for more content transformation ideas.