A Golden Retriever wearing headphones listening to music.
PHOTO: chendongshan

The last couple of days seem to have been all about podcasting. As well as hosting, recording and reviewing the latest edition of the Hyland | Nuxeo Content Journeys podcast during the day, my evenings have been spent editing and doing post-production on a new pop culture podcast my wife and I recently launched.

In between removing all the umms, ahhs and extraneous noises (thanks to the cat jumping on the table), as well as adding in the various sound effects we use for each episode, I started to think about the structure of what I was putting together from a content model perspective.

When we’ve finished with all the fiddly post-production on both podcasts, we end up with a single large audio file — and its associated text-based show notes — that are uploaded to a service that distributes the show across eight different podcast platforms.

All nice and clean, and we get back some great analytics: about how different episodes perform; what platforms are the most popular; and geographic and demographic information on our listeners, all of which help us tailor future content as needed.

How to Extract Value From Audio Assets

But what about the content within those audio files? The Content Journeys podcast is structured around six questions that we ask each of our guests. Three are common, and three are tailored for the individual guest. In almost every episode, the guest will provide some valuable piece of insight or advice. If I want to highlight that response and use it elsewhere — for a social post, or as a pull quote on a blog for instance — how do I get to it?

Must I listen back to the audio track, and then copy and paste the sound bite into a new audio file? Or run a transcription process to turn the audio into text and then do a search? Alternatively, if I want to pull together a compilation of how all the guests from last year answered question number No. 1. do I have to choose between the same time-consuming manual processes?

Related Article: Why Content Marketers Need to Start a Podcast (Today)

No Longer a Text-Based World

This isn’t a new phenomenon: I started my career in the aerospace industry more than 30 years ago, and even back then we were starting to work with structured content models for text, enabling information to be searched, reused and reconfigured as needed.

I’ve seen a lot of development and different applications of that idea in the intervening decades; but they have still been primarily text-based, because most of our content delivery platforms have been that way — from paper, to CD-ROMs to websites, to text messaging and more. But things are changing, as audio is now not only a delivery medium but it also an interactive interface to the digital world.

Treat Audio Assets as Modular Objects

To really extract the value from all the audio assets we are creating, we need to stop treating them as monolithic file types and start thinking of them as collections of modular objects. I’ve started to do that to some extent in the personal podcasts I produce.

The standard sound effects we reuse are stored separately and can easily be dropped in as needed — although it’s still a copy-and-paste operation. We have recorded two versions of the standard introduction and credits sequences, and we have experimented in recording the main body of the shows as separate modules based on the subject matter that we then can stitch together as needed.

Doing so makes it easier to find a pull quote when you only have to search a 15-minute segment rather than an hour-long show. But we still need a better way to add metadata around these basic objects, so we can start to apply some business logic to automate the configuration of the modules for specific shows or audience preferences.

Related Article: CX Decoded Podcast: Analyzing Voice-Content Prospects for Marketers

Building More Complex Taxonomies

And it’s not just for audio productions like podcasts that we need to start thinking about the content model, either. How many times a day do you talk to one of your smart devices? Audio as the user interface also requires a new way of thinking about our content. At the moment we are still very much modeling audio-assistant content on a question-and-response paradigm, but we need to be thinking beyond that: We need to build more complex taxonomies and models that understand natural language syntax, similes, regional variants and a changing linguistic landscape.

To make the best use of those content types, we need to start thinking about both the content strategy for when, where and how we want to use audio, but also the content models that will allow us to mark up, search, manipulate, configure and reuse intelligent audio content going forward.