Click to Call+44 (0)7539 755531 | Click to Call+44 (0)1469 589187 | |

Artificial Intelligence (AI Voices) versus Real Voices


Artificial Intelligence

Would you like to be taught by a robot? It’s possible you already have been. There are many online courses out there that have been produced on low budgets, and rather than pay for a human voiceover artist to narrate the content, the course developers have used synthetic voices.

We’ve certainly all been told what to do, or given information by an AI voice – be it our sat nav, or Siri or Alexa.

Up to now you’ve usually been able to tell an AI voice from a real one. Even if the tone of the synthetic voice was quite good, it still lacked emotion, or phrases would sound a little odd, and you could tell a machine had stitched them together. However, they are getting better – and quickly.

AI voices are based on real voices, and a voice over artist (or artists) somewhere has been involved in helping to create the AI voice by recording thousands of phrases in different ways. The words have then been broken down into their smallest building blocks (phonemes) so that computers can build up realistic sounding words and phrases for any piece of text.

That’s a bit worrying for people in my profession – are our jobs going to be taken by machines? Obviously I hope not.

Last week I attended the One Voice Conference. It should have taken place in a lovely hotel right on the Thames in London. Thanks to coronavirus however, all the sessions and socials were on line, which actually worked remarkably well – and the drinks were considerably cheaper!

There were two sessions during the conference that dealt with AI. The first was an entire session on Text To Speech (TTS) as AI is also known, and this looked at how far AI speech has come, and which sectors of our industry it is starting to be used in. It felt like quite a depressing session – parts of our industry look set to be taken over by machines, possibly in the next few years, particularly phone systems and online training.

The second session was about e-learning, and this I thought was a bit more positive. The presenter, Elinor Hamilton, talked about the advantages that real human voices have over artificially created ones. Although robot voices don’t get tired or need breaks, or get ill – which all seem like marvellous advantages over humans – there are some really important things that we humans can do that the machines just can’t. At least not yet.

AI voices are at the point now where they can express emotion and sound very realistic – they have even been taught to cry, but most are still a bit robot-like.

Something that an artificial voice definitely can’t do right now, is spot mistakes. There have been a number of times when I’ve spotted typos in scripts that totally change the meaning of a sentence, and clients have always been really grateful when I’ve pointed it out.

Imagine if the script had been fed into a computer and converted to speech, and it wasn’t properly proofed afterwards – it could be embarrassing at least, and dangerous at worst. I’ve voiced important things like health and safety videos, and instructional videos for medical clinicians – you definitely don’t want those to be wrong…

It’s a similar story with scripts that have been translated from another language. The translator may have done a good job – but there are always little words and phrases that aren’t quite right – and which only a native speaker would spot. You need a human to pick up these nuances that make the difference between the audience knowing they are listening to something that has been translated, and not realising it was originally in another language.

Another point that Elinor made, that is especially pertinent for the times we are currently living in, is that people need people. We crave human contact, and at this time of social distancing, I think we are appreciating more than ever the little opportunities we have to see and speak to another person. With that in mind, who wants to be talked to by an artificial voice right now?

A machine will not collaborate with you. It will only do what you tell it to do. A real voiceover artist can offer helpful input to make your project the best it can be.

At the end of the day an artificial voice may sound almost indistinguishable from a human voice, but there will still be something missing. A machine has no soul – and I think we can hear that. In fact here’s an example. I was asked to record a voicemail greeting for a local company, which had been using a robotic voice. Here you can hear the before and after recordings side by side. See which you prefer!

I think that companies that really care about their brand will continue to use real human voices. It might be more expensive, but it’s the difference between buying something cheap and mass-produced, and something that has been lovingly hand-crafted (voiced) specifically for their project.

Those in my profession who pay attention to detail and are easy to work with, will, in my opinion continue to be used by companies that care about the audio that represents them.

That’s my hope anyway!