This article was published in the Ergon Magazine SMART insights 2021. Order your free copy now ->
“Hey Siri”, “Hey Google”, “Alexa!”. We have fallen under the spell of voice assistants. From their beginnings in the home, developments in voice recognition are now paving the way for their growing use in business. Understanding commands, or even entire conversations, and responding in seconds, digital assistants are fast becoming our favourite little helpers.
Initially viewed with suspicion as the latest gimmick, smart speaker systems are becoming increasingly sophisticated. Having established themselves in Anglo-Saxon markets, German speakers are now catching on - and catching up. Studies show that people quickly get used to life with a voice assistant, in Switzerland too. They have all sorts of uses, as smartphone assistants, in smart homes and, increasingly, in business. These days we talk, not type. But how do they work? A voice user interface (VUI) connects to a system that receives voice commands and communicates with users with a voice response.
Systems like this add significant value in advisory meetings. They supply relevant information in real time, take the minutes and produce a summary. You might recognise the scenario: you're consulting with experts and they have to go off to check or research something, leaving you twiddling your thumbs. We're so used to having everything at the touch of a button.
Complex answers in real time
It doesn’t have to be this way. Digital assistants do some of the thinking for you; take your questions, do the research and provide current data, so you're not wasting your time. Sound easy? It isn't. If you have ever used a voice assistant, you’ll know that there are times they just don’t understand. It is difficult to catch all the variety and nuance of the spoken word, and the diversity of Swiss dialects really doesn't help. Depending on the emphasis, a sentence with the same word order might be a question or a statement. Yet much is possible with the latest advances. The technology can recognise words, understand sentences and even interpret speech. Understanding individual sentences is all very well but they must be placed in the context of the conversation. Let's take an example:
Ms Banking advisor
“Would you like ten of those shares?”
Mr Bank client
“No, just five, please, but two more of the others.”
In this example, we don’t know what shares they are talking about unless we have the prior context. It is a challenge for the system to interpret sequenced information, such as a conversation between two people, with different sentences and speakers.
Newer voice assistants increasingly use a natural way of speaking. Machine learning and artificial intelligence help with the comprehension of human language. These assistants can answer complex questions during an advisory meeting, for example; provide additional information, give recommendations and even make predictions.
In the example given here, a voice assistant supports Ms Banking advisor by offering a contemporaneous summary of the discussion and making additional investment recommendations on the basis of updated client preferences. Mr Bank client, meanwhile, benefits from an uninterrupted, personalised advisory session. Keeping minutes of the meeting also minimises compliance risks.
“Understanding speech patterns and interpreting sequenced information are critical success factors.”
More than comprehension
In advisory-led sectors such as banking and insurance, digital assistants can help human advisors to do their jobs by relieving them of repetitive or administrative tasks like minute-taking. The advisor can then give the client their full attention and respond more professionally to their needs. This leads to a better outcome for the client and greater satisfaction all round. Advisors have more time for human interaction and for critical business tasks such as making difficult decisions or analysing complex situations.
Having a voice assistant attending and recording meetings is also an additional safeguard in the advisory process. For example, if a client places trades that exceed the limits that have been agreed with the bank, the digital assistant will flag them up. Automatic controls like these prevent risk tolerances being breached.
Acceptance factors
Various things determine whether or not humans will accept a digital assistant. Quality is key. An assistant is of no help if its error rate is too high. Here, the technology has to recognise different speakers, changes of language and different dialects. It also has to understand speech patterns, interpret the back-and-forth that builds up a body of information and pick up on emphasis. All are critical acceptance factors. In the business setting, the confidentiality of advisory sessions is another important aspect. Sensitive discussions demand security and discretion. If you're going to use a voice assistant, everyone has to agree. In such cases it can help to clarify its role and the added value that it delivers. The reason data is being gathered should also be explained in advance and the client should be clear about what exactly happens to their information. Are recordings saved verbatim or used for training purposes? Being able to switch the assistant on and off as the situation demands is also important.
Ultimately, the assistant has to be a help, not a hindrance. According to need it may sit passively in the background, emit low-key signals or play an active part in the conversation with its voice output. It must be configured to fit into the business environment so that it generates added value for everyone.
VUI offers huge development potential but it can also listen in on, and understand, general conversation, so there are still some practical obstacles to overcome.
The problem of pronunciation
There is no such language as Swiss German. Every area has its accents, every region its dialects. If the Swiss don't understand each other, what hope is there for anyone else? There is huge linguistic variation but no rules, so it's no wonder some of the software struggles. This variety contrasts with the small number of people who speak each variant. Vast amounts of data are required to train machine learning models to understand dialogue using natural language processing (NLP). As yet, nobody has produced such a language corpus for Swiss German.
Focus on domain relevance
The more general the conversation, the tougher the task for a digital assistant. To achieve better results it therefore makes sense to focus on relevant sections of the discussion, in which the assistant is expected to provide support. This limits the necessary vocabulary; makes expressions and contexts clearer; and reduces errors. Business applications have an advantage here because they can be limited to relevant phases without neglecting the job they are there to do. The major AI players may have access to larger volumes of data but they lack focus and specialisation, so are still very limited in a specialist domain like banking or insurance.
“Vast amounts of data are required to train machine learning models to understand dialogue using natural language processing.”
Beating barriers to multi-modal interaction
In many cases, an audio track is not enough in itself. Multi-modal interaction includes other channels alongside voice. Back to Mr Client. He gestures to a diagram on a piece of paper and asks what it means. A purely audio-based assistant does not have the visual background to understand the question. Sometimes, the advisor can solve this by stating the implicit context explicitly. Physical visualisations such as projections onto a sheet of paper can also supply the necessary context, and thus support language comprehension. It is also possible to interpret visual data, although it is no small technical feat.
A journey
By actually understanding what we say, voice assistants take us into uncharted waters. They go far beyond straightforward spoken commands and are developing towards an increasingly-free and natural language without specific linguistic triggers. The added value is clear but just how interaction will look and sound to convince clients, advisors and companies of its benefits remains to be seen.
Potential barriers should be seen as an opportunity to improve the client experience. It can be made more interactive, for example, by linking physical elements with virtual content. This, in turn, delivers useful input for improving the quality of advisory sessions, while giving the client a voice experience that is even more user friendly and easy on the ear.
It is only a matter of time until the combination of language and artificial intelligence is so advanced that voice assistants are an intrinsic part of the business world. Our working lives are changing fast and the future will require us to work closely with our digital colleagues. One thing should not be forgotten, however: machines are there to help humans, not replace them.