Data is the fuel that marketing runs on.
As voice interfaces become more popular, much of the data about how users are interacting through natural speech does not make its way to marketers.
When it does, marketers are not yet fully equipped to analyze how voice interaction works. The reason: voice interaction is different in many ways from the text- and click-based interaction that digital marketers have previously encountered.
In other words, a game-changing interface is taking over, but in many ways, marketers are encountering a kind of black box.
Two kinds of voice interfaces
To understand how voice interfaces differ, it’s necessary to understand the two major categories they fall into.
Pulse Labs CEO Abhishek Sutrak shared with RampUp that one category is largely a voice recognition system that offers command and response, such as voice commands in cars and cable TV systems. It primarily translates voice-to-text commands, and the text command system—which converts voice commands to text, then operates on the text command—takes over from there.
The other kind of voice interface is the intelligent voice agent connected through the cloud to databases and a centralized intelligence, such as Amazon’s Alexa or Google Assistant. “Intelligent voice agents are where all voice interfaces are headed,” Sutrak said.
Two of the most common use cases for intelligent voice agents are for searches and as interfaces for branded custom applications.
An example of a voice-based search is someone asking, “Thai restaurant nearby,” and the voice assistant returning a handful of options.
Sutrak said that voice-driven searches currently account for about 13% of all searches and about 20% of mobile searches. Research firm eConsultancy predicts that 30% of all searches will be voice-based by next year.
Branded custom applications
Branded custom applications, as their name suggests, enables companies to get more creative and place content within voice data platforms. An example from CapitalOne provides a credit card balance. Another application, from wine retailer MySomm, can tell you what kind of wine goes best with a certain dish or cuisine.
Pulse Labs has recently formed a partnership with marketing research firm Kantar to develop branded and syndicated reports about data from branded custom voice applications. These are called Skills on Alexa devices and Actions on Google Assistant ones.
These reports effectively act as measurement reports for brands, sharing how effective the Skills are. Outside of such reports, the data from branded custom applications often remains with the brand or the platform.
Reports from Pulse are focused on the usability of these applications, based on the behavior of a selected panel of remote users. But, Sutham said, the available application data is limited for analysts outside the platforms, providing only the intent, and what he called “the slot,” or parameters, of the user’s request.
For example, a user might ask the Delta Airlines-created Skill on an Alexa smart speaker for flight options from Los Angeles to Seattle on Thursday morning. The data delivered to Pulse shows the user is looking for a flight with this departure location, this destination, the requested dates and times, and other specific parameters.
What is not available to Pulse, however, is the wording of the verbal request, or any information about the tone.
In other words, the brand behind the application has no insight into the various signals—some obvious, some hidden—inside a conversational request. (As with virtually all kinds of voice interaction, the brand also has little insight into how or whether the user has given any kind of consent for use of their voice-related data, except that there is often some consent language included when a custom app is installed. We’ll cover this topic in a future piece.)
It’s as if a salesperson at a high-end clothing retailer was handed a note that a visitor to the store wants a suit of this color and this size, without hearing the request spoken by the customer. When spoken, the request could include hints or signals that customer wanted a specific brand but indicated less decisiveness about color and had questions about style.
“The main thing in voice is it’s very nuanced,” Sutham said. On the web, he added, a marketer can guide a user through at least some part of a brand-desired journey. “In voice, the user is in the driver’s seat.”
For voice-controlled applications, the conversational details about how the apps are driven is largely hidden to the marketer. The platform (such as Amazon, Google or Apple) has access to those details, but Sutham said they aren’t made available to outsiders.
Even if the exact wording and tone of the voice request were available, Sutham points out that it would lack benchmarked context.
For example: if a Skill has 173 users who buy the product in a 24-hour period after interacting with the application, is that is a good or bad result? At this point in the evolution of voice interfaces, and there are few yardsticks.
Last year, UK-based digital agency Roast launched what it said was the first Voice Search Ranking Report for Google Assistant. Pulse and Roast are among the first to offer voice-specific data.
Released quarterly, the report—now under the aegis of Roast’s voice-focused companion agency, Rabbit & Pork—looks at how Google Assistant responds to more than 10,000 key phrases, such as “What’s the best HDTV?”
While Pulse reports the results of specific parameter requests to applications without reference to the exact language, Rabbit & Pork tracks the results from common key phrases for voice search.
But Rabbit & Pork also has broad areas of terra incognito when it comes to voice search because of at least two issues: very few results and the nature of conversation.
Here, said Rabbit & Pork Managing Director and Founder John Campbell, is a typical result his agency has found for this specific HDTV request:
User – Hey Google, what is the best HD TV?
Google Assistant – I found a list on the website tomsguide.com. Here are the first three: LG C9 OLED TV; the best 4K TV overall. Vizio M Series Quantum; our favorite smart-TV value. And TCL 6-Series 65-inch Roku TV; a killer big-screen value with Roku. Do you want to hear more?
At the time of this query, the text results for the same search term showed Tom’s Guide as the third-placed site on the results page. Campbell noted that voice-based search “doesn’t always pull the first choice.”
Google Assistant pulls from different websites at different times, depending on its confidence in the accuracy of its sources, which is based on user responses and other factors.
In text-based search results, the top websites are shown, but voice search on a smart speaker generally gives only one to three results, so a variation in sources has a much bigger impact on responses. Additionally, instead of sourcing from websites, Google Assistant might pull a query about local pizza from, for example, Google My Business.
Speech is different
How can a marketer plan for that kind of varying and limited results? There are virtually no ads on smart speakers or other voice-only devices, and no way to directly buy first position. A marketer can help ensure that information on a website is voice-friendly, employing voice schema,and such voice-friendly listings as “#1 HDTV brand, #2 HDTV brand,” for instance, but this doesn’t offer the kind of detail and reliable listing of results that marketers know in text-based results.
And that’s not even accounting for the fact that it’s a spoken request.
Speech is a different animal than text. One user might say to Alexa on the Amazon Echo: “Italian restaurants near me.”
Another user might say: “Alexa, I’m really hungry. It’s raining outside, so tell me the Italian restaurants that deliver to my location.
For text searches, humans have learned how to distill their search terms into phrasing the engine can parse. But once humans enter conversational mode, there can be as many variations as there are in human speech.
The conversational flavor
For marketers, none of the conversational flavor of requests, such as tone, urgency, or tentativeness, comes through in the first wave of voice data reports, like those from Pulse Labs or Rabbit & Pork, yet they are vital elements in how humans respond to requests or conversational interactions.
In some ways, the marketers’ dilemma in understanding the voice revolution is similar to the challenges faced inside augmented or virtual reality environments.
The ways users interact with product choices via voice or through virtual environments is fundamentally different than the ways in which they interact with a text-based point-and-click browser.
The key reason: user interactions through voice, AR, and VR are coming closer to resembling human interactions in the physical world. As voice agents become more intelligent, the conversational nature of the voice interaction will become more prominent and more like talking to another human.
After 25 years of learning how to capture and analyze data from visual-based online interaction, marketers now need to acquire more data and develop better terms and context to better understand what happens when voice takes over the interaction.
As voice does take over, the entire infrastructure will need to better address how and when users give consent for the use of the data in their spoken words. This important piece has not been ironed out yet, but with consumers continuing to use voice commands and rely on voice assistants and growing data regulation, this is likely to be an area of focus in the future.