Fujitsu Laboratories have developed voice-analysis technology that can automatically identify times when customers feel satisfied or dissatisfied in their conversations with service personnel.
Service provided by personnel in support of customers, such as at call centers and bank teller windows, directly relates to the company’s image, making training critical. Given this, in the past, efforts have been made to convert conversations with customers to text with speech recognition technology in order to grasp a customer’s sense of satisfaction. However, there are cases where the same words can convey either satisfaction or dissatisfaction, depending upon the tone of voice. For this reason, based on word content alone, it was impossible to fully grasp the customer’s sense of satisfaction.
Now, using a method that takes into account not only the average pitch of the voice and the degree of variation, but also the characteristic changes at relative points within voice data that covers multiple words – namely at the start or end of speech – Fujitsu has succeeded in a highly accurate quantification of voice cheerfulness. In addition, using machine learning in conjunction with customer-service evaluations, Fujitsu has developed a technology that can automatically identify times in a conversation when a customer is satisfied or dissatisfied with roughly 70% accuracy, as compared with results of a determination made by the human ear.
In a field trial using this technology in Fujitsu Limited and Fujitsu FSAS Inc. call centers, Fujitsu confirmed increases in the efficiency of training, such as monitoring and evaluation of support personnel and feedback on the results, reducing the time required by about 30%. Moreover, there was a greater degree of acceptance by both the evaluator and the person being evaluated due to increased objectivity.
Going forward, Fujitsu Laboratories intends to incorporate this technology into Human Centric AI Zinrai, the AI technology of Fujitsu Limited, and offer it as a product for use in customer-service evaluations and in the training of service personnel in a variety of enterprises that emphasize communication, such as banks and retail stores.
In communication, both remotely, as in call centers and distance learning services, as well as in such face-to-face situations as bank teller windows, the service provided to customers by service personnel is directly related to the company’s image. This means that training for service personnel is seen as extremely important.
Until now, information was sometimes gathered through customer surveys and used in
training, but in many cases, the only data that could be obtained was the overall evaluation of the service. This made it difficult to determine what service personnel did wrong in the conversation. Also, in addition analyzing purchasing history for individual customers, surveys and other marketing activities, there is a need to discern, based on the customer’s actual voice, such as can be heard in call centers, what the customer wants with regard to specific products or services, for example, what the customer’s requirements are or what points need improvement. In so doing it is hoped that data from conversations with customers, gathered from service situations where personnel interact directly with customers, could be used in such areas as evaluating the quality of service, training employees, and marketing strategies
Previously, there were efforts to grasp customer feelings by converting conversations with customers to text using voice recognition software, but not only do actual conversations not always follow ordinary grammar, but they can also be impacted by surrounding noise. As a result, there were many technological difficulties in converting conversations to text through voice recognition, such as many instances where the conversion was incorrect. In addition, even if people use the same words, the meaning may change in a variety of ways depending on their emotions, so it was difficult to correctly interpret the customer’s emotions using methods that analyze speech that has been converted to text.
About the Technology
Fujitsu has now developed technology to automatically determine times in a conversation when customers feel satisfied or dissatisfied based on features of their voices in relation to the way they are speaking with service personnel. With this technology, customer service personnel and service providers can rapidly evaluate service content and improve support methods. Features of the newly developed technology are as follows.
Technology to quantify “voice cheerfulness” from patterns of changes in voice pitch
A cheerful voice is ordinarily one with a high tone, or one in which the voice’s tone and volume change a great deal. Moreover, Fujitsu Laboratories has determined through proprietary research that cheerful voices have unique characteristics of change at the beginnings and endings of statements. As a result, by using a method that takes into account unique changes at relative positions in voice data across multiple words, in addition to analyzing a voice’s average pitch and changes, Fujitsu Laboratories has succeeded in accurately quantifying the cheerfulness of a voice (Figure 2).
In addition, because there is a highly correlated relationship between perceived voice cheerfulness and degree of satisfaction, Fujitsu Laboratories has quantified the sense of satisfaction in a conversation, using the quantified cheerfulness of a voice in a proprietary conversion formula based on survey results (Figure 3). By combining this with customer-service evaluations, and using machine learning to find a threshold point between satisfaction and dissatisfaction, Fujitsu Laboratories has developed technology to automatically identify times during a conversation when the customer is satisfied or dissatisfied. With the learning, the customer can adopt the results of the response evaluation on site, and fixed standards can also be customized for each location.
With this technology, Fujitsu Laboratories was able to determine times of satisfaction and dissatisfaction in conversations in customer service situations with about 70% accuracy compared with results of a determination made with the human ear. This has made it easier, in training customer service personnel, for the person receiving the results to understand which parts of the conversation were good or bad, leading to efficient improvement in service skills.
This technology was field tested in three of Fujitsu and Fujitsu FSAS’s call centers. In addition to this technology, the trial used evaluation tools that also implemented technology to identify problem areas in customer interaction, such as when service personnel spoke over customers or paused too long before responding. As a result, the field test confirmed that not only were there great increases in the efficiency of training through monitoring and evaluating service personnel and providing feedback on the results, with a 30% reduction in time required, but the acceptance of the evaluation by both the evaluator and person being evaluated was significantly increased because of the increased objectivity of the evaluation.
This technology can be expected to lead to improvements in customers’ sense of satisfaction and training efficiency due to increased service skills in a variety of customer contact situations, including call centers.