KT Optimizes Performance for Personalized Text-to-Speech

Arlen_Reyes · ‎11-19-2021

Authors

Xu Jing, Technical Consulting Engineer, Intel

Kyle Park, Cloud Software Development Engineer, Intel

Minkyu Joo, Account Executive, Intel

Soongu Kwon, Research Engineer, R&D Center, KT

Jaehan Park, Project Manager, R&D Center, KT

At a Glance

South Korea’s telecoms leader worked with Intel to optimize and evaluate the performance of its new personalized text-to-speech (P-TTS) service.
KT’s code improved performance by 22 percent from PyTorch 1.7 to PyTorch 1.9 while maintaining voice quality and user connections.
KT can confidently deploy its high-volume P-TTS services on Intel® architecture, benefiting data center efficiency and total cost of ownership.

Executive Summary

As South Korea’s premier telecommunications company, KT Corporation needs powerful, cost-effective infrastructure for deploying innovative artificial intelligence (AI) services at scale. The company was especially interested in whether Intel® Xeon® Scalable processors could provide the performance, latency, and throughput for its growing portfolio of personalized text-to-speech (P-TTS) services. These services use deep learning to enhance voice-based services for businesses and consumers.

Technologists from KT and Intel worked together to optimize performance of the company’s P-TTS service. Software engineers used Intel® Extension for PyTorch*, which is part of the Intel® oneAPI Deep Neural Network (Intel oneDNN) library. Engineers tested the code on 2nd Gen Intel Xeon Scalable processors with PyTorch 1.9, which incorporates many enhancements from Intel Extension for PyTorch and Intel oneDNN.

The optimized solution met KT’s rigorous service level agreements (SLAs) for deploying its P-TTS services. Running the P-TTS service on Intel Xeon Scalable processors rather than GPUs can help KT reduce total cost of ownership (TCO) and use its server infrastructure efficiently.

Business Background

KT Corporation was founded in 1981 as Korea Telecom. Today, it is Korea’s largest comprehensive communications operator, with more than 53 million users of its services. Its portfolio includes mobile telecommunications, internet, and fixed-line telephones and other solutions. KT is also a leader in 5G networks, providing coverage to 85 cities and 1.41 million subscribers.

KT’s growth plan includes commercializing a variety of convergence services based on intelligent networks. Services range from blockchain, to augmented reality and virtual reality (AR/VR), to connected cars. Services are deployed in media and content, finance, real estate, and other industries.

Personalized Text-to-Speech

Personalized text-to-speech adds deep learning to traditional text-to-speech, making it possible to read text in a customized voice that closely matches an input voice. P-TTS is an important part of KT’s new business portfolio. The company has incorporated P-TTS into services it delivers to millions of users.

KT’s P-TTS capabilities offer new opportunities in areas such as enterprise computing, retail, education, and consumer electronics. The global TTS market was valued at USD2.0 billion in 2020. The market is predicted to grow to USD5.0 billion by 2026, experiencing a cumulative annual growth rate (CAGR) of 14.6 percent.

KT’s P-TTS solutions are helping enterprises evolve their call centers into AI-enhanced smart contact centers (AICCs). P-TTS and AI can supplement the work of counselors by acting as voice-enabled AI chatbots. This approach improves counselor productivity and speeds counselor training.

Enterprises also use P-TTS to generate real-time conversation logs and offer product recommendations and suggestions on the counselor’s screen. Banks, credit card companies, universities, and other companies use KT’s P-TTS-enabled AICC solutions to improve customer responsiveness while reducing costs.

In the consumer space, P-TTS can deliver satisfying, customizable experiences. For example, parents can train KT’s P-TTS-enabled virtual assistant, GIGA-Genie*, to read a bedtime story in the parent’s voice. KT has also used P-TTS and voice synthesis technology to create unique voices for deaf individuals based on family voices and speech patterns.

KT’s P-TTS Service Requirement

KT has a large private cloud based on Intel Xeon Scalable processors, and is eager to expand the cloud to support P-TTS and other AI-based services. Doing so enables KT to optimize the use of existing infrastructure and improve efficiency for its data centers. Deploying on Intel processors can also reduce TCO compared to deployment on GPUs, by offering lower energy consumption and power costs.

[1] Caring about You: KT Integrated Report 2020, https://corp.kt.com/archive/ipgrpt/attach/2020/2020_ENG_Archive.pdf.

[1] Markets and Markets, Text-to-Speech Market, https://www.marketsandmarkets.com/Market-Reports/text-to-speech-market-2434298.html

[1] KT Customer Service Case – Next-Generation Contact Center Future and AI Conference, April 20, 2021. Translated from Korean to English via Google Translate. https://blog.naver.com/songcoolsu/222317239061

Before deploying the services on its Intel-based cloud, KT needed to know that the cloud would meet the company’s stringent SLAs for performance, latency, throughput, and voice quality. The company also wanted to be sure that the solution could accommodate varying numbers of concurrent user connections, including periods of peak connectivity.

Optimizing P-TTS with Intel® AI Technologies

KT developed its P-TTS software using PyTorch, an open-source framework for machine learning that was developed primarily at Facebook’s AI Research Lab. Intel works to optimize performance and time-to-market for PyTorch, adding features aimed at delivering outstanding performance out of the box on Intel platforms.

Depending on the optimization techniques, this work takes advantage of Intel® oneAPI Deep Neural Network Library (oneDNN) and upstreams Push Requests to the official PyTorch GitHub repository. Other instances use operator-direct optimizations for ATen, the PyTorch tensor library. Enhancements are often available first through Intel Extension for PyTorch, which is part of Intel oneAPI.

Intel oneDNN

Intel oneDNN is an open-source, cross-platform performance library for deep learning applications. Previously known as Intel® Math Kernel Library for Deep Neural Networks (Intel MKL-DNN), oneDNN aims to improve performance for deep learning applications and frameworks on Intel technologies. Intel oneDNN provides numerous optimizations, including Intel® Advanced Vectorization Extensions 512 (Intel AVX-512), for accurate, low-latency floating-point operations compared to earlier AVX2 instructions.

Intel oneDNN is distributed as part of the Intel oneAPI DL Framework Developer Toolkit and the Intel oneAPI Base Toolkit. It includes basic building blocks for neural networks optimized for Intel architecture processors, Intel processor graphics, and Intel® Xe graphics.

Intel oneDNN also offers experimental support for Arm 64-bit architecture (AArch64), NVIDIA GPUs, OpenPOWER Power ISA (PPC64), and IBM Z (s390x). The oneAPI toolkit provides a straightforward migration path for deep learning codes that have been developed and trained on GPUs.

Intel Extension for PyTorch

Intel Extension for PyTorch is an open source project maintained by Intel, and as the name suggests, it extends the official PyTorch framework. This project provides features to improve the out-of-the-box PyTorch experience further, plus up-to-date Intel techniques to boost performance of deep learning models on Intel platforms.

Building on oneDNN, Intel Extension for PyTorch supports mixed precision operators and multiple data types. The software optimizes performance-critical DNN operations such as convolution and batch normalization, and supports a variety of graph fusion optimizations.

As part of the Intel® AI Analytics Toolkit powered by oneAPI, Intel Extensions for PyTorch can be downloaded and installed from GitHub at no charge. Developers can take advantage of many of its improvements without changes to existing code.

Results

The KT and Intel software engineers used PyTorch profiler to analyze the solution’s performance, and Intel Extension for PyTorch and oneDNN to explore the opportunities for optimizing the code. They identified several operators that appeared to be possible bottlenecks, and did optimizations against these bottlenecks.

The engineers evaluated their code on a newer version of PyTorch, version 1.9. This version integrated a number of optimizations from Intel, along with a version upgrade of Intel MKL and Intel oneDNN in PyTorch.

The team found that Real-time Function (RTF), a common performance measure for speech recognition systems, had improved significantly from version 1.7 (Table 1). The optimized CPU-based solution increased RTF performance by 22 percent while maintaining voice quality and number of connections.

Table 1. Real-Time Function (RTF) Improvement on Bare Metal Servers+

	Cores	Number of instances	Ratio PyTorch 1.9.1 vs 1.7.1 The higher, 1.9.1, runs faster
RTF@1	3	1	1.22

+Platforms used 3nd Gen Intel Xeon Scalable processors, the CentOS 8* operating system, and PyTorch 1.7.1 and 1.9.1.

Ongoing Collaboration

Satisfied with the optimization results, KT is deploying the new P-TTS service at scale on its current cloud infrastructure.

KT and Intel continue to collaborate on a range of topics. As part of this collaboration, KT plans to explore the impact of 3rd Gen Intel Xeon Scalable processors for P-TTS and other AI-enabled services. New 3rd Gen Intel Xeon Scalable processors provide integrated AI acceleration with Intel DL Boost technology.

These third-generation processors support new Intel AVX-512 instructions that use the bfloat16 (Brain Floating Point) format to improve the performance of many deep learning tasks. 3rd Gen Intel Xeon Scalable processors also handle mixed precision modes, offering the flexibility to choose INT8, FP16, and FP32 for increased throughput while maintaining accuracy.

Conclusion

As an international telecommunications innovator and South Korea’s largest telecom company, KT is a leader in creating AI-enhanced digital services for home, work, and on the go. By working closely with Intel and incorporating Intel Xeon Scalable processors into its P-TTS services, KT can continue to deploy these new services at scale on powerful, cost-effective infrastructure.

Learn more. Contact your Intel representative or visit intel.com/ai.

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure.

Your costs and results may vary.

Intel technologies may require enabled hardware, software or service activation.

Intel does not control or audit third-party data. You should consult other sources to evaluate accuracy.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.

Jade_Moon · ‎11-22-2021

Congrats! It's such an amazing achievement. KT's P-TTS services would be the best fit for those who are running large scale of call centers and planning to adopt AI services to augment its capabilities under pandemic and afterwards.