Intel Poland: Creating the voice-and vision-enabled assistive devices of the future

We_Are_Intel1 · ‎02-19-2020

Learn how this software engineering manager and his team in Gdansk are using artificial intelligence to train your smart assistants to be smarter.

Meet Grzegorz Nawrot, a software engineering manager in the Infrastructure and Platform Solutions Group (IPSG) at Intel Poland. He and his team are developing solutions for smart assistants for global vendors like Amazon and Microsoft.

We recently spoke with Grzegorz about his career, the impact of artificial intelligence in assistive technologies, and what IPSG looks for in an ideal candidate.

What does your team do and what projects are you working on?

Our domain is everything that has to do with audio, including voice and speech. We deliver an audio solution based on the integrated audio digital signal processor (DSP). Basically, it is a dedicated audio coprocessor which can be used for processing and offloading any audio content.

Our most visible product is the Intel® Wake on Voice (Intel® WOV). It was developed in collaboration with teams in Poland and Germany. The technology allows for multilingual speech recognition where a key phrase, i.e. “Hi, Alexa,” is spoken. The low-power audio DSP processor will detect that key phrase, and this triggers the voice assistant to respond to the question that follows the key phrase.

How does this neural network-based processing work? We train the model to recognize a specific key phrase and respond with clear indication about the certainty of detection of a given key phrase. Every language requires a different model. We have a fully automated, repeatable process for obtaining the language model utilizing the training material. For example, 50,000 key phrases spoken by different people of all ages—children, women, men—with different dialects of a given language. This material is then used to train the neural network model.

We have an anechoic chamber in Gdansk—it is isolated from all ambient noise and ensures there is no deflection of sound. We use this environment to test our audio solutions and verify the quality of our intellectual property, ensuring that we deliver the targeted parameters of every solution.

What is the purpose and impact of this technology?

With any new technology, usability is critical. If you try to use a new feature, but it is not precise and not user-friendly, you stop using it. The trend was that older interfaces were moving from text to touch, and now we are moving to voice and speech.

Our Intel WOV solution is integrated as the hardware keyword spotter for Cortana. We worked with Microsoft to enable this support, and with Amazon to deliver the brand-new Alexa for PC incorporating Intel WOV.

We are not just targeting PCs. We have Intel WOV on smart devices like intelligent refrigerators that allow people to ask what is inside and generate a shopping list.

For people to adopt these new voice- and speech-based user interfaces, we must ensure very good quality and precision in recognition. We improve the recognition of speech and additional sensors. For example, we are also working on integrating vision. Future devices are going to be equipped with such capabilities that are going to improve people’s interactions with them. Vision could simply be used to detect that somebody is approaching the device. This would switch on the device, so it is ready to respond to your interactions. You would not need to touch it or talk to it, just get close to it.

What makes your group successful?

Our pursuit of innovation and driving solutions. For example, with Intel WOV, we are already on the third generation of our solution. We are developing technologies fully based on neural networks and capable of learning much faster. This is innovation. This is also made possible by our close cooperation with our partners.

There is no routine. Every day is different than the previous one. The pace of the changes that we introduce and collaborate on with our partners is really the thing that drives everybody forward. It is very satisfying when you see solutions you have worked on delivered to the market, be accepted by people, used, and enjoyed.

What would the ideal candidate joining your team bring to the table?

We are always looking for experts in the areas of audio, voice, and speech. Programmers comfortable with C, C++, and Java. However, the most important thing we look for in any candidate is an eagerness to learn new things.

If you look at a person’s education, especially those people who just recently graduated from university, of course they have not had a chance to do anything close to what we are working on at Intel Poland.

But if they have taken opportunities to try different things, and if they can quickly learn and jump from one area to another, this shows the capability for learning that we value in our candidates.

Interested in opportunities at Intel Poland? Check out available openings here: https://www.intel.com/content/www/us/en/jobs/locations/poland.html