The good old cocktail party

CLi37 · ‎03-15-2015

In silence environment, the voice recognition worked excellent. But in the environment with background noise, the rate of voice recognition dropped sharply. This is a problem for real application.

samontab · ‎03-15-2015

The good old cocktail party effect...

Brennon_W_ · ‎03-17-2015

That is always going to be the case with most voice rec tech.

I've found that almost always the quality on a recording is quite low for audio streams that are being used for recognition - my understanding is that this is to enable a balance of performance - a high bit rate would take much longer to process and would cause also cause memory constraints that translate to shorter lengths of audio to be processed (well in an async recognition service that takes context of the entire sentence).

The only way around this is to concentrate at a hardware level on noise reduction.

I wish it was much better also.

Best of luck.

Cheers

samontab · ‎03-17-2015

I think the problem is that our brains are so great at their job that most people assume that a lot of "trivial tasks", like listening to a conversation in a noisy environment, will be easy for a computer when in fact they are really hard.

It's good to have a brain :)

CLi37 · ‎03-17-2015

Brennon W. wrote:

That is always going to be the case with most voice rec tech.

I've found that almost always the quality on a recording is quite low for audio streams that are being used for recognition - my understanding is that this is to enable a balance of performance - a high bit rate would take much longer to process and would cause also cause memory constraints that translate to shorter lengths of audio to be processed (well in an async recognition service that takes context of the entire sentence).

The only way around this is to concentrate at a hardware level on noise reduction.

But even the white noise can make voice recognition worse in RS. There are many methods to cancellation noise. The mic array was supposed to work better. The slow computing speed can be solved by software optimization or hardware acceleration. If the basic functions are not faster enough, the complex applications may become disable. i.e. run voice recognition, face recognition, and gesture recognition in multiple threads.

Poor voice recognition with background noise