- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In silence environment, the voice recognition worked excellent. But in the environment with background noise, the rate of voice recognition dropped sharply. This is a problem for real application.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The good old cocktail party effect...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That is always going to be the case with most voice rec tech.
I've found that almost always the quality on a recording is quite low for audio streams that are being used for recognition - my understanding is that this is to enable a balance of performance - a high bit rate would take much longer to process and would cause also cause memory constraints that translate to shorter lengths of audio to be processed (well in an async recognition service that takes context of the entire sentence).
The only way around this is to concentrate at a hardware level on noise reduction.
I wish it was much better also.
Best of luck.
Cheers
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think the problem is that our brains are so great at their job that most people assume that a lot of "trivial tasks", like listening to a conversation in a noisy environment, will be easy for a computer when in fact they are really hard.
It's good to have a brain :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Brennon W. wrote:
That is always going to be the case with most voice rec tech.
I've found that almost always the quality on a recording is quite low for audio streams that are being used for recognition - my understanding is that this is to enable a balance of performance - a high bit rate would take much longer to process and would cause also cause memory constraints that translate to shorter lengths of audio to be processed (well in an async recognition service that takes context of the entire sentence).
The only way around this is to concentrate at a hardware level on noise reduction.
But even the white noise can make voice recognition worse in RS. There are many methods to cancellation noise. The mic array was supposed to work better. The slow computing speed can be solved by software optimization or hardware acceleration. If the basic functions are not faster enough, the complex applications may become disable. i.e. run voice recognition, face recognition, and gesture recognition in multiple threads.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page