Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

Voice Activity Detection

Pablo_Royo
Beginner
277 Views

Hi

I am testing your example about VAD. It works, but it seems quite dependent on the audio frame size. It works quite well for a length of 32 msec of speech,but when i change that value to fit my requirements (80ms) it performs bad. Is there any reason by which IPP VAD is specially fit to this value? Also, is there any other document appart from the VADcommon.h header file to know how to select proper values for all VAD parameters?

Thanks

0 Kudos
1 Reply
Vyacheslav_Baranniko
New Contributor II
277 Views
Quoting - Pablo Royo

Hi

I am testing your example about VAD. It works, but it seems quite dependent on the audio frame size. It works quite well for a length of 32 msec of speech,but when i change that value to fit my requirements (80ms) it performs bad. Is there any reason by which IPP VAD is specially fit to this value? Also, is there any other document appart from the VADcommon.h header file to know how to select proper values for all VAD parameters?

Thanks


Hellow Pablo,

thanks for using IPP VAD,

I guess you do use one from speech-recognition samples, right? Actually,its functionality was ported to another sample IPP NR which issituatedin the speech-codec samples.This IPP NR sample supports noise reduction and along with several VAD algorithms: G723, G729 andthat one ported from speech-recognition, i.e. IPP_VAD).

All VAD algorithms provided byIPP do have hardcodedframe sizes: G723 - 30ms, G729 - 10ms narrow band VADs, AMRWB - 20ms, and original IPP VAD - 16 ms wideband VADs.Please, note IPP VAD supports16ms frame.32ms is its internal decision window buffer size with16ms overlap, so does it has framesize 16ms.

I think, both AMRWB 20msand IPP VAD 16mswill suite for your wideband 80ms task.

Regards,

Vyacheslav

0 Kudos
Reply