Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Pablo_Royo
Beginner
86 Views

Voice Activity Detection

Hi

I am testing your example about VAD. It works, but it seems quite dependent on the audio frame size. It works quite well for a length of 32 msec of speech,but when i change that value to fit my requirements (80ms) it performs bad. Is there any reason by which IPP VAD is specially fit to this value? Also, is there any other document appart from the VADcommon.h header file to know how to select proper values for all VAD parameters?

Thanks

0 Kudos
1 Reply
Vyacheslav_Baranniko
New Contributor II
86 Views

Quoting - Pablo Royo

Hi

I am testing your example about VAD. It works, but it seems quite dependent on the audio frame size. It works quite well for a length of 32 msec of speech,but when i change that value to fit my requirements (80ms) it performs bad. Is there any reason by which IPP VAD is specially fit to this value? Also, is there any other document appart from the VADcommon.h header file to know how to select proper values for all VAD parameters?

Thanks


Hellow Pablo,

thanks for using IPP VAD,

I guess you do use one from speech-recognition samples, right? Actually,its functionality was ported to another sample IPP NR which issituatedin the speech-codec samples.This IPP NR sample supports noise reduction and along with several VAD algorithms: G723, G729 andthat one ported from speech-recognition, i.e. IPP_VAD).

All VAD algorithms provided byIPP do have hardcodedframe sizes: G723 - 30ms, G729 - 10ms narrow band VADs, AMRWB - 20ms, and original IPP VAD - 16 ms wideband VADs.Please, note IPP VAD supports16ms frame.32ms is its internal decision window buffer size with16ms overlap, so does it has framesize 16ms.

I think, both AMRWB 20msand IPP VAD 16mswill suite for your wideband 80ms task.

Regards,

Vyacheslav

Reply