Re: Anyone working with G.729 decoder?

softrite · ‎12-16-2009

I am curious if anyone has gotton this thing to work in a live RTP environment... other than just runningIntel samples where they encode and decode data wav audio at the same time. That is all the sample code seems to do. That, of course is easy to do. Real world scenarious are different.

I have decoded a live Voip RTP stream using the Intel ipp G.729 1 decoder. It is attached here. Please give it a listen and let me know if any of you can tell what is wrong. It starts out fairly good but then gets worse and worse. The packets are arriving in 20 byte sets so I am assuming these are two 10 ms frames and I amusing the frametype of "5".

See here below: "G729B compatible mode, depending on two consecutive 10ms frames with numbers 1,2 comprised into one 20ms frame: Voice1+Voice2 (5)"
I am guessing that the (5) means the frametype value to be passed in but that is only a guess since nowhere does it explain what to set the frametype to.

This here is about all that Intel offers for documentation on how to setupthis interface. There is some explanation of various API's parameters in the IPP USC Interface Referance Manual but nothing on how to use them anywhere... so I am stuck. In this doc there is a rough model of a sample but it encodes and decodes immediatly as I stated above, which doesn't represent the timing issues encounteredin a real world environment. Also it doesn't show how to deal with VAD/SID and lost packets (PLC). Struggling through and trying to "read the code" which has almost no comments in it is very time consuming.

================================================
USC G729.1 codec is supported with following parameters:

Codec names: IPP_G7291

Compression algorithms: Embedded CELP (50-4000 Hz), Time-Domain Bandwidth Extension (TD-BWE) for the higher band (4000-7000 Hz) and transform coding scheme referred as Time-Domain Aliasing Cancellation (TDAC) for full band (50-7000Hz).

Linkage: USC_G7291_Fxns,

Sampling: 16bit linear 8000 and 16000 Hz.

Frame: 20ms. Two 10ms frames in G729B compatible narrowband mode or 20ms for wideband.

Bitrates: 8000 bps (160 bpf, 20 bytes either one 20ms wb frame or two equal 10ms NB frames), 12000 bps (240 bpf, 30 bytes), 14000 bps (280 bpf, 35 bytes), 16000 bps (320 bpf, 40 bytes), 18000 bps (360 bpf, 45 bytes), 20000 bps (400 bpf, 50 bytes), 22000 bps (440 bpf, 55 bytes), 24000 bps (480 bpf, 60 bytes), 26000 bps (520 bpf, 65 bytes), 28000 bps (560 bpf, 70 bytes), 30000 bps (600 bpf, 75 bytes), 32000 bps (640 bpf, 80 bytes).

Voice Activity Detection: VAD supported for 729B compatible mode (decoder only)

Packet Loss Concealment: PLC supported

Frame type (value):

G729EV wb codec: Voice (0), Erased (1).

G729B compatible mode, depending on two consecutive 10ms frames with numbers 1,2 comprised into one 20ms frame: Voice1+Voice2 (5), SID1+Voice2 (6), Voice1+SID2 (9), SID1+SID2 (10), Erased( -1).

Standard: ITU-T G.729.1, Interoperable with ITU-T G.729, Annexes A, B

Vyacheslav_Baranniko · ‎12-17-2009

Quoting - softrite

I am curious if anyone has gotton this thing to work in a live RTP environment... other than just runningIntel samples where they encode and decode data wav audio at the same time. That is all the sample code seems to do. That, of course is easy to do. Real world scenarious are different.

I have decoded a live Voip RTP stream using the Intel ipp G.729 1 decoder. It is attached here. Please give it a listen and let me know if any of you can tell what is wrong. It starts out fairly good but then gets worse and worse. The packets are arriving in 20 byte sets so I am assuming these are two 10 ms frames and I amusing the frametype of "5".

See here below: "G729B compatible mode, depending on two consecutive 10ms frames with numbers 1,2 comprised into one 20ms frame: Voice1+Voice2 (5)"
I am guessing that the (5) means the frametype value to be passed in but that is only a guess since nowhere does it explain what to set the frametype to.

This here is about all that Intel offers for documentation on how to setupthis interface. There is some explanation of various API's parameters in the IPP USC Interface Referance Manual but nothing on how to use them anywhere... so I am stuck. In this doc there is a rough model of a sample but it encodes and decodes immediatly as I stated above, which doesn't represent the timing issues encounteredin a real world environment. Also it doesn't show how to deal with VAD/SID and lost packets (PLC). Struggling through and trying to "read the code" which has almost no comments in it is very time consuming.

================================================
USC G729.1 codec is supported with following parameters:

Codec names: IPP_G7291

Compression algorithms: Embedded CELP (50-4000 Hz), Time-Domain Bandwidth Extension (TD-BWE) for the higher band (4000-7000 Hz) and transform coding scheme referred as Time-Domain Aliasing Cancellation (TDAC) for full band (50-7000Hz).

Linkage: USC_G7291_Fxns,

Sampling: 16bit linear 8000 and 16000 Hz.

Frame: 20ms. Two 10ms frames in G729B compatible narrowband mode or 20ms for wideband.

Bitrates: 8000 bps (160 bpf, 20 bytes either one 20ms wb frame or two equal 10ms NB frames), 12000 bps (240 bpf, 30 bytes), 14000 bps (280 bpf, 35 bytes), 16000 bps (320 bpf, 40 bytes), 18000 bps (360 bpf, 45 bytes), 20000 bps (400 bpf, 50 bytes), 22000 bps (440 bpf, 55 bytes), 24000 bps (480 bpf, 60 bytes), 26000 bps (520 bpf, 65 bytes), 28000 bps (560 bpf, 70 bytes), 30000 bps (600 bpf, 75 bytes), 32000 bps (640 bpf, 80 bytes).

Voice Activity Detection: VAD supported for 729B compatible mode (decoder only)

Packet Loss Concealment: PLC supported

Frame type (value):

G729EV wb codec: Voice (0), Erased (1).

G729B compatible mode, depending on two consecutive 10ms frames with numbers 1,2 comprised into one 20ms frame: Voice1+Voice2 (5), SID1+Voice2 (6), Voice1+SID2 (9), SID1+SID2 (10), Erased( -1).

Standard: ITU-T G.729.1, Interoperable with ITU-T G.729, Annexes A, B

IPP implementedearly G.729.1 v1.0 (2006). ITU further issuedAmendments 1, 2, 3 (2007), 4, 5 (2008) and Corrigendum 1 (2009) included intoG.729.1 v1.5.DTX update among them (Annex C). Currently IPP hasno plans forsupport of those.

Vyacheslav, IPP speech codecs

softrite · ‎12-18-2009

Quoting - vbaranni

IPP implementedearly G.729. v1.0 (2006). ITU further issuedAmendments 1, 2, 3 (2007), 4, 5 (2008) and Corrigendum 1 (2009) included intoG.729.1 v1.5.DTX update among them (Annex C). Currently IPP hasno plans forsupport of those.

Vyacheslav, IPP speech codecs

So are you suggesting that the G.729 codec is not usable? -or that it would take a lot of additional work to make it something usable? DTX is not needed in most instances. The current IPP 729 has VAD/SID/CNG/PLC in it already. Many of the additions you name here are extensions VoIP would not use because they extend the range and sampling rate wich is not usually what people in charge of internet traffic are looking for when thay choose to go with 729 to save bandwidth. The 2006 ITU 729 AB spec supported by ipp G.729 1 should function and I am saying my RTP sample does not contain any of the 2007, 2008, 2009 extensions and it still is garbled. Is there anyone at Intel support that understands this codec can listen to the attachments (here)and suggest what might be wrong?

Vyacheslav_Baranniko · ‎12-20-2009

Quoting - softrite

So are you suggesting that the G.729 codec is not usable? -or that it would take a lot of additional work to make it something usable? DTX is not needed in most instances. The current IPP 729 has VAD/SID/CNG/PLC in it already. Many of the additions you name here are extensions VoIP would not use because they extend the range and sampling rate wich is not usually what people in charge of internet traffic are looking for when thay choose to go with 729 to save bandwidth. The 2006 ITU 729 AB spec supported by ipp G.729 1 should function and I am saying my RTP sample does not contain any of the 2007, 2008, 2009 extensions and it still is garbled. Is there anyone at Intel support that understands this codec can listen to the attachments (here)and suggest what might be wrong?

The ITU G.729 and ITU G.729.1 are two different specifications. From thread name one may confuse what codec do you mean. Among many others IPP providesUSC_G7291_Fxns (ITU G.729.1,which my previous post was about) andUSC_G729I_Fxns (ITU G.729) . As far asIPP USC_G729I_Fxns codec is concerned it is fully conformant to the latest ITU spec, while IPP USC_G7291_Fxnscodec is not up-to-dated with latest ITUspecs.

In any case, it would benice to havenot only decode output attached but an input RTP stream as well which wasdecoded.

Vyacheslav

softrite · ‎01-13-2010

Well if anyone ever reads this thread on decoding G.729 stuff... lemme save you some time. The Intel samples are cool and all but they do one thing that is not real world... and it can cause you difficulties. Intel sample applications round trip an input file by encoding it and decoding immediately it to show how things work. In the real world a G.729 conversation over VOIP (for example) always has two streams going simultaneously in opposite directions. If you do what I did and try and decode the streams using one decoder, it will not work. You need to run two separate instances of the entire decoder suite, one for each direction. When the session is done, you may reinit and reuse the instance or just use C++ objects and dispose and new another set. Please send all contributions to softrite in my name as I possibly just saved you several days or weeks of work!

Ying_H_Intel · ‎01-14-2010

Hi Softrite,

Thank you a lot for sharing the experience. I believe more users will take care to start separate instances for difference directionalstreaming when build a real VOIP solution based on the USC sample code.

Best Regards,

Ying

Vyacheslav_Baranniko · ‎01-14-2010

Thank you for this important observation!ForP2P VoIPwith endpoint indifferent locationsa singlepair of encode and decodeinstancies canservetwo RTP streams: to encodeoutgoing and to decode incoming ones.Whilewithseveral incoming RTP streams (conference case)each endpointmust haveone independentdecodeinstance per stream.

Vyacheslav

IPP

Vyacheslav_Baranniko · ‎01-19-2010

Worth to note that the example code provided within USC manual (pdf)is not the onlyIPP speech-codec sample.

There isone more application like sample which thoughof course again is not a VoIP client or PBX one might expect from IPP,it is not usable as-is inreal network scenario like p2p communication or conference server.Nevertheless,umc_rtp_speech_codec sample is ready-to-build-and-executecode capable to encode to or decodefrom real RTP streams,for example captured using Wireshark andwhich we hope can be usefull at least in understanding how to build realthingusing USC codecs.

Regards,

Vyacheslav