Re: DSoundAudioRender delay - Page 2

bendeguy · ‎08-15-2006

Hi!

I'm using the DSoundAudioRenderer to play uncompressed pcm sound. The audio data provided by a stream(2 packet/sec).I put the arrived data to a MediaData objectas soon as it arrived. I experienced a delay about 1-2s when I play the sound. What can cause the problem? And how can I solve it?

Thanks in advance,

Bendeguy

bendeguy · ‎08-30-2006

Hi Sergey,

Thanks for your valuable answers. I would like to ask a few more questions about WinMMRenderer. I changed the BUF_SIZE to 4096 and the BUF_NUM_MAX to 50 and I'm feeding the renderer with PCM data (4096byte per packet). Is this setting correct and ensures that all of my audio will be played?
I get the audio(MuLawPCM) from a camera and I start to decode/play them after the second packet arrived. As soon as a packet arrived I decode them which puts them into the renderer's buffer. So when the first two packet arrives I decode them and I decode the next ones as soon asthey arrived.An audio packetplaying length is about 500ms. I decodes(between LockInputBuffer and UnLockInputBuffer)2048 byteschunks of the audio which's decoded size is 4096 byte. I get continuosly exactly enough audio from the camera but I experinced that after a while I run out of audio. It looks like the renderer loses some packet or don't play them. Is this possible?

Thanks in advance,

Bendeguy

Sergey_O_Intel1 · ‎08-30-2006

In you case you'll have 50 buffers of 4096 bytes. A buffer is feeded to the device when it is full. So you won't get any delay here. Due to the bug in current release if you last frame keeps less than 4096 bytes they won't be send to the device.

I didn't quite understand your algorithm. To get the best result one should follow several rules (it is actual not only for UMC renderers):

1) The loop which feeds the renderer should be as small as possible. In our case itshould only wait for the moment when renderer's buffer is free enough (LockInputBuffer), then copy audiodata to renderer and tell the renderer that data is updated (UnlockInputBuffer).

2) It is desirable to have a separate thread for this loop.

3) There shouldn't be any delays in feeding the renderer. So the best way is to decode data into a separate buffer and then feed it to render. Anyway everything should keep pace with real time.

bendeguy · ‎08-30-2006

Hi Sergey!

Which part of my algoritm wasn't clear(sorry for my bad english)? I used WinMMRenderer with the changed buffersize and buffernum. I tried to dothe playingsimilar to the ReverbDemo:

LockInputBuffer
DecodeFrame
UnLockInputBuffer

Because of my buffering of the audio(starts to play when the second packet arrived from the camera) there is enough time to decode the frame and send to the renderer. Example:

Two audio packet arrives with size 4096 bytes(512ms playing time) and 4096(512ms plying time) bytes:

First Packet:
I decode between lockand unlock 2048 byte -- that is 4096 decoded bytes
I decode between lockand unlock 2048 byte -- that is 4096 decoded bytes

SecondPacket:
I decode between lockand unlock 2048 byte -- that is 4096 decoded bytes
I decode between lockand unlock 2048 byte -- that is 4096 decoded bytes

As I seen then the renderer always calls the sendFrame method after this. From this point the audio is playing. After 512ms a new packet arrives with size 4096byte. At this point the renderer played the first packet(512ms) and has another 512ms audio to play. So I decode the third packet:

ThirdPacket:
I decode between lockand unlock 2048 byte -- that is 4096 decoded bytes
I decode between lockand unlock 2048 byte -- that is 4096 decoded bytes
I have 512ms for the decoding, so it can't be the source of the delay. And I repeat this until the camera sends audio data. So I alway have around 512ms to decode to the renderer. Is this algorithm correct? Ordidn't I noticed some basic information about audiorendering?
I experinced that when a packet arrives and I decode it in two part then both decoding happens before the call of sendframe. So the sendframe calls two times the waveOutWrite functions without waiting. Is it proper solution? I redabout the waveOutWrite and Ifound thatwhen the waveOutWrite finishes playing it sets a flag, and then the next waveOutWrite can be called. Is this right or not? What happens if I calla waveOutWrite during another one is playing sound?

Thanks in advance,

Bendeguy

Sergey_O_Intel1 · ‎09-01-2006

Yesterday I wrote you an aswer but when I pushed the button Post everything disappeared and I couldn't bear to write it again.:-)

At my point of view your algorithm is OK. Mu-law decoder works pretty fast so there's no need to create a separate thread for it.

As far as I can imagine the whole process of rendering with WinMMAudioRender looks like the following:

When you decode 2 frames from the cam you feed the renderer with 4x4096 bytes (i.e. 1s at 8kHz mono). It takes exactly 4 buffers from the renderer. When it plays 2x4096 bytes you get another frame from the cam. In ideal situation the renderer plays 3d buffer when you get another 4096 bytes from decoder and feed the renderer with it. When the 4th buffer is finished you send the six portion of 4096 bytes to the renderer. Now everything depends on synchronization. If the system was able to give some time to the renderer's thread than the 5th portion of 4096 bytes is already sent to the device and you won't hear any delay. BTW you will use only 4 buffers with this algorithm (not 50).

Answering you questions I can say that SendFrame calls waveOutWrite as many times as many buffers it can fill but not more than BUF_NUM_MAX. I still didn't catch if you have any problem with WinMMAudioRender now? If you do I can give you an advice to decrease the size of buffers (e.g. 2048 instead of 4096) or to increase the delay. Anyway it is worth trying.

-Sergey

bendeguy · ‎09-01-2006

Hello Sergey,

Thank you very much for your valuable help. To tell the truth unfortunatelly the audio size isn't correctly 4096 byte. It has various size for example 5750, 5520, 6000 and maybe some other size.I've tested my program a little more and I found very interesting things. Icount the timing of the audio and video playing from the length of audio. So if I get a packetwhich consist of10 video frames and has5520 byte audio, then I calculate the playlength(in ms) of the packet by the formula:

(audiosize * 2)/(16)
^decoded audio size ^sample rate /1000 * bytespersample

For 5520 this is 690 ms. This means 69ms between video frames. At the begining of the playing I store the actual value returned by vm_time_get_tick(). After this I decode a frame increase the starttime with one picture's duration and wait until I reach the calculated time, then I display the picture. (I didn't make the wait with Sleep because it has a resolution of 15,625ms on windows which is to inapropriate for my needs. I used instead timeSetEvent but it isn't an importatnt part of the problem). So I make it until the camera sends packets for me. I testedthe timingand compared itwith the windows system time changing and I found that I wait exactly as many time as I have to. And the processing rate is equals to thepacket incoming rate. So the timing is works well but here comes the problem. The audio playing is continuous (there isn't anysilence between the packets) but it looks like it last shorter time than it has to. For example when I'm buffering 8 packet of audio and video and starts the playing then at the begining they are synchronized. But by the progress of time the audio forerun the picture and in the end I hear the audio almost as soon as I make a voice but the video has the delay which comes from the buffering of 8 (or any) packets. And after this point I can hear the short silence int the video sometimes which comes from the variable arriving time of the packets. I mean sometimes a packet not arrive in time (lates a few millisec) but I solved this problem with the buffering. Sorry for my very lon letter, but I would like to tell as many information about my problem as possible to help your help:) So I think two problem can cause the problem.

1,the audio device plays the audio faster than it have to be played. I don't really think this is the problem.

2, I lose (or play without waiting enough) some audio data somewhere. I'm sure that I decode (between LockInputBuffer, UnlockInputBuffer) all of the incoming audio data.

What can be the source and the solution of the problem?

Thanks in advance
Bendeguy

bendeguy · ‎09-01-2006

I have one more thought about this all problem. Maybe a third possible source of the problem. You said that 4*4096 = 16384 byte 1 sec. Which means 8192 sample/sec not 8000. I think the camera sample rate is exactly 8000 and I set 8000 in the renderer too. But if this data played with 8192Hz instead of 8000Hz then this would explain the "lost" of the audio data. 8000 / 192 = 41,66 which means onesec delay will occur after 41,66 sec. I think this is the delay I can hear. Is this possible?

Greetings,

Bendeguy

Sergey_O_Intel1 · ‎09-01-2006

Hi!

First of all you should make sure that audio playback is OK (without video). I.e. there's no delays, cracks, flicks etc. From what cam do you grab the stream? In what format? You can play audio (eg some song) and note the time by an ordinary clock to understand if it is played faster than it ought to. Only when you are sure that there's no problem with audio you should start trying to synchronize it with video. In your case when you know the length of the frame you got and the number of video frames inside there shouldn't be any problem.

BTW do you have timestamps inside your stream? If you do you should put video on audio using them.

Sergey_O_Intel1 · ‎09-01-2006

You're right. 8kHz = 8000 samples = 16000 bytes. It's my mistake.

In your case the renderer will fill only 3 buffers and a part of the 4th at the beginning. Then it 'll fill the 4th and a part of the 5th. Finally in any case it will play all the data you send it. To increase the number of "ready" buffers you may try to decrease their size.

-Sergey

bendeguy · ‎09-01-2006

Hi Sergey!

I use an Axis 211 cam which produceG.711 (PCM 64kbit/sec MuLaw) audio. I decode the audio with the ippsMuLawToLin_8u16s function. I count the duration of the packet playbackfrom the size of audio(which doen't matter if I haven't got any video). As soon as a packet arrives the audiothread deocode it and puts it into the renderer's buffer. So the audio is continuous but it is played faster than it have to. At the beginingI hear the audio delayed as many time as I buffered(for example if I buffered8 audio packet then I hear the sound delayed buy the 8 packet's playing time) and afterthisthe delay decreases continuosly until it reach almost 0 value. I tried to stuff every 4000 decoded bytes with96 more but then the audio last longer than it have to. I tried to stuff only 48 bytes to every 4000 and everything was fine. The audio and the video was synchronized. Which confirms that the problem cannot be with the camera. Because it produces exactly enough audio. It looks like I haven't lost any data somewhere or I loose about 48 byte from every 4096 packet. I think so the problem can be because of the audio played a little faster than it have to be. Do anybody see any possible reasonfor this behavior?
The 48 byte stuffing worked for this (computer/soundcard) but I think it'll have problems on other systems. So I have two choce:
- write an adaptive algorithm which stuff as many byte as I need to keep the synchron ( I really don't want to do this)

-to find what can be wrong(if anything is wrong)

Thanks for all previous (and next) help. Greetings,

Bendeguy

Sergey_O_Intel1 · ‎09-01-2006

Hello Bendeguy!

Could you give a piece of code where you initialize WinMMAudioRender and give the values of BUF_SIZE and BUF_NUM_MAX? The renderer can't play a bit faster or a bit slower if it has enough data. It's impossible.

Try to play G711 stream reading it from a file not from camera. I think either you have a bug in your code or you can't tune the renderer with BUF_SIZE and BUF_NUM_MAX variable (it can affect only the delay before you hear the first sound). I know that there are people who used this renderer in their real time applications and there weren't any problems.

-Sergey

bendeguy · ‎09-01-2006

Hi Sergey!

Here's the code:

arp.info.bitPerSample = 16;
arp.info.channels = 1;
arp.info.sample_frequency = 8000;
arp.info.stream_type = UMC::PCM_AUDIO;
arp.pModuleContext = &HWNDContext;
ar.Init(&arp);

I tried a lot of BUF_SIZE and BUF_NUM_MAX value but mostly I useBUF_SIZE 4096 and BUF_NUM_MAX 4.

Greetings,

Bendeguy

bendeguy · ‎09-01-2006

Hi Sergey!

Can I measure the renderer's processing time for one frame( 4096 byte)? I tried to measure thewaiting time ofWaitForSingleObject(m_sm_free_buffers, INFINITE); calls in sendframe. I used BUF_NUM 2 and BUF_SIZE 4096 so it have to wait one frame's processing time at almost every call. 4096 byte's playing time is 256 msec but instead of this I seen that it is almost always 246-248 secs, very rarely 257-259 and sometimes 1xx. I think this is strange. Don't you think?

Bendeguy

Sergey_O_Intel1 · ‎09-04-2006

I suppose that you feed the renderer with 4000 bytes (256msec) not 4096 and the fuffer is not sent until it gets 4096 bytes. Try to use 4x4000 buffers or even better 8x2000.

-Sergey

bendeguy · ‎09-04-2006

Hi!

I fed the renderer with 4096 bytes. If I feed the renderer with 4000 bytes when it expects 4096 bytes then the audio should come after the video not before.
I made a file from the audio coming from a camera. It's length is 5,471,520 bytes which means 683,94 secs. I played it with my decoder renderer and the playtime was 653,94. I decoded all bytes( between LockInputBuffer and UnlockInputBufer). After this I made an uncompressed pcm file from this and I played it with mplayer. The playtime was 674. I think it means that the problem isn't with my program because I experienced faster play too when I used another(mplyaer) program to play.

Here's the code of my decoder:

Ipp32s PCMDecoderRenderer::getFrameAndPlay(UMC::AudioData *p_dest) {

if (audio_in.GetDataSize() + size_prevdata >= AD_PCM_FRAME_SIZE) {

while (ar.LockInputBuffer(p_dest) != UMC::UMC_OK) {}

if (size_prevdata > 0) {

ippsMuLawToLin_8u16s(p_prevdata, (Ipp16s*)p_dest->GetDataPointer(), size_prevdata);

ippsMuLawToLin_8u16s((Ipp8u*)audio_in.GetDataPointer(), (Ipp16s*)p_dest->GetDataPointer() + size_prevdata, AD_PCM_FRAME_SIZE - size_prevdata);

ippsMulC_16s_I(4, (Ipp16s*)p_dest->GetDataPointer(), AD_PCM_FRAME_SIZE);

audio_in.MoveDataPointer(AD_PCM_FRAME_SIZE - size_prevdata);
size_prevdata = 0;

}
else {

ippsMuLawToLin_8u16s((Ipp8u*)audio_in.GetDataPointer(), (Ipp16s*)p_dest->GetDataPointer(), AD_PCM_FRAME_SIZE);

ippsMulC_16s_I(4, (Ipp16s*)p_dest->GetDataPointer(), AD_PCM_FRAME_SIZE);

audio_in.MoveDataPointer(AD_PCM_FRAME_SIZE);

}

p_dest->SetDataSize(AD_PCM_DECODEDFRAME_SIZE);

end += time_incr;

p_dest->SetTime(start, end);

start = end;

ar.UnLockInputBuffer(p_dest);

return AD_PCM_DECODEDFRAME_SIZE;
} else if (audio_in.GetDataSize() == 0) {

return 0;

} else {

int size = audio_in.GetDataSize();

ippsCopy_8u((Ipp8u*)audio_in.GetDataPointer(), p_prevdata, size);
size_prevdata = size;

p_dest->SetDataSize(0);

return -1;

}

AD_PCM_FRAME_SIZE = 2048 and AD_PCM_DECODEDFRAME_SIZE = 4096

Please look at this and tell me if something isn't correct in it. Thanks in advance,

Bendeguy

Sergey_O_Intel1 · ‎09-05-2006

Hi!

I still didnt understand your algorithm entirely.

If you know that you get MuLaw audio by 4096 bytes (512ms) why do you use p_prevdata?

What is audio_in? Linear buffer? If you MoveDataPointer forward where do you move it back?

You call SetTime.Why?

Your task is to send all data to the renderer. Its not your task to cut data or synchronize it somehow. I would write something like this:

int lastFrame = 0; // should be set to 1 when I send the last portion

while (!lastFrame) {

if (audio_in.GetDataSize()) { // we got 4096 bytes

while(ar.LockInputBuffer(p_dest) != UMC::UMC_OK) {}

ippsMuLawToLin_8u16s(audio_in.GetDataPointer(), p_dest->GetDataPointer(), 4096);

p_dest->SetDataSize(4096*sizeof(short)); // how much data we send

if (!lastFrame) {

ar.UnlockInputBuffer(p_dest);

} else {

ar.UnlockInputBuffer(p_dest, UMC::UMC_END_OF_STREAM); // to send all the data to the device

}

audio_in.SetDataSize(0);

}

Then we should hear all the data we send to the renderer via audio_in.

-Sergey

bendeguy · ‎09-05-2006

I don't get the audio in packet of 4096 bytes. It has various size for example 5750, 5520, 6000 and maybe some other size. When an audio packet arrives I set
status = audio_in.SetBufferPointer(p_source, length);
and after this I decode it with my algorithm.

do {
size_data = dr->getFrameAndPlay(&out_audio);
} while ( size_data > 0);
audio_in is UMC::AudioData. I cut the data into 4096 bytes(this is the decoded size) because it's various size. This is why I have to use prev_data.

Greetings,

Bendeguy

Sergey_O_Intel1 · ‎09-05-2006

How do you get p_source? What happens when you have ptr looking at e.g.5750 bytes and you send only 2048 of them to the renderer. Do you keep data in some buffer and append new portion to it each time? Or you created AudioData object of definite size which will be enough in all situations?

Anyway there's no need to cut input data. The renderer will do it itself.

-Sergey

bendeguy · ‎09-05-2006

When I get a packet, I copy it to my buffer(it is large enough for all size of packets which can come). This packet contains audio and video data. The p_source is the audio data pointer. Idecode only 2048 byte in one step in my algorithm but the audiothread calls the decoder until it returns greater than 0. For example if I have 5750 byte ofd audio then the first call will return 4096 (decoded bytes) and decrease 5750 to 3702. Then I call again getFrameAndPlay which returns 4096 and decrease 3702 to 1654. The next call will save the data for the next packet and returns 0. When a new packet arrives with size 6000 for example then next getFrameAndPlay copies the saved bytes to the new decoded packet and decode 2048-1654= 394 more bytes. Returns with 4096 and decrease 6000 to 5606 and so on. I know that this algorithm will not play the last portion(the last audio packet's last bytes) of data(pointed by p_prev_data) but it isn't releavant in my problem.

Greetings,

Bendeguy

bendeguy · ‎09-06-2006

Hi Sergey!

It seems the buffer size is a special number. Where does it come from?

BUF_SIZE 1152*16

You mentioned that, you know people who made real time stream playing with winmm. Which operation system and processor did they use? Thanks in advance,

Greetings,

Bendeguy

bendeguy · ‎09-08-2006

At last I've solved the problem. I was right the audio really played faster than it have to be. I tried it with mediaplayer, winamp (just play a longer for example 10minutes wav file) and I measured the time with a clock. I experienced about 10s difference between the real playtime and the desired play time. I tried on a few another computer this and I found that some computer plays the audio well and some plays faster. I've read a lot of article, forum about thigs similar to my problem. At last I installed a windows xp update called C-Media, AC97 Audio device(04-07-2006) which solved the problem. I think it means that the earlier audio driver wasn't correct but it seems to me it isn't an individual problem because I experinced it on some other compuers too. So I advice to everybody who( wants to develop a real time stream player ) to test the audio playing correctness.

I think the problem source is that the older driver use the real time clock(RTC) for timing. On my windows XP the clock reolution is 15,625ms which is 16 * 0,977. 0,977 is the smallest time interval I can set. To tell the truth I say 1ms to the windows but it means 0,977 to it. So I think the older driver uses this and they think that the 0,977ms is 1ms. So when it plays a 4096 byte(uncompressedd PCM 8000khz) audio which playlength must be 4096/(8*sizeof(short))= 256 ms then it plays it only for 0,977 * 256 ~250. I measured the times between the WOM_DOWN(which called when the waveout device finish playing of an audio packet) and it shows that a packet of 4096 bytes playtime was ~250ms which fits in to my theory.

Thanks for the lot of valuable advice, answers.

Greetings,

Bendeguy