Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++

Mp3 decoder too slow on NiosII

Altera_Forum
名誉分销商 II
4,242 次查看

Hi,all 

 

I'm now designing a mp3 player for DE2 board, which use libmad for mp3 decoding. But currently, the speed is too slow for real time playing. The core is NiosII fastest core,i cache=8kb, d cache=16kb, cpu clk is set to 100MHz. It still require 380s to decode a 274s mp3 file. Can anyone tell me, is it reasonable? 

 

I have also tried to add custom instruction to replace the macro "mad_f_mul" in libmad, but the decoding speed remain the same or even worse. I can't understand it. 

 

The platform I'm using is QuartusII 9.1. 

 

any guidelines, thx!
0 项奖励
20 回复数
Altera_Forum
名誉分销商 II
1,383 次查看

did you tell the gcc that there are new CI used ?

0 项奖励
Altera_Forum
名誉分销商 II
1,383 次查看

I just use the new CI marco in system.h. The gcc will not compile the new CI automatically? But it still give the correct results.

0 项奖励
Altera_Forum
名誉分销商 II
1,383 次查看

Have you enabled -O2 or -O3 ? 

 

If you rip out all calls into libc, can you fit the whole code into internal memory? 

(It might be that the resident set fits in the cache though.)
0 项奖励
Altera_Forum
名誉分销商 II
1,383 次查看

No, I get the result with optimization off. Does that make so much difference? 

I tired to enable -O3 once, but the program didn't work correctly, it seems the program stopped at some place and a lot of codes can't be executed, so I change it back to no optimization. 

 

I don't quite understand the meaning of "If you rip out all calls into libc, can you fit the whole code into internal memory? (It might be that the resident set fits in the cache though.)". I use the 512K sram as memory and the program size is 167K.
0 项奖励
Altera_Forum
名誉分销商 II
1,383 次查看

Definitely use -O3 - the speed-up is huge. Also, create onchip RAM and find the most often called functions and move them to this RAM until it's full. 

 

Bill
0 项奖励
Altera_Forum
名誉分销商 II
1,383 次查看

Thank you. I shall try -O3 or -O2 again.  

I don't know how to create onchip RAM and move any function to it. I just know how to add the onchip RAM componet to the system using SOPC, what else should I do?
0 项奖励
Altera_Forum
名誉分销商 II
1,383 次查看

 

--- Quote Start ---  

Thank you. I shall try -O3 or -O2 again.  

I don't know how to create onchip RAM and move any function to it. I just know how to add the onchip RAM componet to the system using SOPC, what else should I do? 

--- Quote End ---  

 

 

Use this before functions going into onchip ram: 

 

void function(void) __attribute__((section(".onchip_mem"))); void function(void) { } Of course onchip_mem has to match the name used in SOPC. 

 

Bill
0 项奖励
Altera_Forum
名誉分销商 II
1,383 次查看

 

--- Quote Start ---  

cpu clk is set to 100Hz.  

--- Quote End ---  

 

I imagine cpu is set to 100MHz. If you are really using 100Hz cpu clk that might be the problem.
0 项奖励
Altera_Forum
名誉分销商 II
1,383 次查看

You are right, the cpu is set to 100MHz.

0 项奖励
Altera_Forum
名誉分销商 II
1,383 次查看

FYI here is a MP3 player that handles the audio in real time that might be useful: http://www.nioswiki.com/index.php?title=nios2embeddedevaluationkit/mp3_player&highlight=mp3

0 项奖励
Altera_Forum
名誉分销商 II
1,383 次查看

Thanks to you all. I use the -O1 and it can run realtime now.

0 项奖励
Altera_Forum
名誉分销商 II
1,383 次查看

I recommend tuning the system clock back to find out how much slack you have. For example I wouldn't rely on it being 'realtime' if it fails to keep up if you drop the frequency 10%. Also -O2 should give you better performance.

0 项奖励
Altera_Forum
名誉分销商 II
1,383 次查看

I have tried -O2 and -O3, but the same problem occurs which is the program don't enter the decoding progress. I use an external interrupt to set a begin decoding flag and poll this flag in the main function. When using -O2 or -O3, the beginning flag is set correctly, but the polling didn't work. If I set a breakpoint there, the program shall never go to that part of the main function. I don't know what does optimization really do under different level. Can you give some idea about it?

0 项奖励
Altera_Forum
名誉分销商 II
1,383 次查看

It might be how the polling is implemented. Are you familiar with the keyword 'volatile'? http://en.wikipedia.org/wiki/volatile_variable Variables that should be volatile are one of the many things that you can look out for once you start increasing the optimization level. When you declare a variable volatile you are basically telling the compiler that the value can change at any time without the CPU being involved (a key characteristic of a register in a slave port). If that doesn't help maybe you can copy the code you think is the culprit into this post and one of us can figure out why the optimization level is causing problems. *Usually* these problems are caused by the application code and corner cases that the developer hasn't thought of, I say usually since -O3 can sometimes bring in some surprises :)

0 项奖励
Altera_Forum
名誉分销商 II
1,383 次查看

You are right. Declaring the variables 'volatile' solved the problem immediately. The decoding time for 274s mp3 decreased to 127s under -O3. Thanks a lot.

0 项奖励
Altera_Forum
名誉分销商 II
1,383 次查看

@ BadOmen 

correct changing from -O2 to -O3 might introduce hardcore bugs with -O3 we had to add -fno-rename-registers to get rid of some functional differencies between DEBUG and RELEASE version
0 项奖励
Altera_Forum
名誉分销商 II
1,383 次查看

I am also doing this, 

Can share the code?
0 项奖励
Altera_Forum
名誉分销商 II
1,383 次查看

Yes, but the final program include not only the mp3 decoding. You may need to find the part you need. And the program is not with me right now. I'm not sure I still have it. How can I share the code to you?

0 项奖励
Altera_Forum
名誉分销商 II
1,383 次查看

i am glad to hear from you. 

 

0 项奖励
Altera_Forum
名誉分销商 II
1,321 次查看

bravefjz (http://www.alteraforum.com/forum/member.php?u=34987), 

Please advice did you able to find it? 

If not can you explain how to do it? 

Thanks You
0 项奖励
回复