Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++

Mp3 decoder too slow on NiosII

Altera_Forum
명예로운 기여자 II
4,232 조회수

Hi,all 

 

I'm now designing a mp3 player for DE2 board, which use libmad for mp3 decoding. But currently, the speed is too slow for real time playing. The core is NiosII fastest core,i cache=8kb, d cache=16kb, cpu clk is set to 100MHz. It still require 380s to decode a 274s mp3 file. Can anyone tell me, is it reasonable? 

 

I have also tried to add custom instruction to replace the macro "mad_f_mul" in libmad, but the decoding speed remain the same or even worse. I can't understand it. 

 

The platform I'm using is QuartusII 9.1. 

 

any guidelines, thx!
0 포인트
20 응답
Altera_Forum
명예로운 기여자 II
1,373 조회수

did you tell the gcc that there are new CI used ?

0 포인트
Altera_Forum
명예로운 기여자 II
1,373 조회수

I just use the new CI marco in system.h. The gcc will not compile the new CI automatically? But it still give the correct results.

0 포인트
Altera_Forum
명예로운 기여자 II
1,373 조회수

Have you enabled -O2 or -O3 ? 

 

If you rip out all calls into libc, can you fit the whole code into internal memory? 

(It might be that the resident set fits in the cache though.)
0 포인트
Altera_Forum
명예로운 기여자 II
1,373 조회수

No, I get the result with optimization off. Does that make so much difference? 

I tired to enable -O3 once, but the program didn't work correctly, it seems the program stopped at some place and a lot of codes can't be executed, so I change it back to no optimization. 

 

I don't quite understand the meaning of "If you rip out all calls into libc, can you fit the whole code into internal memory? (It might be that the resident set fits in the cache though.)". I use the 512K sram as memory and the program size is 167K.
0 포인트
Altera_Forum
명예로운 기여자 II
1,373 조회수

Definitely use -O3 - the speed-up is huge. Also, create onchip RAM and find the most often called functions and move them to this RAM until it's full. 

 

Bill
0 포인트
Altera_Forum
명예로운 기여자 II
1,373 조회수

Thank you. I shall try -O3 or -O2 again.  

I don't know how to create onchip RAM and move any function to it. I just know how to add the onchip RAM componet to the system using SOPC, what else should I do?
0 포인트
Altera_Forum
명예로운 기여자 II
1,373 조회수

 

--- Quote Start ---  

Thank you. I shall try -O3 or -O2 again.  

I don't know how to create onchip RAM and move any function to it. I just know how to add the onchip RAM componet to the system using SOPC, what else should I do? 

--- Quote End ---  

 

 

Use this before functions going into onchip ram: 

 

void function(void) __attribute__((section(".onchip_mem"))); void function(void) { } Of course onchip_mem has to match the name used in SOPC. 

 

Bill
0 포인트
Altera_Forum
명예로운 기여자 II
1,373 조회수

 

--- Quote Start ---  

cpu clk is set to 100Hz.  

--- Quote End ---  

 

I imagine cpu is set to 100MHz. If you are really using 100Hz cpu clk that might be the problem.
0 포인트
Altera_Forum
명예로운 기여자 II
1,373 조회수

You are right, the cpu is set to 100MHz.

0 포인트
Altera_Forum
명예로운 기여자 II
1,373 조회수

FYI here is a MP3 player that handles the audio in real time that might be useful: http://www.nioswiki.com/index.php?title=nios2embeddedevaluationkit/mp3_player&highlight=mp3

0 포인트
Altera_Forum
명예로운 기여자 II
1,373 조회수

Thanks to you all. I use the -O1 and it can run realtime now.

0 포인트
Altera_Forum
명예로운 기여자 II
1,373 조회수

I recommend tuning the system clock back to find out how much slack you have. For example I wouldn't rely on it being 'realtime' if it fails to keep up if you drop the frequency 10%. Also -O2 should give you better performance.

0 포인트
Altera_Forum
명예로운 기여자 II
1,373 조회수

I have tried -O2 and -O3, but the same problem occurs which is the program don't enter the decoding progress. I use an external interrupt to set a begin decoding flag and poll this flag in the main function. When using -O2 or -O3, the beginning flag is set correctly, but the polling didn't work. If I set a breakpoint there, the program shall never go to that part of the main function. I don't know what does optimization really do under different level. Can you give some idea about it?

0 포인트
Altera_Forum
명예로운 기여자 II
1,373 조회수

It might be how the polling is implemented. Are you familiar with the keyword 'volatile'? http://en.wikipedia.org/wiki/volatile_variable Variables that should be volatile are one of the many things that you can look out for once you start increasing the optimization level. When you declare a variable volatile you are basically telling the compiler that the value can change at any time without the CPU being involved (a key characteristic of a register in a slave port). If that doesn't help maybe you can copy the code you think is the culprit into this post and one of us can figure out why the optimization level is causing problems. *Usually* these problems are caused by the application code and corner cases that the developer hasn't thought of, I say usually since -O3 can sometimes bring in some surprises :)

0 포인트
Altera_Forum
명예로운 기여자 II
1,373 조회수

You are right. Declaring the variables 'volatile' solved the problem immediately. The decoding time for 274s mp3 decreased to 127s under -O3. Thanks a lot.

0 포인트
Altera_Forum
명예로운 기여자 II
1,373 조회수

@ BadOmen 

correct changing from -O2 to -O3 might introduce hardcore bugs with -O3 we had to add -fno-rename-registers to get rid of some functional differencies between DEBUG and RELEASE version
0 포인트
Altera_Forum
명예로운 기여자 II
1,373 조회수

I am also doing this, 

Can share the code?
0 포인트
Altera_Forum
명예로운 기여자 II
1,373 조회수

Yes, but the final program include not only the mp3 decoding. You may need to find the part you need. And the program is not with me right now. I'm not sure I still have it. How can I share the code to you?

0 포인트
Altera_Forum
명예로운 기여자 II
1,373 조회수

i am glad to hear from you. 

 

0 포인트
Altera_Forum
명예로운 기여자 II
1,311 조회수

bravefjz (http://www.alteraforum.com/forum/member.php?u=34987), 

Please advice did you able to find it? 

If not can you explain how to do it? 

Thanks You
0 포인트
응답