Nios® V/II Embedded Design Suite (EDS)
Support for Embedded Development Tools, Processors (SoCs and Nios® V/II processor), Embedded Development Suites (EDSs), Boot and Configuration, Operating Systems, C and C++
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.
12748 Discussions

gcc optimisation for movsi_internal

Altera_Forum
Honored Contributor II
2,669 Views

The gcc sources (9.0 b141) seem to be missing the pattern that would load a 32bit values whose low bits are all zero into a register (ie a single movhi instruction), the patterns for orhi and andhi are present. 

 

Adding 'K' as below (about line 290 of nios2.md

 

(define_insn "movsi_internal" "(register_operand (operands, SImode) || reg_or_0_operand (operands, SImode))" "@ stw%o0\\t%z1, %0 ldw%o1\\t%0, %1 mov\\t%0, %z1 movi\\t%0, %1 movui\\t%0, %1 movhi\\t%0, %U1 addi\\t%0, gp, %%gprel(%1) movhi\\t%0, %H1\;addi\\t%0, %0, %L1" )  

 

seems to have the desired effect. 

I can't see any reason why this wasn't done - except, perhaps, oversight. 

 

What is the best way of feeding these sort of changes back?
0 Kudos
7 Replies
Altera_Forum
Honored Contributor II
920 Views

Maybe somebody from Altera will read this here or much better get in contact with Altera via MySupport 

 

altera mysupport login (https://www.altera.com/myaltera/mal-index.jsp)
0 Kudos
Altera_Forum
Honored Contributor II
920 Views

Hi dsl, 

 

I like the fact that you are doing something about the compiler. It's lack of optimization astounds me. 

 

Have you done anything for load/stores for address whose top 16-bits are zero? I want to avoid the unnecessary movhi instn in this case and only use just a movi. Do you see a problem with this? 

 

I've never had to look inside gcc and the .md files, etc, but I think I can manage this change and I would like to consider a few others. 

 

Thanks, Peter
0 Kudos
Altera_Forum
Honored Contributor II
920 Views

That should happen provided the address is a compile-time constant (ie not the address of a relocatable). 

 

Check what gets generated for *((unsigned char *)0x1000).
0 Kudos
Altera_Forum
Honored Contributor II
920 Views

So there's no way to optimize out the unnecessary movhi after relocation? 

 

I don't know enough about the compile/link/etc chain yet to know why this would not be possible.
0 Kudos
Altera_Forum
Honored Contributor II
920 Views

Not really, since a every relative branch that crosses the deleted instruction would have to be modified. These are generated as absolute values by the assembler. 

Similarly the jump tables used for switch statements would also need hacking. 

 

One possibility is to use a global register variable to point to the C structure that will be mapped to the low address. The compiler will then generate a single instruction for each access. 

 

Another option is to get the data within +-32k of _gp so that gp relative addressing can be used. 

For a system with only M9K memory it is not unreasonable to get all data items addressable that way. 

struct foo foo __attribute__((section(".sdata"))) = { ... }; 

will make the compiler use gp relative addressing for foo.
0 Kudos
Altera_Forum
Honored Contributor II
920 Views

So what part of the compiler/linker looks at the section type? Depending on the section type, different assembly is emitted, correct? 

 

Would it be possible to create a new section type, whose meaning is that anything in this section can be guaranteed to be within 64k of 0x0? Then I could emit assembly that reference data from this section to not use the movhi. Is it possible to restrict section types to specific regions in a RAM?(
0 Kudos
Altera_Forum
Honored Contributor II
920 Views

Hmmm.... 

You might manage to add sections to the list of 'small' sections in nios2.c for %r0 relative addressing. And then output nnn(r0) instead of the %gprel(nnn)(gp) for such symbols. 

Might be possible to use one of the MACH_DEP symbol flags to qualify SMALL_DATA.
0 Kudos
Reply