- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
I experienced a problem with optimized code, adding two 64 bit (alt_64) variables: Sometimes the upper word of the result gets an extra increment or decrement by 1 although no overflow (should have) occured when adding the lower words. The following are actual (negative) input numbers and the expected sum: 0xFFFFFFFFD75C5BDE + 0xFFFFFFFFFFFFF242 = 0xFFFFFFFFD75C4E20 But what I get is a zero upper word and therefore a result that is off +2^32: = 0x00000000D75C4E20 In other cases I got an upper word of 0xFFFFFFFE when I expected 0xFFFFFFFF, and AFAIR even 0x00000001 when I expected zero. Actually it happens in a control loop where the small value as above is added to a large sum, positive or negative in a rather random order. The problem occurs with interrupts disabled, and there is no other hardware/logic/coprocessor that could cause data corruption during the computation. The core is a NIOS II/s with nothing else, running from onchip memory. With optimization off (-O0), the sum is always computed correctly. But the code in question is time critical, so that is not an option. I'm still trying to write an example to reproduce the problem outside my specific application (but failed so far) and maybe I should try to reproduce it in ISS, but first I'd like to ask here if there is a known problem with optimizations of 64 bit arithmetics? And a known workaround maybe? KoljaLink Copied
4 Replies
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In my experience, this does work. It would be helpful if you posted a snippet of code, as these types of problems are almost always type/typecast sort of issues.
Cheers, - slacker- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
In the snippet below, I get the errors if alt_64 is used for t2, but no errors if alt_32 is used or optimization is turned off.
For explanation: The problematic original code simply does VELOCITY+=ACCELERATION. The code below is a temporary workaround. The ACCELERATION is always small enough (positive or negative, less than 0x1000 and controlled externally) so that VELOCITY never should exceed +- (2^31)-1, therefore the manual computation of VELOCITY_HI is valid. Actually, when the breakpoint at the end is hit, it is t1 (held in registers) where the upper word is wrong, not VELOCITY (in DPRAM). You might be tempted to comment on the "DPRAM" definition; yes, data is in a dual-ported RAM, and another CPU accesses the same area while this code runs in a loop. The other CPU regularly writes ACCELERATION and only reads VELOCITY. Both CPUs have 32-bit-access to the DPRAM.#define VELOCITY (*(volatile alt_64 *)((void *)DPRAM_BASE+8))# define VELOCITY_LO (*(volatile alt_32 *)((void *)DPRAM_BASE+8))# define VELOCITY_HI (*(volatile alt_32 *)((void *)DPRAM_BASE+12))# define ACCELERATION (*(volatile alt_32 *)((void *)DPRAM_BASE+16))
...
while(1) {
...
alt_64 t2; /* it works if you use alt_32 here! */
alt_64 t1;
...
t2 = ACCELERATION;
t1 = VELOCITY + t2;
VELOCITY_LO += t2;
VELOCITY_HI = (VELOCITY_LO >= 0) ? 0 : -1;
/* Don't continue if error occured: good place for a breakpoint */
while(VELOCITY != t1)
asm volatile("nop");
...
} /* while(1) */
The following is the resulting code for the working version with "alt_32 t2" on the left and the erratic code with "alt_64 t2" on the right (only the part that matches the above snippet). In every other place the resulting binaries are exactly the same. DPRAM_BASE is 0x80180, so - VELOCITY(_LO) is 0x80188 (0x80000+392) and - VELOCITY_HI is 0x80188 (0x80000+396) and - ACCELERATION is 0x80190 (0x80000+400) e4: movhi r4,8 | e4: movhi r6,8
e8: addi r4,r4,400 | e8: addi r6,r6,400
ec: ldw r9,0(r4) | ec: ldw r8,0(r6)
f0: ldw r2,0(r17) | f0: ldw r4,0(r6)
f4: ldw r3,4(r17) | f4: ldw r2,0(r17)
f8: ldw r8,0(r17) | f8: srai r5,r8,31
fc: mov r6,r9 | fc: ldw r3,4(r17)
100: srai r7,r9,31 | 100: ldw r8,0(r17)
104: add r8,r8,r9 | 104: add r6,r2,r4
108: stw r8,0(r17) | 108: movhi r10,8
10c: ldw r9,0(r17) | 10c: addi r10,r10,392
110: add r4,r2,r6 | 110: add r8,r8,r4
114: cmpltu r8,r4,r2 | 114: stw r8,0(r17)
118: cmplt r9,r9,zero | 118: ldw r9,0(r17)
11c: movhi r2,8 | 11c: cmpltu r8,r6,r2
120: addi r2,r2,396 | 120: movhi r2,8
124: sub r9,zero,r9 | 124: addi r2,r2,396
128: movhi r10,8 | 128: cmplt r9,r9,zero
12c: addi r10,r10,392 | 12c: sub r9,zero,r9
130: stw r9,0(r2) 130: stw r9,0(r2)
134: ldw r2,0(r10) 134: ldw r2,0(r10)
138: add r5,r3,r7 | 138: add r7,r3,r5
13c: add r8,r8,r5 | 13c: add r8,r8,r7
140: mov r6,r4 | 140: mov r3,r6
144: mov r7,r8 | 144: mov r4,r8
148: beq r2,r4,248 | 148: beq r2,r6,248 <alt_main+0x248>
14c: mov r3,r10 | 14c: mov r5,r10
150: nop 150: nop
154: ldw r2,0(r3) | 154: ldw r2,0(r5)
158: bne r2,r6,150 | 158: bne r2,r3,150 <alt_main+0x150>
15c: ldw r2,4(r3) | 15c: ldw r2,4(r5)
160: bne r2,r7,150 | 160: bne r2,r4,150 <alt_main+0x150>
Thanks for looking at the problem! Kolja
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I think I found the cause for my problem. Looking just at the code that computes the high word of variable t1, I see only one major difference (register numbers matching the alt_64-t2-version, listed first) The version where the error happens loads ACCELERATION twice from Dual Port memory, first to r8, then to r4. The other version loads it once only and then merely copies the register content around. If the other CPU in my system changed ACCELERATION in DPRAM between these two accesses, it would be actually +0xDBE in R8 and -0xDBE in R4 (or vice versa). The sign extension in R5 matches the value loaded into R8 but the overflow bit at PC=0x11C is computed from the value loaded into R4. Do you agree that this might be causing my problems? Then at least I know the cause, can implement proper workarounds, and do not have to fear about wrong 64 bit results in situations where no other CPU accesses the operands. I didn't expect that gcc would produce code to fetch the same volatile operand twice for a single computation. Erratic alt_64 t2 version ec: ldw r8,0(r6) /* r6 = &ACCELERATION */
f0: ldw r4,0(r6)
f4: ldw r2,0(r17) /* r17 = &VELOCITY */
f8: srai r5,r8,31
fc: ldw r3,4(r17)
104: add r6,r2,r4
11c: cmpltu r8,r6,r2
138: add r7,r3,r5
13c: add r8,r8,r7
144: mov r4,r8 /* => new VELOCITY_HI, sometimes wrong */
Working alt_32 t2 version ec: ldw r9,0(r4) /* r6 = &ACCELERATION */
f0: ldw r2,0(r17) /* r17= &VELOCITY */
f4: ldw r3,4(r17)
fc: mov r6,r9
100: srai r7,r9,31
110: add r4,r2,r6
114: cmpltu r8,r4,r2
138: add r5,r3,r7
13c: add r8,r8,r5
144: mov r7,r8 /* => new VELOCITY_HI, always correct */
Or did I put the "volatile" at the wrong place and should've defined something like #define VELOCITY (*(alt_64 *volatile)((void *)DPRAM_BASE+8))
instead of #define VELOCITY (*(volatile alt_64 *)((void *)DPRAM_BASE+8))
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ah! Moving the "volatile" specifier behind the '*' did the trick!
Kolja
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page