Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Debugging issue

jimdempseyatthecove
Honored Contributor III
275 Views

Intel Visual Fortran Composer XE 2011 Update 10 Integration for Microsoft Visual Studio* 2010, 12.1.3530.2010

This is a general compiler issue, although I noticed this when prepairing the application for use with VTune. I am posting this here as I believe it is a compiler issue and not a VTune issuel.

I wanted to use VTune to assist in locating a threading issue via the "Thread Sharing Events". I am doing this on Release-like configuration (full optimizations with debug information). Debugging is somewhat difficult in this configuration and I was hoping VTune "Thread Sharing Events" might localize a coding error to a small section of code.

In order to use this feature (IOW when the "Thread Sharing Events" button is in the depressed mode), a warning message came up stating the I must Enable Parallel Debug Checks (/debug:parallel). So for all the projects in this configuration I set the enable to Yes. Then rebuilt.

After the rebuild I noticed an entirely new problem showed up. NaN's were now being generated throughout the application. In tracking this down I use WinDbg to isolate memory changes for a location that was one of the earliest places I determined was being trashed with NaN.

In looking at the Disassembly window I found

00000001403A4E77 vmovsd xmm0,qword ptr [rbx+rdx+8]

& pFiniteSolutionSIMD.rBeadIPIF(1,JBB).v &

& + pFiniteSolutionSIMD.rBeadDIPIF(1,JBB).v * pFiniteSolutionSIMD.rDELTIS.v

00000001403A4E7D vmulsd xmm1,xmm0,mmword ptr [rdi+18h]

pFiniteSolutionSIMD.rBeadIPIF(1,JBB).v = &

00000001403A4E83 vaddsd xmm15,xmm1,mmword ptr [rdi+rsi+8]

00000001403A4E89 test byte ptr [rax],2

00000001403A4E8C jne 00000001403A535E

00000001403A4E92 vmovsd qword ptr [rdi+rsi],xmm14

00000001403A4E97 test byte ptr [rax],2

00000001403A4E9A jne 00000001403A5334

00000001403A4EA0 vmovsd qword ptr [rdi+rsi+8],xmm15

pFiniteSolutionSIMD.rBeadIPIF(2,JBB).v = &
00000001403A4EA6 test byte ptr [rax],10h
00000001403A4EA9 jne 00000001403A531C
00000001403A4EAF test byte ptr [rax],10h
00000001403A4EB2 jne 00000001403A52FD
00000001403A4EB8 test byte ptr [rax],10h
00000001403A4EBB jne 00000001403A52E5
00000001403A4EC1 mov rdx,qword ptr [rbp+3C0h]
00000001403A4EC8 vmovsd xmm0,qword ptr [rbx+rdx]
& pFiniteSolutionSIMD.rBeadIPIF(2,JBB).v &
& + pFiniteSolutionSIMD.rBeadDIPIF(2,JBB).v * pFiniteSolutionSIMD.rDELTIS.v
00000001403A4ECD vmulsd xmm1,xmm0,mmword ptr [rdi+10h]
pFiniteSolutionSIMD.rBeadIPIF(2,JBB).v = &
00000001403A4ED3 vaddsd xmm14,xmm1,mmword ptr [rdi+r14]

Of particular interest is the vmulsd on the next to last assembler statement. This is using memory address [rdi+10h].

In looking at the registers, RDI contains 0x30. This number is indicative of an offset/index to a base address.

I thought this strange, so I enabled the Output file to produce the .ASM listing. In the .ASM file I find

;;; pFiniteSolutionSIMD => pTether0.pFiniteSolutionXMM

test BYTE PTR [rax], 16 ;113.5

$LN21:

jne .B1.435 ; Prob 0% ;113.5

$LN22:

; LOE rdi xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13 xmm15

.B1.5:: ; Preds .B1.435 .B1.4

$LN23:

mov r15, QWORD PTR [784+rdi] ;113.5


Note, local pointer pFiniteSolutionSIMD resides in r15
Then later on in the same section of code we find:

$LN525:

test BYTE PTR [rax], 16 ;210.13

$LN526:

jne .B1.382 ; Prob 0% ;210.13

$LN527:

; LOE rax rbx rsi rdi r12 r14 r15 xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13

.B1.156:: ; Preds .B1.154 .B1.382 ; Infreq

$LN528:

test BYTE PTR [rax], 16 ;210.13

$LN529:

jne .B1.381 ; Prob 0% ;210.13

$LN530:

; LOE rax rbx rsi rdi r12 r14 r15 xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13

.B1.158:: ; Preds .B1.156 .B1.381 ; Infreq

$LN531:

test BYTE PTR [rax], 16 ;210.13

$LN532:

jne .B1.380 ; Prob 0% ;210.13

$LN533:

; LOE rax rbx rsi rdi r12 r14 r15 xmm6 xmm7 xmm8 xmm9 xmm10 xmm11 xmm12 xmm13

.B1.159:: ; Preds .B1.380 .B1.158 ; Infreq

$LN534:

mov rdx, QWORD PTR [984+rbp] ;210.13

$LN535:

vmovsd xmm0, QWORD PTR [r14+rdx] ;210.13

$LN536:

vmulsd xmm1, xmm0, QWORD PTR [16+r15] ;212.62

$LN537:

vaddsd xmm14, xmm1, QWORD PTR [r12+rdi] ;210.13

Now note that the Assembler file is using r15, which contains the local pFiniteSolutionSIMD. Which is the correct base address of a structure.

There are several issues with the compiler:

1) The compiler/linker is not using the same object code as would be generated by the assembly code (produced on the same build). (r15 vs rdi)
*** resulting in bad code ***

2) Adding /debug:parallel converted all my packed vectors to scalars. Without /debug:parallel I get the vmovpd, vmulpd and vaddpd. With/debug:parallel the small vectors (2-wide in this case) get seperated into individual scalars.

3) The /debug:parallel is adding flag tests, which is fine, excepting for the fact that the same test occurs three times in a row (test byte ptr [rax], 10h). Granted this may be by design as it will extend a conflict detection window for adverse thread to thread shared variable usage. So this may be a non-issue.

The problem I have with /debug:parallel converting vectors to scalars for use with "Thread Sharing Events" is that I cannot detect sharing events that occure while using 2-wide or 4-wide vectors.

Jim Dempsey

0 Kudos
0 Replies
Reply