- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a relaively large routine (3000 Fortan statements). I discovered (with VTune) that the compiler was NOT generating vectorized code. In many places I'm operating with 4-vectors of 4-byte floats. I would have thought that the compiler would be using the SMID instructions that fetch and operate on 4 floats at a time. For example, with this source:
EmitterProjectedCornerPoint(1:4,2) = EmitterPartialProjectedVerts(1:4,2) + CalcPoint(1:4,2)*EmitterMu(1:4)
the compiler produces:
mov eax, -0x10
RENDERING+0x12ba7:movss xmm0, DWORD PTR [eax+027a58960h]
mulss xmm0, DWORD PTR [eax+027a588e0h]
addss xmm0, DWORD PTR [eax+027a58a64h]
movss DWORD PTR [eax+027a58990h], xmm0
add eax, 0x4h
jnz RENDERING+0x12ba7
Interestingly, it doesn't even unroll the loop.
If I takethis typical line of code and put it in a very small routine, the compiler generates the expected SMID instructions that are fetching 4 floats at a time. No loop involved: one move, one mult, and another move. My compiler options are:/nologo /Zi /O3 /QxP /Qparallel /assume:buffered_io /free /module:"Release" /object:"Release" /libs:static /threads /c
In the compiler's defense (as it were) , it issues a message that it has run out of space and I get the following message:
Space exceeded in Data Dependence Test in _MAIN__
Subdivide routine into smaller ones to avoid optimization loss
And . . . if I use /QaxP the out of space message is NOT issued, but the compiler generates code that doesn't even use SMID instructions; the old arithmetic unit instructions are used.
So (finally!) my questions:
1) What 'space' is it that the compiler is running out of? Is there something that I can do/set/indicte?
2) Evidently I don't really understand the difference between /QxP and /QaxP. Shouldn't /Q axP also properly vectorize this code? I'm not getting a message that the compiler has run out of space . . .
Please don't send me to Premier Support. I've been going round and round with them for two week (TWO WEEKS!) and have gotten no where. Has anyone else encountered a difficulty getting code vectorized?
David
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If you have more than one subroutine in the file, but don't need interprocedural optimization, /Qipo- or /Qip- may help. If you do need ipo, there is /QipoN (make N object files rather than 1).
The big hammer, at your own risk, is to set -override_limits
The compiler cuts off optimization for large files in order to avoid danger of getting hung or out of memory.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
David,
While you are waiting for a fix you might try experimenting by creating a user defined type
type Vec4
real(4) :: v(4)
end type Vec4
...
type(Vec4) :: EmitterProjectedCornerPoint(nCorners), EmitterPartialProjectedVerts(nVerts), CalcPoint(nPoints)
type(Vec4) :: EmitterMu
EmitterProjectedCornerPoint(2)%v = EmitterPartialProjectedVerts(2)%v + CalcPoint(2)%v*EmitterMu%v
You might find that the compiler has less to think about when programmed this way
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page