- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've read the description of this optimization option several times, but for the life of me can't make any sense out of the explanation for what it does. The program with this option set to No runs about as fast as another one using similar code where it's set to Yes, so there doesn't seem to be much or any of a speed hit by setting it to No. Am I likely to run into related problems with the optimizer, and is there some real disadvantage to setting Omit Frame Pointers to No?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
By default 'Omit Frame Pointers' is NO (love that double negative) as in Enable Frame Pointers is yes. Together with incremental linking off, Frame Pointers are required to Traceback. If the Frame Pointers are off, the compiler has an additional register to work with and this may boost computational speed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you post your 20 lines? Include subroutine header and declared variables.
Make sure you are not running one configurationwith Run Time checks enabled, or array bounds enabled or one with optimizations off. A run time ratio of 20:1 is indicative of these differences.
By omitting frame pointer your stack allocation is reduced by 4 bytes over that when you include frame pointer. This may be a case of alignment of local variables as well as other issues that may affect the processor caches.
Also, other options may be affecting your code generation. I would suggest the following:
Create a configuration based on the "Debug" configuration (I use "DebugFast"). Then set optimizations to Max Speed and enable other optimizations to your "Release" settings.
a) Compile one way and set break point at start of 20 line code section. On break open disassembly code window. You do not have to understand assembly code to do this. Then select the section of code in the disassembly window that represents the 20 lines of your source code. This may be 100's of lines.Copy to clipboard (Ctrl-C). Open a new text file window and paste the appropriate text in the window. Save the window (Save As using appropriate name "With FP" or "Without FP")
b) Compilethe otherway and set break point at start of 20 line code section. Select, copy and paste to an additional new text window, Save with appropriate other name.
Now with both windows available select a Side-by-side pane view and examine for code differences. Ignore the hex address differences.
Now then,
1) If the number of statements are the same then this is indicative that the performance issue is due to a memory alignment of local variables .OR. the alignment of the working data arrays are now unfavorable for processor cache access. Consider using directives to force alignment of local variables that you identify as sensitive.
!dec$ attributes align : 16 :: TOSVX1
2) If the number of statements are slightly different due to the extra register being available in the "Without FP" then the problem is still likely the same as 1) above.
3) If the number of statements vary significantly then suspect different compiler options are in effect. Note, default options may be different with and without "Omit Frame Pointers". You may have to experiment with options to get what you want.
Also, remember that
If you are unable to resolve the performance issue using the above hints then consider using VTune (or other code profiler) to find the hot spot. This may lend a little more insight to the problem.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here's the code:
SUBROUTINE XX1 (A,D,NROW,IP,LD2)
IMPLICIT NONE
INTEGER (KIND = 4) :: NROW,LD2
INTEGER (KIND = 4) :: IP(NROW)
COMPLEX (KIND = 8) :: A(NROW, *),D(LD2)
INTEGER (KIND = 4) :: I,J,J1,J2,IXJ,PJ,JP1,NROW,R,R1,R2
COMPLEX (KIND = 8) :: AJR
. . .
DO R = R1, R2
. . .
IXJ=0
c ------- Problem block -------
DO J = J1, J2
IXJ=IXJ+1
PJ=IP(J)
AJR=D(PJ)
A(J,R)=AJR
D(PJ)=D(J)
JP1=J+1
DO I = JP1, NROW
D(I)=D(I)-A(I,IXJ)*AJR
END DO
END DO
c -----------------------------
. . .
END DO
. . .
The assembly code generated as you suggested, with Omit Frame Pointers = Yes and No seems identical for the two cases (from DO J = J1, J2 through the second END DO), with one exception. At the very beginning of the block, right after DO J = J1, J2, the "Yes" option has a single additional instruction:
00492467 mov eax,dword ptr [ebp-0Ch]
Is this significant, or should I proceed as though the two are identical?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
So where should I begin looking for the "something else"?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Frame Pointers can be set in IVF and VC++ and they need not be the same. Indeed, my IVF 9.1 on .NET 2003 defaults to NO and YES, respectively. As the documentation cautions they had better both be set to NO for Traceback to deliver anything meaningful. How does different settings affect EBD useage or not?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The mov would not be significant.
If the tests were run with the same assembly code, i.e. you capture the dissassembly window and run the test in the debugger for both configurations, and if the run times are that much different then there could be an alignment issue. (Eliminate the possibiltiy of what you see in the debugger is NOT what ran the tests.)
If A(NROW, *), and D(LD2)are not allocatable then try using the
cDEC$ ATTRIBUTES ALIGN:16 :: A
in the appropriate module that declares the variable. Substituting the actual array name for the dummy argument name.
Also in XX1force alignment on AJR
cDEC$ ATTRIBUTES ALIGN:16 :: AJR
COMPLEX (KIND = 8) :: AJR
And, prior to doing this you can use the debugger Memory window to convert the symbolic name to an hex address. Examine where A(1,1) and D(1) are located.
An alignment with the hex address ending in 0 for A(1,1) and D(1) would yield the best performance. Your loop is small. If full optimizations are onthen AJR is likely registerized (it is complex and would occupy 2 registers).
The alignment issue alone would not account for a 10x difference in performance. What could though is cache collisions.
Therefore it is likelyD(I) and A(I,IXJ) are unfortunately aligned in a manner that causes cache collisions in one of the configurations.
For diagnostics you can opbtain the location of 1st elements in the inner DO loop
write(*,*) LOC(D(JP1)), LOC(A(JP1,IXJ))
DO I = JP1, NROW
Check out what you get with and without the Frame Pointer
Good luck hunting
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I haven't run any tests with other than optimization for maximum speed (basically, the default Release settings). Under those conditions, the problem occurs only when:
-- I've compiled with the default real size of 8 bytes, AND
-- No debug information is being created, AND
-- Omit Frame Pointers = Yes
It's completely repeatable.
Because the problem disappears when I specify that debug information be written, I've been able to troubleshoot it only by having the program write diagnostic information to a file.
The problem is occurring at the following Windows API call:
lngResult = ReadFile(hFileHandle,lngLoc1,lngBytesToProc,
& lngLocBytesProc,NULL_OVERLAPPED)
lngResult, hFileHandle, lngLoc1, lngBytesToProc, and lngLocBytesProc are all INTEGER(KIND = 4). lngLoc1 = LOC(A(1)) where A is an ALLOCATABLE array of type COMPLEX(KIND = 8) (when compiled for default 8 byte real type) of dimension about 1,000,000, and lngBytesProc = LOC(lngBytesProc) where lngBytesProc is an INTEGER (KIND = 4) variable.
The file being read is about 200 MB in size. lngBytesToProc is 8477344. (The A array is filled with several reads from different parts of the file.)
In normal operation, the function call returns a value of 1 (success) for lngResult, and variable lngBytesProc contains 8477344, indicating that the requested number of bytes have been read. The file pointer is at 0 before the call and at 8477344 afterward. The values in the first 529834 (8477344/16) fields of A are those from the file.
When the problem conditions exist, the function call still returns 1 (success), but lngBytesProc contains zero, indicating that no bytes were in fact read. Array A is unchanged by the read operation, confirming that no bytes were read, or at least not put into the array. And the file pointer is at still at zero after the function call. I've confirmed that the values being sent to the function call are exactly the same when the problem does and doesn't occur, and that the content of the file being read is identical.
I can't find anything wrong with my code which, incidentally, has been working fine for several years compiled with CVF 6.6. So it looks to me like a bug in the IVF 9.1 compiler. For now, I'm going to just leave Omit Frame Pointers = No for all my IVF compilations.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I can repeatably cause failure of the ReadFile call by changing nothing but Omit File Pointers; all other compiler options are the same for successful and failure cases.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What happens when you compile everything with Omit Frame Pointers except for the modules that perform the ReadFile? Compiling this way will likely fix the problem and provide for the performance "tweak" of having an extra register available. If this does not fix the problemthen the problem is deeper than ReadFile and the symptoms just coincidentally happened to show up in ReadFile.
Jim Dempsey

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page