- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi..
Would you please help me understand what is wrong with the below code structure. I spent a couple of months for converting my straight code into modules and subroutines similar to a form as below. Now, the speed of my code has decreased four times. By the way, in the new structure I didnt use 'interface" and the code works fine. The huge issue is right now is the speed.
The new object oriented code structure
module arrayMod real,dimension(:,:) :: theArray end module arrayMod program test use arrayMod implicit none call arraySub write(*,*) (thenewArray) end program test subroutine arraySub use arrayMod write(*,*) 'Inside arraySub()' perform operations end subroutine arraySub
The old straight forward code structure
program test implicit none real,dimension(:,:) :: theArray perform the operations write(*,*) (thenewArray) end program test
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This isn't your real program. Please show the actual code, before and after would help. You have left out the most important part - what is actually taking the time! There is also no object-oriented code here.
I would also recommend that "arraySub" be a procedure in the module rather than external.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This kind of question usually requires more digging, at least to the extent of comparing the results of -qopt-report=4. A factor of 4 would seem to imply missed vectorization or some such thing which should show up, maybe with reasons, in that optional output.
Vectorization Advisor might facilitate identification of such performance regressions.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve Lionel (Intel) wrote:
This isn't your real program. Please show the actual code, before and after would help. You have left out the most important part - what is actually taking the time! There is also no object-oriented code here.
I would also recommend that "arraySub" be a procedure in the module rather than external.
Thanks Steve for such a quick reply.
The real program is more than 3000 lines. I can email it to you without any hesitation. The real code is an explicit finite difference solver of two Partial differential Equations with some preconditioning steps.
By the way, I have learned Fortran by myself and now I am at some point in which I feel that usual internet stuff is not improving my coding skills.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Tim P. wrote:
This kind of question usually requires more digging, at least to the extent of comparing the results of -qopt-report=4. A factor of 4 would seem to imply missed vectorization or some such thing which should show up, maybe with reasons, in that optional output.
Vectorization Advisor might facilitate identification of such performance regressions.
Dear Tim,
I compiled the code with -qopt-report=4 option and I have attached the output here. Is there any instruction manual for understanding the content of this report file.
Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We need to compare the fields before and after which refer to sections of code which are losing time. A convenient way to do this is by running under Intel Parallel Advisor. The beta Advisor supports back to XE2016 fully and is sufficiently useful with 2015.
In the recent compilers, the optrpt will quote compile options, so, for example, we can see the target instruction set and -align: settings.
The report shows some "not vectorized" loops associated with decisions not to interchange loops. It seems ambiguous whether "imperfect loop nest" bears on this. Imperfect means roughly that there are operations inside the outer loop but outside the inner one. So you might look at any places where you changed the code in such a way, and check the suggestions about enabling outer loop vectorization by !$omp simd.
You surely got the compiler tied up in knots at source line 1199. Also note the comments that -O3 would be better than some unspecified more aggressive optimization setting the compiler appears to have picked up. If you have exceeded some compiler internal threshold and caused it to stop optimizing, that could well account for your slowdown. Compiling thousands of source lines in a single compilation unit can easily provoke such problems (and form a test case to see whether a more up to date compiler handles it better).

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page