- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
But I had none or insignificant speed increase when I used double precision floating point arthematic on both 32 bit and 64 bit. I was hoping that , also read on intel website that I would have signifcant performance boost using 64 bit computing power on my double precision and quad precision floating point arthematic. Am I missing some thing here, some lack of proper coding technique for 64 bit architecture or some other information ? Also it mite be that some of my compiler options need to be changed , I am using the default maximum speed (/02) option rite now.
Thanks all, Nittin
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks, Nittin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There is not much to do special other than be careful when calling OS and library routines that expect address-sized integer arguments.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A little off topic question I have , a given below 'do loop' takes 2 sec to compute if I dont use open mp directives.
But using open-mp as shown below takes 60 sec! any idea why such a regression in performance ? I have a core 2 duo processor.
i,s,m , j_val , b , track(:,:) are integers.
Perm_mat is a double precision , big dense 25 mb matrix
and h,h1,oneby6 are double precision constants
C(s:m,4) is double precision
Its basically getting the 4 coefficents of a spline polynomial- interpolation
!-----------------------------------------
!$OMP PARALLEL DO PRIVATE(I) DEFAULT(SHARED)
do i=s,m
j_val = track(j_imd(i,body),4) + b
C(i,1) = Perm_Mat(j_val,1)
C(i,2) = ( Perm_Mat((j_val+1),1) - Perm_Mat(j_val,1) )*h1 - h*oneby6*( & 2.00d0*Perm_Mat((j_val),2) + Perm_Mat((j_val+1),2) )
C(i,3) = Perm_Mat(j_val,2)*oneby2
C(i,4) = ( Perm_Mat((j_val+1),2) - Perm_Mat((j_val),2) )*h1*oneby6
end do
!$OMP END PARALLEL DO
!-------------------- -----------------------------------------------------
Thanks Steve and all
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I guess I have to re-write my code. I tried follwoing the tips here on a small part of my code and guess what I had performacne boost of 43% thats impressive (for that part). Enough, to motivate me to change my whole code.
Thanks tim , and all for the great help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
One other suggestion to improve performance
I noticed on my Q6600 (has SSE3 but not SSE4.1)
If I compile optimized for SSE4.1 and requires SSE4.1 (as opposed to optimized for SSE3 and requires SSE3) that the compiler produced better SSE3 code (15% faster for my applicaiton) and out of 750 source files, only 1 produced code using SSE4.1 using one instruction (and would fail to run with illegal instruction). For that source file only, I compile with optimized for SSE3 and requires SSE3. The rest (except for the main) are compiled with SSE4.1. The main has to be compiled with what is availableon your platform as there is a sanity check in the IVF startup code.
As a note to any Intel people reading this I recommend adding and option like /warn_sse4 that emits a warningwhen SSE4.1 instructions are emitted. This way other users can take advantage of using the better SSE3 optimizations performed with the SSE4.1 enabled.
An alternative,an option used in conjunction with "optimized for SSE3 and requires SSE3" that evokes the following: Compile using SSE4.1 optimizations, but enable warn if SSE4.1 instructions emitted, if enter warn on SSE4.1 detection, abort compilation and restart using older SSE3 only optimizations. Compilation would only be longer when SSE4.1 instructions actually used. In my case this was 1 file out of 750 files would have compiled twice. A good tradeoff against the 15% performance boost for SSE3 instruction sequences.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't think we can make useful generalizations from your most recent comment. I have noticed that certain optimizations, which were reserved for -QxS until recently, are becoming available with -QxP in the latest compiler updates. Until recently, -QxP, in principle, optimized for the Prescott CPU of 4 years ago, and didn't take advantage of all of the more recent developments in the compiler. None of this necessarily works any one way in 100% of examples.
To be specific, where the compiler vectorizes unaligned memory accesses, it sometimes uses full cache line unrolling, using scalar loads across the cache line boundary, so as to avoid straddling cache lines with movups. In my examples, all cases which do this under -QxT or -QxS do it with the latest compilers under -QxP as well. The extra code expansion could sometimes aggravate instruction cache or tlb misses. That problem might occur more often on certain early models in the Prescott series.
Several of my colleagues on the application side have agreed in requesting that older options should perform well on the current CPUs, while maintaining the documented instruction set compatibility with the older CPUs. That appears to fit with your desire.
By a year from now, many more CPUs will be on the market which perform better with unaligned movups, so the trend toward full cache line unrolling will have to reverse when optimizing for those CPUs. It is already possible that the same optimizations don't work consistently for Core 2 Duo and Penryn, or for desktop and laptop CPUs.
Proliferation of compiler switches is not entirely productive.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page