- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If your two codes use a lot of memory and have different memory access patterns, (eg if the second involves a lot of temporary array copies), you might be sensitive to differences between the two systems - the Linux one might have less memory and be paging more, and it has a much lower memory bandwidth than the Windows system.
You are using different compilers on the two systems. I would compare carefully the vectorization reports between the two, especially for B, and see if any important loops are vectorized in one case but not the other. (-vec-report2 or /Qvec-report2). You might also turn on the report for higher level loop optimizations with -opt-report-phase hlo (/Qopt-report-phase hlo) and compare. Perhaps the intrinsic or Fortran90 array notation is not being optimized as effectivelywith the older compiler.
This is a big factor, though...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Does the program have an initialization phase where it is reading in data and/or performing a large number of allocations?
Does the program have a termination phase where it is writing out data and/or performing a large number of deallocations?
If so, or unknown, try inserting timer code were you obtain the time _after_ initialization, and then again _before_ termination phase. IOW time just the computational part of the application.
10-20 times slower for B relative to A on the same machine (but different platform) cannot be accounted for by platform UNLESS one system experiences better cache hitsfor B relative to A than the other system.
E5530 4 core w/HT L3 1x8MB, L2 4x256KB
E54104 core wo/HT L2 2x6MB
The caches are significantly different. If you can eliminate the time to initialize and shut down the program and still see the A/Brelative performance difference then it might be good to run VTune or other profiler capable of looking at cache hit/miss data.
Jim Dempsey
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page