- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am trying to identify Intel/11.1 flag(s) that are most suitable to maximize the NAMD's performance (an object-oriented molecular dynamics code, version 2.1b1) on Nehalem (Intel Xeon x7560) processors.
Using some flags that I thought might serve the purpose fails.
Such as: -fast, -mtune=SS4.2
I appreciate your comments on choosing the flags for Nehalem processor with Intel-11.1.
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I don't know that our colleagues who occasionally work on this code would read this forum, nor would they likely be familiar with such an old version. I thought it was a C++ code, as your choice of buzzwords might imply. Why don't you start from options suggested on their site?
http://ftp.ks.uiuc.edu/Research/namd/wiki/index.cgi?NamdAtNCSA
except that -xT should be replaced by almost any other architecture option suitable for Nehalem. -fast implies -xhost, which implies -xSSE4.2 if compiled on a Nehalem. icpc probably won't recognize -mtune, but should recognize -msse4.2.
Use of the affinity facilities of your MPI will be quite important, and your MPI may not automatically recognize the topology of Nehalem-EX. It's particularly strange when HyperThread is enabled, when you likely will get best performance by assigning 1 rank per core, and contiguous ranks to a single CPU.
http://ftp.ks.uiuc.edu/Research/namd/wiki/index.cgi?NamdAtNCSA
except that -xT should be replaced by almost any other architecture option suitable for Nehalem. -fast implies -xhost, which implies -xSSE4.2 if compiled on a Nehalem. icpc probably won't recognize -mtune, but should recognize -msse4.2.
Use of the affinity facilities of your MPI will be quite important, and your MPI may not automatically recognize the topology of Nehalem-EX. It's particularly strange when HyperThread is enabled, when you likely will get best performance by assigning 1 rank per core, and contiguous ranks to a single CPU.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page