Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

OpenMP / memory saturation

Sean_G_
Beginner
687 Views

Hi,

I am working on an application which I'm pretty sure is memory bound.  I tried doing some simple OpenMP, but there was no speedup, which seems to confirm that the kernel is indeed memory bound.

However, if Intel's newer architectures really look like this: http://software.intel.com/sites/default/files/m/d/4/1/d/8/5-3-figure-1.gif shouldn't I be able to try to pin one thread somewhere on the second four cores to get increased memory bandwidth?

It seems like pinning a thread to a core might take some work, so I wanted to see if this makes sense before I tried it.

Thanks

0 Kudos
1 Solution
TimP
Honored Contributor III
687 Views

In most cases, a dual CPU Xeon supports higher total memory bandwidth than a single CPU (typically 40% more).  As you hinted, you would need to set affinity so that the work is distributed evenly across the CPUs.  OMP_PROC_BIND=spread should work with OMP_NUM_THREADS=2, for example.

You may also require attention to your OpenMP optimization to permit the application to engage local memory placement by first touch.  This frequently neglected consideration may be difficult to explain concisely.

You may not get as much advantage from streaming-store optimization when running multiple threads as on a single thread.

View solution in original post

0 Kudos
4 Replies
TimP
Honored Contributor III
688 Views

In most cases, a dual CPU Xeon supports higher total memory bandwidth than a single CPU (typically 40% more).  As you hinted, you would need to set affinity so that the work is distributed evenly across the CPUs.  OMP_PROC_BIND=spread should work with OMP_NUM_THREADS=2, for example.

You may also require attention to your OpenMP optimization to permit the application to engage local memory placement by first touch.  This frequently neglected consideration may be difficult to explain concisely.

You may not get as much advantage from streaming-store optimization when running multiple threads as on a single thread.

0 Kudos
KitturGanesh
Employee
687 Views

Hi, The response from Tim answers your question. Also, the user manual has some detailed sections on optimization using OpenMP such as  parallelizing caveats, worksharing and so on, which should help....

0 Kudos
Sean_G_
Beginner
687 Views

Thanks, Tim.  I am aware of the first-touch stuff (at least in theory), but I just wanted some confirmation that this would be a reasonable approach before I put in the time.

Kittur, thanks for your reply as well, I will have a look at the manuals.

0 Kudos
KitturGanesh
Employee
687 Views

Thanks Sean, yes the user manual should give you a good start and also there are several articles that you can find in the Intel Developer Zone at  http://software.intel.com/ and you can go to the content for compilers and search for relavant knowledge base articles under Development/Tools/Resources/Content Library sections.
 

0 Kudos
Reply