Analyzers
Community support for Analyzers (Intel VTune™ Profiler, Intel Advisor, Intel Inspector)

Memory analysis with vtune

mnt
New Contributor I
731 Views

Hello,

I wrote a code with 4 threads on a i5-6600 CPU which simply accesses a large array with strides. The parameter I change, is the stride size. I expect that the run with large stride, creates more memory accesses due to less locality. In the pictures below, you can see the output of two runs with large and small strides.

mnt_0-1679055329744.png

 

 

mnt_1-1679055346420.png

 

The question is why the run with longer execution time (also larger LLC misses) has more core utilization? 94% vs. 41%.

Also the DRAM bandwidth for the longer executed run is less than the other. I expect the reverse. Any idea about that?

0 Kudos
4 Replies
AlekhyaV_Intel
Moderator
687 Views

Hi,


Thank you for posting in Intel Communities. Could you please provide us the answers to our below doubts so that we can debug your issue further?

  1. Details about your application you attached to VTune Profiler.
  2. Sample Reproducer i.e. the code which you've written and all the command to compile and analyze.
  3. How did you spawn the threads?


Regards,

Alekhya


0 Kudos
mnt
New Contributor I
684 Views

Hello,

I have attached the code. A sample run command is `./a.out 4000000000 4 10 4000000`. The first number is the array size, the second is the thread number, the third is the stride and the fourth is the number of accesses.

The compilation command is a standard gcc command with -O3.

I don't know what you mean by "the way threads are spawn". The code uses standard pthread library.

0 Kudos
mnt
New Contributor I
645 Views

The issue has been solved. Please lock this thread.

0 Kudos
AlekhyaV_Intel
Moderator
619 Views

Hi,


Glad to know that your issue is resolved. Thanks for letting us know. If you need any further assistance, please post a new question as this thread will no longer be monitored by Intel.


Regards,

Alekhya


0 Kudos
Reply