Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
5249 ディスカッション

Memory analysis with vtune

mnt
新規コントリビューター I
1,522件の閲覧回数

Hello,

I wrote a code with 4 threads on a i5-6600 CPU which simply accesses a large array with strides. The parameter I change, is the stride size. I expect that the run with large stride, creates more memory accesses due to less locality. In the pictures below, you can see the output of two runs with large and small strides.

mnt_0-1679055329744.png

 

 

mnt_1-1679055346420.png

 

The question is why the run with longer execution time (also larger LLC misses) has more core utilization? 94% vs. 41%.

Also the DRAM bandwidth for the longer executed run is less than the other. I expect the reverse. Any idea about that?

0 件の賞賛
4 返答(返信)
AlekhyaV_Intel
モデレーター
1,478件の閲覧回数

Hi,


Thank you for posting in Intel Communities. Could you please provide us the answers to our below doubts so that we can debug your issue further?

  1. Details about your application you attached to VTune Profiler.
  2. Sample Reproducer i.e. the code which you've written and all the command to compile and analyze.
  3. How did you spawn the threads?


Regards,

Alekhya


mnt
新規コントリビューター I
1,475件の閲覧回数

Hello,

I have attached the code. A sample run command is `./a.out 4000000000 4 10 4000000`. The first number is the array size, the second is the thread number, the third is the stride and the fourth is the number of accesses.

The compilation command is a standard gcc command with -O3.

I don't know what you mean by "the way threads are spawn". The code uses standard pthread library.

mnt
新規コントリビューター I
1,436件の閲覧回数

The issue has been solved. Please lock this thread.

AlekhyaV_Intel
モデレーター
1,410件の閲覧回数

Hi,


Glad to know that your issue is resolved. Thanks for letting us know. If you need any further assistance, please post a new question as this thread will no longer be monitored by Intel.


Regards,

Alekhya


返信