I am trying to speedup my program via parallelization. I am unable to get speedup factor more than 3 on 8-cores Core i7 CPU.
I suspect I am facing DRAM access bottleneck.
Concurrency Analysis provides very good diagram of CPU load for each thread over time,
Is it possible to show access memory metrics (L1, L2, L3 Bound, DRAM Bound) on this diagram?