I have made the program that conduct 10 independent pointer chase and I have verified 99% of pointer chase step goes to main memory.
My system has maximum memory bandwidth as 59.61GB/s
When I use memory bandwidth monitoring program, one thread can generate about 10GB/s memory bandwidth.
When I increase the number of process, memory bandwidth that each process generate is reduced and I have added all the memory bandwidth value that each process generate, I can only obtain about 32GB/s memory bandwidth.
I can't get more than 32GB/s memory bandwidth even though I increase the number of thread.
So, I think this result comes from bank conflict so leads to low utilization of memory bandwidth.
Is this reasonable explanation?? or Should I consider another factor??