Currently, I measure the L2 and L3 cache hit ratio and misses for a C++ code snippet using the Intel PCM library. Can the PCM library be used to collect additional information as described below:
I would like to instrument all load and store instructions executed by a snippet of C++ code. Following is the information I would like to collect for each such instruction:
* Virtual memory address being accessed for the data
* Was the access a cache hit or not? If it was a hit, which level of cache did it hit in?
If the PCM library does not support this, is there an alternative option to collect this info?
I'm working on an IvyBridge machine - Intel(R) Xeon(R) CPU E5-2697 v2 machine - 2 sockets with 12 cores per socket (Hyperthreading is enabled - hence 24 cores/socket)
For optimizing memory access, I would highly recommend to use a profiler like Intel VTune or Linux perf. If you instrument the code, you will disturb the measurements. For this reasons, these tools use statistical methods for their measurements.
If you truly want to instrument your code, you can give "pin" a try. It will allow you to trace all memory access.
The C++ code I need to profile is part of a bigger program and its called from the JVM via a JNI call.
Can the PIN tool be used to instrument such an application, ie instrumenting JNI calls?