Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.

Can't use PCM

korso
Beginner
2,350 Views
Hello,
I'm trying to use Intel Performance Counter Monitor to measure L2 and L3 cache hit ratio in a custom code. I'm trying to follow the example snippet in website, but I obtain the following error when I try to initialize processor counters:
[bash]Probando Intel PCM Num (logical) cores: 4 Num sockets: 1 Threads per core: 1 Core PMU (perfmon) version: 3 Number of core PMU generic (programmable) counters: 4 Width of generic (programmable) counters: 48 bits Number of core PMU fixed counters: 3 Width of fixed counters: 48 bits Nominal core frequency: 2799999993 Hz LLEGA a crear instancia WARNING: Core 1 IA32_PERFEVTSEL0_ADDR are not zeroed 1114660 mpedrero@huracan:~/Dropbox/uma/padding/IPC$ [/bash]
And my code:
[bash]#include "cpucounters.h" #include #define F 10000 #define C 10000 using namespace std; int main(){ cout<<"Probando Intel PCM\n"<program() != PCM::Success) return -1; cout<<"Llega a programar contadores"<; for(int i=0;i = 1.0; } } // End of custom code SystemCounterState after_sstate = getSystemCounterState(); cout << "RESULTADOS:"<
The problematic line is:
[bash]if (m->program() != PCM::Success) return -1; [/bash]
The rest of executables provided in PCM download (like pcm.x, pcm-sensor.x work seamlessy)
Any advice? It's almost copied to this page code:
Thanks in advance!
PD: I can't find any documentation for PCM. It's available anywhere?
PD2: My system is a i7 860 with Ubuntu 10.04 x86_64
0 Kudos
28 Replies
Roman_D_Intel
Employee
1,777 Views
Hi Korso,
what is the value returned bym->program() ? It contains error code, one of the
[cpp] enum ErrorCode { Success = 0, MSRAccessDenied = 1, PMUBusy = 2, UnknownError };[/cpp]
Best regards,
Roman
0 Kudos
korso
Beginner
1,777 Views
Hi Roman,
You had given me the key. I'd obtained a 2 error code, so I rebooted the computer (It's a WS so normally it's on 24/7) and now same code works. Thank you.
Nevertheless, when I executed the program, I obtain "Number of PCM instances" which increases every time I execute the code. I suposse it should be a method or something to destoy the instance. In fact, I'm guessing that probably this problem could be the cause for my former issues. Can you help me with this?
Thank you again.
PD: There's no documentation available about Intel PCM? Seems a pretty useful library, but with the lack of README of docs is difficult to use it properly.
0 Kudos
Roman_D_Intel
Employee
1,777 Views
Hi korso,
good point. We did not have the clean up call in the article example. Call
m->cleanup(); on your program exit to destroy the PCM instance properly.
We are trying to put more of our time to document PCM better. You feedback is very helpful on what we need to improve.
We have programmer documentation for PCM in doxygen HTML browsable format (the doxygen project file is included into the package) that documents PCM methods including program(...) , cleanup() and others.
Thanks,
Roman
0 Kudos
korso
Beginner
1,777 Views
Works perfectly, thank you Roman. I've generated doc files, so I'll study them next week.
0 Kudos
Subrata_M_
Beginner
1,777 Views
Hi Roman, Can you please share the link to programmer doxygen documentation? Subrata
0 Kudos
Roman_D_Intel
Employee
1,777 Views
Subrata, we do not host doxygen documentation on a web site. You can easily generate it locally if you just install the doxygen tool. After you installed it just execute "doxygen" without parameters in the main PCM directory (it contains "Doxyfile" project file). It will generate the html documentation which you can open with your browser. Best regards, Roman
0 Kudos
Bingyi_C_
Beginner
1,777 Views

Hey Korso and Roman, I am new to PCM, I also try to use Intel Performance Counter Monitor to measure L2 and L3 cache hit ratio in my custom code. I use exactly the same code and I try to use the command as following to compile the code:

g++ -O mycode.cpp -o mycode -I ./PCM/ -L ./PCM/cpucounters.o ./PCM/msr.o -lpthread

However, it still keeps on showing that 

undefined reference to `PCM::getInstance()'
undefined reference to `PCM::program(PCM::ProgramMode, void*)'
undefined reference to `getSystemCounterState()'

I try to -L the other o files. It still doesn't work. Could you guys give me some hints regards how to link the PCM library to my custome code? I didn't find any detail user manual regards how to use PCM in the custom code except http://software.intel.com/en-us/articles/intel-performance-counter-monitor-a-better-way-to-measure-cpu-utilization this page. Thanks a lot!

0 Kudos
Patrick_F_Intel1
Employee
1,777 Views

Hello Bingyi,

I'm guessing that you are calling the undefined reference functions with arguments that don't agree with declaration.

The declaration for getinstance() is:

static PCM * getInstance();

Is that how you are calling it?

What happens when you compile it like:

g++ -O mycode.cpp ./PCM/cpucounters.cpp ./PCM/msr.cpp -o mycode -I ./PCM/ -lpthread

The problem might also be the ordering of the object files... if g++ uses a single pass linker... you might need to repeat cpucounters.o like:

g++ -O mycode.cpp -o mycode -I ./PCM/ -L ./PCM/cpucounters.o ./PCM/msr.o ./PCM/cpucounters.o -lpthread

but I doubt this is the issue (since msr.cpp doesn't use getInstance()).

There is also the possibility that c++ name mangling is the issue... perhaps you are enabling mangling in some cases and not in others...

But to me, this error is almost always a c++ compiling/linking issue, not a problem with PCM.

Pat

0 Kudos
Bingyi_C_
Beginner
1,777 Views

Hey Pat,

Thanks for the fast reply. It works!!! 

If I use the first solution you mentioned, the command line is as following

g++ -O mycode.cpp ./PCM/cpucounters.cpp ./PCM/msr.cpp ./PCM/pci.cpp ./PCM/client_bw.cpp -o 8-4 -I ./PCM -lpthread

It can compile, the warning is as following:

./IntelPerformanceCounterMonitorV2.5.1/cpucounters.cpp: In function ‘void print_mcfg(const char*)’:
./IntelPerformanceCounterMonitorV2.5.1/cpucounters.cpp:2606:61: warning: ignoring return value of ‘ssize_t read(int, void*, size_t)’, declared with attribute warn_unused_result [-Wunused-result]
./IntelPerformanceCounterMonitorV2.5.1/cpucounters.cpp:2615:65: warning: ignoring return value of ‘ssize_t read(int, void*, size_t)’, declared with attribute warn_unused_result [-Wunused-result]
./IntelPerformanceCounterMonitorV2.5.1/pci.cpp: In constructor ‘PciHandleMM::PciHandleMM(uint32, uint32, uint32, uint32)’:
./IntelPerformanceCounterMonitorV2.5.1/pci.cpp:610:65: warning: ignoring return value of ‘ssize_t read(int, void*, size_t)’, declared with attribute warn_unused_result [-Wunused-result]
./IntelPerformanceCounterMonitorV2.5.1/pci.cpp:615:69: warning: ignoring return value of ‘ssize_t read(int, void*, size_t)’, declared with attribute warn_unused_result [-Wunused-result]

Afterward, I run the code, the results is as belowing:

Instructions per clock:-1L3 cache hit ratio:0.479371Bytes read:461888

0 Kudos
Patrick_F_Intel1
Employee
1,777 Views

I wonder if you can find where PciHandleMM() is defined....

0 Kudos
Bingyi_C_
Beginner
1,777 Views

The output doesn't seem very reasonable:

Instructions per clock:-1
L3 Cache Misses2049425
L2 Cache Misses3376068
L3 cache hit ratio:0.392955
L2 cache hit ratio0

0 Kudos
Bingyi_C_
Beginner
1,777 Views

Is there anything wrong with my code?

0 Kudos
Patrick_F_Intel1
Employee
1,777 Views

Probably. IPC should not be -1. I would start looking at why IPC is -1. Or, if there are error message before that, fix those message first.

I assume PCM compiled correctly doesn't report IPC = -1. So I would look for what you are doing differently than what 'unchanged PCM' does. I'm sorry but I don't have time to debug your code.

Pat

0 Kudos
Rolf_Andersson
Beginner
1,777 Views

Bingyi, I'm unable to download your code snippet. Can you post it again?

0 Kudos
Bingyi_C_
Beginner
1,777 Views

Sure, Rolf, the code is as following:

#include "/local/homes/bingyiloc/IntelPerformanceCounterMonitorV2.5.1/cpucounters.h"

#define F 200

#define C 200
using namespace std;
int

main(){
PCM *m = PCM::getInstance();
SystemCounterState before_sstate = getSystemCounterState();
// Begin of custom code
cout<<"bingyi's code is working"<<endl;
double matrix;
for(int i=0;i<100;i++){
for(int j=0;j<100;j++){
matrix = 1.0;
}

}
cout<<"bingyi's code is finished!!"<<endl;
// End of custom code
SystemCounterState after_sstate = getSystemCounterState();
cout << "Instructions per clock:" << getIPC(before_sstate,after_sstate)<<endl;
cout <<"L3 Cache Misses"<< getL3CacheMisses(before_sstate,after_sstate)<<endl;
cout << "L2 Cache Misses"<<getL2CacheMisses(before_sstate,after_sstate)<<endl;
cout << "L3 cache hit ratio:" << getL3CacheHitRatio(before_sstate,after_sstate)<<endl;
cout << "L2 cache hit ratio"<<getL2CacheHitRatio(before_sstate, after_sstate)<<endl;
m->cleanup();

}

0 Kudos
Patrick_F_Intel1
Employee
1,777 Views

Please read Roman's reply to Korso above and follow the already given instructions.

Pat

0 Kudos
Rolf_Andersson
Beginner
1,777 Views

Bingyi, 

it seems you are missing a call to the program method. May I suggest that you add:

[cpp]m->program (PCM::DEFAULT_EVENTS, NULL);[/cpp]

after

[cpp]PCM* m = PCM::getInstance ();[/cpp]

I got the following result on my machine:

[plain]bingyi's code is working
bingyi's code is finished!!
Instructions per clock:0.584452
L3 Cache Misses: 6893
L2 Cache Misses: 11591
L3 cache hit ratio: 0.405314
L2 cache hit ratio: 0.409767
Cleaning up[/plain]

Hope this helps,
Rolf 

0 Kudos
Rolf_Andersson
Beginner
1,777 Views

just wrote a post that got queued for review for some reason;
to add to that post, it seems that your matrix access is out of bounds (using F and C, instead of i and j)

I'll repost my previous comment if it gets lost.

/Rolf

0 Kudos
Bingyi_C_
Beginner
1,777 Views

Hey Rolf, 

Thanks so much for your reply!! I appreciate it very much!!

I add the the code 

m->program (PCM::DEFAULT_EVENTS, NULL);

However the output turns out to be:

Num logical cores: 24
Num sockets: 2
Threads per core: 2
Core PMU (perfmon) version: 3
Number of core PMU generic (programmable) counters: 4
Width of generic (programmable) counters: 48 bits
Number of core PMU fixed counters: 3
Width of fixed counters: 48 bits
Nominal core frequency: 2200000000 Hz
Package thermal spec power: 95 Watt; Package minimum power: 46 Watt; Package maximum power: 145 Watt;


WARNING: Core 0 IA32_PERFEVTSEL0_ADDR are not zeroed 5439548
bingyi's code is working
bingyi's code finished!

Instructions per clock:-1

L3 Cache Misses4301948
L2 Cache Misses4301948
L3 cache hit ratio:0
L2 cache hit ratio0
Cleaning up

It doesn't seem right to me. Is that because of the waring? Should I set IA32_PERFEVTSEL0_ADDR to zero?

0 Kudos
Patrick_F_Intel1
Employee
1,603 Views

So... Roman told Korso to check the error code. Are you checking the error code?

0 Kudos
Reply