Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Beginner
42 Views

intel pcm-cache miss rate per process

I am using Intel xeon X5675 processor and Intel PCM. Is there a way to find the cache data access pattern of each process. Atleast can I find the last level miss rate per process

0 Kudos
17 Replies
Highlighted
Black Belt
42 Views

I suppose that parent process(conhost.exe) which runs PCM user mode module will be the one which is measured.Another option is that PCM driver which runs in arbitrary context can measure performance of currently executed thread on logical processor.I do not know if there is an option to profile cache data accesses of specific process.

0 Kudos
Highlighted
Beginner
42 Views

Thank you iliyapolak

Another option is that PCM driver which runs in arbitrary context can measure performance of currently executed thread on logical processor

I require the access pattern of threads. Can you tell me how this happens

0 Kudos
Highlighted
42 Views

Hello Narran,

PCM doesn't keep track of threads or processes so it can't report stats by process or thread. PCM just reads the counters periodically.

You need something like Intel Vtune Amplifier or, on linux you could use 'perf' (or VTune). These utilities sample the instruction pointer so they can assign events to threads/processes.

Pat

0 Kudos
Highlighted
Beginner
42 Views

thank you pat
0 Kudos
Highlighted
Beginner
42 Views

thank you iliyapolak. "Another option is that PCM driver which runs in arbitrary context can measure performance of currently executed thread on logical processor". can you tell me how i can do this
0 Kudos
Highlighted
Black Belt
42 Views

Hi narran,

as Pat said PCM does not track the instruction pointer of the currently executed thread's functions and does not map this to loaded executables.

>>>thank you iliyapolak. "Another option is that PCM driver which runs in arbitrary context can measure performance of currently executed thread on logical processor". can you tell me how i can do this>>>

This is how the system works and specifically drivers.Thats mean that I suppose msr.sys when executed reads/writes MSR registers and communicates with console applications which in turne runs inside console host process.

 

0 Kudos
Highlighted
42 Views

If you are sure that your app is the only thing running, then you can say that what PCM reports is due to your app. But in practice, especially on servers, many other things are running. Even on my poor little laptop, there are many things running. Apparently our IT dept thinks an idle cpu is terrible thing to waste and they busy up the system constantly. Okay... so maybe the last bit wasn't too relevant to your question...

0 Kudos
Highlighted
Black Belt
42 Views

Even on home desktop processor is busy quite running for example AV software on even spyware which is consuming cpu cycles.

0 Kudos
Highlighted
Black Belt
42 Views

Sorry for off topic,but could not resist.Why my posts are still queued for admin approval?

0 Kudos
Highlighted
Beginner
42 Views

0 Kudos
Highlighted
42 Views

I don't know why the posts are queued. The forums are sometimes attacked by spammers (resulting in lots of unwanted email) so the postings myight be getting queued and checked. I'll see it this is the case.

0 Kudos
Highlighted
Beginner
42 Views

patrick and iliyapolak I tried perf. It gives good info about threads. But i need data accessed by threads not from a command. I woking on finding threads that share some data. i.e access same data(from local or remote cache. I need to get the addresses of data accessed by every thread running, from a program. Is there a tool or software that can help me

0 Kudos
Highlighted
Black Belt
42 Views

Hi Pat,

I think that developing a more heuristic anti-spam filter which will take into account so called "prevoius user behaviour" could eliminate the frequency of anti-spam filter trigerring.

Just my thoughts:)

0 Kudos
Highlighted
Black Belt
42 Views

Hi narran,

I do not know if you can track the data beign accessed by particular thread(s),but  you can profile your app with Xperf which will give a nice graphical breakdown of thread activity and threa's call stacks.

Regarding the data which is accessed by the thread you will need kernel and user mode debugger like windbg in order to inspect and find some access pattern.You can for example write source code compile it with debug information and track the execution inside user-mode debugger.

0 Kudos
Highlighted
42 Views

Hello Narran,

You need a tool (like Intel VTune) which samples the instruction pointer and can measure events related to false sharing.

Please see the article http://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing-among-threads

Pat

0 Kudos
Highlighted
Beginner
42 Views

Patrick Fay iliyapolak I am working  with a scheduling project that schedules threads in multi-core processors based on amount of info shared betweent he threads. I need to know whether two threads share some data saya from a same cache line. How about using intel pin or valgrind for getting those data. Can you suggest some other light weight tool

0 Kudos
Highlighted
42 Views

If your code is actually scheduling the threads, then (assuming you've programmed the appropriate event) you could get the difference in the event over the current context switch. This would tell you that the thread about to be switched out had X amount of shared cache line hits. But this wouldn't tell you with which other thread you were sharing cache lines... just that you are sharing cache lines with some other thread.

The Precise event based sampling (PEBS) events record the address of the data that caused the event, so if there is a PEBS cache line hit event on whatever processor you have, and if you can find a utility to report the PEBS registers (programming PEBS events is a non-trivial task), then you could look at the HITM/HITS (hit modifiied/hit shared) addresses of one thread and compare the same PEBS info from other threads to see which threads are sharing how much data... But this would be a lot of computation, probably too much computation to be put into a real-time scheduler. This sort of ananlysis is done mostly by benchmarkers trying to minimize shared cacheline hits or at least reduce remote NUMA node accesses.

Pat

0 Kudos