Analyzers
Talk to fellow users of Intel Analyzer tools (Intel VTune™ Profiler, Intel Advisor)
4975 Discussions

Profiling a single threaded single process running on a single CPU with General Exploration

Prasanth_G_
Beginner
318 Views

I am trying to profile a process on linux running on a single CPU on a Broadwell (model name    : Intel(R) Xeon(R) CPU D-1540 @ 2.00GHz) and getting a CPI rate of 2.828 with default Vtune config. With more samples (reducing the sample interval), I see it bumps up to 3.148. While I understand one or two delay functions in "Bottom-up" chart that are affecting the CPI rate, what I do not understand is "vmlinux" showing a CPI rate of 1.588. According to system config, CPU that I am running is dedicated to the process and any actions of linux kernel should be performed on a different CPU. Does CPI 1.588 above say this is not happening ? Any help is greatly appreciated. Any other suggestions/comments based on the results and Vtune-config pasted below?

Vtune config on General Exploration:

1. Attach to a process with ssh session.

2. Automatically stop after 60 secs.

3. Analyse child processes.

4. Under 1 minute of duration estimate

5. Collection data: 0

6. Slow frames: 40, Fast frames: 100 (default values)

7. CPU mask: 11 

 

RESULTS:

With defaults:

Elapsed Time:    60.059s
    Clockticks:    151,600,000
    Instructions Retired:    53,600,000
    CPI Rate:    2.828
    MUX Reliability:    0.948
    Front-End Bound:    0.191
        Front-End Latency:    0.106
            ICache Misses:    0.026
            ITLB Overhead:    0.009
            Branch Resteers:    0.047
            DSB Switches:    0.000
            Length Changing Prefixes:    0.000
            MS Switches:    0.106
        Front-End Bandwidth:    0.086
            Front-End Bandwidth DSB:    0.026
            Front-End Bandwidth MITE:    0.237
            Front-End Bandwidth LSD:    0.000
    Bad Speculation:    0.046
    Back-End Bound:    0.584
        Memory Bound:    0.217
            L1 Bound:    0.237
            L2 Bound:    0.000
            L3 Bound:    0.000
            DRAM Bound:    0.211
            Store Bound:    0.000
        Core Bound:    0.367
            Divider:    0.000
            Port Utilization:    0.923
                Cycles of 0 Ports Utilized:    0.633
                Cycles of 1 Port Utilized:    0.290
                Cycles of 2 Ports Utilized:    0.053
                Cycles of 3+ Ports Utilized:    0.079
    Retiring:    0.178
        General Retirement:    0.113
        Microcode Sequencer:    0.065
        Assists:    0.000
    Total Thread Count:    5
    Paused Time:    0s

 

With Sampling Interval event-config=CPU_CLK_UNHALTED.THREAD:sa=200000,INST_RETIRED.ANY:sa=200000 as suggested in another recent post.

Elapsed Time:    60.001s
    Clockticks:    1,454,600,000
    Instructions Retired:    462,000,000
    CPI Rate:    3.148
    MUX Reliability:    0.984
    Front-End Bound:    0.067
        Front-End Latency:    0.063
            ICache Misses:    0.019
            ITLB Overhead:    0.003
            Branch Resteers:    0.025
            DSB Switches:    0.000
            Length Changing Prefixes:    0.000
            MS Switches:    0.121
        Front-End Bandwidth:    0.004
            Front-End Bandwidth DSB:    0.000
            Front-End Bandwidth MITE:    0.179
            Front-End Bandwidth LSD:    0.000
    Bad Speculation:    0.009
    Back-End Bound:    0.769
        Memory Bound:    0.365
            L1 Bound:    0.294
            L2 Bound:    0.000
            L3 Bound:    0.234
            DRAM Bound:    0.000
            Store Bound:    0.000
        Core Bound:    0.404
            Divider:    0.000
            Port Utilization:    0.660
                Cycles of 0 Ports Utilized:    0.415
                Cycles of 1 Port Utilized:    0.242
                Cycles of 2 Ports Utilized:    0.110
                Cycles of 3+ Ports Utilized:    0.049
    Retiring:    0.155
        General Retirement:    0.056
        Microcode Sequencer:    0.099
        Assists:    0.000
    Total Thread Count:    5
    Paused Time:    0s

0 Kudos
1 Reply
Dmitry_R_Intel1
Employee
318 Views

Hello Prasanth,

When a process does say system call it will be done in the context of this process. Most likely this is what happening in your case. 

0 Kudos
Reply