Community
cancel
Showing results for 
Search instead for 
Did you mean: 
Highlighted
Novice
190 Views

High CPU usage for 1st 1 or 2 hours on RHEL 7.6/7.7/7.8

Hello All,

We have multicore/multithread application media-intensive software developed with IPP and we are using intel compiler with AVX optimization enabled. We also have a JDK ( 1.8.0_181-b13) installed and a Java code to handle the signaling and control operations of media applications running on C/C++. This is done through JNI running on other cores.

On Rhel 6 our application runs with constant CPU usage at a fixed load testing. But on Rhel 7 we are seeing for 1st one hour or so (sometimes it takes 2 hours) the CPU usage is high. And post that without any change the CPU usage is coming down by 5-10%. Initially, we thought it is caused by "TUNED" but even uninstalling tuned. and we are using the default rhel kernel (3.10.0-957.el7.x86_64).

We are using Intel Xeon CPU[Intel(R) Xeon(R) Gold 6132 CPU @ 2.60GHz ] with 2 Sockets and multiple CPUS[ ProLiant DL380 Gen10 ]. Our product/application is running on a bare-metal server. Not in any VM or KVM.

The OS is Vanilla  RHEL 7.8 .

  We are using numcatl cores and the product is pinned on 3 cores in NUMA 1. The cpupower.service is not running in our server. The Intel P_state and C_State drivers are not installed. Turbo Boost is disabled. The nohz_full and CPU isolation is disabled in our setup The irqbalance service is running in default mode. The /proc/sys/vm/ data is restored to default OS values after the uninstallation of TUNED. Spectre Meltdown patches are applied. Any clue or help will be highly appreciated.

Thanks 

Soumit

 

Tags (2)
0 Kudos
11 Replies
Highlighted
Black Belt
165 Views

Can you run Intel VTune hotspots analysis?

Run at least a 10 such a analysis in order to  observe a results variation.

0 Kudos
Highlighted
Black Belt
155 Views

I have forgotten to add, that you may restrict the count of events either to kernel mode code or to user mode code and look at the resulting hotspots.

I do not recommend to measure both user mode and kernel mode count overflow. By my own observation the perf-driverless activity in the kernel mode is very large and will skew the results.

0 Kudos
Highlighted
Novice
139 Views

Hello, 

 Thanks for replying to this thread.  And suggesting using Vtune. We are trying but we are facing some issue when we try to capture hotpot data with target PID. We are trying to resolve that. 

     We are new to Vtune, we are looking into documentation, if you can suggest how to enable/disable event counting for Kernel/User space will be of great help.

Regards,

Soumit

 

0 Kudos
Highlighted
Black Belt
132 Views

Hello,

You may create a *custom analysis and open the EVENT drop-down menu, then choose either: USER or OS.

Bernard_0-1602155685670.png

Run at least a 10 times the same analysis and observe the results (there may be a variations) in some performance metrics reported by the VTune.

We are trying but we are facing some issue when we try to capture hotpot data with target PID.

If possible try to run your executable under the VTune control.

*of course this is relevant to hotspot analysis also.

0 Kudos
Highlighted
Black Belt
127 Views

I have forgotten to add this response.

If you are allowed you may share the results with us of course.

0 Kudos
Highlighted
Novice
99 Views

Hello 

we are able to get the Performance snapshot. I will need to check with legal to share in community the capture. 

But when we try to capture the hotspot analysis, we end up having the error due to PIN_MAX_THREADS. Can you please let us know if this can be bypassed? or configure, we could not find any details for this. 

 

Pasted the effor

0 Kudos
Highlighted
Black Belt
90 Views

>>>But when we try to capture the hotspot analysis, we end up having the error due to PIN_MAX_THREADS. Can you please let us know if this can be bypassed?>>>

You should ask for help on Intel VTune forum.

I have never (as a VTune user) encountered the problem described in your response.

 

0 Kudos
Highlighted
Beginner
55 Views

Hi,

 

We have been able to capture Hardware based hotspots and we are trying to analyze the sample.

We found that the same section of code is utilizing different amount of CPU time after the process if getting executed for 1 hour or a little more. Until then, the CPU usage is higher.

And we need a little help from you regarding what can be the possible cause for the same. I am attaching the assembly analysis screenshot from when the CPU usage is higher and low. It includes the CPU time and the instructions retired numbers.

Can you please help us little on the possible reason for the same.

 

Thanks and Best Regards.

0 Kudos
Highlighted
Black Belt
46 Views

By looking at assembly I presume that attached part of the code may be executed by some loop (at the higher level).

There are two arguments loads into %r14 and %r13 (possibly pointers?) and later there is a possible "pointer chase" like machine code sequence. I suppose, that  high number of retired instructions and CPU time spent may be related to ineffective caching of the pointee data. The code at addresses 0x29795, 0x29799 and 0x2979d is probably executed serially and represent some kind of data structure manipulation. The code at 0x297b8 is dependent on result at 0x2979d (hence it was marked by the VTune).

It is interesting how much the issue of skid skewed the results. VTune pre-configured "hotspot analysis" relied on "INSTR_RETIRED.PREC_DIST" and for large number of loop iterations or other lengthy hot code the convergence is high and hence precision of results is more accurate.

It is hard to know exactly what has happen without seeing the whole picture presumably at source level and without an additional samples of other performance events (cache-hierarchy related).

 

0 Kudos
Highlighted
Black Belt
30 Views

@ArnabGanguly 

Can you show the source code implementation?

0 Kudos
Highlighted
Beginner
14 Views

Hi Bernard,

 

I am checking whether I can do that. I will get back on the same.

In the meantime, if you have any other suggestions, please let me know so.

 

Thanks & Best Regards

0 Kudos