Software Archive
Read-only legacy content
17061 Discussions

What does __kmp_wait_yield_4 imply in hot spot analysis of vTune

YW
Beginner
3,014 Views

Hi,

I am rewriting part of LibSVM code for vectorization in order to accelerate it on Xeon Phi. Unfortunately, the performance of my rewriting code is the same as, if not slightly worse than, the original performance on Xeon Phi. To figure out the reason, I profiled two programs using vTune hot spot analysis. According to the results, the target code segment takes significantly less time in the rewriting version, however, most of its time goes to __kmp_wait_yield_4, which only occupies a tiny amount of time in the original version.

Does anyone know what it means? I tried to Google it but got very little information.

BTW, my vectorization code gets about 20% improvement in total if running in the host.

Thanks in advance!

0 Kudos
22 Replies
jimdempseyatthecove
Honored Contributor III
462 Views

>>why do you think y[] is better to be double?

If you use G*y you avoid conversion of char to int, then converting from int to double.

if alpha_status is made double too, and UPPER_BOUND==-1.0, LOWER_BOUND==1.0

then test becomes (y != alpha_status)

(verify sign on ..._BOUND)

Though C++ should promote at compile time, please get into the habit of using the same typed literals. Porting to other languages might not be so kind. Example: Fortran, at compile time uses the literals in the type as expressed in the source code. By convention, conversion is performed at run time. Therfore:

someDouble = 0.1

is different from

someDouble = 0.1D+0

Jim Dempsey

 

0 Kudos
YW
Beginner
462 Views

Thanks a lot for you guys' help! I finally figure out the cause of soaring __kmp_wait_yield_4 in vTune is from my accidentally nested OMP parallelization. FYI.

I really appreciate your help though, especially Jim's continuous advice on the vectorization issues, which is extremely helpful to me.

0 Kudos
Reply