Intel® C++ Compiler
Community support and assistance for creating C++ code that runs on platforms based on Intel® processors.

building libiomp5md.dll

pertosa__andrea
Beginner
842 Views

Hello!

I have been struggling all weekend to build libiomp5.dll on Visual Studio 2010. I wonder if anyone can offer some help. First, let me explain the motivation. 

One of our software components, the solver, uses omp and detects the max number of threads by calling omp_get_max_threads(). 

We noticed that if that call is made as soon as the product is launched, then we get 8 threads throughout the application. If that call is not made, then when the solver starts it only gets 4 threads. This leads us to believe that another component is setting the max_number of threads to 4 and that's the global value that is then used later. 

We have no idea how to verify if this is the case so I thought that I could build the Intel multithreading library libiomp5dm.dll (which is imported by several other dlls in the product) build it debug mode and see what makes calls into that library. 

I downloaded the source code from here [https://github.com/llvm-mirror/openmp/tree/release_35] installed cmake and set out to build that component... and got nowhere!

The build system seems to choke on a post build event even if the individual projects do not have any post-build events specified in VS. Here is the error I see: 

4>CustomBuild:
4>  Building Custom Rule D:/work/openmp-release_35/runtime/CMakeLists.txt
4>  CMake does not need to re-run because D:/work/openmp-release_35/runtime/build/CMakeFiles/generate.stamp is up-to-date.
4>  Generating libiomp.rc
4>  Too many argument(s)
4>  Try --help option for more information.
4>C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\Microsoft.CppCommon.targets(151,5): error MSB6006: "cmd.exe" exited with code 255.
4>
4>Build FAILED.
 
etc.,... 
And this is the command I used to build the solution file: 
 
>cmake -DCMAKE_C_COMPILER=cl -DCMAKE_CXX_COMPILER=cl -G "Visual Studio 10 2010 Win64" -Darch=32e -DCMAKE_BUILD_TYPE=Debug -Dversion=5  ..
 
A general question first: how can I detect what component is making a call to the omp library? Is my strategy correct? 
And as far as the build: has anyone run into this issue? Was anyone successful in building libiomp5md.dll with TRACING enabled? 
All I want is a library with some information being executed at runtime so I can further investigate. 
Thanks & Regards. 
 
Andrea P.
Altair Engineering. 
 
0 Kudos
3 Replies
Olga_M_Intel
Employee
842 Views

Let me ask a couple of questions.

What are those "magic" numbers - 8 and 4 threads? ;)  Is it a 4-core system with hyper-threading you are running on?

Do you set any OpenMP environment variables (OMP_*, KMP_*) before or during application run?

Do you call any other omp_* API (except omp_get_max_threads())?

Do you use any resource management system that could limit the number of cores to use?

Also can you please set the following environment variables before running your application and provide output here?

set KMP_VERSION=1

set KMP_SETTINGS=1

set KMP_AFFINITY=verbose

Thank you.

0 Kudos
pertosa__andrea
Beginner
842 Views

Hi Olga thanks for your reply. 

If I set OMP_NUM_THREADS=8 in the command windows from which I start the product, then that setting is retained and our (downstream) application tells us that 8 threads were using in the solver. 

This could be a solution, except that we are responsible for just one part of the software and defining a global setting for all seems a bit of a problem. So I thought that it's better to understand who/what sets the max_num_threads to 4 and address the issue from there. 

Note that this setting is persistent through the product ONLY if I set that environment variable in the cmd windows from which I start the product. If I set that via Python (we have a Python console in our product), then the value I set (e.g. 8) doesn't seem to be retained. 

I found that the area of the code that limits the threads to 4 (or 1 on Linux) is the OpenSceneGraph library that is used for the User Interface. Since we do not compile nor own that code we cannot see where the max threads are defined in openSceneGraph, but it's likely that that application limits the number of thread/manages resources.

So that' is what I found... I am unsure what is the solution at this point., but at least I know the culprit... 

As far as your other questions, here is the output: 

1mbd:hw>Qt: Untested Windows version 6.2 detected!
 
--- 07-FEB-2018 11:19:08 ---
 
 
 
Intel(R) OMP Copyright (C) 1997-2013, Intel Corporation. All Rights Reserved.
Intel(R) OMP version: 5.0.20130227
Intel(R) OMP library type: performance
Intel(R) OMP link type: dynamic
Intel(R) OMP build time: 2013-02-27 09:53:18 UTC
Intel(R) OMP build compiler: Intel C++ Compiler 12.1
Intel(R) OMP alternative compiler support: yes
Intel(R) OMP API version: 3.1 (201107)
Intel(R) OMP dynamic error checking: no
Intel(R) OMP thread affinity support: not used
Intel(R) OMP debugger support version: 1.1
 
User settings:
 
   KMP_AFFINITY=verbose
   KMP_SETTINGS=1
   KMP_VERSION=1
 
Effective settings:
 
   KMP_ABORT_DELAY=0
   KMP_ABORT_IF_NO_IRML=false
   KMP_ALIGN_ALLOC=64
   KMP_ALL_THREADPRIVATE=128
   KMP_ALL_THREADS=32768
   KMP_ASAT_DEC=1
   KMP_ASAT_FAVOR=0
   KMP_ASAT_INC=4
   KMP_ASAT_INTERVAL=5
   KMP_ASAT_TRIGGER=5000
   KMP_ATOMIC_MODE=1
   KMP_BLOCKTIME=200
   KMP_CPUINFO_FILE: value is not defined
   KMP_DETERMINISTIC_REDUCTION=false
   KMP_DUPLICATE_LIB_OK=false
   KMP_FORCE_REDUCTION: value is not defined
   KMP_FOREIGN_THREADS_THREADPRIVATE=true
   KMP_FORKJOIN_BARRIER="2,2"
   KMP_FORKJOIN_BARRIER_PATTERN="hyper,hyper"
   KMP_FORKJOIN_FRAMES=false
   KMP_GTID_MODE=2
   KMP_HANDLE_SIGNALS=false
   KMP_INIT_AT_FORK=true
   KMP_INIT_WAIT=2048
   KMP_ITT_PREPARE_DELAY=0
   KMP_LIBRARY=throughput
   KMP_LOCK_KIND=queuing
   KMP_MALLOC_POOL_INCR=1M
   KMP_MONITOR_STACKSIZE: value is not defined
   KMP_NEXT_WAIT=1024
   KMP_NUM_LOCKS_IN_BLOCK=1
   KMP_PLAIN_BARRIER="2,2"
   KMP_PLAIN_BARRIER_PATTERN="hyper,hyper"
   KMP_REDUCTION_BARRIER="1,1"
   KMP_REDUCTION_BARRIER_PATTERN="hyper,hyper"
   KMP_SCHEDULE="static,balanced;guided,iterative"
   KMP_SETTINGS=true
   KMP_STACKOFFSET=0
   KMP_STACKSIZE=4M
   KMP_STORAGE_MAP=false
   KMP_TASKING=2
   KMP_TASK_STEALING_CONSTRAINT=1
   KMP_USE_IRML=false
   KMP_VERSION=true
   KMP_WARNINGS=true
   OMP_DYNAMIC=false
   OMP_MAX_ACTIVE_LEVELS=2147483647
   OMP_NESTED=false
   OMP_NUM_THREADS: value is not defined
   KMP_AFFINITY="verbose,none"
   OMP_PLACES: value is not defined
   OMP_PROC_BIND="false"
   OMP_WAIT_POLICY=PASSIVE
 
OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,4,5}
OMP: Info #156: KMP_AFFINITY: 4 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 2 cores/pkg x 2 threads/core (2 total cores)
OMP: Info #147: KMP_AFFINITY: Internal thread 0 bound to OS proc set {0,1,4,5}
Intel(R) OMP Intel(R) RML support: not using
0 Kudos
Olga_M_Intel
Employee
842 Views

Hello!

Thanks for providing the output.

What I can see from it is that your application definitely got limited resources -

OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,4,5}

OMP: Info #156: KMP_AFFINITY: 4 available OS procs

OMP: Info #157: KMP_AFFINITY: Uniform topology

OMP: Info #179: KMP_AFFINITY: 1 packages x 2 cores/pkg x 2 threads/core (2 total cores)

It means you have only 4 processors available. So, when you set  OMP_NUM_THREADS=8 you get oversubscription that would be harmful for performance.

0 Kudos
Reply