When we use a single icpc process and perform "icpc -V", the command usually finishes in about 1 - 1.5 seconds. When we use multiple icpc processes, something appears to serialize. For example, if we perform 16 "icpc -V" commands concurrently, each one takes approximately 16 seconds to return. The execution time gets worse and worse as we add processes.
After examining strace and various flexlm debug output, the delay and serialization appears to occur when something in icpc (flexlm?) is scanning all devices in the system using libudev. Why does icpc scan all devices (stat and readlink on them, too)? This device scan happens after the license server sends the license information... The following is printed "INTEL_LMD: checkoutfilter: returns ACCEPT". The scan happens before we see "Checkout succeeded". Is there any way to disable the device scan? Or speed up the device scan?
The only way I've found to make the intel C++ compiler function at acceptable speeds is to create zero length libudev.so.0 and libudev.so.1 and add them to my LD_LIBRARY_PATH when using icpc. This gets rid of the device scan, but it is not the safest thing to do. Is there any other better workaround?
Note: We have had similar slowness in earlier intel C++ compiler releases, too. Some previous releases of the intel C++ compiler only looked for libudev.so.0, so the intel C++ compiler worked reasonably fast on a system that did not have libudev.so.0. However, previous intel C++ compiler releases were slow on systems that did have libudev.so.0.
I cannot send the log file, but can answer some questions about it, if you have any. It should not be too difficult to find the situations under which icpc (or flexlm assuming you have the source) will use libudev to scan devices. If you do a "strings icpc | grep udev", there are some suspicious symbols.
Also, I was not quite correct in my first post. The Intel C++ Compiler device scan performs a stat and readlink on many thousands of devices. The Intel C++ Compiler does try to open and read a significant number of devices, too... again, many thousands, but about 1/4 the amount on which it performs a stat and readlink.
One suggestion was to borrow licenses. This does enable the Intel C++ Compiler to acquire licenses in a timely manner, but eventually we run out of licenses. The early return of licenses does not work properly (in v11.13.*.* of flex net), even though Intel's own documentation states that it does... https://software.intel.com/en-us/articles/intel-flexlm-license-borrowing-capability.