I have a 32 processor AMD system with 64 gigabytes of main memory. I'm trying to run up to 32 processes (independent processes -- *not* threads), each performing raycasting operations using rtcIntersect on triangle meshes. Each process uses about 1.25 gigabytes -- thus there is plenty of available memory. The raycasters run normally when running one at a time. A single process runs for about 8 minutes and uses about 8 minutes of CPU time. All processes are single-thread. Each process iterates over 6 scenes, calling rtcCommit 6 times during the run. The accumulated time for the rtcCommit calls is about 40 seconds of wall-clock time and 40 seconds of CPU time.
When starting 32 processes simultaneously, they get stalled on the rtcCommit calls. Within a single job, that normally takes 8 minutes, the rtcCommit calls are using about 40 seconds of CPU time and about 1000 seconds (more than 15 minutes) of wall-clock time. The processes all complete successfully after about 25 minutes.
I don't see any reason why these processes would interfere with each other to such an extent. They should be running completely independently and well within the limits of the system resources. Is there something about the internals of rtcCommit that would cause it to perform poorly in a multi-processing environment?
Further observations indicate that rtcCommit calls are all forced to use only CPU0. Is there a configuration option for that? How can I tell embree to allow rtcCommit calls to use any available CPU?
In order to get the jobs to run single-threaded, I have set "nthreads=1" in the rtcInit call. It seems that embree has misinterpreted this to mean that only one CPU is available globally for all rtcCommit calls across all processes.
Is there some other option for telling rtcCommit to run single-thread per process?
The different processes using Embree should run independently without any problems. What version of Embree are you using?
A couple of things to check:
- If you are not already using it please try the latest version of Embree which should be 2.6.1 at this point.
- use "threads=1" instead of "nthreads=1". If that is still not forcing Embree to use only a single thread you could "brute force" set the number of internal threads to 1 by putting "g_numThreads=1;" in line 307 in rtcore.cpp.
- Does the application somehow set the affinity of the application thread to CPU0?
- The top memory consumption is typically reached during the rtcCommit call (scene setup,BVH build etc), please double check that a single process at this point does not use more than 2 GB otherwise swapping to hard disc will occur with 32 processes.
Carsten, thanks for your suggestions.
The behavior described in my original post is for Embree 2.4, with "threads=1". ('nthreads' was a typo in the forum post only).
Embree 2.6.1 behaves differently, but still not entirely as I would expect.
Using 2.6.1, with the default rtcInit() (i.e. no threads setting), my executables are running single-threaded as normal, except for the rtcCommit calls, which use as many CPUs as are available. For example, if I start 4 instances, they will run on 4 CPUs most of the time, and spread out over all 32 CPUs, briefly, for duration of the rtcCommit calls. I believe this is the expected behavior, and is the same as Embree 2.4.
The difference occurs when setting "threads=1" to try to force rtcCommit to run single threaded. With Embree 2.6.1, the rtcCommits are still spreading out over all 32 CPUs, apparently ignoring the threads=1 setting. But even worse, now the executable code *outside* the rtcCommits is forced to run only on CPU0. Furthermore, it seems that *any* value for threads has the same effect. For example, I set "threads=4", and all code, except the rtcCommit calls, is forced to run only on CPU0.
I don't believe my application code is messing with affinity settings. Other code runs normally on this machine. It is only code linked to the Embree library which is behaving strangely.
I have verified that the memory usage is not causing swapping.
I have not yet tried setting g_num_Threads in the source, since I'm currently testing with binary distribution.
The issue appears to be that the threads=N option did two things internally: setting the number of used worker threads to N and setting the affinity of the threads (and probably also of the main thread). Setting the affinity causes the issues you see. The threads option was only intended to be used for debugging. I will change the behavior for the upcoming release the following way.
1) threads=N will set the number of TBB worker threads to N
2) set_affinity=1 will separately set the affinity if that is required
However, we recommend not using the rtcInit parameters at all to configure the TBB threads. Best use the tbb::taskscheduler_init object to configure the TBB threads from your application. Embree will then use that TBB configuration also.