Intel® Quartus® Prime Software
Intel® Quartus® Prime Design Software, Design Entry, Synthesis, Simulation, Verification, Timing Analysis, System Design (Platform Designer, formerly Qsys)
16595 Discussions

Design Space Explorer does not make use of all CPUs in the system

Altera_Forum
Honored Contributor II
1,063 Views

I have Quartus Prime 18.0 Standard, and I've been using it on a Windows-based dual Xeon E5-2699 system (it's high end as CPUs go, but not particularly cutting-edge - these CPUs are almost 4 years old.) Each CPU has 22 cores with 2 hyperthreaded logical processors each.  

 

Since compilation is incredibly slow (I am currently working on a design that averages 2 hours per compile, most of it spent in the fitter), proper use of available resources is a must.  

 

First of all, each instance is capped by the options at 16 processors. That I can understand (maybe the algorithm does not parallelize well), but there's still room for improvement.  

 

Since timing seems to vary quite a bit between runs, I took to using Design Space Explorer to do multiple compiles in parallel. As a side note, it would be nice if DSE let me vary the code between exploration points (e.g. define a different macro in each point, so I could leverage that macro to try different permutations of my code.) But I digress. 

 

Now, the problem is that all instances launched by DSE are locked to socket# 1. So, I have 8 instances trying to compile at once, but they are all crowding the same 22 cores, so, whenever they all go multithreaded, they start competing for resources. I see 100% CPU usage in socket# 1 and 0% socket usage in socket# 2. As a result, compilation takes longer than it should. 

 

From the point of view of Windows API, CPUs are arranged in two "processor groups". The simplest solution would be to detect the number of processor groups and to pick one of the groups at random when Quartus is launched. It's literally 10 lines of code. 

 

What's especially baffling is that I am somehow prevented from messing with process affinity directly. I'd be willing to move some instances onto socket# 2 by hand through the task manager, except, if I try to do that to any of the 8 instances of quartus_fit.exe, I get an error message, "Unable to access or set process affinity" / "The operation could not be completed" / "Access is denied". I never realized it was even possible to block access to process affinity. (This is Windows Server 2012 and I'm doing it as Administrator, so, no UAC issues.) I can reassign affinities of quartus_sh.exe and quartus_worker.exe (not that it does me any good). And the quartus_fit.exe process is not protected, because I can attach to it with a debugger just fine. It just won't let me touch its affinity for some reason. 

 

How do I go about submitting a bug report / feature request?
0 Kudos
2 Replies
Altera_Forum
Honored Contributor II
340 Views

Hi, 

 

You can file a service request at,http://mysupport.altera.com 

 

Best Regards, 

Anand Raj Shankar 

(This message was posted on behalf of Intel Corporation)
0 Kudos
Altera_Forum
Honored Contributor II
340 Views

 

--- Quote Start ---  

Hi, 

 

You can file a service request at,http://mysupport.altera.com 

 

Best Regards, 

Anand Raj Shankar 

(This message was posted on behalf of Intel Corporation) 

--- Quote End ---  

 

 

I was hoping to bypass low-level support drones, but I guess that's the only option. 

 

I did some more digging. The problem is not limited to DSE. Even if I launch multiple compilations in parallel by hand by opening 8 command prompts and executing quartus_sh -flow in each, they all end up crowding socket#1 and they all refuse to be reassigned. This only takes effect some time into the compilation. For the first 10 to 20 minutes, quartus_fit.exe is free to reassign and may even be executing on socket#2, but, at some later point, it migrates onto socket#1 and "locks up". 

 

Upon closer look, it may simply be impossible to change affinity through the task manager once the process begins spawning threads. After that it's necessary to move threads from socket to socket individually.  

As a workaround, I wrote a short app based on this sample https://msdn.microsoft.com/en-us/library/windows/desktop/ms686852(v=vs.85).aspx that enumerates running processes and manually redistributes threads of all instances of quartus_fit.exe between sockets. Put it in the task scheduler on a 30 minute interval. Works like a charm - seems to cut the compilation time by at least 25%.
0 Kudos
Reply