Processors
Intel® Processors, Tools, and Utilities
14509 Discussions

Unknown bottleneck when spawning a moderate number of copies of single-threaded processes

Mark2000
Beginner
697 Views
Recently configured a computer with an i9-13000KF running on a motherboard with a Z690 chipset, Ubuntu 22.04. Running simulation software with single-threaded high-CPU usage and get about 10it/s in the simulation. As start copies of the simulation, the speed drops towards ~1it/s as N_processes approaches 20 (<32 cores), with some speed drops even with only 2-3 processes. Core usage is near maximum on each of N_processes cores. Clock speeds appear to remain roughly constant, with no clear sign of throttling due to thermal or otherwise. Ram utilization is only about 30% (of 64gb), and no swap is being used. Nearly 0 i/o use for read/write to disk (which is expected for this simulation).
 
When I a simple stress-testing python script (computes sqrts over and over) on single and multiple threads, that script did not show the slowdown behavior and kept a constant rate as long as the number of instances was less than the number of threads.
 
I reran both tests using numactl to ensure each script was running on a single independent thread and got the same results, even when just trying performance or standard cores. Is there some other limitation of the CPU that I could be hitting that I should be profiling, or some resource the CPU needs that would be bottlenecking performance that I haven't accounted for?
 
Any suggestions are appreciated, as I expected significantly better multithreaded performance than this.
Labels (1)
0 Kudos
4 Replies
DeividA_Intel
Employee
650 Views

Hello Mark2000, 


  

Thank you for posting on the Intel® communities. I am sorry to know that you are having issues with your Intel® Core™ i9-13900KF Processor.


  

In order to better assist you, please provide the following:  


1. What is the brand and model name of your motherboard?

2. Can you share more details about the final goal of your tests? What should be the expected behavior with the Intel® Core™ i9-13900KF Processor?

3. What is the name of the application that you are using? Can you provide a link?

4. What is the BIOS version installed?

5. Can you share a video of the issue?

6. Did you check with the motherboard manufacturer to confirm compatibility?



Regards,  

Deivid A.  

Intel Customer Support Technician 


0 Kudos
Mark2000
Beginner
638 Views

Hey Deivid,

 

Thanks for the response. To answer your questions:

  1. The motherboard is a MSI Z690-P, which I've confirmed supports 13th gen Intel Core processors. 
  2.  My final goal of the tests is to be able to run multiple simulations at the same time, one per thread, at the same speed that a single simulation on a single thread can run. The expected behavior of each single-threaded simulation maintaining the same speed has been displayed on an Apple M2 Pro processor running 12 simulations on each of 12 cores, as well as on an AMD Ryzen 3960X (Ubuntu 22.04) on 48/48 threads. For reference, the Ryzen 3960X has 0.25MB/thread of L2 cache vs the i9-13900's 1.0MB/thread, and 2.66MB/thread of L3 cache for the 3960X compared to 1.125MB/thread for the 13900. Those are fairly evenly matched on a per-thread basis, and given that I even see slowdowns when I'm using only 10/32 threads on the i9 (leaving each thread more L2/L3 cache than on the Ryzen), I don't think that the i9-13900 is underspecced for the task I'm giving it (unless there is another resource I'm not considering). Monitoring IO and RAM during the tests, neither is under significant pressure compared to what it is capable of. While temperatures have been relatively high later in testing, the behavior is seen immediately, before the chip has a chance to heat up and encounter any thermal throttling.
  3.  The software is a high-performance C++ dynamics simulation (i.e. very math heavy) software wrapped in Python that I'm a contributor to (link). It is single threaded.
  4. The latest BIOS version available on the MSI website is being used, AMI BIOS 7D36vAC
  5.  Sorry, can't take a video. To describe the test I'm running, I have a script than indefinitely runs my simulation and returns the speed of the simulation in iterations/second. I record the average speed (higher it/s is better, expected to be constant as long as the number of simulations is less than the number of threads), then start another instance of the task in the background and record the speed again, repeating this for more and more additional tasks. Unfortunately, I don't have the exact results of this test saved and won't be able to rerun it for a few days, but it starts around 10it/s when only the single process is running, and decreases to about 1it/s when 20 processes are running. Happy to provide results once I get a chance rerun if it would be helpful.
  6.  Yes, compatibility is confirmed on the manufacturer website.

If you have any suggestions of other resource monitoring tools I should check out to look for possible bottlenecks, please let me know.

Thanks,

Mark

0 Kudos
DeividA_Intel
Employee
607 Views

Hello Mark2000, 


  

Thank you for the information provided 


  

I will proceed to check the issue internally and post back soon with more details. 



Best regards, 

Deivid A.  

Intel Customer Support Technician 


0 Kudos
DeividA_Intel
Employee
553 Views

Hello Mark2000,  



Thank you for your patience. I would like to let you know that we have a specific forum for this kind of issue and product, it is called the Intel Developer Zone. There you will receive the appropriate support on this and other concerns you may have related to this product.  


Here you will find the links to access the website and the community forums:  



 

Please keep in mind that this thread will no longer be monitored by Intel.  


Regards,   

Deivid A.  

Intel Customer Support Technician  


0 Kudos
Reply