Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.
1757 Discussions

Unknown bottleneck when spawning a moderate number of copies of single-threaded processes

Mark2000
Beginner
868 Views

Was referred to this forum from a previous thread:

https://community.intel.com/t5/Processors/Unknown-bottleneck-when-spawning-a-moderate-number-of-copies-of/m-p/1492416/emcs_t/S2h8ZW1haWx8dG9waWNfc3Vic2NyaXB0aW9ufExJRVc2S0ZPRVpLUzZHfDE0OTI0MTZ8U1VCU0NSSVBUSU9OU3xoSw#M63498

Recently configured a computer with an i9-13000KF running on a motherboard with a Z690 chipset (MSI Z690-p), Ubuntu 22.04. Running simulation software with single-threaded high-CPU usage and get about 10it/s in the simulation. As start copies of the simulation, the speed drops towards ~1it/s as N_processes approaches 20 (<32 cores), with some speed drops even with only 2-3 processes. Core usage is near maximum on each of N_processes cores. Clock speeds appear to remain roughly constant, with no clear sign of throttling due to thermal or otherwise. Ram utilization is only about 30% (of 64gb), and no swap is being used. Nearly 0 i/o use for read/write to disk (which is expected for this simulation).
 
When I a simple stress-testing python script (computes sqrts over and over) on single and multiple threads, that script did not show the slowdown behavior and kept a constant rate as long as the number of instances was less than the number of threads.
 
I reran both tests using numactl to ensure each script was running on a single independent thread and got the same results, even when just trying performance or standard cores. Is there some other limitation of the CPU that I could be hitting that I should be profiling, or some resource the CPU needs that would be bottlenecking performance that I haven't accounted for? One suspect is the VRMs being too wimpy on the lower-end MSI Z690-p board that I'm using, but I don't know how I can profile for this definitively.
 
Any suggestions are appreciated, as I expected significantly better multithreaded performance than this. 
0 Kudos
0 Replies
Reply