Software Tuning, Performance Optimization & Platform Monitoring
Discussion regarding monitoring and software tuning methodologies, Performance Monitoring Unit (PMU) of Intel microprocessors, and platform updating.
Announcements
FPGA community forums and blogs have moved to the Altera Community. Existing Intel Community members can sign in with their current credentials.
1791 Discussions

Unknown bottleneck when spawning a moderate number of copies of single-threaded processes

Mark2000
Beginner
1,018 Views

Was referred to this forum from a previous thread:

https://community.intel.com/t5/Processors/Unknown-bottleneck-when-spawning-a-moderate-number-of-copies-of/m-p/1492416/emcs_t/S2h8ZW1haWx8dG9waWNfc3Vic2NyaXB0aW9ufExJRVc2S0ZPRVpLUzZHfDE0OTI0MTZ8U1VCU0NSSVBUSU9OU3xoSw#M63498

Recently configured a computer with an i9-13000KF running on a motherboard with a Z690 chipset (MSI Z690-p), Ubuntu 22.04. Running simulation software with single-threaded high-CPU usage and get about 10it/s in the simulation. As start copies of the simulation, the speed drops towards ~1it/s as N_processes approaches 20 (<32 cores), with some speed drops even with only 2-3 processes. Core usage is near maximum on each of N_processes cores. Clock speeds appear to remain roughly constant, with no clear sign of throttling due to thermal or otherwise. Ram utilization is only about 30% (of 64gb), and no swap is being used. Nearly 0 i/o use for read/write to disk (which is expected for this simulation).
 
When I a simple stress-testing python script (computes sqrts over and over) on single and multiple threads, that script did not show the slowdown behavior and kept a constant rate as long as the number of instances was less than the number of threads.
 
I reran both tests using numactl to ensure each script was running on a single independent thread and got the same results, even when just trying performance or standard cores. Is there some other limitation of the CPU that I could be hitting that I should be profiling, or some resource the CPU needs that would be bottlenecking performance that I haven't accounted for? One suspect is the VRMs being too wimpy on the lower-end MSI Z690-p board that I'm using, but I don't know how I can profile for this definitively.
 
Any suggestions are appreciated, as I expected significantly better multithreaded performance than this. 
0 Kudos
0 Replies
Reply