Run process A, takes 14 seconds to complete, CPU is maxed on one core the entire time
Run two instances of process A, takes 23 seconds to complete, CPU is maxed on two cores the entire time.
So how can example 2 take much longer?
If the processes were IO bound, then how can the 2 CPU cores be maxed out the whole time?
I understand there are many things that prevent perfect 2x scaling, but I don't understand how the 2 CPU cores can be maxed if there were something else preventing better scaling.
Any ideas appreciated,
Thank you for your response it was very helpful.
I now need to look for the proper techniques to measure lock contention, memory bandwidth usage, and other possibilities that are not easy to detect just by using SysInternals utilities.
I have no idea how to check on these yet, but that's what Google is for so I'm going to find out.
Try Intel's vTune. They have a trial version.
They also have other utility programs.
By examining where your program is executing you may shed some light on the problem. vTune will detect a fair amount of your problems (once you get the hang of using it).
I do not use vTune myself as I have an AMD based system, for that I use CodeAnalyst which has similar functionality.