In one of my projects, I spawn multiple tbb tasks to do a job and wait for it to get completed. The performance of the application is far better on my development machine (Intel Core(TM) i5-4590 CPU @ 3.30 GHz with 4 Cores) than the deployment machine (Intel Xeon(R) CPU E5- 2648L 0 @ 1.80GHz with 32 Cores). On the deployment machine having this multicore CPU, I was expecting a performance boost but results are quite opposite.
According to this article, on a very high level, a simple rule applies,
More cores = more multitasking
Higher clock speed = faster task completion
On the deployment machine, the overall CPU is idle most of the time and the processing time is almost triple as compared to my development machine. As I understand the task scheduler does the load balancing and does not create threads for each task.
So, How do I squeeze that idle CPU resource and get the processing done in lesser time?