I am using internet tbb threads and intel parallelism in my application for performance optimization, which works pretty well on my development machine with avg 10% CPU load. But when I am deploying the application on target machine which is a high end server with 32 cores and huge RAM, it is utilizing 80-90 % of CPU in same scenario . The server has a custom built hardware with rhel 6.
What could be the root cause of this problem?
It is hard to say since there could be a number of different reasons why application does not scale as expected. For example, depending on the data layout application could suffer from false sharing, or, there might be threads contending for the same memory block due to implementation of the logic in the application.
In your particular case, it seems strange to me that on development machine the CPU load is about 10%, which is quite small and usually indicates that the threads are sleepy, meaning that they do not perform useful work most of the time. However, with more number of cores the CPU utilization is also high, which could be a sign of contention problem or simply application could generate more work since it sees that the machine has bigger computing power.
To understand this you would want to profile the application to see what the threads are actually doing.