all serial with
"That is, before multi-threading, procA takes 200 sec, procB taks 100 sec, and procC takes 200 sec => total 500 sec"
Then you modified ProcB only and observed:
"'After multi-threading(4 threads), procA takes 220 sec, procB takes 60 sec, and procC takes 230 sec => total 510 sec"
Actually you modified more than just procB as you also said you start your thread pool before the loop.
This indicates your auxilliary threads are gumming up your performance somehow. To diagnose where the overhead is originating from leave in the code that starts your thread pool and replace your parallel procB with your serial procB.
IOW: start your thread pool, enter your loop executing your serial code, exit loop, take timing. I will guess you will see a slow down in all three procs. This will indicate that your thread idel state isn't idel and you will have to look at your code to see why.