Can you go in more detail here?
Pipeline typically consists of a serial input IO stage + a serial output IO stage + a set of parallel computational stages. If the performance is limited by IO stages, then parallel_for gains nothing cause it's IO. And if the performance is limited by parallel computational stages, then parallel_for gains nothing again cause all cores are already loaded.
Do you have in mind a pipeline consisting only of serial stages?