Your design is intriguing generalization of a pipline stage. If I understand your design correctly, a parallel stage can be viewed as "each item has a unique context" and a serial_in_order stage can be viewed as "each item has the same context". Your extension allows intermediate degrees of parallelism that map neatly to resource sharing.
The implementation would seem to be a variation on "class input_buffer" in pipeline.cpp, with some hashing involved, right?
It would be good if you could share your code. The contribution process is described here.