Here is the general recommendation for nested loops: "If your own program has nested loops and the computation time used by the innermost loop is small, consider adding task annotations around the next outermost loop. Use the Suitability Report Average Instance Time as a guide."
If you have a sample program with the nested loop you have in mind (with similar computation distribution among the nested loops), please share and I'd be happy to provide recommendations.
Also, please check out the Parallel Universe case study titled "Embrace Parallelism with Intel(R) Parallel Advisor (cover story for http://software.intel.com/sites/products/parallelmag/parallel_mag_Issue8.pdf).
Intel Developer Support