1. It would be helpful, if anybody could just let us know the steps to convert a while loop to parallel_while.
2. Also, I've done a parallel_for in a program. I wanted to know, how do I go about doing the same for the other for loops in the same program.
thanks in advance.
i appreciate ur help
You will need new classes for each unique loop you wish to convert to parallel execution. Each class can define one function call operator (operator()), which contains the corresponding for-loop, then use each unique constructor within the function containing multiple non-nested loops, using each as appropriate in the corresponding parallel_for calls.
Alternatively, you could try a compiler that supports the c++00x standard for lambda constructs, which will allow you to specify the for-loop ineach parallel_for call as a lambda construct, enabling the loops to stay in-line in the original function. Intel C++ Compiler V11 supports lambdas.
As was previously mentioned, parallel_while has been deprecated in favor of parallel_do, but never was it intended as a general replacement for while. The parallel_while with its stream interface and the parallel_do using iterators, each seeks to enable some parallel computation with an inherent serialization in the loop, to advance to the next item. Concurrency occurs only insofar as the loop can seriallyspawn tasks faster than they can be completed in parallel. If the loop itself can add additional work items, that will improve the scaling.
We implemented AES algo in VC++ using Intel TBB. Using a 4 mb file as input, we get the following results:
Serial:1 min, 30 sec
The processor used:Intel Pentium M 1.7 GHZ, centrino platform, cores:1, threads:1
We were kinda surprised by the readings and in fact, didn't expect such a huge gain. Couls you please explain the same. We used auto_partitioner for the grain size
We appreciate your help
Well, having seen such miraculous "improvements" in the past while parallelizing code,I'd first caution you to make sure all the work is getting done--that is, verify that the parallel code gets the correct result. As I've discovered to my chagrin in the past, it's amazing how much work you can get done if you don't do it all ;-).
If that all checks out, about all I can think of that might explain it would be the advantage in cache locality that you gain by partitioning the work via the Intel TBB constructs. We have seen cases of super-linear scaling due to the improvements in cache use, though I don't have enough information to know whether this applies in your case. Are you using a parallel_for to process the buffered file?
our readings are:
AES Encryption with parallelization:40 sec
AES Encryption without parallelization:120 sec
AES Decryption with parallelization:40 sec
AES Decryption without parallelization:126 sec
Isn't this limited to ECB mode of operation, though, which fails to hide some of the structure of the encrypted data? And doesn't that limit applicability? What does TLS require? Just curious...