Community
cancel
Showing results for 
Search instead for 
Did you mean: 
anoop66
Beginner
73 Views

Help in College Project with parallel_while

Hi, I am doing a final year project as part of my engineering curriculum in Bangalore, India. I don't have any appreciable knowledge on Intel TBB. I've implemented only a parallel_for till now for files. I wanted help regarding parallel_while. I have the o reilly book, but am unable to understand anything from that.

1. It would be helpful, if anybody could just let us know the steps to convert a while loop to parallel_while.
2. Also, I've done a parallel_for in a program. I wanted to know, how do I go about doing the same for the other for loops in the same program.

Thanks
0 Kudos
10 Replies
RafSchietekat
Black Belt
73 Views

parallel_while has been replaced by parallel_do. Please also consult TBB's Tutorial and Reference Manual.
geoffrey-burling
Beginner
73 Views

Quoting - anoop66
Hi, I am doing a final year project as part of my engineering curriculum in Bangalore, India. I don't have any appreciable knowledge on Intel TBB. I've implemented only a parallel_for till now for files. I wanted help regarding parallel_while. I have the o reilly book, but am unable to understand anything from that.

1. It would be helpful, if anybody could just let us know the steps to convert a while loop to parallel_while.
2. Also, I've done a parallel_for in a program. I wanted to know, how do I go about doing the same for the other for loops in the same program.

Thanks
Anoop66, i'm unclear about exactly what you are asking help with. Are you trying to nest parallel_for loops, or are you looking for how many different ways you canparallelize for loops? The Intel Knowledge Base has numerous snippetes ofexample code showing how tocreate parallel_for loops, for example this one -- http://software.intel.com/en-us/articles/all-facilities-fibonacci/ -- which generates a Fibonacci sequence.

Geoff
anoop66
Beginner
73 Views

Quoting - Raf Schietekat
parallel_while has been replaced by parallel_do. Please also consult TBB's Tutorial and Reference Manual.
sir, thanks for ur quick reply. Also, could u pls let me know, what is the way to incorporate multiple parallel_for loops in a single program. I appreciate ur help and thank u in advance
anoop66
Beginner
73 Views

Anoop66, i'm unclear about exactly what you are asking help with. Are you trying to nest parallel_for loops, or are you looking for how many different ways you canparallelize for loops? The Intel Knowledge Base has numerous snippetes ofexample code showing how tocreate parallel_for loops, for example this one -- http://software.intel.com/en-us/articles/all-facilities-fibonacci/ -- which generates a Fibonacci sequence.

Geoff
sir, i have many for loops in my c++ program. i have converted one of these to parallel_for by creating a class for the same and doing the overload operator thing. And would like to do the same for the other loops. They are not nested, just simple plain ol for loops. so, do i make new classes for them or split the program into multiple cpp files, or create .h files?

thanks in advance.
i appreciate ur help
anoop66
Beginner
73 Views

Quoting - Raf Schietekat
parallel_while has been replaced by parallel_do. Please also consult TBB's Tutorial and Reference Manual.
sir, i have many for loops in my c++ program. i have converted one of these to parallel_for by creating a class for the same and doing the overload operator thing. And would like to do the same for the other loops. They are not nested, just simple plain ol for loops. so, do i make new classes for them or split the program into multiple cpp files, or create .h files?
robert-reed
Valued Contributor II
73 Views

Quoting - anoop66
sir, i have many for loops in my c++ program. i have converted one of these to parallel_for by creating a class for the same and doing the overload operator thing. And would like to do the same for the other loops. They are not nested, just simple plain ol for loops. so, do i make new classes for them or split the program into multiple cpp files, or create .h files?

You will need new classes for each unique loop you wish to convert to parallel execution. Each class can define one function call operator (operator()), which contains the corresponding for-loop, then use each unique constructor within the function containing multiple non-nested loops, using each as appropriate in the corresponding parallel_for calls.

Alternatively, you could try a compiler that supports the c++00x standard for lambda constructs, which will allow you to specify the for-loop ineach parallel_for call as a lambda construct, enabling the loops to stay in-line in the original function. Intel C++ Compiler V11 supports lambdas.

As was previously mentioned, parallel_while has been deprecated in favor of parallel_do, but never was it intended as a general replacement for while. The parallel_while with its stream interface and the parallel_do using iterators, each seeks to enable some parallel computation with an inherent serialization in the loop, to advance to the next item. Concurrency occurs only insofar as the loop can seriallyspawn tasks faster than they can be completed in parallel. If the loop itself can add additional work items, that will improve the scaling.

anoop66
Beginner
73 Views

You will need new classes for each unique loop you wish to convert to parallel execution. Each class can define one function call operator (operator()), which contains the corresponding for-loop, then use each unique constructor within the function containing multiple non-nested loops, using each as appropriate in the corresponding parallel_for calls.

Alternatively, you could try a compiler that supports the c++00x standard for lambda constructs, which will allow you to specify the for-loop ineach parallel_for call as a lambda construct, enabling the loops to stay in-line in the original function. Intel C++ Compiler V11 supports lambdas.

As was previously mentioned, parallel_while has been deprecated in favor of parallel_do, but never was it intended as a general replacement for while. The parallel_while with its stream interface and the parallel_do using iterators, each seeks to enable some parallel computation with an inherent serialization in the loop, to advance to the next item. Concurrency occurs only insofar as the loop can seriallyspawn tasks faster than they can be completed in parallel. If the loop itself can add additional work items, that will improve the scaling.

thnx a lot. U answered my mail in great length. Although, i didn't get the lambda part. But, it's ok. we finished converting our first module.

We implemented AES algo in VC++ using Intel TBB. Using a 4 mb file as input, we get the following results:

Serial:1 min, 30 sec
Parallel:30 sec

The processor used:Intel Pentium M 1.7 GHZ, centrino platform, cores:1, threads:1
RAM:512 mb

We were kinda surprised by the readings and in fact, didn't expect such a huge gain. Couls you please explain the same. We used auto_partitioner for the grain size

We appreciate your help
robert-reed
Valued Contributor II
73 Views

Quoting - anoop66
thnx a lot. U answered my mail in great length. Although, i didn't get the lambda part. But, it's ok. we finished converting our first module.

We implemented AES algo in VC++ using Intel TBB. Using a 4 mb file as input, we get the following results:

Serial:1 min, 30 sec
Parallel:30 sec

The processor used:Intel Pentium M 1.7 GHZ, centrino platform, cores:1, threads:1
RAM:512 mb

We were kinda surprised by the readings and in fact, didn't expect such a huge gain. Couls you please explain the same. We used auto_partitioner for the grain size

Well, having seen such miraculous "improvements" in the past while parallelizing code,I'd first caution you to make sure all the work is getting done--that is, verify that the parallel code gets the correct result. As I've discovered to my chagrin in the past, it's amazing how much work you can get done if you don't do it all ;-).

If that all checks out, about all I can think of that might explain it would be the advantage in cache locality that you gain by partitioning the work via the Intel TBB constructs. We have seen cases of super-linear scaling due to the improvements in cache use, though I don't have enough information to know whether this applies in your case. Are you using a parallel_for to process the buffered file?

anoop66
Beginner
73 Views

well, we are getting the same output for both the serial and parallel version of the program
our readings are:

AES Encryption with parallelization:40 sec
AES Encryption without parallelization:120 sec

AES Decryption with parallelization:40 sec
AES Decryption without parallelization:126 sec


RafSchietekat
Black Belt
73 Views

A threefold improvement already: imagine what it would be if you actually used multiple cores (I guess I missed something here)!

Isn't this limited to ECB mode of operation, though, which fails to hide some of the structure of the encrypted data? And doesn't that limit applicability? What does TLS require? Just curious...
Reply