Link Copied
Hi Joex26
Seriously without the joke.
I think that better you make test with couple ICC 11.. and VC2008 or last version.
probably you discover approximative same result if you having code has not wrote specifically accorded Icc favorable side
See if you can change some (while) to (for) with correct (private local pointer) for possible vectorization also
see TBB Openmp used for side (SCTP) ext....
Also If you can use last type same I7 core, Atom, or ULV processor is better.
You are welcome here with free choice of your appreciation,also with your favorite compiler.
With or without Icc compiler, success to all member your team.
Computer sized same phone have well potential success now or tomorrow , i hope.
Kind regards
After reflection add this exchange (with flowers),
Precaution if tomorrow I buy the Nokia phone I can not find (optimized to me) penguin inserted for bites my ear;
[cpp]//omp_set_nested (c); #pragma omp parallel shared(a,p) private(j,k,x) { for (int i = 0; i <= c - 1; i++) { // WRONG ADDED JUST FOR VERIFICATION CROSS AND KMP_AFFINITY 0 // if((i % 2) ==0){p=0;}else{p=1;}cpu_set_t mask;CPU_ZERO(&mask);CPU_SET(p,&mask); // if (sched_setaffinity(0, sizeof(cpu_set_t), &mask) <0) {perror("sched_setaffinity");} if (p > d) //FLAG HALT AS SATISFACT POINT { i = c - 1; } #pragma omp sections nowait { #pragma omp section for (j = 0; j <= la - 1; j++) { x = 0; if (aIn the above you can now clearly see that you have== b[0]) { for (k = j; k <= j + lc - 1; k++) { if (a == b && x <= lc - 1) { x++; } if (x == lc) { noc++; pos[noc] = k - x; p++; } // if (x == lc) } // for (k = j; k <= j + lc - 1; k++) } // if (a == b[0]) } // for (j = 0; j <= la - 1; j++) // end #pragma omp section } // end #pragma omp sections nowait } // for (int i = 0; i <= c - 1; i++) } // end #pragma omp parallel // begin serial code for (int i = 0; i <= c - 1; i++) { std::cout << noc << " <-OCC-> " << b << std::endl; int m = 1; while (pos != 0) { std::cout << pos << " <-POS-> " << b << " <-TO-> " << pos + strlen (b) << std::endl; m++; } } return (0); } [/cpp]
[cpp]Bustaf, Sorry about calling you joe (joex26 started this thread) I will add additional comments to the reformatted version of _your_ code 01.//omp_set_nested (c); // ^^ your comment to turn off nested is OK // vv pragma to begin parallel region 02.#pragma omp parallel shared(a,p) private(j,k,x) 03.{ // ^^ scoping brace for parallel region // -- all threads in team are running through this region // vv each thread executes following for loop 04. for (int i = 0; i <= c - 1; i++) 05. { // -- each thread arrives here with i=0,1,2,...,c-1 // -- and arrive here at uncontrolable times 06. // WRONG ADDED JUST FOR VERIFICATION CROSS AND KMP_AFFINITY 0 07. // if((i % 2) ==0){p=0;}else{p=1;}cpu_set_t mask;CPU_ZERO(&mask);CPU_SET(p,&mask); 08. // if (sched_setaffinity(0, sizeof(cpu_set_t), &mask) <0) {perror("sched_setaffinity");} 09. 10. if (p > d) //FLAG HALT AS SATISFACT POINT 11. { 12. i = c - 1; 13. } // vv pragma to dividE up current team into sections 14. #pragma omp sections nowait 15. { // ^^ brace to begin scope of sections // (and implicit first section) // -- first thread of team reaching sections executes this section // *** CAUTION **** // *** this sections has nowait AND is enclosed in a loop // *** making it possible that a thread may enter this sections // *** on iteration n+1 _prior_ to other team member entering // *** sections on iteration n. This violates all threads must pass // *** through sections (although eventually they will) // *** the specification is mute on this as to if an implementation // *** can work around this // vv due to lack of statements between sections { and pragma omp section // vv the following section is the 1st section of the sections // vv and therefor redundant 16. #pragma omp section // vv no brace (ok) therefore the following for statement is section 1 17. for (j = 0; j <= la - 1; j++) 18. { 19. x = 0; 20. if (a== b[0]) 21. { 22. for (k = j; k <= j + lc - 1; k++) 23. { 24. if (a == b && x <= lc - 1) 25. { 26. x++; 27. } 28. if (x == lc) 29. { 30. noc++; 31. pos[noc] = k - x; 32. p++; 33. } // if (x == lc) 34. } // for (k = j; k <= j + lc - 1; k++) 35. } // if (a == b[0]) 36. } // for (j = 0; j <= la - 1; j++) 37. // end #pragma omp section 38. } // end #pragma omp sections nowait 39. } // for (int i = 0; i <= c - 1; i++) 40.} // end #pragma omp parallel 41.// begin serial code 42.for (int i = 0; i <= c - 1; i++) Comments The sections with nowait inside a loop within a parallel region (and without barrier) is operating under unspecified rules. If you were to remove nowait (or add barrier) then only one thread would perform productive work (as explained earlier) This would be equivilent to making the sections and section into a single If you want each thread to enter the for(j loop then you must resolve the possibility that multiple threads may concurrently execute "noc++" with the same value for i, and under which case the result is not determinant. A similar (but not quite same) issue exists with "pos[noc] = k - x" where multiple theads execute the statemen at the same time with the same value of i, in which case you would be using an indeterminant value in noc as a subscript to store a value of "k-x" which may also differ between threads. Unless you want jibberish in noc and pos, the above code (as you wrote) is senseless. Jim Dempsey [/cpp]
[cpp]Bustaf, Sorry about calling you joe (joex26 started this thread) I will add additional comments to the reformatted version of _your_ code 01.//omp_set_nested (c); // ^^ your comment to turn off nested is OK // vv pragma to begin parallel region 02.#pragma omp parallel shared(a,p) private(j,k,x) 03.{ // ^^ scoping brace for parallel region // -- all threads in team are running through this region // vv each thread executes following for loop 04. for (int i = 0; i <= c - 1; i++) 05. { // -- each thread arrives here with i=0,1,2,...,c-1 // -- and arrive here at uncontrolable times 06. // WRONG ADDED JUST FOR VERIFICATION CROSS AND KMP_AFFINITY 0 07. // if((i % 2) ==0){p=0;}else{p=1;}cpu_set_t mask;CPU_ZERO(&mask);CPU_SET(p,&mask); 08. // if (sched_setaffinity(0, sizeof(cpu_set_t), &mask) <0) {perror("sched_setaffinity");} 09. 10. if (p > d) //FLAG HALT AS SATISFACT POINT 11. { 12. i = c - 1; 13. } // vv pragma to dividE up current team into sections 14. #pragma omp sections nowait 15. { // ^^ brace to begin scope of sections // (and implicit first section) // -- first thread of team reaching sections executes this section // *** CAUTION **** // *** this sections has nowait AND is enclosed in a loop // *** making it possible that a thread may enter this sections // *** on iteration n+1 _prior_ to other team member entering // *** sections on iteration n. This violates all threads must pass // *** through sections (although eventually they will) // *** the specification is mute on this as to if an implementation // *** can work around this // vv due to lack of statements between sections { and pragma omp section // vv the following section is the 1st section of the sections // vv and therefor redundant 16. #pragma omp section // vv no brace (ok) therefore the following for statement is section 1 17. for (j = 0; j <= la - 1; j++) 18. { 19. x = 0; 20. if (a== b[0]) 21. { 22. for (k = j; k <= j + lc - 1; k++) 23. { 24. if (a == b && x <= lc - 1) 25. { 26. x++; 27. } 28. if (x == lc) 29. { 30. noc++; 31. pos[noc] = k - x; 32. p++; 33. } // if (x == lc) 34. } // for (k = j; k <= j + lc - 1; k++) 35. } // if (a == b[0]) 36. } // for (j = 0; j <= la - 1; j++) 37. // end #pragma omp section 38. } // end #pragma omp sections nowait 39. } // for (int i = 0; i <= c - 1; i++) 40.} // end #pragma omp parallel 41.// begin serial code 42.for (int i = 0; i <= c - 1; i++) Comments The sections with nowait inside a loop within a parallel region (and without barrier) is operating under unspecified rules. If you were to remove nowait (or add barrier) then only one thread would perform productive work (as explained earlier) This would be equivilent to making the sections and section into a single If you want each thread to enter the for(j loop then you must resolve the possibility that multiple threads may concurrently execute "noc++" with the same value for i, and under which case the result is not determinant. A similar (but not quite same) issue exists with "pos[noc] = k - x" where multiple theads execute the statemen at the same time with the same value of i, in which case you would be using an indeterminant value in noc as a subscript to store a value of "k-x" which may also differ between threads. Unless you want jibberish in noc and pos, the above code (as you wrote) is senseless. Jim Dempsey [/cpp]
[cpp]Bustaf, Sorry about calling you joe (joex26 started this thread) I will add additional comments to the reformatted version of _your_ code 01.//omp_set_nested (c); // ^^ your comment to turn off nested is OK // vv pragma to begin parallel region 02.#pragma omp parallel shared(a,p) private(j,k,x) 03.{ // ^^ scoping brace for parallel region // -- all threads in team are running through this region // vv each thread executes following for loop 04. for (int i = 0; i <= c - 1; i++) 05. { // -- each thread arrives here with i=0,1,2,...,c-1 // -- and arrive here at uncontrolable times 06. // WRONG ADDED JUST FOR VERIFICATION CROSS AND KMP_AFFINITY 0 07. // if((i % 2) ==0){p=0;}else{p=1;}cpu_set_t mask;CPU_ZERO(&mask);CPU_SET(p,&mask); 08. // if (sched_setaffinity(0, sizeof(cpu_set_t), &mask) <0) {perror("sched_setaffinity");} 09. 10. if (p > d) //FLAG HALT AS SATISFACT POINT 11. { 12. i = c - 1; 13. } // vv pragma to dividE up current team into sections 14. #pragma omp sections nowait 15. { // ^^ brace to begin scope of sections // (and implicit first section) // -- first thread of team reaching sections executes this section // *** CAUTION **** // *** this sections has nowait AND is enclosed in a loop // *** making it possible that a thread may enter this sections // *** on iteration n+1 _prior_ to other team member entering // *** sections on iteration n. This violates all threads must pass // *** through sections (although eventually they will) // *** the specification is mute on this as to if an implementation // *** can work around this // vv due to lack of statements between sections { and pragma omp section // vv the following section is the 1st section of the sections // vv and therefor redundant 16. #pragma omp section // vv no brace (ok) therefore the following for statement is section 1 17. for (j = 0; j <= la - 1; j++) 18. { 19. x = 0; 20. if (a== b[0]) 21. { 22. for (k = j; k <= j + lc - 1; k++) 23. { 24. if (a == b && x <= lc - 1) 25. { 26. x++; 27. } 28. if (x == lc) 29. { 30. noc++; 31. pos[noc] = k - x; 32. p++; 33. } // if (x == lc) 34. } // for (k = j; k <= j + lc - 1; k++) 35. } // if (a == b[0]) 36. } // for (j = 0; j <= la - 1; j++) 37. // end #pragma omp section 38. } // end #pragma omp sections nowait 39. } // for (int i = 0; i <= c - 1; i++) 40.} // end #pragma omp parallel 41.// begin serial code 42.for (int i = 0; i <= c - 1; i++) Comments The sections with nowait inside a loop within a parallel region (and without barrier) is operating under unspecified rules. If you were to remove nowait (or add barrier) then only one thread would perform productive work (as explained earlier) This would be equivilent to making the sections and section into a single If you want each thread to enter the for(j loop then you must resolve the possibility that multiple threads may concurrently execute "noc++" with the same value for i, and under which case the result is not determinant. A similar (but not quite same) issue exists with "pos[noc] = k - x" where multiple theads execute the statemen at the same time with the same value of i, in which case you would be using an indeterminant value in noc as a subscript to store a value of "k-x" which may also differ between threads. Unless you want jibberish in noc and pos, the above code (as you wrote) is senseless. Jim Dempsey [/cpp]
Instead of trying to drown the fish with some nonsense literature
Please give the same (your style) code to same functionality with specific results
that use Time to reference for resolve this difference is an reality not literature.
-rwxr-xr-x 1 daemon staff 82094 Nov 7 06:35 ficat5 ICC COMPILED
real 0m0.202s Slow as 82094 must be read by system (is 4 * GNU size)
user 0m0.004s
sys 0m0.000s
-rwxr-xr-x 1 daemon staff 21410 Nov 7 06:37 ficat5 GNU COMPILED
real 0m0.097s
user 0m0.004s
sys 0m0.004s
Sorry,I have other task to make for less a time with you
Bustaf
I forget ...
Just compile exactly i have wrote without flag -parallel to not crossing ( vector)
and read your screen
ipo: remark #11001: performing single-file optimizations
ipo: remark #11005: generating object file /tmp/ipo_icpcKZn3QH.o
ficat5.cc(165): (col. 16) remark: OpenMP DEFINED SECTION WAS PARALLELIZED.
ficat5.cc(165): (col. 16) remark: OpenMP DEFINED REGION WAS PARALLELIZED.
ficat5.cc(46): (col. 1) remark: OpenMP DEFINED SECTION WAS PARALLELIZED.
ficat5.cc(34): (col. 1) remark: OpenMP DEFINED REGION WAS PARALLELIZED.
also add if you want -par-report=3
ipo: remark #11001: performing single-file optimizations
ipo: remark #11005: generating object file /tmp/ipo_icpcEbE1oi.o
ficat5.cc(165): (col. 16) remark: OpenMP DEFINED SECTION WAS PARALLELIZED.
ficat5.cc(165): (col. 16) remark: OpenMP DEFINED REGION WAS PARALLELIZED.
procedure: main
ficat5.cc(46): (col. 1) remark: OpenMP DEFINED SECTION WAS PARALLELIZED.
ficat5.cc(34): (col. 1) remark: OpenMP DEFINED REGION WAS PARALLELIZED.
procedure: count_array_occurs
procedure: __sti__$E
Maybe ... Compiler have use the ghosted threads for make ? and wrote an fictive literature
just only for that you are happy ???
Or as showing this compiler , is dummy ?? catastrophic result is not dummy..
For more complete information about compiler optimizations, see our Optimization Notice.