- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[bash][/bash]
Please try using openmp pragma here , since the auto-parallelization looks not working here for reasons I do not know. Do something like #pragma omp parallel . In that case, the omp region gets parallelized. You may though keep -parallel option , for other parts of the program. In this case, with -openmp -parallel, you would get openmp region parallelized, while still having
In this case since there is no inner (fine-grained) loop, the already parallelized loop does not get vectorized, as the message shows when you have -openmp option. But anyway, I hope the openmp pragma would be better to work with here, so the loop gets parallelized. So you can try with openmp pragma instead of auto-parallel feature here, though the compiler should have applied auto-parallel feature.
Sample code:
void row(double* restrict matrix, double* restrict filled_row,int col_num, int row_num, int which_row)
{
int col_num_local=col_num;
#pragma omp parallel for
#pragma ivdep
// #pragma vector always
for(int ii=0;ii
{ filled_row[ii]=matrix[ii*row_num+which_row];} // This is line 22
}
int main()
{
double matrix[2000]={0};
double filled_row[1500]={0};
row(matrix, filled_row, 1000, 2, 10);
return 1;
}
Please give it a try and let me know.
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
There are certainly many cases where unsigned short/int arithmetic restrains vectorization. Also, try to remove unsigned from the loop index and data arithmetic to check for any progress.
Also, you may try using #pragma vector always , also check for /c /Qrestrict and the restrict keyword from the compiler documentation.
Please provide piece of code, and the command-line options that are using.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[bash][/bash]
Please try using openmp pragma here , since the auto-parallelization looks not working here for reasons I do not know. Do something like #pragma omp parallel . In that case, the omp region gets parallelized. You may though keep -parallel option , for other parts of the program. In this case, with -openmp -parallel, you would get openmp region parallelized, while still having
In this case since there is no inner (fine-grained) loop, the already parallelized loop does not get vectorized, as the message shows when you have -openmp option. But anyway, I hope the openmp pragma would be better to work with here, so the loop gets parallelized. So you can try with openmp pragma instead of auto-parallel feature here, though the compiler should have applied auto-parallel feature.
Sample code:
void row(double* restrict matrix, double* restrict filled_row,int col_num, int row_num, int which_row)
{
int col_num_local=col_num;
#pragma omp parallel for
#pragma ivdep
// #pragma vector always
for(int ii=0;ii
{ filled_row[ii]=matrix[ii*row_num+which_row];} // This is line 22
}
int main()
{
double matrix[2000]={0};
double filled_row[1500]={0};
row(matrix, filled_row, 1000, 2, 10);
return 1;
}
Please give it a try and let me know.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
[cpp]void row(double* restrict matrix, const double* const restrict filled_row, const size_t col_num, const size_t row_num, const size_t which_row) { #pragma ivdep #pragma vector always for ( size_t ii = 0 ; ii < col_num ; ++ii ) { const size_t matrix_ii = (ii * row_num) + which_row ; filled_row[ii] = matrix[matrix_ii] ; } } [/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I think the information on vectorization, parallelization is available in Intel Compiler user and reference guide in chapter "Optimizing Applications".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The need for automated tools for checking parallel code was recognized years ago; the lineage of Parallel Studio goes back through Intel Thread Checker to KAI Assure.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page