- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi everyone,

I recently ran into an issue I don't understand. I wrote an iterative flow solver that carries out some calculations in each, then calculates a residual and starts over if the residual is still too large. I moved the calculation part into a function that is called in each iteration. It looks like this;

for (int j = 1; j < (ny - 1); j++) { for (int i = boundLeft; i < (bruttoLength-boundRight); i++) { i0 = i + j * bruttoLength; utilde[i0] = 0; // differences in x-direction if (i == 1) { utilde[i0] = utilde[i0] - (-v / (12 * a * dx) - 1 / (12 * pow(dx, 2))) * u[i0 + 3]; utilde[i0] = utilde[i0] - (v / (2 * a * dx) + 1 / (3 * pow(dx, 2))) * u[i0 + 2]; utilde[i0] = utilde[i0] - (-3 * v / (2 * a*dx) + 1 / (2 * pow(dx, 2))) * u[i0 + 1]; utilde[i0] = utilde[i0] - (v / (4 * a*dx) + 11 / (12 * pow(dx, 2))) * utilde[i0 - 1]; own_share = own_share + (5 * v / (6 * a*dx) - 5 / (3 * pow(dx, 2))); } else { if (i == (bruttoLength - 2)) { utilde[i0] = utilde[i0] - (v / (12 * a * dx) - 1 / (12 * pow(dx, 2))) * utilde[i0 - 3]; utilde[i0] = utilde[i0] - (-v / (2 * a * dx) + 1 / (3 * pow(dx, 2))) * utilde[i0 - 2]; utilde[i0] = utilde[i0] - (3 * v / (2 * a * dx) + 1 / (2 * pow(dx, 2))) * utilde[i0 - 1]; utilde[i0] = utilde[i0] - (-v / (4 * a * dx) + 11 / (12 * pow(dx, 2))) * u[i0 + 1]; own_share = own_share + (-5 * v / (6 * a*dx) - 5 / (3 * pow(dx, 2))); } else { utilde[i0] = utilde[i0] - (v / (12 * a * dx) - 1 / (12 * pow(dx, 2))) * u[i0 + 2]; utilde[i0] = utilde[i0] - (-2 * v / (3 * a * dx) + 4 / (3 * pow(dx, 2))) * u[i0 + 1]; utilde[i0] = utilde[i0] - (2 * v / (3 * a * dx) + 4 / (3 * pow(dx, 2))) * utilde[i0 - 1]; utilde[i0] = utilde[i0] - (-v / (12 * a * dx) - 1 / (12 * pow(dx, 2))) * utilde[i0 - 2]; own_share = own_share + (-5 / (2 * pow(dx, 2))); } } // repeat equivalent code for y-direction. It has the same structure as above // SOR-Share utilde[i0] = utilde[i0] / own_share * w + (1 - w) * u[i0]; own_share = 0; } }

As long as the code is executed as a function call (arguments are 3 integers and 2 double pointers) the performance is really bad. As soon as I copy the code directly into my loop there is a massive speed up.

I tried both versions with and without the code optimization enabled (/O2) and measured the average execution time of the code snippet above. It looks like there is only minor code optimization for the version with the function call as the execution time did not improve much (3x faster, compared to 12x faster withou the function call).

I'm not sure if this is the root of the problem though. Can anybody give me some advise? Of course I could leave the whole calculation part inside my while-loop, but that looks very confusing. It would be much clearer to move it into a separate function.

I'm using the compiler that comes with Intel Parallel Studio XE 2017.

Best regards.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi, Jan

If you are passing pointers as function parameters, compiler will consider them alias with each other with data dependence that some of the optimization cannot be used. You can those pointers with Restrict keyword to eliminate the dependance, e.g.:

void fun(double *restrict a, double *restrict b)

If you are compiling with C file, please specify /Qstd=c99 or -std=c99 so that the Restrict keyword can be recognized. Or you can use option /Qrestrict or -restrict.

You can also try option /Qopt-report:5 to generate an optimization report to examine different optimizations on your code.

Hope this helps.

Thanks.

Link Copied

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Hi, Jan

If you are passing pointers as function parameters, compiler will consider them alias with each other with data dependence that some of the optimization cannot be used. You can those pointers with Restrict keyword to eliminate the dependance, e.g.:

void fun(double *restrict a, double *restrict b)

If you are compiling with C file, please specify /Qstd=c99 or -std=c99 so that the Restrict keyword can be recognized. Or you can use option /Qrestrict or -restrict.

You can also try option /Qopt-report:5 to generate an optimization report to examine different optimizations on your code.

Hope this helps.

Thanks.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

**copy the code directly into my loop there is a massive speed up**. Your for loop is a complex one however it looks like in the 2nd case it was vectorized. Take a look at optimization reports to understand what was wrong in the 1st case.

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Thank you very much guys! You helped me a lot!

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page