Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

vu64

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-11-2009
04:07 AM

37 Views

Also, because I just use addAcc(i,j) to compute for j, I remove some temporaries from this function and I see an increase more than two times in speed. I think this is due to less memory allocation of this frequently called function.

void addAcc(int i, int j) {

// Compute the force between bodies and apply to each as an acceleration

// compute the distance between them

double dx = body

double dy = body

double dz = body

double distsq = dx*dx + dy*dy + dz*dz;

if (distsq < MINDIST) distsq = MINDIST;

double dist = sqrt(distsq);

double ai = GFORCE*body

body

body

body

}

I am very grateful if you can explain about this or point out any mistake if there is.

Link Copied

Accepted Solutions

robert-reed

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-11-2009
10:08 AM

37 Views

Quoting - vu64

Also, because I just use addAcc(i,j) to compute for j, I remove some temporaries from this function and I see an increase more than two times in speed. I think this is due to less memory allocation of this frequently called function.

I am very grateful if you can explain about this or point out any mistake if there is.

Since the allocation of local functionvariables is usually accomplished by a single stack adjustment on entry to the function, having more local temporaries should notaffectperformance (within reasonable limits), but the question of running

// Compute the accelerations of the bodies

for (i = 0; i < n - 1; ++i)

for (j = i + 1; j < n; ++j)

addAcc(i, j);

for (i = 0; i < n - 1; ++i)

for (j = i + 1; j < n; ++j)

addAcc(i, j);

will need to be modified to look something like this:

// Compute the accelerations of the bodies

for (i = 0; i < n; ++i)

for (j = 1; j < n; ++j)

if (i != j) addAcc(i, j);

for (i = 0; i < n; ++i)

for (j = 1; j < n; ++j)

if (i != j) addAcc(i, j);

Otherwise, not all the *j* bodies will be considered in computing the accelerations for an *i* body. I'll plan to explore this alternative as I move the blog series forward. Thank you for the question.

4 Replies

robert-reed

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-11-2009
10:08 AM

38 Views

Quoting - vu64

Also, because I just use addAcc(i,j) to compute for j, I remove some temporaries from this function and I see an increase more than two times in speed. I think this is due to less memory allocation of this frequently called function.

I am very grateful if you can explain about this or point out any mistake if there is.

Since the allocation of local functionvariables is usually accomplished by a single stack adjustment on entry to the function, having more local temporaries should notaffectperformance (within reasonable limits), but the question of running

// Compute the accelerations of the bodies

for (i = 0; i < n - 1; ++i)

for (j = i + 1; j < n; ++j)

addAcc(i, j);

for (i = 0; i < n - 1; ++i)

for (j = i + 1; j < n; ++j)

addAcc(i, j);

will need to be modified to look something like this:

// Compute the accelerations of the bodies

for (i = 0; i < n; ++i)

for (j = 1; j < n; ++j)

if (i != j) addAcc(i, j);

for (i = 0; i < n; ++i)

for (j = 1; j < n; ++j)

if (i != j) addAcc(i, j);

Otherwise, not all the *j* bodies will be considered in computing the accelerations for an *i* body. I'll plan to explore this alternative as I move the blog series forward. Thank you for the question.

vu64

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-11-2009
03:45 PM

37 Views

Quoting - Robert Reed (Intel)

Since the allocation of local functionvariables is usually accomplished by a single stack adjustment on entry to the function, having more local temporaries should notaffectperformance (within reasonable limits), but the question of running

// Compute the accelerations of the bodies

for (i = 0; i < n - 1; ++i)

for (j = i + 1; j < n; ++j)

addAcc(i, j);

for (i = 0; i < n - 1; ++i)

for (j = i + 1; j < n; ++j)

addAcc(i, j);

will need to be modified to look something like this:

// Compute the accelerations of the bodies

for (i = 0; i < n; ++i)

for (j = 1; j < n; ++j)

if (i != j) addAcc(i, j);

for (i = 0; i < n; ++i)

for (j = 1; j < n; ++j)

if (i != j) addAcc(i, j);

Otherwise, not all the *j* bodies will be considered in computing the accelerations for an *i* body. I'll plan to explore this alternative as I move the blog series forward. Thank you for the question.

Here is my parallel version:

parallel_for(blocked_range

[&] (const blocked_range

for (int i = r.begin(); i != r.end(); ++i) {

for (int j = 0; j < i; ++j)

addAcc(i,j);

for (int j = i+1; j < n; ++j)

addAcc(i,j);

}

}, auto_partitioner());

robert-reed

Valued Contributor II

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-11-2009
05:34 PM

37 Views

Quoting - vu64

parallel_for(blocked_range (0, n),

[&] (const blocked_range& r) {

for (int i = r.begin(); i != r.end(); ++i) {

for (int j = 0; j < i; ++j)

addAcc(i,j);

for (int j = i+1; j < n; ++j)

addAcc(i,j);

}

}, auto_partitioner());

[&] (const blocked_range

for (int i = r.begin(); i != r.end(); ++i) {

for (int j = 0; j < i; ++j)

addAcc(i,j);

for (int j = i+1; j < n; ++j)

addAcc(i,j);

}

}, auto_partitioner());

Yeah, that should produce the correct result. I'll add it to my topic set and give it a try when my blog seriesgets there. Thanks!

vu64

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

10-11-2009
09:53 PM

37 Views

Quoting - Robert Reed (Intel)

Yeah, that should produce the correct result. I'll add it to my topic set and give it a try when my blog seriesgets there. Thanks!

I have also try using blocked_range2d over the matrix but it seems to run slower. Hope that you can also look into it. Thanks.

For more complete information about compiler optimizations, see our Optimization Notice.