- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am trying to parallelize an ITK application. The application has a very promiment loop with 50 iterations. I parallelized this loop using parallel_reduce() and ran the parallelized application on a 8-core machine. I noticed that a grain size of 1 is giving a better performance larger grain sizes. I am puzzled at this and wondering how this could be possible or under which condition this happens. Can anybody shed some light?
thanks,
fiju
I am trying to parallelize an ITK application. The application has a very promiment loop with 50 iterations. I parallelized this loop using parallel_reduce() and ran the parallelized application on a 8-core machine. I noticed that a grain size of 1 is giving a better performance larger grain sizes. I am puzzled at this and wondering how this could be possible or under which condition this happens. Can anybody shed some light?
thanks,
fiju
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I cannot really shed light on your problem, but I would advise that you verify (using prints or debugger) how many tasks are actually created. Since TBB normally splits the range in two, but you start with a number of tasks that isn't a power of two, you could have different load-balancing issues for different grain sizes.
Also, I don't quite understand what your question is to begin with: the smallest grain size would give you the most opportunity for load-balancing, if overhead and scalability aren't an issue.
Also, I don't quite understand what your question is to begin with: the smallest grain size would give you the most opportunity for load-balancing, if overhead and scalability aren't an issue.
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page