- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi to everyone,
I am working on a OpenCL Monte Carlo code for particle transport, and in order to alleviate the thread divergence, due to the stochastic nature of the method, I am trying to implement an event-by-event simulation, i.e. process concurrently particles that will carry out a specific process (e.g. photo-electric effect, compton scattering, etc.).
Several buffers containing particle information (energy, position, etc) are stored in global memory. Additionally, a buffer containing an index to the next event (i.e. physical process) is associated to each particle. First the most frequent next event is determined using a histogram-like kernel, as the following:
void __kernel freq_kernel( // Number of data per work-item to be analyzed int nperwork_item, // Particle stack __global int *gstack_stat, // Frequency of events __global int *gfreq_events) { int gid = get_global_id(0); int lid = get_local_id(0); int lsize = get_local_size(0); int gsize = get_global_size(0); // Local array containing frequency of events __local int lfreq_events[MX_EVENTS]; // Initialize local frequency array. for(int i=lid; i<MX_EVENTS; i+=lsize) { lfreq_events = 0; } barrier(CLK_LOCAL_MEM_FENCE); // Compute the frequency of events local array. int ndata = gsize*nperwork_item; for(int i=gid; i<ndata; i+=gsize) { atomic_add(&lfreq_events[gstack_stat-1], 1); } barrier(CLK_LOCAL_MEM_FENCE); // Finally, save local results to global memory. for(int i=lid; i<MX_EVENTS; i+=lsize) { atomic_add(&gfreq_events, lfreq_events); } barrier(CLK_LOCAL_MEM_FENCE); }
The result is an array with the frequency of each possible event. Then it is transferred to the host and the event with the highest frequency is selected as the next event to be processed. In the host we have the following code:
int next_event = 0; int max_count = 0; for (cl_uint idevice = 0; idevice < number_of_devices; idevice++) { max_count = *std::max_element(h_freq_events[idevice].begin(), h_freq_events[idevice].end()); next_event = (int)std::distance(h_freq_events[idevice].begin(), std::max_element(h_freq_events[idevice].begin(), h_freq_events[idevice].end())); }
Now the idea should be to determine which particles must be simulated in the next step. Essentially, I would like to obtain the indexes of such particles in order to gather its particle information (energy, position, etc...) from global memory. A serial alternative could be the following:
for (cl_uint idevice = 0; idevice < number_of_devices; idevice++) { for (int i = 0; i<global_work_size[idevice] * nperwork_item[idevice]; i++) { if (h_pstack_stat[idevice] == next_event) { h_mask[idevice].push_back(i); } } }
So I need an advice, do you now an efficient way to obtain the needed indexes?, it is clear that calculating them on the host is very expensive and I am looking for a more efficient way. I have looked for information online without success. Maybe I should first order the indexes array, together with the others holding the particle information, and then carry out the gather operation?. Thanks for your help!.
Link Copied
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page