- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi !
I am having quite a strange problem on the cpu ( I don't have intel gpu to try this, but works on nvidia ).
I have a kernel which makes everything crash at an assignment depending on my structure.
If I uncomment the _pad0 variable the crash does not appear. The sizes are the same on the host and device.
Here is my structure:
[cpp]
typedef struct capsule_s
{
vector2d_t p; // center of mass position
vector2d_t v; // center of mass velocity
scalar_t r; // Radius
scalar_t mass_density;
scalar_t color_field;
//scalar_t _pad0; // For removing the crash with intel cpu.
} capsule_t;[/cpp]
vector2d_t is defined as cl_float2 for host and float2 in kernels. scalar_t as cl_float and float.
In the kernel I have something similar to:
[cpp]
capsule_t geom_old = in_capsules[idx];
/*
Code that doesn't crash.
*/
out_capsules[idx] = geom_new; // Here it crashes
[/cpp]
More precisely I found that all of this fail :
[cpp]
out_capsules[idx].v = geom_new.v; // fails
/----------
float2 tmp = geom_new.v; // no failure here.
out_capsules[idx].v = tmp; // fails
/----------
out_capsules[idx].v.x = geom_new.v.x; // fails
[/cpp]
And this only happens for this member. I am not sure if this is related to alignement since other members assignments do not cause problems.
Any help is welcome :)
Thanks in advance !
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Is it possible to attach the minimal kernel that results in the crash?
Thanks,
Raghu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Of course ! Thx for the reply :)
Here is the whole kernel if you want the including header and other functions I can send if you want.
[cpp]
__kernel void bacteria_step ( __global capsule_t* in_capsules,
__global capsule_t* out_capsules,
__global int* box_map,
__global gp_box_t* boxes,
__global gp_command_t* commands,
float timestep,
float boundary_sphere_radius )
{
int idx = get_global_id(0);
int box_idx = box_map[idx];
gp_box_t box = boxes[box_idx];
capsule_t geom_old = in_capsules[idx];
vector2d_t pressure_force = (vector2d_t) (.0f, .0f );
vector2d_t viscosity_force = (vector2d_t) (.0f, .0f );
vector2d_t surface_normal = (vector2d_t) (.0f, .0f );
scalar_t lagr_ci = .0f;
for(int ii = 0; ii < GP_BOX_POP_SIZE + GP_BOX_GHOST_SIZE; ii++)
{
int i = (ii < GP_BOX_POP_SIZE) ? box.gp_box_pop[ii] : box.gp_box_ghost[ii-GP_BOX_POP_SIZE];
if ( (i != GP_UNDEF_ID) )
{
capsule_t geom_alt = in_capsules;
if ( i != idx )
{
pressure_force -= pressure_between_two ( geom_old, geom_alt );
viscosity_force += viscosity_between_two ( geom_old, geom_alt );
}
surface_normal += ( MASS / geom_alt.mass_density ) * poly6_kernel_nabla ( geom_old.p - geom_alt.p, SMOOTHING_LEN );
lagr_ci += ( MASS / geom_alt.mass_density ) * poly6_kernel_nabla_2 ( geom_old.p - geom_alt.p, SMOOTHING_LEN );
}
}
viscosity_force *= VISCOSITY;
vector2d_t surface_force = -SURF_TENSION * lagr_ci * ( surface_normal / length ( surface_normal ) );
capsule_t geom_new = geom_old;
/////////////
// Leap-frog
/////////////
vector2d_t tot_force = pressure_force + REST_DENSITY * GRAVITY + viscosity_force;
tot_force += length ( surface_normal ) >= THRESHOLD ? surface_force : .0f;
vector2d_t accel = tot_force / geom_new.mass_density;
vector2d_t old_vel = geom_new.v;
vector2d_t new_vel = geom_new.v + accel * timestep; // vel_.5dt = vel_-.5dt + dt*accel.
geom_new.p += timestep * new_vel;
geom_new.v = ( old_vel + new_vel ) * .5f;
handle_sphere_env_collision ( &geom_new,
boundary_sphere_radius,
timestep );
out_capsules[idx] = geom_new;
/////////////
// Update map
/////////////
int2 box_coord = (int2){ floor(geom_new.p.x / GP_BOX_URADIUS),
floor(geom_new.p.y / GP_BOX_URADIUS) };
if ( (box_coord.x != box.gp_box_coord.x) || (box_coord.y != box.gp_box_coord.y) )
{
commands[idx].gp_command_type = GP_COMMAND_MOVE;
commands[idx].gp_command_pos = box_coord;
commands[idx].gp_command_data = idx;
}
else
commands[idx].gp_command_type = GP_COMMAND_NONE;
}
[/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Romain c. wrote:
Here is the whole kernel if you want the including header and other functions I can send if you want.
Yes please.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I would suspect that the struct layout (size in bytes) on the host compiler is different from the opencl CPU compiler. Can u please try using 'packed' keyword for the struct definition on Host and Kernel as well, see if that helps. (OpenCL 1.2 specification 6.11.1)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried with packed for kernel side, and used the pragmas push, pack and pop ( not sure about this ) on host side, it makes the program not crash but size of the structure is 28 on kernel but still 32 on host. So data is not correct after...
Thanks for helping :)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Can you post the host side structure (with the pragmas you added)?
Raghu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here it is. Names are a bit different but everything is as before. Thx.
[cpp]
#ifndef __SPH_KERNEL__
#pragma pack(push) /* push current alignment to stack */
#pragma pack(1) /* set alignment to 1 byte boundary */
#endif
typedef struct
#ifdef __SPH_KERNEL__
__attribute__ ((packed))
#endif
capsule_s
{
vector2d_t p; // center of mass position
vector2d_t v; // center of mass velocity
scalar_t r; // Radius
scalar_t mass_density;
scalar_t rr;
} capsule_t;
#ifndef __SPH_KERNEL__
#pragma pack(pop)
#endif
[/cpp]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With or without packing the size of the above structure will be 28. Are you sure you are not picking up the definition of the structure from some other header?
Try writing a simple test program, declare capsule_s and try to print the size of the structure. What do you get?
Thanks,
Raghu
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried what you suggested with a c++ and c compiler only using the struct declaration with cl_types. On c++ compiler the size is indeed 28 but on a C99 compiler it is 32 ( both clang and intel one ).
Moreover in c99 I think that the compiler add padding to be aligned with the largest data in the struct. For instance here, largest size is 8, and 32 is multiple of 8. Am I right?
Thanks for your patience :)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page