OpenCL* for CPU
Ask questions and share information on Intel® SDK for OpenCL™ Applications and OpenCL™ implementations for Intel® CPU.
Announcements
This forum covers OpenCL* for CPU only. OpenCL* for GPU questions can be asked in the GPU Compute Software forum. Intel® FPGA SDK for OpenCL™ questions can be ask in the FPGA Intel® High Level Design forum.
1718 Discussions

Crash when assigning two structures.

Romain_c_
Beginner
658 Views

Hi !

I am having quite a strange problem on the cpu ( I don't have intel gpu to try this, but works on nvidia ).

I have a kernel which makes everything crash at an assignment depending on my structure.

If I uncomment the _pad0 variable the crash does not appear. The sizes are the same on the host and device.

Here is my structure:

[cpp]

typedef struct capsule_s
{
    vector2d_t p; // center of mass position
    vector2d_t v; // center of mass velocity
    scalar_t    r; // Radius
    scalar_t    mass_density;
    scalar_t    color_field;
    //scalar_t _pad0; // For removing the crash with intel cpu.
} capsule_t;[/cpp]

vector2d_t is defined as cl_float2 for host and float2 in kernels. scalar_t as cl_float and float.

In the kernel I have something similar to:

[cpp]

capsule_t geom_old = in_capsules[idx];
/*
Code that doesn't crash.
*/

out_capsules[idx] = geom_new;  // Here it crashes

[/cpp]

More precisely I found that all of this fail :

[cpp]

out_capsules[idx].v = geom_new.v;  // fails

/----------

float2 tmp =  geom_new.v; // no failure here.

out_capsules[idx].v = tmp; // fails

/----------

out_capsules[idx].v.x = geom_new.v.x;  // fails

[/cpp]

And this only happens for this member. I am not sure if this is related to alignement since other members assignments do not cause problems.

Any help is welcome :)

Thanks in advance !

 

0 Kudos
10 Replies
Raghupathi_M_Intel
658 Views

Is it possible to attach the minimal kernel that results in the crash?

Thanks,
Raghu

0 Kudos
Romain_c_
Beginner
658 Views

Of course ! Thx for the reply :)

Here is the whole kernel if you want the including header and other functions I can send if you want.

[cpp]

__kernel void bacteria_step ( __global capsule_t* in_capsules, 
                                          __global capsule_t* out_capsules,
                                          __global int* box_map,
                                          __global gp_box_t* boxes,
                                          __global gp_command_t* commands,
                                           float timestep,
                                           float boundary_sphere_radius )
{
    int idx = get_global_id(0);

    int box_idx = box_map[idx];
    gp_box_t box = boxes[box_idx];
    capsule_t geom_old = in_capsules[idx];
    
    vector2d_t pressure_force  = (vector2d_t) (.0f, .0f );
    vector2d_t viscosity_force = (vector2d_t) (.0f, .0f );
    vector2d_t surface_normal  = (vector2d_t) (.0f, .0f );
    scalar_t   lagr_ci         = .0f;
    for(int ii = 0; ii < GP_BOX_POP_SIZE + GP_BOX_GHOST_SIZE; ii++)
    {
        int i = (ii < GP_BOX_POP_SIZE) ? box.gp_box_pop[ii] : box.gp_box_ghost[ii-GP_BOX_POP_SIZE];
        if ( (i != GP_UNDEF_ID) )
        {
            capsule_t geom_alt = in_capsules;
            if ( i != idx )
            {   
                pressure_force  -= pressure_between_two ( geom_old, geom_alt );
                viscosity_force += viscosity_between_two ( geom_old, geom_alt );
            }

            surface_normal +=  ( MASS / geom_alt.mass_density ) * poly6_kernel_nabla ( geom_old.p - geom_alt.p, SMOOTHING_LEN );
            lagr_ci += ( MASS / geom_alt.mass_density ) * poly6_kernel_nabla_2 ( geom_old.p - geom_alt.p, SMOOTHING_LEN );
        }
    }

    viscosity_force *= VISCOSITY;
    vector2d_t surface_force = -SURF_TENSION * lagr_ci * ( surface_normal / length ( surface_normal ) );
    capsule_t geom_new = geom_old;
    
    /////////////
    // Leap-frog
    /////////////
    
    vector2d_t tot_force = pressure_force +  REST_DENSITY * GRAVITY + viscosity_force;
    tot_force += length ( surface_normal ) >= THRESHOLD ? surface_force : .0f;
    vector2d_t accel     = tot_force / geom_new.mass_density;

    vector2d_t old_vel   = geom_new.v;
    vector2d_t new_vel   = geom_new.v + accel * timestep; // vel_.5dt = vel_-.5dt + dt*accel.

    geom_new.p += timestep * new_vel;
    geom_new.v = ( old_vel + new_vel ) * .5f;

   
    handle_sphere_env_collision ( &geom_new,
                                  boundary_sphere_radius, 
                                  timestep );

    out_capsules[idx] = geom_new;
    /////////////
    // Update map
    /////////////
    
    
    int2 box_coord = (int2){ floor(geom_new.p.x / GP_BOX_URADIUS),
                             floor(geom_new.p.y / GP_BOX_URADIUS) };

    if ( (box_coord.x != box.gp_box_coord.x) || (box_coord.y != box.gp_box_coord.y) )
    {
        commands[idx].gp_command_type = GP_COMMAND_MOVE;
        commands[idx].gp_command_pos = box_coord;
        commands[idx].gp_command_data = idx;
    }
    else
        commands[idx].gp_command_type = GP_COMMAND_NONE;
        
}

[/cpp]

0 Kudos
Raghupathi_M_Intel
658 Views

Romain c. wrote:

Here is the whole kernel if you want the including header and other functions I can send if you want.

Yes please.

0 Kudos
Romain_c_
Beginner
657 Views

I have put them in a zip file.

Also I forgot to say if I remove the color_field (and padding) member it works too.

0 Kudos
Rami_J_Intel
Employee
657 Views

I would suspect that the struct layout (size in bytes) on the host compiler is different from the opencl CPU compiler. Can u please try using 'packed' keyword for the struct definition on Host and Kernel as well, see if that helps. (OpenCL 1.2 specification 6.11.1)

0 Kudos
Romain_c_
Beginner
657 Views

I tried with packed for kernel side, and used the pragmas  push, pack and pop ( not sure about this ) on host side, it makes the program not crash but size of the structure is 28 on kernel but still 32 on host. So data is not correct after...

Thanks for helping :)

0 Kudos
Raghupathi_M_Intel
658 Views

Can you post the host side structure (with the pragmas you added)?

Raghu

0 Kudos
Romain_c_
Beginner
658 Views

Here it is. Names are a bit different but everything is as before. Thx.

[cpp]

#ifndef __SPH_KERNEL__
#pragma pack(push)  /* push current alignment to stack */
#pragma pack(1)     /* set alignment to 1 byte boundary */
#endif
typedef struct 
#ifdef __SPH_KERNEL__
    __attribute__ ((packed)) 
#endif
    capsule_s
{
    vector2d_t p; // center of mass position
    vector2d_t v; // center of mass velocity
    scalar_t   r; // Radius
    scalar_t   mass_density;
    scalar_t   rr;
} capsule_t;
#ifndef __SPH_KERNEL__
#pragma pack(pop) 
#endif

[/cpp]

0 Kudos
Raghupathi_M_Intel
658 Views

With or without packing the size of the above structure will be 28. Are you sure you are not picking up the definition of the structure from some other header?

Try writing a simple test program, declare capsule_s and try to print the size of the structure. What do you get?

Thanks,
Raghu

0 Kudos
Romain_c_
Beginner
658 Views

I tried what you suggested with a c++ and c compiler only using the struct declaration with cl_types. On c++ compiler the size is indeed 28 but on a C99 compiler it is 32 ( both clang and intel one ).

Moreover in c99 I think that the compiler add padding to be aligned with the largest data in the struct. For instance here, largest size is 8, and 32 is multiple of 8. Am I right?

Thanks for your patience :)

0 Kudos
Reply