Software Archive
Read-only legacy content
17061 Discussions

Why does scif_register() return 0?

Bryan_H_2
Beginner
485 Views

Hi, All.

My team is trying to transfer bulk data between host and mic using SCIF RMA. All our demo code works fine. When we add the SCIF RMA to the real project, it connects successfully but it gets return 0 from scif_register() and scif_vwriteto() failed error.We have checked so many times. The code we are inserting is exactly the same with the demo code which works fine. 

Did anyone encounter such problem when using scoff rma? Or would you please offer some advice?

Thanks a lot:)

Bryan

 
0 Kudos
2 Replies
Frances_R_Intel
Employee
485 Views

Have you tried checking the value of errno when you return from scif_register? The man page for scif_register has explanations for the different error numbers. 

If I had to make a wild guess, without benefit of any real facts, I would suspect that in your real code you are making calls to the coi library (which uses scif) or are using offload directives (which make calls to the coi library) and there is a conflict. But check errno and see what it says.

0 Kudos
Bryan_H_2
Beginner
485 Views

Hi, Frances.

Thank you for your reply. We've check the returned value - 0 and status - Success from scif_register(). It seems like scif_register() works fine, but it returned local_offset - 0 as in code below. And we can send the returned local_offset using scif_send() back to peer, but when we actually scif_vwirteto(), it return "scif_vwriteto failed error". I attached our SendData and RecvData code as follows.Do you have any ideas?

Thanks again.

Bryan.

    int RecvData(scif_epd_t epd, void *pData, int size)
    {
        int control_msg = 0;
        off_t local_offset;
        if(local_offset = scif_register(epd, pData, size, 0, SCIF_PROT_READ | SCIF_PROT_WRITE, 0) < 0)
        {
            printf("scif_register failed with error : %d\n", get_curr_status());
            printf("scif_register error: %s\n", strerror(errno));
            exit(-1);
        }
        printf("scif_register status: %d\n", get_curr_status());
        printf("scif_register statuc: %s\n", strerror(errno));
        BARRIER(epd, "register window done");
        scif_send(epd, &local_offset, sizeof(local_offset), 1);

        BARRIER(epd, "waiting on peer vwriteto");
        return size;
   }
    int SendData(scif_epd_t epd, void *pData, int size)
    {

        off_t remote_offset;
        int control_msg = 0, err;

        BARRIER(epd, "peer register window done");
        
        scif_recv(epd, &remote_offset, sizeof(remote_offset), 1);
        if ((err = scif_vwriteto(epd, pData, 0x1000, remote_offset, 1))){
            printf("scif_vwriteto failed with error.");
        }

        BARRIER(epd, "vwriteto done");
        return size;
   }
Frances Roth (Intel) wrote:

Have you tried checking the value of errno when you return from scif_register? The man page for scif_register has explanations for the different error numbers. 

If I had to make a wild guess, without benefit of any real facts, I would suspect that in your real code you are making calls to the coi library (which uses scif) or are using offload directives (which make calls to the coi library) and there is a conflict. But check errno and see what it says.

 

0 Kudos
Reply