Intel® oneAPI Base Toolkit
Support for the core tools and libraries within the base toolkit that are used to build and deploy high-performance data-centric applications.
418 Discussions

sycl::queue::memcpy (sycl::handler::memcpy) with std::string_view and static storage duration

breyerml
Novice
1,901 Views

OS: Ubuntu 20.04.2 LTS
DPCPP compiler version: Intel(R) oneAPI DPC++ Compiler 2021.1 (2020.10.0.1113)

I tried playing around with samples from the "Data Parallel C++" book and stumbled upon an issue with USM, memcpy and static storage duration.

 

#include <CL/sycl.hpp>

#include <cstring>
#include <iostream>
#include <string>
#include <string_view>
#include <type_traits>
using namespace sycl;


template <typename T>
std::size_t str_size(const T& str) {
    if constexpr (std::is_pointer_v<T>) {
        return std::strlen(str);
    } else {
        return std::size(str);
    }
}
template <typename T>
const char* str_data(const T& str) {
    if constexpr (std::is_pointer_v<T>) {
        return str;
    } else {
        return std::data(str);
    }
}


template <typename T>
void test() {
    T str = "12345678";
    const std::size_t size = str_size(str);
    
    // use std::memcpy
    std::cout << "  std::memcpy" << std::endl;
    {
        queue Q;
        char* result = malloc_shared<char>(size + 1, Q);
        // init
        for (std::size_t i = 0; i < size; ++i) {
            result[i] = '-';
        }
        result[size] = '\0';
        std::cout << "    " << result << std::endl;
        
        // perform memcpy
        std::memcpy(result, str_data(str), size + 1);
        std::cout << "    " << result << std::endl;
        
        // kernel
        Q.parallel_for(size, [=](id<1> i) { result[i] += 1; }).wait();
        std::cout << "    " << result << std::endl;
        std::cout << "    -> " 
                  << (std::string_view{result} == std::string_view{"23456789"})
                  << std::endl;
    }
    
    // use sycl::queue::memcpy
    std::cout << "  sycl::queue::memcpy" << std::endl;
    {
        queue Q;
        char* result = malloc_shared<char>(size + 1, Q);
        // init
        for (std::size_t i = 0; i < size; ++i) {
            result[i] = '-';
        }
        result[size] = '\0';
        std::cout << "    " << result << std::endl;
        
        // perform memcpy
        Q.memcpy(result, str_data(str), size + 1).wait();
        std::cout << "    " << result << std::endl;
        
        // kernel
        Q.parallel_for(size, [=](id<1> i) { result[i] += 1; }).wait();
        std::cout << "    " << result << std::endl;
        std::cout << "    -> " 
                  << (std::string_view{result} == std::string_view{"23456789"})
                  << std::endl;
    }
    
    // use sycl::handler::memcpy
    std::cout << "  sycl::handler::memcpy" << std::endl;
    {
        queue Q;
        char* result = malloc_shared<char>(size + 1, Q);
        // init
        for (std::size_t i = 0; i < size; ++i) {
            result[i] = '-';
        }
        result[size] = '\0';
        std::cout << "    " << result << std::endl;
        
        // perform memcpy
        Q.submit([&](handler& h) {
            h.memcpy(result, str_data(str), size + 1);
        }).wait();
        std::cout << "    " << result << std::endl;
        
        // kernel
        Q.parallel_for(size, [=](id<1> i) { result[i] += 1; }).wait();
        std::cout << "    " << result << std::endl;
        std::cout << "    -> " 
                  << (std::string_view{result} == std::string_view{"23456789"})
                  << std::endl;
    }
    
    std::cout << std::endl;
}


int main() {
    std::cout << std::boolalpha;
    
    // std::string
    std::cout << "std::string" << std::endl;
    test<std::string>();
    
    std::cout << "std::string_view" << std::endl;
    test<std::string_view>();
    
    std::cout << "const char*" << std::endl;
    test<const char*>();
    
    
  return 0;
}

 

The idea is the following:
- create a string
- allocate USM of the respective size (and initialize it with '-')
- copy string to USM pointer
- modify string inside kernel

This works fine for std::string, std::string_view, and const char* when using std::memcpy.
However, when using sycl::queue::memcpy or sycl::handler::memcpy it only works for std::string, using std::string_view or const char* results in garbage being copied by the respective memcpy operation.

Here one sample output on my machine:

 

std::string
  std::memcpy
    --------
    12345678
    23456789
    -> true
  sycl::queue::memcpy
    --------
    12345678
    23456789
    -> true
  sycl::handler::memcpy
    --------
    12345678
    23456789
    -> true

std::string_view
  std::memcpy
    --------
    12345678
    23456789
    -> true
  sycl::queue::memcpy
    --------
    ��7
    ��8D@�O�
    -> false
  sycl::handler::memcpy
    --------
     �
h�! 
    !�
      i�"!
    -> false

const char*
  std::memcpy
    --------
    12345678
    23456789
    -> true
  sycl::queue::memcpy
    --------
     �
h�! �"
    !�
      i�"!�"
    -> false
  sycl::handler::memcpy
    --------
    ��7
    ��8D@��$
    -> false

 

The code has been compiled with

 

dpcpp -std=c++17 memcpy.cpp

 

 

I guess that the weird results have something to do with the static storage duration of the const char* c-string literal (and hence the std::string_view wrapping a const char*).

Is it explicitly disallowed by SYCL to memcpy the contents of a c-style string literal using the respective sycl memcpy functions?

 

EDIT:
As of https://intel.github.io/llvm-docs/doxygen/classcl_1_1sycl_1_1queue.html#a6bc6a510e5e9abcbf1ee904d4b86edbf sycl::queue::memcpy accepts only USM pointers for both source and destination pointers. However, the SYCL standard states explicitly that "Copies numBytes of data from the pointer src to the pointer dest. Both dest and src may be either host or USM pointers. For moredetail on USM, please see Section 4.8." on page 301. With the restriction of both pointers being USM pointer memcpy would be rather useless...

0 Kudos
3 Replies
AbhishekD_Intel
Moderator
1,828 Views

Hi,


Thanks for reaching out to us.

We are looking into your issue. We will update you as soon as we get any updates.


Warm Regards,

Abhishek


0 Kudos
Sravani_K_Intel
Moderator
1,742 Views

Hi,


Thanks for reporting. This issue is due to a bug in the DPC++ Runtime while copying from the .data section to the device. The issue has been escalated to development and will be fixed timely.


DPC++ is aligned with SYCL spec for sycl::queue::memcpy() and both src and destination pointers can be either USM or host pointers. The above link has a documentation glitch and will be fixed shortly.



Thanks.


0 Kudos
Asim_YarKhan
Beginner
1,283 Views

What is the status of this issue?

I am seeing problems copying (using sycl::queue::memcpy)  from a USM shared-allocation back to a cpu c++_new allocation.  
The error generally shows up as a hang when I do copies of different sizes.

If I do the copy using a for loop, things work fine, implying that this is a sycl::queue:: memcpy issue.

Thanks,

Asim

 

0 Kudos
Reply