- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The following snippet of code demonstrates a different between using shared memory and using device memory that I don't fully understand. A simple kernel uses the memory to build and iterate through a linked list and then explicitly copy the memory back to the host. When I use device memory, everything works as expected. When I change to shared memory, the output is 0,0,0,0 instead of the expected 1,2,3,4. My best guess is that the runtime affects when shared memory is implicitly copied between the device and the host. If this is the cause, is there any detailed description of how this works and is there any way to get the code to work as expected (i.e. some additional synchronization call to force the copy)?
I'm building and running this in the devcloud with an Arria 10 device.
#if defined(FPGA_EMULATOR)
INTEL::fpga_emulator_selector device_selector;
#else
INTEL::fpga_selector device_selector;
#endif
queue q(device_selector, dpc_common::exception_handler);
size_t num_items = 4;
size_t num_bytes = num_items * sizeof(Node);
// When I used malloc_device() below, then everything works as expected
Node *linked_list = malloc_shared<Node>(num_items, q);
q.memset(linked_list, 0, num_bytes).wait();
auto linked_e = q.submit([&](handler &h) {
h.single_task<LinkedKernel>([=]() {
linked_list[0].data = 0;
linked_list[1].data = 1;
linked_list[2].data = 2;
linked_list[3].data = 3;
linked_list[3].next = &(linked_list[2]);
linked_list[2].next = &(linked_list[1]);
linked_list[1].next = &(linked_list[0]);
linked_list[0].next = nullptr;
Node *head = &(linked_list[3]);
for (Node *next = head; next != nullptr; next = next->next)
next->data += 1;
});
});
linked_e.wait();
Node *host_list = (Node *) malloc(num_bytes);
memset(host_list, 0, num_bytes);
q.memcpy(host_list, linked_list, num_bytes).wait();
for (size_t i = 0; i < num_items; ++i)
std::cout << host_list[i].data << std::endl;
// Expected Output:
// 1
// 2
// 3
// 4
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Please refer to the Examples of Stall-free and Stallable Memory Systems in the Intel High Level Synthesis Compiler Pro Edition: Best Practices Guide:

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page