- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
in short, in SYCL, when querrying the amount of free memory on a GPU using the function
dev.get_info<sycl::ext::intel::info::device::free_memory>()
it is reported wrong.
The function always reports the amount of free memory on the first stack in the 2-stack GPU, regardless if the sycl::device corresponds to the first or the second stack.
I use
export ZE_FLAT_DEVICE_HIERARCHY="FLAT"
so the 2-stack GPU corresponds to two sycl::devices available to the user, one for each stack.
I use PVC1550 GPUs on a private instance on the Tiber devcloud.
icpx version 2025.0.1
I see the issue with lower icpx versions too (you might need to use `export ZES_ENABLE_SYSMAN=1` with the lower versions to enable the free memory querry)
Reproducer code:
#include <cstdio>
#include <cstdlib>
#include <vector>
#include <sycl/sycl.hpp>
int main(int argc, char ** argv)
{
if(argc <= 1) throw std::runtime_error("not enough arguments");
int gpu_idx = atoi(argv[1]);
std::vector<sycl::device> gpus_all = sycl::device::get_devices(sycl::info::device_type::gpu);
std::vector<sycl::device> gpus_levelzero;
for(sycl::device & gpu : gpus_all)
{
if(gpu.get_backend() == sycl::backend::ext_oneapi_level_zero)
{
gpus_levelzero.push_back(gpu);
}
}
printf("There are %zu levelzero GPUs\n", gpus_levelzero.size());
for(size_t i = 0; i < gpus_levelzero.size(); i++)
{
sycl::device & d = gpus_levelzero[i];
size_t mem_capacity = d.get_info<sycl::info::device::global_mem_size>();
size_t mem_free = d.get_info<sycl::ext::intel::info::device::free_memory>();
printf(" GPU %2zu: capacity = %12zu B = %6zu MiB, free = %12zu B = %6zu MiB\n", i, mem_capacity, mem_capacity >> 20, mem_free, mem_free >> 20);
}
sycl::queue q(gpus_levelzero[gpu_idx]);
size_t allocsize = (size_t{60} << 30);
void * ptr = sycl::malloc_device(allocsize, q);
printf("Allocated on GPU %2d: %zu B = %zu MiB, ptr = %p\n", gpu_idx, allocsize, allocsize >> 20, ptr);
printf("Current free memory:\n");
for(size_t i = 0; i < gpus_levelzero.size(); i++)
{
sycl::device & d = gpus_levelzero[i];
size_t mem_free = d.get_info<sycl::ext::intel::info::device::free_memory>();
printf(" GPU %2zu: free = %12zu B = %6zu MiB\n", i, mem_free, mem_free >> 20);
}
sycl::free(ptr, q);
printf("Memory freed\n");
printf("Current free memory:\n");
for(size_t i = 0; i < gpus_levelzero.size(); i++)
{
sycl::device & d = gpus_levelzero[i];
size_t mem_free = d.get_info<sycl::ext::intel::info::device::free_memory>();
printf(" GPU %2zu: free = %12zu B = %6zu MiB\n", i, mem_free, mem_free >> 20);
}
return 0;
}
It finds all level zero GPU devices and reports their memory capacity and free memory. It then allocates 60 GiB of memory on a given GPU. Then it again prints the amount of free memory on each GPU. At the end it frees the memory and again reports the free memory on each GPU.
Compile with
icpx -fsycl source.cpp -o program.x
and run as
./program.x <device_index_where_to_allocate>
output with `./program.x 0` (or any other even number lower than number of gpus) (shortened):
There are 16 levelzero GPUs
GPU 0: capacity = 68719476736 B = 65536 MiB, free = 68673966080 B = 65492 MiB
GPU 1: capacity = 68719476736 B = 65536 MiB, free = 68673966080 B = 65492 MiB
GPU 2: capacity = 68719476736 B = 65536 MiB, free = 68673970176 B = 65492 MiB
GPU 3: capacity = 68719476736 B = 65536 MiB, free = 68673970176 B = 65492 MiB
...
Allocated on GPU 0: 64424509440 B = 61440 MiB, ptr = 0xff00000000200000
Current free memory:
GPU 0: free = 4119027712 B = 3928 MiB
GPU 1: free = 4119027712 B = 3928 MiB
GPU 2: free = 68547817472 B = 65372 MiB
GPU 3: free = 68547817472 B = 65372 MiB
...
Memory freed
Current free memory:
GPU 0: free = 4119089152 B = 3928 MiB
GPU 1: free = 4119142400 B = 3928 MiB
GPU 2: free = 68548362240 B = 65372 MiB
GPU 3: free = 68548427776 B = 65372 MiB
...
You see, I allocated 60 GiB only on GPU 0, but GPU 1 also reports lower free memory.
output with `./program.x 1` (or any other odd number lower than number of gpus) (shortened):
There are 16 levelzero GPUs
GPU 0: capacity = 68719476736 B = 65536 MiB, free = 68673966080 B = 65492 MiB
GPU 1: capacity = 68719476736 B = 65536 MiB, free = 68673966080 B = 65492 MiB
GPU 2: capacity = 68719476736 B = 65536 MiB, free = 68673974272 B = 65492 MiB
GPU 3: capacity = 68719476736 B = 65536 MiB, free = 68673974272 B = 65492 MiB
...
Allocated on GPU 1: 64424509440 B = 61440 MiB, ptr = 0xff00000000200000
Current free memory:
GPU 0: free = 68543537152 B = 65368 MiB
GPU 1: free = 68543537152 B = 65368 MiB
GPU 2: free = 68547821568 B = 65372 MiB
GPU 3: free = 68547821568 B = 65372 MiB
...
Memory freed
Current free memory:
GPU 0: free = 68543533056 B = 65368 MiB
GPU 1: free = 68543533056 B = 65368 MiB
GPU 2: free = 68548452352 B = 65372 MiB
GPU 3: free = 68548493312 B = 65372 MiB
...
Here, I allocated 60 GiB on GPU 1, but the free memory function reports that almost all the memory is still free.
Furthermore (this is probably a separate issue), after I free the memory using sycl::free, querrying the free device memory still treats is as allocated -- see the last group of free memory reports in the outputs.
Link to docs about the device hierarchy: https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2025-0/exposing-the-device-hierarchy.html
Am I doing something wrong? Is this expected on the 2-stack GPU? Can this be fixed?
Thanks,
Jakub
Link Copied
0 Replies
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page