Can OpenVINO’s NPU plugin create_tensor() accept a pre-allocated NPU device memory pointer as an arg

namuk · ‎06-17-2025

Hello,

I’m developing a deep learning application on Intel hardware using OpenVINO, and I have a few questions:

Aside from tensor-creation APIs such as create_tensor(), is there any API in OpenVINO for allocating NPU device memory?
I’d like to implement my own memory pool for the NPU. When using the NPU plugin, is it possible to pass a pre-allocated device memory address as an argument to RemoteContext::create_tensor()? I noticed that one of the create_tensor() overloads takes an AnyMap parameter and supports an SHARED_BUF enum—does that relate to this functionality?
In the following function, can I pass a pre-allocated device memory address as the void *buffer argument?

inline ZeroBufferTensor create_tensor(const element::Type type, const Shape &shape, void *buffer);
As I understand it, the NPU plugin’s create_tensor() only accepts a memory handle. With the CPU plugin, I can pass a pre-allocated host pointer to create_tensor(), and it seems the GPU plugin supports this too. But it doesn’t look like the NPU plugin does—am I correct?
Finally, if I call compile_model() with the NPU device selected but use ov::Tensor (rather than RemoteTensor) for the inputs and outputs—passing pre-allocated host memory addresses—does OpenVINO internally copy that data into NPU device memory?

Thank you!

Peh_Intel · ‎06-18-2025

Hi namuk,

Thanks for your questions. Let me check with the engineering team for the precise answers and get back to you.

Regards,

Peh

Peh_Intel · ‎06-19-2025

Hi namuk,

Please refer to the answers below.

1) Aside from tensor-creation APIs such as create_tensor(), is there any API in OpenVINO for allocating NPU device memory?

In OpenVINO, memory allocation on NPU devices is primarily managed via create_tensor() in conjunction with a RemoteContext. There is no public API to allocate raw NPU device memory directly outside of this mechanism; memory must be wrapped within the OpenVINO tensor abstraction for compatibility with the NPU plugin.

However, the NPU plugin internally leverages Level Zero APIs, and advanced users could theoretically manage memory with zeMemAllocDevice externally, but OpenVINO does not provide a direct or supported API for raw device memory management outside of its RemoteTensor interface.

2) I’d like to implement my own memory pool for the NPU. When using the NPU plugin, is it possible to pass a pre-allocated device memory address as an argument to RemoteContext::create_tensor()? I noticed that one of the create_tensor() overloads takes an AnyMap parameter and supports an SHARED_BUF enum—does that relate to this functionality?

Well, I would said it partially true, The SHARED_BUF property in the AnyMap is meant to allow importing external memory into the NPU plugin. However, for this to work, the pointer must point to memory allocated using Level Zero-compatible APIs (e.g., via zeMemAllocShared()), not just any buffer. The NPU plugin internally checks and validates the memory handle to ensure it's a valid, device-accessible address.

So, while you can use SHARED_BUF, you must ensure the buffer was allocated appropriately via Level Zero APIs. Standard malloc or new won’t work here.

3) In the following function, can I pass a pre-allocated device memory address as the void *buffer argument?

inline ZeroBufferTensor create_tensor(const element::Type type, const Shape &shape, void *buffer);

As I understand it, the NPU plugin’s create_tensor() only accepts a memory handle. With the CPU plugin, I can pass a pre-allocated host pointer to create_tensor(), and it seems the GPU plugin supports this too. But it doesn’t look like the NPU plugin does—am I correct?

Yes, this API overload but this only supported for host-accessible memory, such as with the CPU plugin, and in some cases the GPU plugin.

For the NPU plugin, this is not supported because the plugin doesn't directly accept host-side memory pointers as device tensors. Instead, use the RemoteTensor creation via a RemoteContext. The NPU expects a valid memory handle, not a raw pointer.

The NPU plugin’s create_tensor() does not support raw host memory pointers like the CPU/GPU plugins.

4) Finally, if I call compile_model() with the NPU device selected but use ov::Tensor (rather than RemoteTensor) for the inputs and outputs—passing pre-allocated host memory addresses—does OpenVINO internally copy that data into NPU device memory?

Yes. When using standard ov::Tensor inputs (host memory) with an NPU-compiled model:

OpenVINO automatically performs memory copy from host to device for inputs (and back for outputs).
This copy happens internally as part of infer() or infer_async().

So, if performance is a concern and the user wants to avoid host-device copies, they must use RemoteTensors mapped to device memory instead.

Hope this information helps.

Regards,

Peh

Peh_Intel · ‎06-27-2025

Hi namuk,

This thread will no longer be monitored since we have provided answers. If you need any additional information from Intel, please submit a new question.

Regards,

Peh

namuk · ‎06-27-2025

Thank you. It has been very helpful.