Thank you for posting in Intel community forum, hope all is well and apologies for the delayed in response.
Would recommend to try on the loop unroll pragma, which should reduces the latencey.
More information can be found here on the code snippet and implementation.
Please do let us know if that helps.
Note: there are also details steps on the loops best practices here which are recommended, hopefully it will give some insights.
Greetings, as we do not receive any further clarification on what is provided. Hence thread will now be transitioned to community support and we will no longer monitor this thread. For new queries, please feel free to open a new thread and we will be right with you. Pleasure having you here.