Principle of locality

Intel® Integrated Performance Primitives

Deliberate problems developing high-performance vision, signal, security, and storage applications.

Principle of locality

880 Views

Hi,

Say I am doing something like this:

...
ippsConj_32fc(X,Tmp,N);
ippsMul_32fc(X,Tmp+1,Rxx,N-1);

ippsMul_32fc(Tmp,Y,Rxy,N);
ippsMagnitude_32fc(Rxy,A2,N);
ippsPhase_32fc(Rxy,P2,N);

ippsConj_32fc(Y,Tmp,N)
ippsMul_32fc(Y,Tmp+1,Ryy,N-1);

ippsAdd_32fc_I(Rxx,Ryy,N);
ippsMagnitude_32fc(Ryy,A1,N);
ippsPhase_32fc(Ryy,P1,N);
...

with N being e.g. 512.

Would it be faster to do all this in one for loop over the vector length N using intrinsics due to the principal of locality? And if possible parallellize using OMP or TBB. What are your thoughts on this?

Thanks,
Thor Andreas

Link Copied

0 Replies

Community support is provided Monday to Friday. Other contact methods are available here.

Intel does not verify all solutions, including but not limited to any file transfers that may appear in this community. Accordingly, Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade.

For more complete information about compiler optimizations, see our Optimization Notice.