Intel® Integrated Performance Primitives
Deliberate problems developing high-performance vision, signal, security, and storage applications.

Principle of locality

thorsan
Beginner
467 Views
Hi,

Say I am doing something like this:

...
ippsConj_32fc(X,Tmp,N);
ippsMul_32fc(X,Tmp+1,Rxx,N-1);

ippsMul_32fc(Tmp,Y,Rxy,N);
ippsMagnitude_32fc(Rxy,A2,N);
ippsPhase_32fc(Rxy,P2,N);

ippsConj_32fc(Y,Tmp,N)
ippsMul_32fc(Y,Tmp+1,Ryy,N-1);

ippsAdd_32fc_I(Rxx,Ryy,N);
ippsMagnitude_32fc(Ryy,A1,N);
ippsPhase_32fc(Ryy,P1,N);
...

with N being e.g. 512.

Would it be faster to do all this in one for loop over the vector length N using intrinsics due to the principal of locality? And if possible parallellize using OMP or TBB. What are your thoughts on this?

Thanks,
Thor Andreas
0 Kudos
0 Replies
Reply