- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Say I am doing something like this:
...
ippsConj_32fc(X,Tmp,N);
ippsMul_32fc(X,Tmp+1,Rxx,N-1);
ippsMul_32fc(Tmp,Y,Rxy,N);
ippsMagnitude_32fc(Rxy,A2,N);
ippsPhase_32fc(Rxy,P2,N);
ippsConj_32fc(Y,Tmp,N)
ippsMul_32fc(Y,Tmp+1,Ryy,N-1);
ippsAdd_32fc_I(Rxx,Ryy,N);
ippsMagnitude_32fc(Ryy,A1,N);
ippsPhase_32fc(Ryy,P1,N);
...
with N being e.g. 512.
Would it be faster to do all this in one for loop over the vector length N using intrinsics due to the principal of locality? And if possible parallellize using OMP or TBB. What are your thoughts on this?
Thanks,
Thor Andreas
Say I am doing something like this:
...
ippsConj_32fc(X,Tmp,N);
ippsMul_32fc(X,Tmp+1,Rxx,N-1);
ippsMul_32fc(Tmp,Y,Rxy,N);
ippsMagnitude_32fc(Rxy,A2,N);
ippsPhase_32fc(Rxy,P2,N);
ippsConj_32fc(Y,Tmp,N)
ippsMul_32fc(Y,Tmp+1,Ryy,N-1);
ippsAdd_32fc_I(Rxx,Ryy,N);
ippsMagnitude_32fc(Ryy,A1,N);
ippsPhase_32fc(Ryy,P1,N);
...
with N being e.g. 512.
Would it be faster to do all this in one for loop over the vector length N using intrinsics due to the principal of locality? And if possible parallellize using OMP or TBB. What are your thoughts on this?
Thanks,
Thor Andreas
Link Copied
0 Replies
Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page