- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
The question comes from following code:
float fa[128] __attribute__((align(64))); float fb[128] __attribute__((align(64))); for(j=0; j<100000000; j++) { for(k=0; k<128; k++) { fa=a*fa +fb ; } }
When i compile it with icc and -no-vec option it takes about 124 s to complete and with auto-vectorization it only needs 1.5 s. This means there is a speedup of about 80x even though the vector units can only process 16 Floats at once.
Doing the same on an Intel Xeon E5-1620 v2 @ 3.70GHz results in 5,6 s with -no-vec and 1.5 s with auto-vectorization.
All testswere done using only 1 core.
Why does the Xeon Phi speed up so good with Vector Instructions and the Xeon doesnt? Shouldnt the Xeon speed up 8 times, as the Vector registers are 256 bit?
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
A couple of things to check:
1) does the compiler skip iterations in one or both cases, seeing that you don't use the intermediate results?
2) is prefetch more efficient in the vectorized case (1 prefetch to L2 and 1 to L1 per cache line)?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
to 1) As fa is used on both sides of the assignment no iteration should be discardable
2) Souldnt everything fit in the L1 cache after the first outer iteration anyway? So prefetches shouldnt be too significant.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
With vector instructions and aligned data the compiler can incorporate a memory reference into the FP add operation and into the FP multiply operation. In scalar code these will have to be separate MOV instructions (since most will not be 64-byte aligned), which could double the number of instructions in the inner loop.
It should be easy to check the assembly code to see if this is part of the problem.

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page