- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I am trying to port a C program from PowerPC using ALTIVEC vector operations to an Intel xeon server using IKML 9.0. I'm finding most operations for Trig, FFT, DOT product, etc.
However, I cannot find vector operations for basic things like vector add, threshold, etc. ALTIVEC has hardware optimized instructions such as 'vadd' which returns the sum of two vectors, and 'vthres' which applies a scalar threshold to a vector.
It looks like these operations are not directly supported in IKML, BLAS, LAPACK.
It appears that I must write my own function and rely on compiler optimization and parallel OpenMP support.
If the functions are available in IKML, please tell me where.
Otherwise, how can I craft these functions to ensure we use hardware SIMD instructions on a single CPU. Our real-time algorithms process a large number of small vectors which we pin to a single CPU (to avoid context switch time). So, I'm looking to use hardware optimization, but not SMP parallel support.
What pragma declarations must I use, and what compiler flags will help to ensure the use of SIMD operations as I implement these basic vector operations.
Thank you,
Scott James
Software Engineer
Mantech Real-time Systems Laboratory
scott.james@mantech.com
941-377-6775 x270
I am trying to port a C program from PowerPC using ALTIVEC vector operations to an Intel xeon server using IKML 9.0. I'm finding most operations for Trig, FFT, DOT product, etc.
However, I cannot find vector operations for basic things like vector add, threshold, etc. ALTIVEC has hardware optimized instructions such as 'vadd' which returns the sum of two vectors, and 'vthres' which applies a scalar threshold to a vector.
It looks like these operations are not directly supported in IKML, BLAS, LAPACK.
It appears that I must write my own function and rely on compiler optimization and parallel OpenMP support.
If the functions are available in IKML, please tell me where.
Otherwise, how can I craft these functions to ensure we use hardware SIMD instructions on a single CPU. Our real-time algorithms process a large number of small vectors which we pin to a single CPU (to avoid context switch time). So, I'm looking to use hardware optimization, but not SMP parallel support.
What pragma declarations must I use, and what compiler flags will help to ensure the use of SIMD operations as I implement these basic vector operations.
Thank you,
Scott James
Software Engineer
Mantech Real-time Systems Laboratory
scott.james@mantech.com
941-377-6775 x270
Link Copied
1 Reply
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Intel IPP may have functions which would interest you. Otherwise, you may want compiler auto-vectorization. Auto-vectorization requires use of compiler flags such as ICL /QxW (SSE2), and may require use of restrict qualifiers and ICL #pragma such as
#pragma vector always
#pragma vector aligned
Intel 10.0 compilers have improved auto-vectorization of short loops of constant length, or preferred lengths which may be specified in #pragma.
10.0 compilers also include more comprehensive short vector math library support for auto-vectorization.
If you use a compiler without auto-vectorization, you could use the simd intrinsics from et al.
#pragma vector always
#pragma vector aligned
Intel 10.0 compilers have improved auto-vectorization of short loops of constant length, or preferred lengths which may be specified in #pragma.
10.0 compilers also include more comprehensive short vector math library support for auto-vectorization.
If you use a compiler without auto-vectorization, you could use the simd intrinsics from

Reply
Topic Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page