- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Dear Intel technician,
On SKYLAKE computer, I built my application with AVX512 instead of SSE2 with Intel parallel_studio_xe_2018.1.038, my application is worse than before with SSE2 in performance. I mean my application takes longer time to run than before with SSE2 on SKYLAKE computers. I cannot understand it. Could Intel tech support staff tell me what caused such a poor performance of AVX512 instructions? Thanks in advance.
Bby the way, do you have any Fortran OpenMP-based test programmes for public to confirm the performance of AVX512 instructions on SKYLAKE computers?
My application is Fortran codes with OpenMP parallelism. It can run on both Linux and Windows and is built with Intel 2018 Fortran Compiler. The tests are done on 2-sockets or 4 -sockets skylake multi-core computers.
Key options used under Linux:
OPENMP_FLAGS = -qopenmp -arch CORE-AVX512 -align array64byte
BASE_FLAGS = -O3 -auto -cm -w -fpp -DPN -DF95 -DLINUX -DLINUX_X64 -DOPENMP_VER -DSR3_LIB -threads
I am looking forward to hearing from you. Your early response is highly appreciated.
Dingjun
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please refer to this article:
https://software.intel.com/en-us/articles/tuning-simd-vectorization-when-targeting-intel-xeon-processor-scalable-family
In addition to Release Notes: -qopt-zmm-usage option
https://software.intel.com/en-us/articles/intel-fortran-compiler-180-for-linux-release-notes-for-intel-parallel-studio-xe-2018#new_options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, Devorah,
I am trying to re-run our application in terms of your advices.
If the following options are used, the elapsed time of our application is 7401.82 seconds
Key options used under Linux:
OPENMP_FLAGS = -qopenmp -arch SSE2 -align array16byte
BASE_FLAGS = -O3 -auto -cm -w -fpp -DPN -DF95 -DLINUX -DLINUX_X64 -DOPENMP_VER -DSR3_LIB -threads
If the following options are used, the elapsed time of our application is 8006.94 seconds. The performance deteriorates.
OPENMP_FLAGS = -qopenmp -arch CORE-AVX512 -xCORE-AVX512 -align array64byte -qopt-zmm-usage=high
BASE_FLAGS = -O3 -auto -cm -w -fpp -DPN -DF95 -DLINUX -DLINUX_X64 -DOPENMP_VER -DSR3_LIB -threads -fimf-force-dynamic-target -fimf-use-svml=true
The Intel Fortran Compiler ifort.exe is '/opt/intel/parallel_studio_xe_2018.1.038/compilers_and_libraries_2018/linux/bin/intel64/ifort'.
Could you give me more help? I want to accelerate our application due to use of AVX512 instructions on our SKYLAKE computers.
I look forward to hearing from you.
Best regards,
Dingjun
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If I want to upgrade our OpenMP based Fortran application with AVX512 instead of SSE2 under Linux, could someone tell me what I should revise in MAKEFILE and Fortran source codes so that our application with AVX512 can run faster than that with SSE2 on our SKLYLAKE computers? Thanks in advance.
I look forward to your reply! Urgency!
Best regards,
Dingjun
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This issue is better to be reported via our Online Service Center at https://supporttickets.intel.com/ for further investigation. We would need to look at the sources, in addition to other information.
Instructions on how to file a ticket are available here:
https://software.intel.com/en-us/articles/how-to-create-a-support-request-at-online-service-center

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page