Showing results for

- Intel Community
- Software Products
- Software Archive
- Difference Performance Datatype dependency

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted
##

eric_p_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-13-2017
02:19 AM

12 Views

Difference Performance Datatype dependency

Hi all and thanks to help,

I write in Fortran a stupid program that implements a dot product between two arrays , one in double precision and the other changing the datatype.

PROGRAM datatype USE omp_lib implicit none double precision, allocatable,dimension(:,:,:) :: A,B,C integer(kind=1), allocatable,dimension(:,:,:) :: D integer(kind=4), allocatable,dimension(:,:,:) :: E integer(kind=8), allocatable,dimension(:,:,:) :: F real, allocatable,dimension(:,:,:) :: G LOGICAL, allocatable,dimension(:,:,:) :: H integer :: t,i,j,k,size = 500,repetition=40 double precision :: time,time1 ALLOCATE(A(size,size,size),B(size,size,size),C(size,size,size)) A = 4. B = 1. time = omp_get_wtime() do t = 1,repetition do i=1,size do j=1,size do k=1,size !dir$ vector aligned c(k,j,i) = a(k,j,i) * b(k,j,i) +5.2 enddo enddo enddo enddo time = omp_get_wtime() - time print *,"TIME double",time/DBLE(repetition) DEALLOCATE(B) ALLOCATE(G(size,size,size)) G = 240. time = omp_get_wtime() do t = 1,repetition do i=1,size do j=1,size do k=1,size !dir$ vector aligned c(k,j,i) = a(k,j,i) * g(k,j,i) +5.2 enddo enddo enddo enddo time = omp_get_wtime() - time print *,"TIME float",time/DBLE(repetition) DEALLOCATE(G) ALLOCATE(D(size,size,size)) D = 240 time = omp_get_wtime() do t = 1,repetition do i=1,size do j=1,size do k=1,size !dir$ vector aligned c(k,j,i) = a(k,j,i) * d(k,j,i) +5.2 enddo enddo enddo enddo time = omp_get_wtime() - time print *,"TIME int8",time/DBLE(repetition) DEALLOCATE(D) ALLOCATE(E(size,size,size)) e = 240 time = omp_get_wtime() do t = 1,repetition do i=1,size do j=1,size do k=1,size !dir$ vector aligned c(k,j,i) = a(k,j,i) * e(k,j,i) +5.2 enddo enddo enddo enddo time = omp_get_wtime() - time print *,"TIME int32",time/DBLE(repetition) DEALLOCATE(E) ALLOCATE(F(size,size,size)) f = 240 time = omp_get_wtime() do t = 1,repetition do i=1,size do j=1,size do k=1,size !dir$ vector aligned c(k,j,i) = a(k,j,i) * f(k,j,i) +5.2 enddo enddo enddo enddo time = omp_get_wtime() - time print *,"TIME int64",time/DBLE(repetition) DEALLOCATE(F) ALLOCATE(H(size,size,size)) h = .True. time = omp_get_wtime() do t = 1,repetition do i=1,size do j=1,size do k=1,size !dir$ vector aligned c(k,j,i) = a(k,j,i) * h(k,j,i) +5.2 enddo enddo enddo enddo time = omp_get_wtime() - time print *,"TIME logical",time/DBLE(repetition) END PROGRAM

I try this code on Broadwell Intel(R) Xeon(R) E5-2697 v4 @ 2.30GHz and Intel Xeon Phi 7250 KNL.

BROADWELL (1 core) TIME double 0.314651775360107 TIME float 0.256021851301193 TIME int8 0.218752950429916 TIME int32 0.245272749662399 TIME int64 0.319928669929504 TIME logical 0.245576351881027 ------------------------------------------------- KNL (1 core) TIME double 0.545190346240997 TIME float 0.608061379194260 TIME int8 0.749213725328445 TIME int32 0.718595725297928 TIME int64 0.730906349420547 TIME logical 0.544638276100159

On the broadwell architecture the best performance was obtained with **double * int 8** and the worst was **double * double . **I think the better performance on int8 is due to better use of cache that mask the time of cast from int8 to double, is it right?

I don't understand because on KNL the behavour is opposite. I analyzed compiler opt report but in both case the double precision decide the vector lengh so the operation per clock cycle.

Someone can help me to understand this behaviour?

Thanks

Best regards

Eric

3 Replies

Highlighted
##

TimP

Black Belt

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-13-2017
03:53 AM

12 Views

Those vector aligned directives presumably have no effect when placed after the DO. Did you include -align array64byte -xHost in your compiler option line, and check with -opt-report=4 ?

The logical promotion to double precision isn't valid Fortran and should be rejected by ifort if you set appropriate options, such as -standard-semantics.

I don't see how integer(8) could imply different cache behavior from real(8) unless you have alignment issues.

I think your characterization of this as a dot product is misleading.

Highlighted
##

eric_p_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-13-2017
06:21 AM

12 Views

Hi Tim,

ifort -align array64byte -xAVX2 -qopt-report4 -qopenmp -O1 ./datatype_3d.f90 -o ./dt.x TIME double 0.333785974979401 TIME float 0.284216976165771 TIME int1 0.262497049570084 TIME int32 0.265798276662827 TIME int64 0.316275727748871 TIME logical 0.246519052982330

and with -O2

ifort -align array64byte -xAVX2 -qopt-report4 -qopenmp -O2 ./datatype_3d.f90 -o ./dt.x TIME double 0.297000402212143 TIME float 0.246595245599747 TIME int1 0.217311549186707 TIME int32 0.247010648250580 TIME int64 0.294282549619675 TIME logical 0.253364574909210

for -O2 the opt report :

Intel(R) Advisor can now assist with vectorization and show optimization report messages with your source code. See "https://software.intel.com/en-us/intel-advisor-xe" for details. Intel(R) Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 17.0.1.132 Build 20161005 Compiler options: -align array64byte -xAVX2 -qopt-report4 -qopenmp -O2 -o ./dt.x Report from: Interprocedural optimizations [ipo] WHOLE PROGRAM (SAFE) [EITHER METHOD]: false WHOLE PROGRAM (SEEN) [TABLE METHOD]: false WHOLE PROGRAM (READ) [OBJECT READER METHOD]: false INLINING OPTION VALUES: -inline-factor: 100 -inline-min-size: 30 -inline-max-size: 230 -inline-max-total-size: 2000 -inline-max-per-routine: 10000 -inline-max-per-compile: 500000 In the inlining report below: "sz" refers to the "size" of the routine. The smaller a routine's size, the more likely it is to be inlined. "isz" refers to the "inlined size" of the routine. This is the amount the calling routine will grow if the called routine is inlined into it. The compiler generally limits the amount a routine can grow by having routines inlined into it. Begin optimization report for: DATATYPE Report from: Interprocedural optimizations [ipo] INLINE REPORT: (DATATYPE) [1/1=100.0%] ./datatype_3d.f90(1,9) -> EXTERN: (1,9) for_set_reentrancy -> EXTERN: (17,1) for_alloc_allocatable -> EXTERN: (17,1) for_check_mult_overflow64 -> EXTERN: (17,1) for_alloc_allocatable -> EXTERN: (17,1) for_check_mult_overflow64 -> EXTERN: (17,1) for_alloc_allocatable -> EXTERN: (17,1) for_check_mult_overflow64 -> EXTERN: (23,8) omp_get_wtime -> EXTERN: (34,8) omp_get_wtime -> EXTERN: (36,1) for_write_seq_lis_xmit -> EXTERN: (36,1) for_write_seq_lis -> EXTERN: (38,1) for_dealloc_allocatable -> EXTERN: (40,1) for_alloc_allocatable -> EXTERN: (40,1) for_check_mult_overflow64 -> EXTERN: (43,8) omp_get_wtime -> EXTERN: (54,8) omp_get_wtime -> EXTERN: (56,1) for_write_seq_lis_xmit -> EXTERN: (56,1) for_write_seq_lis -> EXTERN: (59,1) for_dealloc_allocatable -> EXTERN: (64,1) for_alloc_allocatable -> EXTERN: (64,1) for_check_mult_overflow64 -> EXTERN: (67,8) omp_get_wtime -> EXTERN: (78,8) omp_get_wtime -> EXTERN: (80,1) for_write_seq_lis_xmit -> EXTERN: (80,1) for_write_seq_lis -> EXTERN: (83,1) for_dealloc_allocatable -> EXTERN: (85,1) for_alloc_allocatable -> EXTERN: (85,1) for_check_mult_overflow64 -> EXTERN: (88,8) omp_get_wtime -> EXTERN: (99,8) omp_get_wtime -> EXTERN: (101,1) for_write_seq_lis_xmit -> EXTERN: (101,1) for_write_seq_lis -> EXTERN: (103,1) for_dealloc_allocatable -> EXTERN: (105,1) for_alloc_allocatable -> EXTERN: (105,1) for_check_mult_overflow64 -> EXTERN: (108,8) omp_get_wtime -> EXTERN: (119,8) omp_get_wtime -> EXTERN: (121,1) for_write_seq_lis_xmit -> EXTERN: (121,1) for_write_seq_lis -> EXTERN: (123,1) for_dealloc_allocatable -> EXTERN: (125,1) for_alloc_allocatable -> EXTERN: (125,1) for_check_mult_overflow64 -> EXTERN: (128,8) omp_get_wtime -> EXTERN: (139,8) omp_get_wtime -> EXTERN: (141,1) for_write_seq_lis_xmit -> EXTERN: (141,1) for_write_seq_lis Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at ./datatype_3d.f90(19,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(19,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(19,1) <Peeled loop for vectorization> remark #25015: Estimate of max trip count of loop=3 LOOP END LOOP BEGIN at ./datatype_3d.f90(19,1) remark #15388: vectorization support: reference A(:,:,:) has aligned access remark #15305: vectorization support: vector length 4 remark #15399: vectorization support: unroll factor set to 4 remark #15309: vectorization support: normalized vectorization overhead 0.833 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 3 remark #15477: vector cost: 0.750 remark #15478: estimated potential speedup: 3.430 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at ./datatype_3d.f90(19,1) <Remainder loop for vectorization> remark #15389: vectorization support: reference A(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 4 remark #15309: vectorization support: normalized vectorization overhead 2.600 remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP BEGIN at ./datatype_3d.f90(19,1) <Remainder loop for vectorization> LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(20,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(20,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(20,1) <Peeled loop for vectorization> remark #25015: Estimate of max trip count of loop=3 LOOP END LOOP BEGIN at ./datatype_3d.f90(20,1) remark #15388: vectorization support: reference B(:,:,:) has aligned access remark #15305: vectorization support: vector length 4 remark #15399: vectorization support: unroll factor set to 4 remark #15309: vectorization support: normalized vectorization overhead 0.833 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 3 remark #15477: vector cost: 0.750 remark #15478: estimated potential speedup: 3.430 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at ./datatype_3d.f90(20,1) <Remainder loop for vectorization> remark #15389: vectorization support: reference B(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 4 remark #15309: vectorization support: normalized vectorization overhead 2.600 remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP BEGIN at ./datatype_3d.f90(20,1) <Remainder loop for vectorization> LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(29,35) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(25,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(26,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(28,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(29,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(29,15) ] remark #15388: vectorization support: reference B(k,j,i) has aligned access [ ./datatype_3d.f90(29,26) ] remark #15305: vectorization support: vector length 4 remark #15399: vectorization support: unroll factor set to 4 remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 2 remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 8 remark #15477: vector cost: 1.750 remark #15478: estimated potential speedup: 4.440 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=31 LOOP END LOOP BEGIN at ./datatype_3d.f90(28,4) <Remainder loop for vectorization> remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(29,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(29,15) ] remark #15388: vectorization support: reference B(k,j,i) has aligned access [ ./datatype_3d.f90(29,26) ] remark #15305: vectorization support: vector length 4 remark #15427: loop was completely unrolled remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(41,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(41,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(41,1) <Peeled loop for vectorization> remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at ./datatype_3d.f90(41,1) remark #15389: vectorization support: reference G(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15399: vectorization support: unroll factor set to 2 remark #15309: vectorization support: normalized vectorization overhead 1.667 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 3 remark #15477: vector cost: 0.370 remark #15478: estimated potential speedup: 5.840 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at ./datatype_3d.f90(41,1) <Remainder loop for vectorization> remark #15389: vectorization support: reference G(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override remark #15305: vectorization support: vector length 4 remark #15309: vectorization support: normalized vectorization overhead 2.167 LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(49,15) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(45,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(46,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(48,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(49,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(49,15) ] remark #15388: vectorization support: reference G(k,j,i) has aligned access [ ./datatype_3d.f90(49,26) ] remark #15305: vectorization support: vector length 4 remark #15399: vectorization support: unroll factor set to 4 remark #15417: vectorization support: number of FP up converts: single precision to double precision 1 [ ./datatype_3d.f90(49,4) ] remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 2 remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 9 remark #15477: vector cost: 2.000 remark #15478: estimated potential speedup: 4.370 remark #15487: type converts: 1 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=31 LOOP END LOOP BEGIN at ./datatype_3d.f90(48,4) <Remainder loop for vectorization> remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(49,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(49,15) ] remark #15388: vectorization support: reference G(k,j,i) has aligned access [ ./datatype_3d.f90(49,26) ] remark #15305: vectorization support: vector length 4 remark #15427: loop was completely unrolled remark #15417: vectorization support: number of FP up converts: single precision to double precision 1 [ ./datatype_3d.f90(49,4) ] remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(65,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(65,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(65,1) remark #25408: memset generated remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(65,1) remark #15389: vectorization support: reference D(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 32 remark #15309: vectorization support: normalized vectorization overhead 0.600 remark #15300: LOOP WAS VECTORIZED remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 2 remark #15477: vector cost: 0.150 remark #15478: estimated potential speedup: 10.660 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=3 LOOP END LOOP BEGIN at ./datatype_3d.f90(65,1) <Remainder loop for vectorization> remark #25015: Estimate of max trip count of loop=96 LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(73,15) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(69,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(70,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(72,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(73,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(73,15) ] remark #15388: vectorization support: reference D(k,j,i) has aligned access [ ./datatype_3d.f90(73,26) ] remark #15305: vectorization support: vector length 4 remark #15399: vectorization support: unroll factor set to 4 remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 1 remark #15449: unmasked aligned unit stride stores: 1 remark #15450: unmasked unaligned unit stride loads: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 10 remark #15477: vector cost: 2.000 remark #15478: estimated potential speedup: 4.840 remark #15487: type converts: 1 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=31 LOOP END LOOP BEGIN at ./datatype_3d.f90(72,4) <Remainder loop for vectorization> remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(73,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(73,15) ] remark #15388: vectorization support: reference D(k,j,i) has aligned access [ ./datatype_3d.f90(73,26) ] remark #15305: vectorization support: vector length 4 remark #15427: loop was completely unrolled remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(86,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(86,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(86,1) <Peeled loop for vectorization> remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at ./datatype_3d.f90(86,1) remark #15389: vectorization support: reference E(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 3.333 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 2 remark #15477: vector cost: 0.370 remark #15478: estimated potential speedup: 4.220 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at ./datatype_3d.f90(86,1) <Remainder loop for vectorization> LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(94,15) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(90,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(91,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(93,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(94,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(94,15) ] remark #15388: vectorization support: reference E(k,j,i) has aligned access [ ./datatype_3d.f90(94,26) ] remark #15305: vectorization support: vector length 4 remark #15399: vectorization support: unroll factor set to 4 remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 2 remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 10 remark #15477: vector cost: 2.000 remark #15478: estimated potential speedup: 4.840 remark #15487: type converts: 1 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=31 LOOP END LOOP BEGIN at ./datatype_3d.f90(93,4) <Remainder loop for vectorization> remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(94,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(94,15) ] remark #15388: vectorization support: reference E(k,j,i) has aligned access [ ./datatype_3d.f90(94,26) ] remark #15305: vectorization support: vector length 4 remark #15427: loop was completely unrolled remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(106,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(106,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(106,1) <Peeled loop for vectorization> remark #25015: Estimate of max trip count of loop=3 LOOP END LOOP BEGIN at ./datatype_3d.f90(106,1) remark #15389: vectorization support: reference F(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 4 remark #15309: vectorization support: normalized vectorization overhead 3.333 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 2 remark #15477: vector cost: 0.750 remark #15478: estimated potential speedup: 2.500 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at ./datatype_3d.f90(106,1) <Remainder loop for vectorization> LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(114,15) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(110,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(111,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(113,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(114,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(114,15) ] remark #15388: vectorization support: reference F(k,j,i) has aligned access [ ./datatype_3d.f90(114,26) ] remark #15410: vectorization support: conversion from int to float will be emulated [ ./datatype_3d.f90(114,26) ] remark #15305: vectorization support: vector length 4 remark #15399: vectorization support: unroll factor set to 4 remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 2 remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 10 remark #15477: vector cost: 4.250 remark #15478: estimated potential speedup: 2.320 remark #15487: type converts: 1 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=31 LOOP END LOOP BEGIN at ./datatype_3d.f90(113,4) <Remainder loop for vectorization> remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(114,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(114,15) ] remark #15388: vectorization support: reference F(k,j,i) has aligned access [ ./datatype_3d.f90(114,26) ] remark #15410: vectorization support: conversion from int to float will be emulated [ ./datatype_3d.f90(114,26) ] remark #15305: vectorization support: vector length 4 remark #15427: loop was completely unrolled remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(126,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(126,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(126,1) remark #25408: memset generated remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(126,1) remark #15389: vectorization support: reference H(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 0.600 remark #15300: LOOP WAS VECTORIZED remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 2 remark #15477: vector cost: 0.620 remark #15478: estimated potential speedup: 2.660 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=3 LOOP END LOOP BEGIN at ./datatype_3d.f90(126,1) <Remainder loop for vectorization> remark #25015: Estimate of max trip count of loop=24 LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(134,15) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(130,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(131,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(133,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(134,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(134,15) ] remark #15388: vectorization support: reference H(k,j,i) has aligned access [ ./datatype_3d.f90(134,26) ] remark #15305: vectorization support: vector length 4 remark #15399: vectorization support: unroll factor set to 4 remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 2 remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 10 remark #15477: vector cost: 2.000 remark #15478: estimated potential speedup: 4.840 remark #15487: type converts: 1 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=31 LOOP END LOOP BEGIN at ./datatype_3d.f90(133,4) <Remainder loop for vectorization> remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(134,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(134,15) ] remark #15388: vectorization support: reference H(k,j,i) has aligned access [ ./datatype_3d.f90(134,26) ] remark #15305: vectorization support: vector length 4 remark #15427: loop was completely unrolled remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP END Report from: Code generation optimizations [cg] ./datatype_3d.f90(65,1):remark #34014: optimization advice for memset: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation ./datatype_3d.f90(65,1):remark #34026: call to memset implemented as a call to optimized library version ./datatype_3d.f90(126,1):remark #34014: optimization advice for memset: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation ./datatype_3d.f90(126,1):remark #34026: call to memset implemented as a call to optimized library version ./datatype_3d.f90(1,9):remark #34051: REGISTER ALLOCATION : [MAIN__] ./datatype_3d.f90:1 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 30[ rax rdx rcx rbx rsi rdi r8-r15 zmm0-zmm15] Routine temporaries Total : 1144 Global : 390 Local : 754 Regenerable : 193 Spilled : 171 Routine stack Variables : 276 bytes* Reads : 10 [8.00e+00 ~ 0.0%] Writes : 26 [2.40e+01 ~ 0.0%] Spills : 1328 bytes* Reads : 183 [5.44e+03 ~ 2.8%] Writes : 175 [1.17e+03 ~ 0.6%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. ===========================================================================

Thanks

Eric

Highlighted
##

eric_p_

Beginner

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-13-2017
06:58 AM

12 Views

Hi Tim,

I also try on KNL

ifort -align array64byte -axMIC-AVX512 -qopt-report4 -qopenmp -O2 ./datatype_3d.f90 -o ./dt.x

TIME double 0.277145397663116

TIME float 0.262126672267914

TIME int1 0.287773197889328

TIME int32 0.259244447946548

TIME int64 0.570451277494431

TIME logical 0.262114471197128

The results appears more raesonable, the floats have the better time. But on opt report the longest vector leght is for 8 bit integer even if it is mulptiply with double.

Intel(R) Advisor can now assist with vectorization and show optimization report messages with your source code. See "https://software.intel.com/en-us/intel-advisor-xe" for details. Intel(R) Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 17.0.1.132 Build 20161005 Compiler options: -align array64byte -axMIC-AVX512 -qopt-report4 -qopenmp -O2 -o ./dt.x Report from: Interprocedural optimizations [ipo] WHOLE PROGRAM (SAFE) [EITHER METHOD]: false WHOLE PROGRAM (SEEN) [TABLE METHOD]: false WHOLE PROGRAM (READ) [OBJECT READER METHOD]: false INLINING OPTION VALUES: -inline-factor: 100 -inline-min-size: 30 -inline-max-size: 230 -inline-max-total-size: 2000 -inline-max-per-routine: 10000 -inline-max-per-compile: 500000 In the inlining report below: "sz" refers to the "size" of the routine. The smaller a routine's size, the more likely it is to be inlined. "isz" refers to the "inlined size" of the routine. This is the amount the calling routine will grow if the called routine is inlined into it. The compiler generally limits the amount a routine can grow by having routines inlined into it. Begin optimization report for: DATATYPE Report from: Interprocedural optimizations [ipo] INLINE REPORT: (DATATYPE) [1/1=100.0%] ./datatype_3d.f90(1,9) -> EXTERN: (1,9) for_set_reentrancy -> EXTERN: (17,1) for_alloc_allocatable -> EXTERN: (17,1) for_check_mult_overflow64 -> EXTERN: (17,1) for_alloc_allocatable -> EXTERN: (17,1) for_check_mult_overflow64 -> EXTERN: (17,1) for_alloc_allocatable -> EXTERN: (17,1) for_check_mult_overflow64 -> EXTERN: (23,8) omp_get_wtime -> EXTERN: (34,8) omp_get_wtime -> EXTERN: (36,1) for_write_seq_lis_xmit -> EXTERN: (36,1) for_write_seq_lis -> EXTERN: (38,1) for_dealloc_allocatable -> EXTERN: (40,1) for_alloc_allocatable -> EXTERN: (40,1) for_check_mult_overflow64 -> EXTERN: (43,8) omp_get_wtime -> EXTERN: (54,8) omp_get_wtime -> EXTERN: (56,1) for_write_seq_lis_xmit -> EXTERN: (56,1) for_write_seq_lis -> EXTERN: (59,1) for_dealloc_allocatable -> EXTERN: (64,1) for_alloc_allocatable -> EXTERN: (64,1) for_check_mult_overflow64 -> EXTERN: (67,8) omp_get_wtime -> EXTERN: (78,8) omp_get_wtime -> EXTERN: (80,1) for_write_seq_lis_xmit -> EXTERN: (80,1) for_write_seq_lis -> EXTERN: (83,1) for_dealloc_allocatable -> EXTERN: (85,1) for_alloc_allocatable -> EXTERN: (85,1) for_check_mult_overflow64 -> EXTERN: (88,8) omp_get_wtime -> EXTERN: (99,8) omp_get_wtime -> EXTERN: (101,1) for_write_seq_lis_xmit -> EXTERN: (101,1) for_write_seq_lis -> EXTERN: (103,1) for_dealloc_allocatable -> EXTERN: (105,1) for_alloc_allocatable -> EXTERN: (105,1) for_check_mult_overflow64 -> EXTERN: (108,8) omp_get_wtime -> EXTERN: (119,8) omp_get_wtime -> EXTERN: (121,1) for_write_seq_lis_xmit -> EXTERN: (121,1) for_write_seq_lis -> EXTERN: (123,1) for_dealloc_allocatable -> EXTERN: (125,1) for_alloc_allocatable -> EXTERN: (125,1) for_check_mult_overflow64 -> EXTERN: (128,8) omp_get_wtime -> EXTERN: (139,8) omp_get_wtime -> EXTERN: (141,1) for_write_seq_lis_xmit -> EXTERN: (141,1) for_write_seq_lis Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at ./datatype_3d.f90(19,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(19,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(19,1) <Peeled loop for vectorization> remark #15389: vectorization support: reference A(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15335: peel loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override remark #15305: vectorization support: vector length 2 remark #15309: vectorization support: normalized vectorization overhead 1.250 remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at ./datatype_3d.f90(19,1) remark #15388: vectorization support: reference A(:,:,:) has aligned access remark #15305: vectorization support: vector length 8 remark #15399: vectorization support: unroll factor set to 2 remark #15309: vectorization support: normalized vectorization overhead 1.667 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 3 remark #15477: vector cost: 0.370 remark #15478: estimated potential speedup: 6.920 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at ./datatype_3d.f90(19,1) <Remainder loop for vectorization> remark #15389: vectorization support: reference A(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 1.364 remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(20,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(20,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(20,1) <Peeled loop for vectorization> remark #15389: vectorization support: reference B(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15335: peel loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override remark #15305: vectorization support: vector length 2 remark #15309: vectorization support: normalized vectorization overhead 1.250 remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at ./datatype_3d.f90(20,1) remark #15388: vectorization support: reference B(:,:,:) has aligned access remark #15305: vectorization support: vector length 8 remark #15399: vectorization support: unroll factor set to 2 remark #15309: vectorization support: normalized vectorization overhead 1.667 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 3 remark #15477: vector cost: 0.370 remark #15478: estimated potential speedup: 6.920 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at ./datatype_3d.f90(20,1) <Remainder loop for vectorization> remark #15389: vectorization support: reference B(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 1.364 remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(29,35) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(25,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(26,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(28,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(29,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(29,15) ] remark #15388: vectorization support: reference B(k,j,i) has aligned access [ ./datatype_3d.f90(29,26) ] remark #15305: vectorization support: vector length 8 remark #15399: vectorization support: unroll factor set to 2 remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 2 remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 8 remark #15477: vector cost: 0.870 remark #15478: estimated potential speedup: 8.920 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=31 LOOP END LOOP BEGIN at ./datatype_3d.f90(28,4) <Remainder loop for vectorization> remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(29,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(29,15) ] remark #15388: vectorization support: reference B(k,j,i) has aligned access [ ./datatype_3d.f90(29,26) ] remark #15305: vectorization support: vector length 4 remark #15427: loop was completely unrolled remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(41,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(41,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(41,1) <Peeled loop for vectorization> remark #15389: vectorization support: reference G(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 1.250 remark #15301: PEEL LOOP WAS VECTORIZED remark #25015: Estimate of max trip count of loop=1 LOOP END LOOP BEGIN at ./datatype_3d.f90(41,1) remark #15389: vectorization support: reference G(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 16 remark #15399: vectorization support: unroll factor set to 2 remark #15309: vectorization support: normalized vectorization overhead 1.667 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 3 remark #15477: vector cost: 0.180 remark #15478: estimated potential speedup: 11.840 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at ./datatype_3d.f90(41,1) <Remainder loop for vectorization> remark #15389: vectorization support: reference G(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 1.250 remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(49,15) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(45,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(46,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(48,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(49,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(49,15) ] remark #15388: vectorization support: reference G(k,j,i) has aligned access [ ./datatype_3d.f90(49,26) ] remark #15305: vectorization support: vector length 8 remark #15399: vectorization support: unroll factor set to 2 remark #15417: vectorization support: number of FP up converts: single precision to double precision 1 [ ./datatype_3d.f90(49,4) ] remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 2 remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 9 remark #15477: vector cost: 1.000 remark #15478: estimated potential speedup: 8.780 remark #15487: type converts: 1 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=31 LOOP END LOOP BEGIN at ./datatype_3d.f90(48,4) <Remainder loop for vectorization> remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(49,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(49,15) ] remark #15388: vectorization support: reference G(k,j,i) has aligned access [ ./datatype_3d.f90(49,26) ] remark #15305: vectorization support: vector length 4 remark #15427: loop was completely unrolled remark #15417: vectorization support: number of FP up converts: single precision to double precision 1 [ ./datatype_3d.f90(49,4) ] remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(65,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(65,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(65,1) remark #25408: memset generated remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(65,1) <Peeled loop for vectorization> remark #15389: vectorization support: reference D(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15335: peel loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override remark #15305: vectorization support: vector length 16 remark #15309: vectorization support: normalized vectorization overhead 0.058 remark #25015: Estimate of max trip count of loop=31 LOOP END LOOP BEGIN at ./datatype_3d.f90(65,1) remark #15388: vectorization support: reference D(:,:,:) has aligned access remark #15305: vectorization support: vector length 32 remark #15309: vectorization support: normalized vectorization overhead 3.333 remark #15300: LOOP WAS VECTORIZED remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 2 remark #15477: vector cost: 0.090 remark #15478: estimated potential speedup: 6.850 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=3 LOOP END LOOP BEGIN at ./datatype_3d.f90(65,1) <Remainder loop for vectorization> remark #15388: vectorization support: reference D(:,:,:) has aligned access remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override remark #15305: vectorization support: vector length 16 remark #15309: vectorization support: normalized vectorization overhead 0.058 remark #25015: Estimate of max trip count of loop=96 LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(73,15) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(69,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(70,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(72,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(73,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(73,15) ] remark #15388: vectorization support: reference D(k,j,i) has aligned access [ ./datatype_3d.f90(73,26) ] remark #15305: vectorization support: vector length 8 remark #15399: vectorization support: unroll factor set to 2 remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 1 remark #15449: unmasked aligned unit stride stores: 1 remark #15450: unmasked unaligned unit stride loads: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 10 remark #15477: vector cost: 1.000 remark #15478: estimated potential speedup: 9.760 remark #15487: type converts: 1 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=31 LOOP END LOOP BEGIN at ./datatype_3d.f90(72,4) <Remainder loop for vectorization> remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(73,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(73,15) ] remark #15388: vectorization support: reference D(k,j,i) has aligned access [ ./datatype_3d.f90(73,26) ] remark #15305: vectorization support: vector length 4 remark #15427: loop was completely unrolled remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(86,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(86,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(86,1) <Peeled loop for vectorization> remark #15389: vectorization support: reference E(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15335: peel loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override remark #15305: vectorization support: vector length 4 remark #15309: vectorization support: normalized vectorization overhead 0.938 remark #25015: Estimate of max trip count of loop=15 LOOP END LOOP BEGIN at ./datatype_3d.f90(86,1) remark #15389: vectorization support: reference E(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 16 remark #15309: vectorization support: normalized vectorization overhead 3.333 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 2 remark #15477: vector cost: 0.180 remark #15478: estimated potential speedup: 8.210 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at ./datatype_3d.f90(86,1) <Remainder loop for vectorization> remark #15389: vectorization support: reference E(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 1.250 remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(94,15) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(90,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(91,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(93,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(94,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(94,15) ] remark #15388: vectorization support: reference E(k,j,i) has aligned access [ ./datatype_3d.f90(94,26) ] remark #15305: vectorization support: vector length 8 remark #15399: vectorization support: unroll factor set to 2 remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 2 remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 10 remark #15477: vector cost: 1.000 remark #15478: estimated potential speedup: 9.760 remark #15487: type converts: 1 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=31 LOOP END LOOP BEGIN at ./datatype_3d.f90(93,4) <Remainder loop for vectorization> remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(94,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(94,15) ] remark #15388: vectorization support: reference E(k,j,i) has aligned access [ ./datatype_3d.f90(94,26) ] remark #15305: vectorization support: vector length 4 remark #15427: loop was completely unrolled remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(106,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(106,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(106,1) <Peeled loop for vectorization> remark #15389: vectorization support: reference F(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15335: peel loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override remark #15305: vectorization support: vector length 2 remark #15309: vectorization support: normalized vectorization overhead 1.250 remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at ./datatype_3d.f90(106,1) remark #15389: vectorization support: reference F(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 3.333 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 2 remark #15477: vector cost: 0.370 remark #15478: estimated potential speedup: 4.610 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at ./datatype_3d.f90(106,1) <Remainder loop for vectorization> remark #15389: vectorization support: reference F(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 1.364 remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(114,15) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(110,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(111,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(113,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(114,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(114,15) ] remark #15388: vectorization support: reference F(k,j,i) has aligned access [ ./datatype_3d.f90(114,26) ] remark #15410: vectorization support: conversion from int to float will be emulated [ ./datatype_3d.f90(114,26) ] remark #15305: vectorization support: vector length 8 remark #15399: vectorization support: unroll factor set to 2 remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 2 remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 10 remark #15477: vector cost: 2.120 remark #15478: estimated potential speedup: 4.590 remark #15487: type converts: 1 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=31 LOOP END LOOP BEGIN at ./datatype_3d.f90(113,4) <Remainder loop for vectorization> remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(114,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(114,15) ] remark #15388: vectorization support: reference F(k,j,i) has aligned access [ ./datatype_3d.f90(114,26) ] remark #15410: vectorization support: conversion from int to float will be emulated [ ./datatype_3d.f90(114,26) ] remark #15305: vectorization support: vector length 4 remark #15427: loop was completely unrolled remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(126,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(126,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(126,1) remark #25408: memset generated remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(126,1) remark #15389: vectorization support: reference H(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 16 remark #15309: vectorization support: normalized vectorization overhead 0.600 remark #15300: LOOP WAS VECTORIZED remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 2 remark #15477: vector cost: 0.310 remark #15478: estimated potential speedup: 2.660 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=1 LOOP END LOOP BEGIN at ./datatype_3d.f90(126,1) <Remainder loop for vectorization> remark #15389: vectorization support: reference H(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override remark #15305: vectorization support: vector length 4 remark #15309: vectorization support: normalized vectorization overhead 0.938 remark #25015: Estimate of max trip count of loop=24 LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(134,15) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(130,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(131,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(133,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(134,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(134,15) ] remark #15388: vectorization support: reference H(k,j,i) has aligned access [ ./datatype_3d.f90(134,26) ] remark #15305: vectorization support: vector length 8 remark #15399: vectorization support: unroll factor set to 2 remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 2 remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 10 remark #15477: vector cost: 1.000 remark #15478: estimated potential speedup: 9.760 remark #15487: type converts: 1 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=31 LOOP END LOOP BEGIN at ./datatype_3d.f90(133,4) <Remainder loop for vectorization> remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(134,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(134,15) ] remark #15388: vectorization support: reference H(k,j,i) has aligned access [ ./datatype_3d.f90(134,26) ] remark #15305: vectorization support: vector length 4 remark #15427: loop was completely unrolled remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP END Report from: Code generation optimizations [cg] ./datatype_3d.f90(1,9):remark #34051: REGISTER ALLOCATION : [MAIN__] ./datatype_3d.f90:1 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 2[ rax rdx] Routine temporaries Total : 9 Global : 7 Local : 2 Regenerable : 1 Spilled : 0 Routine stack Variables : 0 bytes* Reads : 0 [0.00e+00 ~ -nan%] Writes : 0 [0.00e+00 ~ -nan%] Spills : 0 bytes* Reads : 0 [0.00e+00 ~ -nan%] Writes : 0 [0.00e+00 ~ -nan%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: DATATYPE [future_cpu_22] Report from: Interprocedural optimizations [ipo] INLINE REPORT: (DATATYPE) [1/1=100.0%] ./datatype_3d.f90(1,9) -> EXTERN: (1,9) for_set_reentrancy -> EXTERN: (17,1) for_alloc_allocatable -> EXTERN: (17,1) for_check_mult_overflow64 -> EXTERN: (17,1) for_alloc_allocatable -> EXTERN: (17,1) for_check_mult_overflow64 -> EXTERN: (17,1) for_alloc_allocatable -> EXTERN: (17,1) for_check_mult_overflow64 -> EXTERN: (23,8) omp_get_wtime -> EXTERN: (34,8) omp_get_wtime -> EXTERN: (36,1) for_write_seq_lis_xmit -> EXTERN: (36,1) for_write_seq_lis -> EXTERN: (38,1) for_dealloc_allocatable -> EXTERN: (40,1) for_alloc_allocatable -> EXTERN: (40,1) for_check_mult_overflow64 -> EXTERN: (43,8) omp_get_wtime -> EXTERN: (54,8) omp_get_wtime -> EXTERN: (56,1) for_write_seq_lis_xmit -> EXTERN: (56,1) for_write_seq_lis -> EXTERN: (59,1) for_dealloc_allocatable -> EXTERN: (64,1) for_alloc_allocatable -> EXTERN: (64,1) for_check_mult_overflow64 -> EXTERN: (67,8) omp_get_wtime -> EXTERN: (78,8) omp_get_wtime -> EXTERN: (80,1) for_write_seq_lis_xmit -> EXTERN: (80,1) for_write_seq_lis -> EXTERN: (83,1) for_dealloc_allocatable -> EXTERN: (85,1) for_alloc_allocatable -> EXTERN: (85,1) for_check_mult_overflow64 -> EXTERN: (88,8) omp_get_wtime -> EXTERN: (99,8) omp_get_wtime -> EXTERN: (101,1) for_write_seq_lis_xmit -> EXTERN: (101,1) for_write_seq_lis -> EXTERN: (103,1) for_dealloc_allocatable -> EXTERN: (105,1) for_alloc_allocatable -> EXTERN: (105,1) for_check_mult_overflow64 -> EXTERN: (108,8) omp_get_wtime -> EXTERN: (119,8) omp_get_wtime -> EXTERN: (121,1) for_write_seq_lis_xmit -> EXTERN: (121,1) for_write_seq_lis -> EXTERN: (123,1) for_dealloc_allocatable -> EXTERN: (125,1) for_alloc_allocatable -> EXTERN: (125,1) for_check_mult_overflow64 -> EXTERN: (128,8) omp_get_wtime -> EXTERN: (139,8) omp_get_wtime -> EXTERN: (141,1) for_write_seq_lis_xmit -> EXTERN: (141,1) for_write_seq_lis Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at ./datatype_3d.f90(19,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(19,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(19,1) <Peeled loop for vectorization> remark #15389: vectorization support: reference A(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15335: peel loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override remark #15305: vectorization support: vector length 2 remark #15309: vectorization support: normalized vectorization overhead 1.250 remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at ./datatype_3d.f90(19,1) remark #15388: vectorization support: reference A(:,:,:) has aligned access remark #15305: vectorization support: vector length 8 remark #15399: vectorization support: unroll factor set to 2 remark #15309: vectorization support: normalized vectorization overhead 1.667 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 3 remark #15477: vector cost: 0.370 remark #15478: estimated potential speedup: 6.920 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at ./datatype_3d.f90(19,1) <Remainder loop for vectorization> remark #15389: vectorization support: reference A(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 1.364 remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(20,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(20,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(20,1) <Peeled loop for vectorization> remark #15389: vectorization support: reference B(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15335: peel loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override remark #15305: vectorization support: vector length 2 remark #15309: vectorization support: normalized vectorization overhead 1.250 remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at ./datatype_3d.f90(20,1) remark #15388: vectorization support: reference B(:,:,:) has aligned access remark #15305: vectorization support: vector length 8 remark #15399: vectorization support: unroll factor set to 2 remark #15309: vectorization support: normalized vectorization overhead 1.667 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 3 remark #15477: vector cost: 0.370 remark #15478: estimated potential speedup: 6.920 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at ./datatype_3d.f90(20,1) <Remainder loop for vectorization> remark #15389: vectorization support: reference B(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 1.364 remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(29,35) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(25,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(26,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(28,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(29,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(29,15) ] remark #15388: vectorization support: reference B(k,j,i) has aligned access [ ./datatype_3d.f90(29,26) ] remark #15305: vectorization support: vector length 8 remark #15399: vectorization support: unroll factor set to 2 remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 2 remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 8 remark #15477: vector cost: 0.870 remark #15478: estimated potential speedup: 8.920 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=31 LOOP END LOOP BEGIN at ./datatype_3d.f90(28,4) <Remainder loop for vectorization> remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(29,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(29,15) ] remark #15388: vectorization support: reference B(k,j,i) has aligned access [ ./datatype_3d.f90(29,26) ] remark #15305: vectorization support: vector length 4 remark #15427: loop was completely unrolled remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(41,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(41,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(41,1) <Peeled loop for vectorization> remark #15389: vectorization support: reference G(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 1.250 remark #15301: PEEL LOOP WAS VECTORIZED remark #25015: Estimate of max trip count of loop=1 LOOP END LOOP BEGIN at ./datatype_3d.f90(41,1) remark #15389: vectorization support: reference G(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 16 remark #15399: vectorization support: unroll factor set to 2 remark #15309: vectorization support: normalized vectorization overhead 1.667 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 3 remark #15477: vector cost: 0.180 remark #15478: estimated potential speedup: 11.840 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at ./datatype_3d.f90(41,1) <Remainder loop for vectorization> remark #15389: vectorization support: reference G(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 1.250 remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(49,15) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(45,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(46,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(48,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(49,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(49,15) ] remark #15388: vectorization support: reference G(k,j,i) has aligned access [ ./datatype_3d.f90(49,26) ] remark #15305: vectorization support: vector length 8 remark #15399: vectorization support: unroll factor set to 2 remark #15417: vectorization support: number of FP up converts: single precision to double precision 1 [ ./datatype_3d.f90(49,4) ] remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 2 remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 9 remark #15477: vector cost: 1.000 remark #15478: estimated potential speedup: 8.780 remark #15487: type converts: 1 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=31 LOOP END LOOP BEGIN at ./datatype_3d.f90(48,4) <Remainder loop for vectorization> remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(49,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(49,15) ] remark #15388: vectorization support: reference G(k,j,i) has aligned access [ ./datatype_3d.f90(49,26) ] remark #15305: vectorization support: vector length 4 remark #15427: loop was completely unrolled remark #15417: vectorization support: number of FP up converts: single precision to double precision 1 [ ./datatype_3d.f90(49,4) ] remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(65,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(65,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(65,1) remark #25408: memset generated remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(65,1) <Peeled loop for vectorization> remark #15389: vectorization support: reference D(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15335: peel loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override remark #15305: vectorization support: vector length 16 remark #15309: vectorization support: normalized vectorization overhead 0.058 remark #25015: Estimate of max trip count of loop=31 LOOP END LOOP BEGIN at ./datatype_3d.f90(65,1) remark #15388: vectorization support: reference D(:,:,:) has aligned access remark #15305: vectorization support: vector length 32 remark #15309: vectorization support: normalized vectorization overhead 3.333 remark #15300: LOOP WAS VECTORIZED remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 2 remark #15477: vector cost: 0.090 remark #15478: estimated potential speedup: 6.850 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=3 LOOP END LOOP BEGIN at ./datatype_3d.f90(65,1) <Remainder loop for vectorization> remark #15388: vectorization support: reference D(:,:,:) has aligned access remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override remark #15305: vectorization support: vector length 16 remark #15309: vectorization support: normalized vectorization overhead 0.058 remark #25015: Estimate of max trip count of loop=96 LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(73,15) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(69,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(70,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(72,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(73,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(73,15) ] remark #15388: vectorization support: reference D(k,j,i) has aligned access [ ./datatype_3d.f90(73,26) ] remark #15305: vectorization support: vector length 8 remark #15399: vectorization support: unroll factor set to 2 remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 1 remark #15449: unmasked aligned unit stride stores: 1 remark #15450: unmasked unaligned unit stride loads: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 10 remark #15477: vector cost: 1.000 remark #15478: estimated potential speedup: 9.760 remark #15487: type converts: 1 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=31 LOOP END LOOP BEGIN at ./datatype_3d.f90(72,4) <Remainder loop for vectorization> remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(73,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(73,15) ] remark #15388: vectorization support: reference D(k,j,i) has aligned access [ ./datatype_3d.f90(73,26) ] remark #15305: vectorization support: vector length 4 remark #15427: loop was completely unrolled remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(86,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(86,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(86,1) <Peeled loop for vectorization> remark #15389: vectorization support: reference E(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15335: peel loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override remark #15305: vectorization support: vector length 4 remark #15309: vectorization support: normalized vectorization overhead 0.938 remark #25015: Estimate of max trip count of loop=15 LOOP END LOOP BEGIN at ./datatype_3d.f90(86,1) remark #15389: vectorization support: reference E(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 16 remark #15309: vectorization support: normalized vectorization overhead 3.333 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 2 remark #15477: vector cost: 0.180 remark #15478: estimated potential speedup: 8.210 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at ./datatype_3d.f90(86,1) <Remainder loop for vectorization> remark #15389: vectorization support: reference E(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 1.250 remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(94,15) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(90,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(91,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(93,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(94,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(94,15) ] remark #15388: vectorization support: reference E(k,j,i) has aligned access [ ./datatype_3d.f90(94,26) ] remark #15305: vectorization support: vector length 8 remark #15399: vectorization support: unroll factor set to 2 remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 2 remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 10 remark #15477: vector cost: 1.000 remark #15478: estimated potential speedup: 9.760 remark #15487: type converts: 1 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=31 LOOP END LOOP BEGIN at ./datatype_3d.f90(93,4) <Remainder loop for vectorization> remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(94,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(94,15) ] remark #15388: vectorization support: reference E(k,j,i) has aligned access [ ./datatype_3d.f90(94,26) ] remark #15305: vectorization support: vector length 4 remark #15427: loop was completely unrolled remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(106,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(106,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(106,1) <Peeled loop for vectorization> remark #15389: vectorization support: reference F(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15335: peel loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override remark #15305: vectorization support: vector length 2 remark #15309: vectorization support: normalized vectorization overhead 1.250 remark #25015: Estimate of max trip count of loop=7 LOOP END LOOP BEGIN at ./datatype_3d.f90(106,1) remark #15389: vectorization support: reference F(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 3.333 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 2 remark #15477: vector cost: 0.370 remark #15478: estimated potential speedup: 4.610 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at ./datatype_3d.f90(106,1) <Remainder loop for vectorization> remark #15389: vectorization support: reference F(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 8 remark #15309: vectorization support: normalized vectorization overhead 1.364 remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(114,15) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(110,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(111,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(113,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(114,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(114,15) ] remark #15388: vectorization support: reference F(k,j,i) has aligned access [ ./datatype_3d.f90(114,26) ] remark #15410: vectorization support: conversion from int to float will be emulated [ ./datatype_3d.f90(114,26) ] remark #15305: vectorization support: vector length 8 remark #15399: vectorization support: unroll factor set to 2 remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 2 remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 10 remark #15477: vector cost: 2.120 remark #15478: estimated potential speedup: 4.590 remark #15487: type converts: 1 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=31 LOOP END LOOP BEGIN at ./datatype_3d.f90(113,4) <Remainder loop for vectorization> remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(114,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(114,15) ] remark #15388: vectorization support: reference F(k,j,i) has aligned access [ ./datatype_3d.f90(114,26) ] remark #15410: vectorization support: conversion from int to float will be emulated [ ./datatype_3d.f90(114,26) ] remark #15305: vectorization support: vector length 4 remark #15427: loop was completely unrolled remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(126,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(126,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(126,1) remark #25408: memset generated remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(126,1) remark #15389: vectorization support: reference H(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 16 remark #15309: vectorization support: normalized vectorization overhead 0.600 remark #15300: LOOP WAS VECTORIZED remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 2 remark #15477: vector cost: 0.310 remark #15478: estimated potential speedup: 2.660 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=1 LOOP END LOOP BEGIN at ./datatype_3d.f90(126,1) <Remainder loop for vectorization> remark #15389: vectorization support: reference H(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override remark #15305: vectorization support: vector length 4 remark #15309: vectorization support: normalized vectorization overhead 0.938 remark #25015: Estimate of max trip count of loop=24 LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(134,15) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(130,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(131,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(133,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(134,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(134,15) ] remark #15388: vectorization support: reference H(k,j,i) has aligned access [ ./datatype_3d.f90(134,26) ] remark #15305: vectorization support: vector length 8 remark #15399: vectorization support: unroll factor set to 2 remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 2 remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 10 remark #15477: vector cost: 1.000 remark #15478: estimated potential speedup: 9.760 remark #15487: type converts: 1 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=31 LOOP END LOOP BEGIN at ./datatype_3d.f90(133,4) <Remainder loop for vectorization> remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(134,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(134,15) ] remark #15388: vectorization support: reference H(k,j,i) has aligned access [ ./datatype_3d.f90(134,26) ] remark #15305: vectorization support: vector length 4 remark #15427: loop was completely unrolled remark #15301: REMAINDER LOOP WAS VECTORIZED LOOP END LOOP END LOOP END LOOP END Report from: Code generation optimizations [cg] ./datatype_3d.f90(65,1):remark #34014: optimization advice for memset: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation ./datatype_3d.f90(65,1):remark #34026: call to memset implemented as a call to optimized library version ./datatype_3d.f90(126,1):remark #34014: optimization advice for memset: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation ./datatype_3d.f90(126,1):remark #34026: call to memset implemented as a call to optimized library version ./datatype_3d.f90(1,9):remark #34051: REGISTER ALLOCATION : [MAIN__.Z] ./datatype_3d.f90:1 Hardware registers Reserved : 2[ rsp rip] Available : 63[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm31 k0-k7] Callee-save : 6[ rbx rbp r12-r15] Assigned : 49[ rax rdx rcx rbx rsi rdi r8-r15 zmm0-zmm31 k1-k3] Routine temporaries Total : 1283 Global : 448 Local : 835 Regenerable : 212 Spilled : 179 Routine stack Variables : 276 bytes* Reads : 10 [8.00e+00 ~ 0.0%] Writes : 26 [2.40e+01 ~ 0.0%] Spills : 1432 bytes* Reads : 223 [6.46e+03 ~ 3.1%] Writes : 195 [2.18e+03 ~ 1.0%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. =========================================================================== Begin optimization report for: DATATYPE [generic] Report from: Interprocedural optimizations [ipo] INLINE REPORT: (DATATYPE) [1/1=100.0%] ./datatype_3d.f90(1,9) -> EXTERN: (1,9) for_set_reentrancy -> EXTERN: (17,1) for_alloc_allocatable -> EXTERN: (17,1) for_check_mult_overflow64 -> EXTERN: (17,1) for_alloc_allocatable -> EXTERN: (17,1) for_check_mult_overflow64 -> EXTERN: (17,1) for_alloc_allocatable -> EXTERN: (17,1) for_check_mult_overflow64 -> EXTERN: (23,8) omp_get_wtime -> EXTERN: (34,8) omp_get_wtime -> EXTERN: (36,1) for_write_seq_lis_xmit -> EXTERN: (36,1) for_write_seq_lis -> EXTERN: (38,1) for_dealloc_allocatable -> EXTERN: (40,1) for_alloc_allocatable -> EXTERN: (40,1) for_check_mult_overflow64 -> EXTERN: (43,8) omp_get_wtime -> EXTERN: (54,8) omp_get_wtime -> EXTERN: (56,1) for_write_seq_lis_xmit -> EXTERN: (56,1) for_write_seq_lis -> EXTERN: (59,1) for_dealloc_allocatable -> EXTERN: (64,1) for_alloc_allocatable -> EXTERN: (64,1) for_check_mult_overflow64 -> EXTERN: (67,8) omp_get_wtime -> EXTERN: (78,8) omp_get_wtime -> EXTERN: (80,1) for_write_seq_lis_xmit -> EXTERN: (80,1) for_write_seq_lis -> EXTERN: (83,1) for_dealloc_allocatable -> EXTERN: (85,1) for_alloc_allocatable -> EXTERN: (85,1) for_check_mult_overflow64 -> EXTERN: (88,8) omp_get_wtime -> EXTERN: (99,8) omp_get_wtime -> EXTERN: (101,1) for_write_seq_lis_xmit -> EXTERN: (101,1) for_write_seq_lis -> EXTERN: (103,1) for_dealloc_allocatable -> EXTERN: (105,1) for_alloc_allocatable -> EXTERN: (105,1) for_check_mult_overflow64 -> EXTERN: (108,8) omp_get_wtime -> EXTERN: (119,8) omp_get_wtime -> EXTERN: (121,1) for_write_seq_lis_xmit -> EXTERN: (121,1) for_write_seq_lis -> EXTERN: (123,1) for_dealloc_allocatable -> EXTERN: (125,1) for_alloc_allocatable -> EXTERN: (125,1) for_check_mult_overflow64 -> EXTERN: (128,8) omp_get_wtime -> EXTERN: (139,8) omp_get_wtime -> EXTERN: (141,1) for_write_seq_lis_xmit -> EXTERN: (141,1) for_write_seq_lis Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par] LOOP BEGIN at ./datatype_3d.f90(19,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(19,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(19,1) <Peeled loop for vectorization> remark #25015: Estimate of max trip count of loop=1 LOOP END LOOP BEGIN at ./datatype_3d.f90(19,1) remark #15388: vectorization support: reference A(:,:,:) has aligned access remark #15305: vectorization support: vector length 2 remark #15399: vectorization support: unroll factor set to 4 remark #15309: vectorization support: normalized vectorization overhead 0.833 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 4 remark #15477: vector cost: 1.500 remark #15478: estimated potential speedup: 2.550 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at ./datatype_3d.f90(19,1) <Remainder loop for vectorization> LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(20,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(20,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(20,1) <Peeled loop for vectorization> remark #25015: Estimate of max trip count of loop=1 LOOP END LOOP BEGIN at ./datatype_3d.f90(20,1) remark #15388: vectorization support: reference B(:,:,:) has aligned access remark #15305: vectorization support: vector length 2 remark #15399: vectorization support: unroll factor set to 4 remark #15309: vectorization support: normalized vectorization overhead 0.833 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 4 remark #15477: vector cost: 1.500 remark #15478: estimated potential speedup: 2.550 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at ./datatype_3d.f90(20,1) <Remainder loop for vectorization> LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(29,35) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(25,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(26,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(28,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(29,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(29,15) ] remark #15388: vectorization support: reference B(k,j,i) has aligned access [ ./datatype_3d.f90(29,26) ] remark #15305: vectorization support: vector length 2 remark #15399: vectorization support: unroll factor set to 4 remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 2 remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 9 remark #15477: vector cost: 4.000 remark #15478: estimated potential speedup: 2.220 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=62 LOOP END LOOP BEGIN at ./datatype_3d.f90(28,4) <Remainder loop for vectorization> remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(29,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(29,15) ] remark #15388: vectorization support: reference B(k,j,i) has aligned access [ ./datatype_3d.f90(29,26) ] remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override remark #15305: vectorization support: vector length 2 remark #15309: vectorization support: normalized vectorization overhead 0.714 remark #25436: completely unrolled by 4 LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(41,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(41,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(41,1) <Peeled loop for vectorization> remark #25015: Estimate of max trip count of loop=3 LOOP END LOOP BEGIN at ./datatype_3d.f90(41,1) remark #15388: vectorization support: reference G(:,:,:) has aligned access remark #15305: vectorization support: vector length 4 remark #15399: vectorization support: unroll factor set to 2 remark #15309: vectorization support: normalized vectorization overhead 1.667 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 3 remark #15477: vector cost: 0.750 remark #15478: estimated potential speedup: 3.680 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at ./datatype_3d.f90(41,1) <Remainder loop for vectorization> LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(49,15) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(45,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(46,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(48,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(49,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(49,15) ] remark #15388: vectorization support: reference G(k,j,i) has aligned access [ ./datatype_3d.f90(49,26) ] remark #15305: vectorization support: vector length 2 remark #15399: vectorization support: unroll factor set to 4 remark #15417: vectorization support: number of FP up converts: single precision to double precision 1 [ ./datatype_3d.f90(49,4) ] remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 1 remark #15449: unmasked aligned unit stride stores: 1 remark #15450: unmasked unaligned unit stride loads: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 10 remark #15477: vector cost: 4.500 remark #15478: estimated potential speedup: 2.200 remark #15487: type converts: 1 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=62 LOOP END LOOP BEGIN at ./datatype_3d.f90(48,4) <Remainder loop for vectorization> remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(49,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(49,15) ] remark #15388: vectorization support: reference G(k,j,i) has aligned access [ ./datatype_3d.f90(49,26) ] remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override remark #15305: vectorization support: vector length 2 remark #15309: vectorization support: normalized vectorization overhead 0.667 remark #15417: vectorization support: number of FP up converts: single precision to double precision 1 [ ./datatype_3d.f90(49,4) ] remark #25436: completely unrolled by 4 LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(65,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(65,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(65,1) remark #25408: memset generated remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(65,1) remark #15389: vectorization support: reference D(:,:,:) has unaligned access remark #15381: vectorization support: unaligned access used inside loop body remark #15305: vectorization support: vector length 16 remark #15309: vectorization support: normalized vectorization overhead 0.500 remark #15300: LOOP WAS VECTORIZED remark #15451: unmasked unaligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 2 remark #15477: vector cost: 0.370 remark #15478: estimated potential speedup: 4.920 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=6 LOOP END LOOP BEGIN at ./datatype_3d.f90(65,1) <Remainder loop for vectorization> remark #25015: Estimate of max trip count of loop=96 LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(73,15) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(69,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(70,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(72,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(73,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(73,15) ] remark #15388: vectorization support: reference D(k,j,i) has aligned access [ ./datatype_3d.f90(73,26) ] remark #15305: vectorization support: vector length 2 remark #15399: vectorization support: unroll factor set to 4 remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 1 remark #15449: unmasked aligned unit stride stores: 1 remark #15450: unmasked unaligned unit stride loads: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 11 remark #15477: vector cost: 4.500 remark #15478: estimated potential speedup: 2.410 remark #15487: type converts: 1 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=62 LOOP END LOOP BEGIN at ./datatype_3d.f90(72,4) <Remainder loop for vectorization> remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(73,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(73,15) ] remark #15388: vectorization support: reference D(k,j,i) has aligned access [ ./datatype_3d.f90(73,26) ] remark #15305: vectorization support: vector length 2 remark #15309: vectorization support: normalized vectorization overhead 0.667 remark #15301: REMAINDER LOOP WAS VECTORIZED remark #25015: Estimate of max trip count of loop=2 LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(86,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(86,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(86,1) <Peeled loop for vectorization> remark #25015: Estimate of max trip count of loop=3 LOOP END LOOP BEGIN at ./datatype_3d.f90(86,1) remark #15388: vectorization support: reference E(:,:,:) has aligned access remark #15305: vectorization support: vector length 4 remark #15309: vectorization support: normalized vectorization overhead 3.333 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 2 remark #15477: vector cost: 0.750 remark #15478: estimated potential speedup: 2.500 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at ./datatype_3d.f90(86,1) <Remainder loop for vectorization> LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(94,15) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(90,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(91,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(93,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(94,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(94,15) ] remark #15388: vectorization support: reference E(k,j,i) has aligned access [ ./datatype_3d.f90(94,26) ] remark #15305: vectorization support: vector length 2 remark #15399: vectorization support: unroll factor set to 4 remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 1 remark #15449: unmasked aligned unit stride stores: 1 remark #15450: unmasked unaligned unit stride loads: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 11 remark #15477: vector cost: 4.500 remark #15478: estimated potential speedup: 2.410 remark #15487: type converts: 1 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=62 LOOP END LOOP BEGIN at ./datatype_3d.f90(93,4) <Remainder loop for vectorization> remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(94,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(94,15) ] remark #15388: vectorization support: reference E(k,j,i) has aligned access [ ./datatype_3d.f90(94,26) ] remark #15305: vectorization support: vector length 2 remark #15309: vectorization support: normalized vectorization overhead 0.667 remark #15301: REMAINDER LOOP WAS VECTORIZED remark #25015: Estimate of max trip count of loop=2 LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(106,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(106,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(106,1) <Peeled loop for vectorization> remark #25015: Estimate of max trip count of loop=1 LOOP END LOOP BEGIN at ./datatype_3d.f90(106,1) remark #15388: vectorization support: reference F(:,:,:) has aligned access remark #15305: vectorization support: vector length 2 remark #15399: vectorization support: unroll factor set to 4 remark #15309: vectorization support: normalized vectorization overhead 0.833 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 2 remark #15477: vector cost: 1.500 remark #15478: estimated potential speedup: 1.290 remark #15488: --- end vector cost summary --- LOOP END LOOP BEGIN at ./datatype_3d.f90(106,1) <Remainder loop for vectorization> LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(114,15) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(110,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(111,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(113,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(114,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(114,15) ] remark #15388: vectorization support: reference F(k,j,i) has aligned access [ ./datatype_3d.f90(114,26) ] remark #15410: vectorization support: conversion from int to float will be emulated [ ./datatype_3d.f90(114,26) ] remark #15305: vectorization support: vector length 2 remark #15399: vectorization support: unroll factor set to 4 remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 2 remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 11 remark #15477: vector cost: 9.000 remark #15478: estimated potential speedup: 1.220 remark #15487: type converts: 1 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=62 LOOP END LOOP BEGIN at ./datatype_3d.f90(113,4) <Remainder loop for vectorization> remark #25436: completely unrolled by 4 LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(126,1) remark #25101: Loop Interchange not done due to: Original Order seems proper remark #25452: Original Order found to be proper, but by a close margin remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(126,1) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(126,1) remark #25408: memset generated remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(126,1) <Peeled loop for vectorization> remark #25015: Estimate of max trip count of loop=3 LOOP END LOOP BEGIN at ./datatype_3d.f90(126,1) remark #15388: vectorization support: reference H(:,:,:) has aligned access remark #15305: vectorization support: vector length 4 remark #15309: vectorization support: normalized vectorization overhead 3.333 remark #15300: LOOP WAS VECTORIZED remark #15442: entire loop may be executed in remainder remark #15449: unmasked aligned unit stride stores: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 2 remark #15477: vector cost: 0.750 remark #15478: estimated potential speedup: 1.450 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=6 LOOP END LOOP BEGIN at ./datatype_3d.f90(126,1) <Remainder loop for vectorization> remark #25015: Estimate of max trip count of loop=24 LOOP END LOOP END LOOP END LOOP END LOOP BEGIN at ./datatype_3d.f90(134,15) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(130,2) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(131,3) remark #15542: loop was not vectorized: inner loop was already vectorized LOOP BEGIN at ./datatype_3d.f90(133,4) remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(134,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(134,15) ] remark #15388: vectorization support: reference H(k,j,i) has aligned access [ ./datatype_3d.f90(134,26) ] remark #15305: vectorization support: vector length 2 remark #15399: vectorization support: unroll factor set to 4 remark #15300: LOOP WAS VECTORIZED remark #15448: unmasked aligned unit stride loads: 1 remark #15449: unmasked aligned unit stride stores: 1 remark #15450: unmasked unaligned unit stride loads: 1 remark #15475: --- begin vector cost summary --- remark #15476: scalar cost: 11 remark #15477: vector cost: 4.500 remark #15478: estimated potential speedup: 2.410 remark #15487: type converts: 1 remark #15488: --- end vector cost summary --- remark #25015: Estimate of max trip count of loop=62 LOOP END LOOP BEGIN at ./datatype_3d.f90(133,4) <Remainder loop for vectorization> remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(134,4) ] remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(134,15) ] remark #15388: vectorization support: reference H(k,j,i) has aligned access [ ./datatype_3d.f90(134,26) ] remark #15305: vectorization support: vector length 2 remark #15309: vectorization support: normalized vectorization overhead 0.667 remark #15301: REMAINDER LOOP WAS VECTORIZED remark #25015: Estimate of max trip count of loop=2 LOOP END LOOP END LOOP END LOOP END Report from: Code generation optimizations [cg] ./datatype_3d.f90(65,1):remark #34014: optimization advice for memset: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation ./datatype_3d.f90(65,1):remark #34026: call to memset implemented as a call to optimized library version ./datatype_3d.f90(126,1):remark #34014: optimization advice for memset: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation ./datatype_3d.f90(126,1):remark #34026: call to memset implemented as a call to optimized library version ./datatype_3d.f90(1,9):remark #34051: REGISTER ALLOCATION : [MAIN__.A] ./datatype_3d.f90:1 Hardware registers Reserved : 2[ rsp rip] Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15] Callee-save : 6[ rbx rbp r12-r15] Assigned : 28[ rax rdx rcx rbx rsi rdi r8-r15 zmm0-zmm13] Routine temporaries Total : 1109 Global : 356 Local : 753 Regenerable : 194 Spilled : 137 Routine stack Variables : 276 bytes* Reads : 10 [8.00e+00 ~ 0.0%] Writes : 26 [2.40e+01 ~ 0.0%] Spills : 1056 bytes* Reads : 151 [3.22e+03 ~ 1.3%] Writes : 139 [9.02e+02 ~ 0.4%] Notes *Non-overlapping variables and spills may share stack space, so the total stack size might be less than this. ===========================================================================

Thanks

Eric

For more complete information about compiler optimizations, see our Optimization Notice.