- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all and thanks to help,
I write in Fortran a stupid program that implements a dot product between two arrays , one in double precision and the other changing the datatype.
PROGRAM datatype USE omp_lib implicit none double precision, allocatable,dimension(:,:,:) :: A,B,C integer(kind=1), allocatable,dimension(:,:,:) :: D integer(kind=4), allocatable,dimension(:,:,:) :: E integer(kind=8), allocatable,dimension(:,:,:) :: F real, allocatable,dimension(:,:,:) :: G LOGICAL, allocatable,dimension(:,:,:) :: H integer :: t,i,j,k,size = 500,repetition=40 double precision :: time,time1 ALLOCATE(A(size,size,size),B(size,size,size),C(size,size,size)) A = 4. B = 1. time = omp_get_wtime() do t = 1,repetition do i=1,size do j=1,size do k=1,size !dir$ vector aligned c(k,j,i) = a(k,j,i) * b(k,j,i) +5.2 enddo enddo enddo enddo time = omp_get_wtime() - time print *,"TIME double",time/DBLE(repetition) DEALLOCATE(B) ALLOCATE(G(size,size,size)) G = 240. time = omp_get_wtime() do t = 1,repetition do i=1,size do j=1,size do k=1,size !dir$ vector aligned c(k,j,i) = a(k,j,i) * g(k,j,i) +5.2 enddo enddo enddo enddo time = omp_get_wtime() - time print *,"TIME float",time/DBLE(repetition) DEALLOCATE(G) ALLOCATE(D(size,size,size)) D = 240 time = omp_get_wtime() do t = 1,repetition do i=1,size do j=1,size do k=1,size !dir$ vector aligned c(k,j,i) = a(k,j,i) * d(k,j,i) +5.2 enddo enddo enddo enddo time = omp_get_wtime() - time print *,"TIME int8",time/DBLE(repetition) DEALLOCATE(D) ALLOCATE(E(size,size,size)) e = 240 time = omp_get_wtime() do t = 1,repetition do i=1,size do j=1,size do k=1,size !dir$ vector aligned c(k,j,i) = a(k,j,i) * e(k,j,i) +5.2 enddo enddo enddo enddo time = omp_get_wtime() - time print *,"TIME int32",time/DBLE(repetition) DEALLOCATE(E) ALLOCATE(F(size,size,size)) f = 240 time = omp_get_wtime() do t = 1,repetition do i=1,size do j=1,size do k=1,size !dir$ vector aligned c(k,j,i) = a(k,j,i) * f(k,j,i) +5.2 enddo enddo enddo enddo time = omp_get_wtime() - time print *,"TIME int64",time/DBLE(repetition) DEALLOCATE(F) ALLOCATE(H(size,size,size)) h = .True. time = omp_get_wtime() do t = 1,repetition do i=1,size do j=1,size do k=1,size !dir$ vector aligned c(k,j,i) = a(k,j,i) * h(k,j,i) +5.2 enddo enddo enddo enddo time = omp_get_wtime() - time print *,"TIME logical",time/DBLE(repetition) END PROGRAM
I try this code on Broadwell Intel(R) Xeon(R) E5-2697 v4 @ 2.30GHz and Intel Xeon Phi 7250 KNL.
BROADWELL (1 core) TIME double 0.314651775360107 TIME float 0.256021851301193 TIME int8 0.218752950429916 TIME int32 0.245272749662399 TIME int64 0.319928669929504 TIME logical 0.245576351881027 ------------------------------------------------- KNL (1 core) TIME double 0.545190346240997 TIME float 0.608061379194260 TIME int8 0.749213725328445 TIME int32 0.718595725297928 TIME int64 0.730906349420547 TIME logical 0.544638276100159
On the broadwell architecture the best performance was obtained with double * int 8 and the worst was double * double . I think the better performance on int8 is due to better use of cache that mask the time of cast from int8 to double, is it right?
I don't understand because on KNL the behavour is opposite. I analyzed compiler opt report but in both case the double precision decide the vector lengh so the operation per clock cycle.
Someone can help me to understand this behaviour?
Thanks
Best regards
Eric
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Those vector aligned directives presumably have no effect when placed after the DO. Did you include -align array64byte -xHost in your compiler option line, and check with -opt-report=4 ?
The logical promotion to double precision isn't valid Fortran and should be rejected by ifort if you set appropriate options, such as -standard-semantics.
I don't see how integer(8) could imply different cache behavior from real(8) unless you have alignment issues.
I think your characterization of this as a dot product is misleading.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tim,
ifort -align array64byte -xAVX2 -qopt-report4 -qopenmp -O1 ./datatype_3d.f90 -o ./dt.x TIME double 0.333785974979401 TIME float 0.284216976165771 TIME int1 0.262497049570084 TIME int32 0.265798276662827 TIME int64 0.316275727748871 TIME logical 0.246519052982330
and with -O2
ifort -align array64byte -xAVX2 -qopt-report4 -qopenmp -O2 ./datatype_3d.f90 -o ./dt.x TIME double 0.297000402212143 TIME float 0.246595245599747 TIME int1 0.217311549186707 TIME int32 0.247010648250580 TIME int64 0.294282549619675 TIME logical 0.253364574909210
for -O2 the opt report :
Intel(R) Advisor can now assist with vectorization and show optimization
report messages with your source code.
See "https://software.intel.com/en-us/intel-advisor-xe" for details.
Intel(R) Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 17.0.1.132 Build 20161005
Compiler options: -align array64byte -xAVX2 -qopt-report4 -qopenmp -O2 -o ./dt.x
Report from: Interprocedural optimizations [ipo]
WHOLE PROGRAM (SAFE) [EITHER METHOD]: false
WHOLE PROGRAM (SEEN) [TABLE METHOD]: false
WHOLE PROGRAM (READ) [OBJECT READER METHOD]: false
INLINING OPTION VALUES:
-inline-factor: 100
-inline-min-size: 30
-inline-max-size: 230
-inline-max-total-size: 2000
-inline-max-per-routine: 10000
-inline-max-per-compile: 500000
In the inlining report below:
"sz" refers to the "size" of the routine. The smaller a routine's size,
the more likely it is to be inlined.
"isz" refers to the "inlined size" of the routine. This is the amount
the calling routine will grow if the called routine is inlined into it.
The compiler generally limits the amount a routine can grow by having
routines inlined into it.
Begin optimization report for: DATATYPE
Report from: Interprocedural optimizations [ipo]
INLINE REPORT: (DATATYPE) [1/1=100.0%] ./datatype_3d.f90(1,9)
-> EXTERN: (1,9) for_set_reentrancy
-> EXTERN: (17,1) for_alloc_allocatable
-> EXTERN: (17,1) for_check_mult_overflow64
-> EXTERN: (17,1) for_alloc_allocatable
-> EXTERN: (17,1) for_check_mult_overflow64
-> EXTERN: (17,1) for_alloc_allocatable
-> EXTERN: (17,1) for_check_mult_overflow64
-> EXTERN: (23,8) omp_get_wtime
-> EXTERN: (34,8) omp_get_wtime
-> EXTERN: (36,1) for_write_seq_lis_xmit
-> EXTERN: (36,1) for_write_seq_lis
-> EXTERN: (38,1) for_dealloc_allocatable
-> EXTERN: (40,1) for_alloc_allocatable
-> EXTERN: (40,1) for_check_mult_overflow64
-> EXTERN: (43,8) omp_get_wtime
-> EXTERN: (54,8) omp_get_wtime
-> EXTERN: (56,1) for_write_seq_lis_xmit
-> EXTERN: (56,1) for_write_seq_lis
-> EXTERN: (59,1) for_dealloc_allocatable
-> EXTERN: (64,1) for_alloc_allocatable
-> EXTERN: (64,1) for_check_mult_overflow64
-> EXTERN: (67,8) omp_get_wtime
-> EXTERN: (78,8) omp_get_wtime
-> EXTERN: (80,1) for_write_seq_lis_xmit
-> EXTERN: (80,1) for_write_seq_lis
-> EXTERN: (83,1) for_dealloc_allocatable
-> EXTERN: (85,1) for_alloc_allocatable
-> EXTERN: (85,1) for_check_mult_overflow64
-> EXTERN: (88,8) omp_get_wtime
-> EXTERN: (99,8) omp_get_wtime
-> EXTERN: (101,1) for_write_seq_lis_xmit
-> EXTERN: (101,1) for_write_seq_lis
-> EXTERN: (103,1) for_dealloc_allocatable
-> EXTERN: (105,1) for_alloc_allocatable
-> EXTERN: (105,1) for_check_mult_overflow64
-> EXTERN: (108,8) omp_get_wtime
-> EXTERN: (119,8) omp_get_wtime
-> EXTERN: (121,1) for_write_seq_lis_xmit
-> EXTERN: (121,1) for_write_seq_lis
-> EXTERN: (123,1) for_dealloc_allocatable
-> EXTERN: (125,1) for_alloc_allocatable
-> EXTERN: (125,1) for_check_mult_overflow64
-> EXTERN: (128,8) omp_get_wtime
-> EXTERN: (139,8) omp_get_wtime
-> EXTERN: (141,1) for_write_seq_lis_xmit
-> EXTERN: (141,1) for_write_seq_lis
Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par]
LOOP BEGIN at ./datatype_3d.f90(19,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(19,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(19,1)
<Peeled loop for vectorization>
remark #25015: Estimate of max trip count of loop=3
LOOP END
LOOP BEGIN at ./datatype_3d.f90(19,1)
remark #15388: vectorization support: reference A(:,:,:) has aligned access
remark #15305: vectorization support: vector length 4
remark #15399: vectorization support: unroll factor set to 4
remark #15309: vectorization support: normalized vectorization overhead 0.833
remark #15300: LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 3
remark #15477: vector cost: 0.750
remark #15478: estimated potential speedup: 3.430
remark #15488: --- end vector cost summary ---
LOOP END
LOOP BEGIN at ./datatype_3d.f90(19,1)
<Remainder loop for vectorization>
remark #15389: vectorization support: reference A(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 4
remark #15309: vectorization support: normalized vectorization overhead 2.600
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP BEGIN at ./datatype_3d.f90(19,1)
<Remainder loop for vectorization>
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(20,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(20,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(20,1)
<Peeled loop for vectorization>
remark #25015: Estimate of max trip count of loop=3
LOOP END
LOOP BEGIN at ./datatype_3d.f90(20,1)
remark #15388: vectorization support: reference B(:,:,:) has aligned access
remark #15305: vectorization support: vector length 4
remark #15399: vectorization support: unroll factor set to 4
remark #15309: vectorization support: normalized vectorization overhead 0.833
remark #15300: LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 3
remark #15477: vector cost: 0.750
remark #15478: estimated potential speedup: 3.430
remark #15488: --- end vector cost summary ---
LOOP END
LOOP BEGIN at ./datatype_3d.f90(20,1)
<Remainder loop for vectorization>
remark #15389: vectorization support: reference B(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 4
remark #15309: vectorization support: normalized vectorization overhead 2.600
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP BEGIN at ./datatype_3d.f90(20,1)
<Remainder loop for vectorization>
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(29,35)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(25,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(26,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(28,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(29,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(29,15) ]
remark #15388: vectorization support: reference B(k,j,i) has aligned access [ ./datatype_3d.f90(29,26) ]
remark #15305: vectorization support: vector length 4
remark #15399: vectorization support: unroll factor set to 4
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 2
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 8
remark #15477: vector cost: 1.750
remark #15478: estimated potential speedup: 4.440
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=31
LOOP END
LOOP BEGIN at ./datatype_3d.f90(28,4)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(29,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(29,15) ]
remark #15388: vectorization support: reference B(k,j,i) has aligned access [ ./datatype_3d.f90(29,26) ]
remark #15305: vectorization support: vector length 4
remark #15427: loop was completely unrolled
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(41,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(41,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(41,1)
<Peeled loop for vectorization>
remark #25015: Estimate of max trip count of loop=7
LOOP END
LOOP BEGIN at ./datatype_3d.f90(41,1)
remark #15389: vectorization support: reference G(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 8
remark #15399: vectorization support: unroll factor set to 2
remark #15309: vectorization support: normalized vectorization overhead 1.667
remark #15300: LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15451: unmasked unaligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 3
remark #15477: vector cost: 0.370
remark #15478: estimated potential speedup: 5.840
remark #15488: --- end vector cost summary ---
LOOP END
LOOP BEGIN at ./datatype_3d.f90(41,1)
<Remainder loop for vectorization>
remark #15389: vectorization support: reference G(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override
remark #15305: vectorization support: vector length 4
remark #15309: vectorization support: normalized vectorization overhead 2.167
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(49,15)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(45,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(46,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(48,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(49,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(49,15) ]
remark #15388: vectorization support: reference G(k,j,i) has aligned access [ ./datatype_3d.f90(49,26) ]
remark #15305: vectorization support: vector length 4
remark #15399: vectorization support: unroll factor set to 4
remark #15417: vectorization support: number of FP up converts: single precision to double precision 1 [ ./datatype_3d.f90(49,4) ]
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 2
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 9
remark #15477: vector cost: 2.000
remark #15478: estimated potential speedup: 4.370
remark #15487: type converts: 1
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=31
LOOP END
LOOP BEGIN at ./datatype_3d.f90(48,4)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(49,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(49,15) ]
remark #15388: vectorization support: reference G(k,j,i) has aligned access [ ./datatype_3d.f90(49,26) ]
remark #15305: vectorization support: vector length 4
remark #15427: loop was completely unrolled
remark #15417: vectorization support: number of FP up converts: single precision to double precision 1 [ ./datatype_3d.f90(49,4) ]
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(65,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(65,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(65,1)
remark #25408: memset generated
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(65,1)
remark #15389: vectorization support: reference D(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 32
remark #15309: vectorization support: normalized vectorization overhead 0.600
remark #15300: LOOP WAS VECTORIZED
remark #15451: unmasked unaligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 2
remark #15477: vector cost: 0.150
remark #15478: estimated potential speedup: 10.660
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=3
LOOP END
LOOP BEGIN at ./datatype_3d.f90(65,1)
<Remainder loop for vectorization>
remark #25015: Estimate of max trip count of loop=96
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(73,15)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(69,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(70,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(72,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(73,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(73,15) ]
remark #15388: vectorization support: reference D(k,j,i) has aligned access [ ./datatype_3d.f90(73,26) ]
remark #15305: vectorization support: vector length 4
remark #15399: vectorization support: unroll factor set to 4
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 1
remark #15449: unmasked aligned unit stride stores: 1
remark #15450: unmasked unaligned unit stride loads: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 10
remark #15477: vector cost: 2.000
remark #15478: estimated potential speedup: 4.840
remark #15487: type converts: 1
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=31
LOOP END
LOOP BEGIN at ./datatype_3d.f90(72,4)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(73,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(73,15) ]
remark #15388: vectorization support: reference D(k,j,i) has aligned access [ ./datatype_3d.f90(73,26) ]
remark #15305: vectorization support: vector length 4
remark #15427: loop was completely unrolled
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(86,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(86,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(86,1)
<Peeled loop for vectorization>
remark #25015: Estimate of max trip count of loop=7
LOOP END
LOOP BEGIN at ./datatype_3d.f90(86,1)
remark #15389: vectorization support: reference E(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 8
remark #15309: vectorization support: normalized vectorization overhead 3.333
remark #15300: LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15451: unmasked unaligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 2
remark #15477: vector cost: 0.370
remark #15478: estimated potential speedup: 4.220
remark #15488: --- end vector cost summary ---
LOOP END
LOOP BEGIN at ./datatype_3d.f90(86,1)
<Remainder loop for vectorization>
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(94,15)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(90,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(91,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(93,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(94,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(94,15) ]
remark #15388: vectorization support: reference E(k,j,i) has aligned access [ ./datatype_3d.f90(94,26) ]
remark #15305: vectorization support: vector length 4
remark #15399: vectorization support: unroll factor set to 4
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 2
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 10
remark #15477: vector cost: 2.000
remark #15478: estimated potential speedup: 4.840
remark #15487: type converts: 1
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=31
LOOP END
LOOP BEGIN at ./datatype_3d.f90(93,4)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(94,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(94,15) ]
remark #15388: vectorization support: reference E(k,j,i) has aligned access [ ./datatype_3d.f90(94,26) ]
remark #15305: vectorization support: vector length 4
remark #15427: loop was completely unrolled
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(106,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(106,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(106,1)
<Peeled loop for vectorization>
remark #25015: Estimate of max trip count of loop=3
LOOP END
LOOP BEGIN at ./datatype_3d.f90(106,1)
remark #15389: vectorization support: reference F(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 4
remark #15309: vectorization support: normalized vectorization overhead 3.333
remark #15300: LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15451: unmasked unaligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 2
remark #15477: vector cost: 0.750
remark #15478: estimated potential speedup: 2.500
remark #15488: --- end vector cost summary ---
LOOP END
LOOP BEGIN at ./datatype_3d.f90(106,1)
<Remainder loop for vectorization>
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(114,15)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(110,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(111,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(113,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(114,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(114,15) ]
remark #15388: vectorization support: reference F(k,j,i) has aligned access [ ./datatype_3d.f90(114,26) ]
remark #15410: vectorization support: conversion from int to float will be emulated [ ./datatype_3d.f90(114,26) ]
remark #15305: vectorization support: vector length 4
remark #15399: vectorization support: unroll factor set to 4
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 2
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 10
remark #15477: vector cost: 4.250
remark #15478: estimated potential speedup: 2.320
remark #15487: type converts: 1
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=31
LOOP END
LOOP BEGIN at ./datatype_3d.f90(113,4)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(114,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(114,15) ]
remark #15388: vectorization support: reference F(k,j,i) has aligned access [ ./datatype_3d.f90(114,26) ]
remark #15410: vectorization support: conversion from int to float will be emulated [ ./datatype_3d.f90(114,26) ]
remark #15305: vectorization support: vector length 4
remark #15427: loop was completely unrolled
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(126,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(126,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(126,1)
remark #25408: memset generated
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(126,1)
remark #15389: vectorization support: reference H(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 8
remark #15309: vectorization support: normalized vectorization overhead 0.600
remark #15300: LOOP WAS VECTORIZED
remark #15451: unmasked unaligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 2
remark #15477: vector cost: 0.620
remark #15478: estimated potential speedup: 2.660
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=3
LOOP END
LOOP BEGIN at ./datatype_3d.f90(126,1)
<Remainder loop for vectorization>
remark #25015: Estimate of max trip count of loop=24
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(134,15)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(130,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(131,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(133,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(134,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(134,15) ]
remark #15388: vectorization support: reference H(k,j,i) has aligned access [ ./datatype_3d.f90(134,26) ]
remark #15305: vectorization support: vector length 4
remark #15399: vectorization support: unroll factor set to 4
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 2
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 10
remark #15477: vector cost: 2.000
remark #15478: estimated potential speedup: 4.840
remark #15487: type converts: 1
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=31
LOOP END
LOOP BEGIN at ./datatype_3d.f90(133,4)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(134,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(134,15) ]
remark #15388: vectorization support: reference H(k,j,i) has aligned access [ ./datatype_3d.f90(134,26) ]
remark #15305: vectorization support: vector length 4
remark #15427: loop was completely unrolled
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP END
Report from: Code generation optimizations [cg]
./datatype_3d.f90(65,1):remark #34014: optimization advice for memset: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation
./datatype_3d.f90(65,1):remark #34026: call to memset implemented as a call to optimized library version
./datatype_3d.f90(126,1):remark #34014: optimization advice for memset: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation
./datatype_3d.f90(126,1):remark #34026: call to memset implemented as a call to optimized library version
./datatype_3d.f90(1,9):remark #34051: REGISTER ALLOCATION : [MAIN__] ./datatype_3d.f90:1
Hardware registers
Reserved : 2[ rsp rip]
Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15]
Callee-save : 6[ rbx rbp r12-r15]
Assigned : 30[ rax rdx rcx rbx rsi rdi r8-r15 zmm0-zmm15]
Routine temporaries
Total : 1144
Global : 390
Local : 754
Regenerable : 193
Spilled : 171
Routine stack
Variables : 276 bytes*
Reads : 10 [8.00e+00 ~ 0.0%]
Writes : 26 [2.40e+01 ~ 0.0%]
Spills : 1328 bytes*
Reads : 183 [5.44e+03 ~ 2.8%]
Writes : 175 [1.17e+03 ~ 0.6%]
Notes
*Non-overlapping variables and spills may share stack space,
so the total stack size might be less than this.
===========================================================================
Thanks
Eric
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Tim,
I also try on KNL
ifort -align array64byte -axMIC-AVX512 -qopt-report4 -qopenmp -O2 ./datatype_3d.f90 -o ./dt.x
TIME double 0.277145397663116
TIME float 0.262126672267914
TIME int1 0.287773197889328
TIME int32 0.259244447946548
TIME int64 0.570451277494431
TIME logical 0.262114471197128
The results appears more raesonable, the floats have the better time. But on opt report the longest vector leght is for 8 bit integer even if it is mulptiply with double.
Intel(R) Advisor can now assist with vectorization and show optimization
report messages with your source code.
See "https://software.intel.com/en-us/intel-advisor-xe" for details.
Intel(R) Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 17.0.1.132 Build 20161005
Compiler options: -align array64byte -axMIC-AVX512 -qopt-report4 -qopenmp -O2 -o ./dt.x
Report from: Interprocedural optimizations [ipo]
WHOLE PROGRAM (SAFE) [EITHER METHOD]: false
WHOLE PROGRAM (SEEN) [TABLE METHOD]: false
WHOLE PROGRAM (READ) [OBJECT READER METHOD]: false
INLINING OPTION VALUES:
-inline-factor: 100
-inline-min-size: 30
-inline-max-size: 230
-inline-max-total-size: 2000
-inline-max-per-routine: 10000
-inline-max-per-compile: 500000
In the inlining report below:
"sz" refers to the "size" of the routine. The smaller a routine's size,
the more likely it is to be inlined.
"isz" refers to the "inlined size" of the routine. This is the amount
the calling routine will grow if the called routine is inlined into it.
The compiler generally limits the amount a routine can grow by having
routines inlined into it.
Begin optimization report for: DATATYPE
Report from: Interprocedural optimizations [ipo]
INLINE REPORT: (DATATYPE) [1/1=100.0%] ./datatype_3d.f90(1,9)
-> EXTERN: (1,9) for_set_reentrancy
-> EXTERN: (17,1) for_alloc_allocatable
-> EXTERN: (17,1) for_check_mult_overflow64
-> EXTERN: (17,1) for_alloc_allocatable
-> EXTERN: (17,1) for_check_mult_overflow64
-> EXTERN: (17,1) for_alloc_allocatable
-> EXTERN: (17,1) for_check_mult_overflow64
-> EXTERN: (23,8) omp_get_wtime
-> EXTERN: (34,8) omp_get_wtime
-> EXTERN: (36,1) for_write_seq_lis_xmit
-> EXTERN: (36,1) for_write_seq_lis
-> EXTERN: (38,1) for_dealloc_allocatable
-> EXTERN: (40,1) for_alloc_allocatable
-> EXTERN: (40,1) for_check_mult_overflow64
-> EXTERN: (43,8) omp_get_wtime
-> EXTERN: (54,8) omp_get_wtime
-> EXTERN: (56,1) for_write_seq_lis_xmit
-> EXTERN: (56,1) for_write_seq_lis
-> EXTERN: (59,1) for_dealloc_allocatable
-> EXTERN: (64,1) for_alloc_allocatable
-> EXTERN: (64,1) for_check_mult_overflow64
-> EXTERN: (67,8) omp_get_wtime
-> EXTERN: (78,8) omp_get_wtime
-> EXTERN: (80,1) for_write_seq_lis_xmit
-> EXTERN: (80,1) for_write_seq_lis
-> EXTERN: (83,1) for_dealloc_allocatable
-> EXTERN: (85,1) for_alloc_allocatable
-> EXTERN: (85,1) for_check_mult_overflow64
-> EXTERN: (88,8) omp_get_wtime
-> EXTERN: (99,8) omp_get_wtime
-> EXTERN: (101,1) for_write_seq_lis_xmit
-> EXTERN: (101,1) for_write_seq_lis
-> EXTERN: (103,1) for_dealloc_allocatable
-> EXTERN: (105,1) for_alloc_allocatable
-> EXTERN: (105,1) for_check_mult_overflow64
-> EXTERN: (108,8) omp_get_wtime
-> EXTERN: (119,8) omp_get_wtime
-> EXTERN: (121,1) for_write_seq_lis_xmit
-> EXTERN: (121,1) for_write_seq_lis
-> EXTERN: (123,1) for_dealloc_allocatable
-> EXTERN: (125,1) for_alloc_allocatable
-> EXTERN: (125,1) for_check_mult_overflow64
-> EXTERN: (128,8) omp_get_wtime
-> EXTERN: (139,8) omp_get_wtime
-> EXTERN: (141,1) for_write_seq_lis_xmit
-> EXTERN: (141,1) for_write_seq_lis
Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par]
LOOP BEGIN at ./datatype_3d.f90(19,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(19,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(19,1)
<Peeled loop for vectorization>
remark #15389: vectorization support: reference A(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15335: peel loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override
remark #15305: vectorization support: vector length 2
remark #15309: vectorization support: normalized vectorization overhead 1.250
remark #25015: Estimate of max trip count of loop=7
LOOP END
LOOP BEGIN at ./datatype_3d.f90(19,1)
remark #15388: vectorization support: reference A(:,:,:) has aligned access
remark #15305: vectorization support: vector length 8
remark #15399: vectorization support: unroll factor set to 2
remark #15309: vectorization support: normalized vectorization overhead 1.667
remark #15300: LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 3
remark #15477: vector cost: 0.370
remark #15478: estimated potential speedup: 6.920
remark #15488: --- end vector cost summary ---
LOOP END
LOOP BEGIN at ./datatype_3d.f90(19,1)
<Remainder loop for vectorization>
remark #15389: vectorization support: reference A(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 8
remark #15309: vectorization support: normalized vectorization overhead 1.364
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(20,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(20,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(20,1)
<Peeled loop for vectorization>
remark #15389: vectorization support: reference B(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15335: peel loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override
remark #15305: vectorization support: vector length 2
remark #15309: vectorization support: normalized vectorization overhead 1.250
remark #25015: Estimate of max trip count of loop=7
LOOP END
LOOP BEGIN at ./datatype_3d.f90(20,1)
remark #15388: vectorization support: reference B(:,:,:) has aligned access
remark #15305: vectorization support: vector length 8
remark #15399: vectorization support: unroll factor set to 2
remark #15309: vectorization support: normalized vectorization overhead 1.667
remark #15300: LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 3
remark #15477: vector cost: 0.370
remark #15478: estimated potential speedup: 6.920
remark #15488: --- end vector cost summary ---
LOOP END
LOOP BEGIN at ./datatype_3d.f90(20,1)
<Remainder loop for vectorization>
remark #15389: vectorization support: reference B(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 8
remark #15309: vectorization support: normalized vectorization overhead 1.364
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(29,35)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(25,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(26,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(28,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(29,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(29,15) ]
remark #15388: vectorization support: reference B(k,j,i) has aligned access [ ./datatype_3d.f90(29,26) ]
remark #15305: vectorization support: vector length 8
remark #15399: vectorization support: unroll factor set to 2
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 2
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 8
remark #15477: vector cost: 0.870
remark #15478: estimated potential speedup: 8.920
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=31
LOOP END
LOOP BEGIN at ./datatype_3d.f90(28,4)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(29,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(29,15) ]
remark #15388: vectorization support: reference B(k,j,i) has aligned access [ ./datatype_3d.f90(29,26) ]
remark #15305: vectorization support: vector length 4
remark #15427: loop was completely unrolled
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(41,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(41,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(41,1)
<Peeled loop for vectorization>
remark #15389: vectorization support: reference G(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 8
remark #15309: vectorization support: normalized vectorization overhead 1.250
remark #15301: PEEL LOOP WAS VECTORIZED
remark #25015: Estimate of max trip count of loop=1
LOOP END
LOOP BEGIN at ./datatype_3d.f90(41,1)
remark #15389: vectorization support: reference G(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 16
remark #15399: vectorization support: unroll factor set to 2
remark #15309: vectorization support: normalized vectorization overhead 1.667
remark #15300: LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15451: unmasked unaligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 3
remark #15477: vector cost: 0.180
remark #15478: estimated potential speedup: 11.840
remark #15488: --- end vector cost summary ---
LOOP END
LOOP BEGIN at ./datatype_3d.f90(41,1)
<Remainder loop for vectorization>
remark #15389: vectorization support: reference G(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 8
remark #15309: vectorization support: normalized vectorization overhead 1.250
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(49,15)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(45,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(46,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(48,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(49,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(49,15) ]
remark #15388: vectorization support: reference G(k,j,i) has aligned access [ ./datatype_3d.f90(49,26) ]
remark #15305: vectorization support: vector length 8
remark #15399: vectorization support: unroll factor set to 2
remark #15417: vectorization support: number of FP up converts: single precision to double precision 1 [ ./datatype_3d.f90(49,4) ]
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 2
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 9
remark #15477: vector cost: 1.000
remark #15478: estimated potential speedup: 8.780
remark #15487: type converts: 1
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=31
LOOP END
LOOP BEGIN at ./datatype_3d.f90(48,4)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(49,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(49,15) ]
remark #15388: vectorization support: reference G(k,j,i) has aligned access [ ./datatype_3d.f90(49,26) ]
remark #15305: vectorization support: vector length 4
remark #15427: loop was completely unrolled
remark #15417: vectorization support: number of FP up converts: single precision to double precision 1 [ ./datatype_3d.f90(49,4) ]
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(65,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(65,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(65,1)
remark #25408: memset generated
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(65,1)
<Peeled loop for vectorization>
remark #15389: vectorization support: reference D(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15335: peel loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override
remark #15305: vectorization support: vector length 16
remark #15309: vectorization support: normalized vectorization overhead 0.058
remark #25015: Estimate of max trip count of loop=31
LOOP END
LOOP BEGIN at ./datatype_3d.f90(65,1)
remark #15388: vectorization support: reference D(:,:,:) has aligned access
remark #15305: vectorization support: vector length 32
remark #15309: vectorization support: normalized vectorization overhead 3.333
remark #15300: LOOP WAS VECTORIZED
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 2
remark #15477: vector cost: 0.090
remark #15478: estimated potential speedup: 6.850
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=3
LOOP END
LOOP BEGIN at ./datatype_3d.f90(65,1)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference D(:,:,:) has aligned access
remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override
remark #15305: vectorization support: vector length 16
remark #15309: vectorization support: normalized vectorization overhead 0.058
remark #25015: Estimate of max trip count of loop=96
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(73,15)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(69,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(70,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(72,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(73,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(73,15) ]
remark #15388: vectorization support: reference D(k,j,i) has aligned access [ ./datatype_3d.f90(73,26) ]
remark #15305: vectorization support: vector length 8
remark #15399: vectorization support: unroll factor set to 2
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 1
remark #15449: unmasked aligned unit stride stores: 1
remark #15450: unmasked unaligned unit stride loads: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 10
remark #15477: vector cost: 1.000
remark #15478: estimated potential speedup: 9.760
remark #15487: type converts: 1
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=31
LOOP END
LOOP BEGIN at ./datatype_3d.f90(72,4)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(73,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(73,15) ]
remark #15388: vectorization support: reference D(k,j,i) has aligned access [ ./datatype_3d.f90(73,26) ]
remark #15305: vectorization support: vector length 4
remark #15427: loop was completely unrolled
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(86,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(86,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(86,1)
<Peeled loop for vectorization>
remark #15389: vectorization support: reference E(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15335: peel loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override
remark #15305: vectorization support: vector length 4
remark #15309: vectorization support: normalized vectorization overhead 0.938
remark #25015: Estimate of max trip count of loop=15
LOOP END
LOOP BEGIN at ./datatype_3d.f90(86,1)
remark #15389: vectorization support: reference E(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 16
remark #15309: vectorization support: normalized vectorization overhead 3.333
remark #15300: LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15451: unmasked unaligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 2
remark #15477: vector cost: 0.180
remark #15478: estimated potential speedup: 8.210
remark #15488: --- end vector cost summary ---
LOOP END
LOOP BEGIN at ./datatype_3d.f90(86,1)
<Remainder loop for vectorization>
remark #15389: vectorization support: reference E(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 8
remark #15309: vectorization support: normalized vectorization overhead 1.250
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(94,15)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(90,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(91,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(93,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(94,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(94,15) ]
remark #15388: vectorization support: reference E(k,j,i) has aligned access [ ./datatype_3d.f90(94,26) ]
remark #15305: vectorization support: vector length 8
remark #15399: vectorization support: unroll factor set to 2
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 2
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 10
remark #15477: vector cost: 1.000
remark #15478: estimated potential speedup: 9.760
remark #15487: type converts: 1
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=31
LOOP END
LOOP BEGIN at ./datatype_3d.f90(93,4)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(94,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(94,15) ]
remark #15388: vectorization support: reference E(k,j,i) has aligned access [ ./datatype_3d.f90(94,26) ]
remark #15305: vectorization support: vector length 4
remark #15427: loop was completely unrolled
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(106,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(106,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(106,1)
<Peeled loop for vectorization>
remark #15389: vectorization support: reference F(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15335: peel loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override
remark #15305: vectorization support: vector length 2
remark #15309: vectorization support: normalized vectorization overhead 1.250
remark #25015: Estimate of max trip count of loop=7
LOOP END
LOOP BEGIN at ./datatype_3d.f90(106,1)
remark #15389: vectorization support: reference F(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 8
remark #15309: vectorization support: normalized vectorization overhead 3.333
remark #15300: LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15451: unmasked unaligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 2
remark #15477: vector cost: 0.370
remark #15478: estimated potential speedup: 4.610
remark #15488: --- end vector cost summary ---
LOOP END
LOOP BEGIN at ./datatype_3d.f90(106,1)
<Remainder loop for vectorization>
remark #15389: vectorization support: reference F(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 8
remark #15309: vectorization support: normalized vectorization overhead 1.364
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(114,15)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(110,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(111,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(113,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(114,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(114,15) ]
remark #15388: vectorization support: reference F(k,j,i) has aligned access [ ./datatype_3d.f90(114,26) ]
remark #15410: vectorization support: conversion from int to float will be emulated [ ./datatype_3d.f90(114,26) ]
remark #15305: vectorization support: vector length 8
remark #15399: vectorization support: unroll factor set to 2
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 2
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 10
remark #15477: vector cost: 2.120
remark #15478: estimated potential speedup: 4.590
remark #15487: type converts: 1
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=31
LOOP END
LOOP BEGIN at ./datatype_3d.f90(113,4)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(114,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(114,15) ]
remark #15388: vectorization support: reference F(k,j,i) has aligned access [ ./datatype_3d.f90(114,26) ]
remark #15410: vectorization support: conversion from int to float will be emulated [ ./datatype_3d.f90(114,26) ]
remark #15305: vectorization support: vector length 4
remark #15427: loop was completely unrolled
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(126,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(126,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(126,1)
remark #25408: memset generated
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(126,1)
remark #15389: vectorization support: reference H(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 16
remark #15309: vectorization support: normalized vectorization overhead 0.600
remark #15300: LOOP WAS VECTORIZED
remark #15451: unmasked unaligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 2
remark #15477: vector cost: 0.310
remark #15478: estimated potential speedup: 2.660
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=1
LOOP END
LOOP BEGIN at ./datatype_3d.f90(126,1)
<Remainder loop for vectorization>
remark #15389: vectorization support: reference H(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override
remark #15305: vectorization support: vector length 4
remark #15309: vectorization support: normalized vectorization overhead 0.938
remark #25015: Estimate of max trip count of loop=24
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(134,15)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(130,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(131,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(133,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(134,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(134,15) ]
remark #15388: vectorization support: reference H(k,j,i) has aligned access [ ./datatype_3d.f90(134,26) ]
remark #15305: vectorization support: vector length 8
remark #15399: vectorization support: unroll factor set to 2
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 2
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 10
remark #15477: vector cost: 1.000
remark #15478: estimated potential speedup: 9.760
remark #15487: type converts: 1
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=31
LOOP END
LOOP BEGIN at ./datatype_3d.f90(133,4)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(134,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(134,15) ]
remark #15388: vectorization support: reference H(k,j,i) has aligned access [ ./datatype_3d.f90(134,26) ]
remark #15305: vectorization support: vector length 4
remark #15427: loop was completely unrolled
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP END
Report from: Code generation optimizations [cg]
./datatype_3d.f90(1,9):remark #34051: REGISTER ALLOCATION : [MAIN__] ./datatype_3d.f90:1
Hardware registers
Reserved : 2[ rsp rip]
Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15]
Callee-save : 6[ rbx rbp r12-r15]
Assigned : 2[ rax rdx]
Routine temporaries
Total : 9
Global : 7
Local : 2
Regenerable : 1
Spilled : 0
Routine stack
Variables : 0 bytes*
Reads : 0 [0.00e+00 ~ -nan%]
Writes : 0 [0.00e+00 ~ -nan%]
Spills : 0 bytes*
Reads : 0 [0.00e+00 ~ -nan%]
Writes : 0 [0.00e+00 ~ -nan%]
Notes
*Non-overlapping variables and spills may share stack space,
so the total stack size might be less than this.
===========================================================================
Begin optimization report for: DATATYPE [future_cpu_22]
Report from: Interprocedural optimizations [ipo]
INLINE REPORT: (DATATYPE) [1/1=100.0%] ./datatype_3d.f90(1,9)
-> EXTERN: (1,9) for_set_reentrancy
-> EXTERN: (17,1) for_alloc_allocatable
-> EXTERN: (17,1) for_check_mult_overflow64
-> EXTERN: (17,1) for_alloc_allocatable
-> EXTERN: (17,1) for_check_mult_overflow64
-> EXTERN: (17,1) for_alloc_allocatable
-> EXTERN: (17,1) for_check_mult_overflow64
-> EXTERN: (23,8) omp_get_wtime
-> EXTERN: (34,8) omp_get_wtime
-> EXTERN: (36,1) for_write_seq_lis_xmit
-> EXTERN: (36,1) for_write_seq_lis
-> EXTERN: (38,1) for_dealloc_allocatable
-> EXTERN: (40,1) for_alloc_allocatable
-> EXTERN: (40,1) for_check_mult_overflow64
-> EXTERN: (43,8) omp_get_wtime
-> EXTERN: (54,8) omp_get_wtime
-> EXTERN: (56,1) for_write_seq_lis_xmit
-> EXTERN: (56,1) for_write_seq_lis
-> EXTERN: (59,1) for_dealloc_allocatable
-> EXTERN: (64,1) for_alloc_allocatable
-> EXTERN: (64,1) for_check_mult_overflow64
-> EXTERN: (67,8) omp_get_wtime
-> EXTERN: (78,8) omp_get_wtime
-> EXTERN: (80,1) for_write_seq_lis_xmit
-> EXTERN: (80,1) for_write_seq_lis
-> EXTERN: (83,1) for_dealloc_allocatable
-> EXTERN: (85,1) for_alloc_allocatable
-> EXTERN: (85,1) for_check_mult_overflow64
-> EXTERN: (88,8) omp_get_wtime
-> EXTERN: (99,8) omp_get_wtime
-> EXTERN: (101,1) for_write_seq_lis_xmit
-> EXTERN: (101,1) for_write_seq_lis
-> EXTERN: (103,1) for_dealloc_allocatable
-> EXTERN: (105,1) for_alloc_allocatable
-> EXTERN: (105,1) for_check_mult_overflow64
-> EXTERN: (108,8) omp_get_wtime
-> EXTERN: (119,8) omp_get_wtime
-> EXTERN: (121,1) for_write_seq_lis_xmit
-> EXTERN: (121,1) for_write_seq_lis
-> EXTERN: (123,1) for_dealloc_allocatable
-> EXTERN: (125,1) for_alloc_allocatable
-> EXTERN: (125,1) for_check_mult_overflow64
-> EXTERN: (128,8) omp_get_wtime
-> EXTERN: (139,8) omp_get_wtime
-> EXTERN: (141,1) for_write_seq_lis_xmit
-> EXTERN: (141,1) for_write_seq_lis
Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par]
LOOP BEGIN at ./datatype_3d.f90(19,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(19,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(19,1)
<Peeled loop for vectorization>
remark #15389: vectorization support: reference A(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15335: peel loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override
remark #15305: vectorization support: vector length 2
remark #15309: vectorization support: normalized vectorization overhead 1.250
remark #25015: Estimate of max trip count of loop=7
LOOP END
LOOP BEGIN at ./datatype_3d.f90(19,1)
remark #15388: vectorization support: reference A(:,:,:) has aligned access
remark #15305: vectorization support: vector length 8
remark #15399: vectorization support: unroll factor set to 2
remark #15309: vectorization support: normalized vectorization overhead 1.667
remark #15300: LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 3
remark #15477: vector cost: 0.370
remark #15478: estimated potential speedup: 6.920
remark #15488: --- end vector cost summary ---
LOOP END
LOOP BEGIN at ./datatype_3d.f90(19,1)
<Remainder loop for vectorization>
remark #15389: vectorization support: reference A(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 8
remark #15309: vectorization support: normalized vectorization overhead 1.364
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(20,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(20,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(20,1)
<Peeled loop for vectorization>
remark #15389: vectorization support: reference B(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15335: peel loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override
remark #15305: vectorization support: vector length 2
remark #15309: vectorization support: normalized vectorization overhead 1.250
remark #25015: Estimate of max trip count of loop=7
LOOP END
LOOP BEGIN at ./datatype_3d.f90(20,1)
remark #15388: vectorization support: reference B(:,:,:) has aligned access
remark #15305: vectorization support: vector length 8
remark #15399: vectorization support: unroll factor set to 2
remark #15309: vectorization support: normalized vectorization overhead 1.667
remark #15300: LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 3
remark #15477: vector cost: 0.370
remark #15478: estimated potential speedup: 6.920
remark #15488: --- end vector cost summary ---
LOOP END
LOOP BEGIN at ./datatype_3d.f90(20,1)
<Remainder loop for vectorization>
remark #15389: vectorization support: reference B(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 8
remark #15309: vectorization support: normalized vectorization overhead 1.364
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(29,35)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(25,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(26,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(28,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(29,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(29,15) ]
remark #15388: vectorization support: reference B(k,j,i) has aligned access [ ./datatype_3d.f90(29,26) ]
remark #15305: vectorization support: vector length 8
remark #15399: vectorization support: unroll factor set to 2
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 2
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 8
remark #15477: vector cost: 0.870
remark #15478: estimated potential speedup: 8.920
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=31
LOOP END
LOOP BEGIN at ./datatype_3d.f90(28,4)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(29,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(29,15) ]
remark #15388: vectorization support: reference B(k,j,i) has aligned access [ ./datatype_3d.f90(29,26) ]
remark #15305: vectorization support: vector length 4
remark #15427: loop was completely unrolled
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(41,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(41,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(41,1)
<Peeled loop for vectorization>
remark #15389: vectorization support: reference G(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 8
remark #15309: vectorization support: normalized vectorization overhead 1.250
remark #15301: PEEL LOOP WAS VECTORIZED
remark #25015: Estimate of max trip count of loop=1
LOOP END
LOOP BEGIN at ./datatype_3d.f90(41,1)
remark #15389: vectorization support: reference G(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 16
remark #15399: vectorization support: unroll factor set to 2
remark #15309: vectorization support: normalized vectorization overhead 1.667
remark #15300: LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15451: unmasked unaligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 3
remark #15477: vector cost: 0.180
remark #15478: estimated potential speedup: 11.840
remark #15488: --- end vector cost summary ---
LOOP END
LOOP BEGIN at ./datatype_3d.f90(41,1)
<Remainder loop for vectorization>
remark #15389: vectorization support: reference G(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 8
remark #15309: vectorization support: normalized vectorization overhead 1.250
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(49,15)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(45,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(46,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(48,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(49,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(49,15) ]
remark #15388: vectorization support: reference G(k,j,i) has aligned access [ ./datatype_3d.f90(49,26) ]
remark #15305: vectorization support: vector length 8
remark #15399: vectorization support: unroll factor set to 2
remark #15417: vectorization support: number of FP up converts: single precision to double precision 1 [ ./datatype_3d.f90(49,4) ]
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 2
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 9
remark #15477: vector cost: 1.000
remark #15478: estimated potential speedup: 8.780
remark #15487: type converts: 1
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=31
LOOP END
LOOP BEGIN at ./datatype_3d.f90(48,4)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(49,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(49,15) ]
remark #15388: vectorization support: reference G(k,j,i) has aligned access [ ./datatype_3d.f90(49,26) ]
remark #15305: vectorization support: vector length 4
remark #15427: loop was completely unrolled
remark #15417: vectorization support: number of FP up converts: single precision to double precision 1 [ ./datatype_3d.f90(49,4) ]
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(65,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(65,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(65,1)
remark #25408: memset generated
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(65,1)
<Peeled loop for vectorization>
remark #15389: vectorization support: reference D(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15335: peel loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override
remark #15305: vectorization support: vector length 16
remark #15309: vectorization support: normalized vectorization overhead 0.058
remark #25015: Estimate of max trip count of loop=31
LOOP END
LOOP BEGIN at ./datatype_3d.f90(65,1)
remark #15388: vectorization support: reference D(:,:,:) has aligned access
remark #15305: vectorization support: vector length 32
remark #15309: vectorization support: normalized vectorization overhead 3.333
remark #15300: LOOP WAS VECTORIZED
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 2
remark #15477: vector cost: 0.090
remark #15478: estimated potential speedup: 6.850
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=3
LOOP END
LOOP BEGIN at ./datatype_3d.f90(65,1)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference D(:,:,:) has aligned access
remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override
remark #15305: vectorization support: vector length 16
remark #15309: vectorization support: normalized vectorization overhead 0.058
remark #25015: Estimate of max trip count of loop=96
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(73,15)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(69,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(70,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(72,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(73,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(73,15) ]
remark #15388: vectorization support: reference D(k,j,i) has aligned access [ ./datatype_3d.f90(73,26) ]
remark #15305: vectorization support: vector length 8
remark #15399: vectorization support: unroll factor set to 2
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 1
remark #15449: unmasked aligned unit stride stores: 1
remark #15450: unmasked unaligned unit stride loads: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 10
remark #15477: vector cost: 1.000
remark #15478: estimated potential speedup: 9.760
remark #15487: type converts: 1
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=31
LOOP END
LOOP BEGIN at ./datatype_3d.f90(72,4)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(73,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(73,15) ]
remark #15388: vectorization support: reference D(k,j,i) has aligned access [ ./datatype_3d.f90(73,26) ]
remark #15305: vectorization support: vector length 4
remark #15427: loop was completely unrolled
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(86,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(86,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(86,1)
<Peeled loop for vectorization>
remark #15389: vectorization support: reference E(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15335: peel loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override
remark #15305: vectorization support: vector length 4
remark #15309: vectorization support: normalized vectorization overhead 0.938
remark #25015: Estimate of max trip count of loop=15
LOOP END
LOOP BEGIN at ./datatype_3d.f90(86,1)
remark #15389: vectorization support: reference E(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 16
remark #15309: vectorization support: normalized vectorization overhead 3.333
remark #15300: LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15451: unmasked unaligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 2
remark #15477: vector cost: 0.180
remark #15478: estimated potential speedup: 8.210
remark #15488: --- end vector cost summary ---
LOOP END
LOOP BEGIN at ./datatype_3d.f90(86,1)
<Remainder loop for vectorization>
remark #15389: vectorization support: reference E(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 8
remark #15309: vectorization support: normalized vectorization overhead 1.250
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(94,15)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(90,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(91,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(93,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(94,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(94,15) ]
remark #15388: vectorization support: reference E(k,j,i) has aligned access [ ./datatype_3d.f90(94,26) ]
remark #15305: vectorization support: vector length 8
remark #15399: vectorization support: unroll factor set to 2
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 2
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 10
remark #15477: vector cost: 1.000
remark #15478: estimated potential speedup: 9.760
remark #15487: type converts: 1
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=31
LOOP END
LOOP BEGIN at ./datatype_3d.f90(93,4)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(94,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(94,15) ]
remark #15388: vectorization support: reference E(k,j,i) has aligned access [ ./datatype_3d.f90(94,26) ]
remark #15305: vectorization support: vector length 4
remark #15427: loop was completely unrolled
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(106,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(106,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(106,1)
<Peeled loop for vectorization>
remark #15389: vectorization support: reference F(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15335: peel loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override
remark #15305: vectorization support: vector length 2
remark #15309: vectorization support: normalized vectorization overhead 1.250
remark #25015: Estimate of max trip count of loop=7
LOOP END
LOOP BEGIN at ./datatype_3d.f90(106,1)
remark #15389: vectorization support: reference F(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 8
remark #15309: vectorization support: normalized vectorization overhead 3.333
remark #15300: LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15451: unmasked unaligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 2
remark #15477: vector cost: 0.370
remark #15478: estimated potential speedup: 4.610
remark #15488: --- end vector cost summary ---
LOOP END
LOOP BEGIN at ./datatype_3d.f90(106,1)
<Remainder loop for vectorization>
remark #15389: vectorization support: reference F(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 8
remark #15309: vectorization support: normalized vectorization overhead 1.364
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(114,15)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(110,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(111,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(113,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(114,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(114,15) ]
remark #15388: vectorization support: reference F(k,j,i) has aligned access [ ./datatype_3d.f90(114,26) ]
remark #15410: vectorization support: conversion from int to float will be emulated [ ./datatype_3d.f90(114,26) ]
remark #15305: vectorization support: vector length 8
remark #15399: vectorization support: unroll factor set to 2
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 2
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 10
remark #15477: vector cost: 2.120
remark #15478: estimated potential speedup: 4.590
remark #15487: type converts: 1
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=31
LOOP END
LOOP BEGIN at ./datatype_3d.f90(113,4)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(114,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(114,15) ]
remark #15388: vectorization support: reference F(k,j,i) has aligned access [ ./datatype_3d.f90(114,26) ]
remark #15410: vectorization support: conversion from int to float will be emulated [ ./datatype_3d.f90(114,26) ]
remark #15305: vectorization support: vector length 4
remark #15427: loop was completely unrolled
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(126,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(126,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(126,1)
remark #25408: memset generated
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(126,1)
remark #15389: vectorization support: reference H(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 16
remark #15309: vectorization support: normalized vectorization overhead 0.600
remark #15300: LOOP WAS VECTORIZED
remark #15451: unmasked unaligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 2
remark #15477: vector cost: 0.310
remark #15478: estimated potential speedup: 2.660
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=1
LOOP END
LOOP BEGIN at ./datatype_3d.f90(126,1)
<Remainder loop for vectorization>
remark #15389: vectorization support: reference H(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override
remark #15305: vectorization support: vector length 4
remark #15309: vectorization support: normalized vectorization overhead 0.938
remark #25015: Estimate of max trip count of loop=24
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(134,15)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(130,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(131,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(133,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(134,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(134,15) ]
remark #15388: vectorization support: reference H(k,j,i) has aligned access [ ./datatype_3d.f90(134,26) ]
remark #15305: vectorization support: vector length 8
remark #15399: vectorization support: unroll factor set to 2
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 2
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 10
remark #15477: vector cost: 1.000
remark #15478: estimated potential speedup: 9.760
remark #15487: type converts: 1
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=31
LOOP END
LOOP BEGIN at ./datatype_3d.f90(133,4)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(134,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(134,15) ]
remark #15388: vectorization support: reference H(k,j,i) has aligned access [ ./datatype_3d.f90(134,26) ]
remark #15305: vectorization support: vector length 4
remark #15427: loop was completely unrolled
remark #15301: REMAINDER LOOP WAS VECTORIZED
LOOP END
LOOP END
LOOP END
LOOP END
Report from: Code generation optimizations [cg]
./datatype_3d.f90(65,1):remark #34014: optimization advice for memset: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation
./datatype_3d.f90(65,1):remark #34026: call to memset implemented as a call to optimized library version
./datatype_3d.f90(126,1):remark #34014: optimization advice for memset: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation
./datatype_3d.f90(126,1):remark #34026: call to memset implemented as a call to optimized library version
./datatype_3d.f90(1,9):remark #34051: REGISTER ALLOCATION : [MAIN__.Z] ./datatype_3d.f90:1
Hardware registers
Reserved : 2[ rsp rip]
Available : 63[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm31 k0-k7]
Callee-save : 6[ rbx rbp r12-r15]
Assigned : 49[ rax rdx rcx rbx rsi rdi r8-r15 zmm0-zmm31 k1-k3]
Routine temporaries
Total : 1283
Global : 448
Local : 835
Regenerable : 212
Spilled : 179
Routine stack
Variables : 276 bytes*
Reads : 10 [8.00e+00 ~ 0.0%]
Writes : 26 [2.40e+01 ~ 0.0%]
Spills : 1432 bytes*
Reads : 223 [6.46e+03 ~ 3.1%]
Writes : 195 [2.18e+03 ~ 1.0%]
Notes
*Non-overlapping variables and spills may share stack space,
so the total stack size might be less than this.
===========================================================================
Begin optimization report for: DATATYPE [generic]
Report from: Interprocedural optimizations [ipo]
INLINE REPORT: (DATATYPE) [1/1=100.0%] ./datatype_3d.f90(1,9)
-> EXTERN: (1,9) for_set_reentrancy
-> EXTERN: (17,1) for_alloc_allocatable
-> EXTERN: (17,1) for_check_mult_overflow64
-> EXTERN: (17,1) for_alloc_allocatable
-> EXTERN: (17,1) for_check_mult_overflow64
-> EXTERN: (17,1) for_alloc_allocatable
-> EXTERN: (17,1) for_check_mult_overflow64
-> EXTERN: (23,8) omp_get_wtime
-> EXTERN: (34,8) omp_get_wtime
-> EXTERN: (36,1) for_write_seq_lis_xmit
-> EXTERN: (36,1) for_write_seq_lis
-> EXTERN: (38,1) for_dealloc_allocatable
-> EXTERN: (40,1) for_alloc_allocatable
-> EXTERN: (40,1) for_check_mult_overflow64
-> EXTERN: (43,8) omp_get_wtime
-> EXTERN: (54,8) omp_get_wtime
-> EXTERN: (56,1) for_write_seq_lis_xmit
-> EXTERN: (56,1) for_write_seq_lis
-> EXTERN: (59,1) for_dealloc_allocatable
-> EXTERN: (64,1) for_alloc_allocatable
-> EXTERN: (64,1) for_check_mult_overflow64
-> EXTERN: (67,8) omp_get_wtime
-> EXTERN: (78,8) omp_get_wtime
-> EXTERN: (80,1) for_write_seq_lis_xmit
-> EXTERN: (80,1) for_write_seq_lis
-> EXTERN: (83,1) for_dealloc_allocatable
-> EXTERN: (85,1) for_alloc_allocatable
-> EXTERN: (85,1) for_check_mult_overflow64
-> EXTERN: (88,8) omp_get_wtime
-> EXTERN: (99,8) omp_get_wtime
-> EXTERN: (101,1) for_write_seq_lis_xmit
-> EXTERN: (101,1) for_write_seq_lis
-> EXTERN: (103,1) for_dealloc_allocatable
-> EXTERN: (105,1) for_alloc_allocatable
-> EXTERN: (105,1) for_check_mult_overflow64
-> EXTERN: (108,8) omp_get_wtime
-> EXTERN: (119,8) omp_get_wtime
-> EXTERN: (121,1) for_write_seq_lis_xmit
-> EXTERN: (121,1) for_write_seq_lis
-> EXTERN: (123,1) for_dealloc_allocatable
-> EXTERN: (125,1) for_alloc_allocatable
-> EXTERN: (125,1) for_check_mult_overflow64
-> EXTERN: (128,8) omp_get_wtime
-> EXTERN: (139,8) omp_get_wtime
-> EXTERN: (141,1) for_write_seq_lis_xmit
-> EXTERN: (141,1) for_write_seq_lis
Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par]
LOOP BEGIN at ./datatype_3d.f90(19,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(19,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(19,1)
<Peeled loop for vectorization>
remark #25015: Estimate of max trip count of loop=1
LOOP END
LOOP BEGIN at ./datatype_3d.f90(19,1)
remark #15388: vectorization support: reference A(:,:,:) has aligned access
remark #15305: vectorization support: vector length 2
remark #15399: vectorization support: unroll factor set to 4
remark #15309: vectorization support: normalized vectorization overhead 0.833
remark #15300: LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 4
remark #15477: vector cost: 1.500
remark #15478: estimated potential speedup: 2.550
remark #15488: --- end vector cost summary ---
LOOP END
LOOP BEGIN at ./datatype_3d.f90(19,1)
<Remainder loop for vectorization>
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(20,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(20,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(20,1)
<Peeled loop for vectorization>
remark #25015: Estimate of max trip count of loop=1
LOOP END
LOOP BEGIN at ./datatype_3d.f90(20,1)
remark #15388: vectorization support: reference B(:,:,:) has aligned access
remark #15305: vectorization support: vector length 2
remark #15399: vectorization support: unroll factor set to 4
remark #15309: vectorization support: normalized vectorization overhead 0.833
remark #15300: LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 4
remark #15477: vector cost: 1.500
remark #15478: estimated potential speedup: 2.550
remark #15488: --- end vector cost summary ---
LOOP END
LOOP BEGIN at ./datatype_3d.f90(20,1)
<Remainder loop for vectorization>
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(29,35)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(25,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(26,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(28,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(29,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(29,15) ]
remark #15388: vectorization support: reference B(k,j,i) has aligned access [ ./datatype_3d.f90(29,26) ]
remark #15305: vectorization support: vector length 2
remark #15399: vectorization support: unroll factor set to 4
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 2
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 9
remark #15477: vector cost: 4.000
remark #15478: estimated potential speedup: 2.220
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=62
LOOP END
LOOP BEGIN at ./datatype_3d.f90(28,4)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(29,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(29,15) ]
remark #15388: vectorization support: reference B(k,j,i) has aligned access [ ./datatype_3d.f90(29,26) ]
remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override
remark #15305: vectorization support: vector length 2
remark #15309: vectorization support: normalized vectorization overhead 0.714
remark #25436: completely unrolled by 4
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(41,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(41,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(41,1)
<Peeled loop for vectorization>
remark #25015: Estimate of max trip count of loop=3
LOOP END
LOOP BEGIN at ./datatype_3d.f90(41,1)
remark #15388: vectorization support: reference G(:,:,:) has aligned access
remark #15305: vectorization support: vector length 4
remark #15399: vectorization support: unroll factor set to 2
remark #15309: vectorization support: normalized vectorization overhead 1.667
remark #15300: LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 3
remark #15477: vector cost: 0.750
remark #15478: estimated potential speedup: 3.680
remark #15488: --- end vector cost summary ---
LOOP END
LOOP BEGIN at ./datatype_3d.f90(41,1)
<Remainder loop for vectorization>
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(49,15)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(45,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(46,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(48,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(49,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(49,15) ]
remark #15388: vectorization support: reference G(k,j,i) has aligned access [ ./datatype_3d.f90(49,26) ]
remark #15305: vectorization support: vector length 2
remark #15399: vectorization support: unroll factor set to 4
remark #15417: vectorization support: number of FP up converts: single precision to double precision 1 [ ./datatype_3d.f90(49,4) ]
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 1
remark #15449: unmasked aligned unit stride stores: 1
remark #15450: unmasked unaligned unit stride loads: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 10
remark #15477: vector cost: 4.500
remark #15478: estimated potential speedup: 2.200
remark #15487: type converts: 1
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=62
LOOP END
LOOP BEGIN at ./datatype_3d.f90(48,4)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(49,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(49,15) ]
remark #15388: vectorization support: reference G(k,j,i) has aligned access [ ./datatype_3d.f90(49,26) ]
remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override
remark #15305: vectorization support: vector length 2
remark #15309: vectorization support: normalized vectorization overhead 0.667
remark #15417: vectorization support: number of FP up converts: single precision to double precision 1 [ ./datatype_3d.f90(49,4) ]
remark #25436: completely unrolled by 4
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(65,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(65,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(65,1)
remark #25408: memset generated
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(65,1)
remark #15389: vectorization support: reference D(:,:,:) has unaligned access
remark #15381: vectorization support: unaligned access used inside loop body
remark #15305: vectorization support: vector length 16
remark #15309: vectorization support: normalized vectorization overhead 0.500
remark #15300: LOOP WAS VECTORIZED
remark #15451: unmasked unaligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 2
remark #15477: vector cost: 0.370
remark #15478: estimated potential speedup: 4.920
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=6
LOOP END
LOOP BEGIN at ./datatype_3d.f90(65,1)
<Remainder loop for vectorization>
remark #25015: Estimate of max trip count of loop=96
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(73,15)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(69,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(70,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(72,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(73,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(73,15) ]
remark #15388: vectorization support: reference D(k,j,i) has aligned access [ ./datatype_3d.f90(73,26) ]
remark #15305: vectorization support: vector length 2
remark #15399: vectorization support: unroll factor set to 4
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 1
remark #15449: unmasked aligned unit stride stores: 1
remark #15450: unmasked unaligned unit stride loads: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 11
remark #15477: vector cost: 4.500
remark #15478: estimated potential speedup: 2.410
remark #15487: type converts: 1
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=62
LOOP END
LOOP BEGIN at ./datatype_3d.f90(72,4)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(73,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(73,15) ]
remark #15388: vectorization support: reference D(k,j,i) has aligned access [ ./datatype_3d.f90(73,26) ]
remark #15305: vectorization support: vector length 2
remark #15309: vectorization support: normalized vectorization overhead 0.667
remark #15301: REMAINDER LOOP WAS VECTORIZED
remark #25015: Estimate of max trip count of loop=2
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(86,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(86,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(86,1)
<Peeled loop for vectorization>
remark #25015: Estimate of max trip count of loop=3
LOOP END
LOOP BEGIN at ./datatype_3d.f90(86,1)
remark #15388: vectorization support: reference E(:,:,:) has aligned access
remark #15305: vectorization support: vector length 4
remark #15309: vectorization support: normalized vectorization overhead 3.333
remark #15300: LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 2
remark #15477: vector cost: 0.750
remark #15478: estimated potential speedup: 2.500
remark #15488: --- end vector cost summary ---
LOOP END
LOOP BEGIN at ./datatype_3d.f90(86,1)
<Remainder loop for vectorization>
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(94,15)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(90,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(91,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(93,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(94,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(94,15) ]
remark #15388: vectorization support: reference E(k,j,i) has aligned access [ ./datatype_3d.f90(94,26) ]
remark #15305: vectorization support: vector length 2
remark #15399: vectorization support: unroll factor set to 4
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 1
remark #15449: unmasked aligned unit stride stores: 1
remark #15450: unmasked unaligned unit stride loads: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 11
remark #15477: vector cost: 4.500
remark #15478: estimated potential speedup: 2.410
remark #15487: type converts: 1
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=62
LOOP END
LOOP BEGIN at ./datatype_3d.f90(93,4)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(94,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(94,15) ]
remark #15388: vectorization support: reference E(k,j,i) has aligned access [ ./datatype_3d.f90(94,26) ]
remark #15305: vectorization support: vector length 2
remark #15309: vectorization support: normalized vectorization overhead 0.667
remark #15301: REMAINDER LOOP WAS VECTORIZED
remark #25015: Estimate of max trip count of loop=2
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(106,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(106,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(106,1)
<Peeled loop for vectorization>
remark #25015: Estimate of max trip count of loop=1
LOOP END
LOOP BEGIN at ./datatype_3d.f90(106,1)
remark #15388: vectorization support: reference F(:,:,:) has aligned access
remark #15305: vectorization support: vector length 2
remark #15399: vectorization support: unroll factor set to 4
remark #15309: vectorization support: normalized vectorization overhead 0.833
remark #15300: LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 2
remark #15477: vector cost: 1.500
remark #15478: estimated potential speedup: 1.290
remark #15488: --- end vector cost summary ---
LOOP END
LOOP BEGIN at ./datatype_3d.f90(106,1)
<Remainder loop for vectorization>
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(114,15)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(110,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(111,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(113,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(114,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(114,15) ]
remark #15388: vectorization support: reference F(k,j,i) has aligned access [ ./datatype_3d.f90(114,26) ]
remark #15410: vectorization support: conversion from int to float will be emulated [ ./datatype_3d.f90(114,26) ]
remark #15305: vectorization support: vector length 2
remark #15399: vectorization support: unroll factor set to 4
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 2
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 11
remark #15477: vector cost: 9.000
remark #15478: estimated potential speedup: 1.220
remark #15487: type converts: 1
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=62
LOOP END
LOOP BEGIN at ./datatype_3d.f90(113,4)
<Remainder loop for vectorization>
remark #25436: completely unrolled by 4
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(126,1)
remark #25101: Loop Interchange not done due to: Original Order seems proper
remark #25452: Original Order found to be proper, but by a close margin
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(126,1)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(126,1)
remark #25408: memset generated
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(126,1)
<Peeled loop for vectorization>
remark #25015: Estimate of max trip count of loop=3
LOOP END
LOOP BEGIN at ./datatype_3d.f90(126,1)
remark #15388: vectorization support: reference H(:,:,:) has aligned access
remark #15305: vectorization support: vector length 4
remark #15309: vectorization support: normalized vectorization overhead 3.333
remark #15300: LOOP WAS VECTORIZED
remark #15442: entire loop may be executed in remainder
remark #15449: unmasked aligned unit stride stores: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 2
remark #15477: vector cost: 0.750
remark #15478: estimated potential speedup: 1.450
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=6
LOOP END
LOOP BEGIN at ./datatype_3d.f90(126,1)
<Remainder loop for vectorization>
remark #25015: Estimate of max trip count of loop=24
LOOP END
LOOP END
LOOP END
LOOP END
LOOP BEGIN at ./datatype_3d.f90(134,15)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(130,2)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(131,3)
remark #15542: loop was not vectorized: inner loop was already vectorized
LOOP BEGIN at ./datatype_3d.f90(133,4)
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(134,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(134,15) ]
remark #15388: vectorization support: reference H(k,j,i) has aligned access [ ./datatype_3d.f90(134,26) ]
remark #15305: vectorization support: vector length 2
remark #15399: vectorization support: unroll factor set to 4
remark #15300: LOOP WAS VECTORIZED
remark #15448: unmasked aligned unit stride loads: 1
remark #15449: unmasked aligned unit stride stores: 1
remark #15450: unmasked unaligned unit stride loads: 1
remark #15475: --- begin vector cost summary ---
remark #15476: scalar cost: 11
remark #15477: vector cost: 4.500
remark #15478: estimated potential speedup: 2.410
remark #15487: type converts: 1
remark #15488: --- end vector cost summary ---
remark #25015: Estimate of max trip count of loop=62
LOOP END
LOOP BEGIN at ./datatype_3d.f90(133,4)
<Remainder loop for vectorization>
remark #15388: vectorization support: reference C(k,j,i) has aligned access [ ./datatype_3d.f90(134,4) ]
remark #15388: vectorization support: reference A(k,j,i) has aligned access [ ./datatype_3d.f90(134,15) ]
remark #15388: vectorization support: reference H(k,j,i) has aligned access [ ./datatype_3d.f90(134,26) ]
remark #15305: vectorization support: vector length 2
remark #15309: vectorization support: normalized vectorization overhead 0.667
remark #15301: REMAINDER LOOP WAS VECTORIZED
remark #25015: Estimate of max trip count of loop=2
LOOP END
LOOP END
LOOP END
LOOP END
Report from: Code generation optimizations [cg]
./datatype_3d.f90(65,1):remark #34014: optimization advice for memset: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation
./datatype_3d.f90(65,1):remark #34026: call to memset implemented as a call to optimized library version
./datatype_3d.f90(126,1):remark #34014: optimization advice for memset: increase the destination's alignment to 16 (and use __assume_aligned) to speed up library implementation
./datatype_3d.f90(126,1):remark #34026: call to memset implemented as a call to optimized library version
./datatype_3d.f90(1,9):remark #34051: REGISTER ALLOCATION : [MAIN__.A] ./datatype_3d.f90:1
Hardware registers
Reserved : 2[ rsp rip]
Available : 39[ rax rdx rcx rbx rbp rsi rdi r8-r15 mm0-mm7 zmm0-zmm15]
Callee-save : 6[ rbx rbp r12-r15]
Assigned : 28[ rax rdx rcx rbx rsi rdi r8-r15 zmm0-zmm13]
Routine temporaries
Total : 1109
Global : 356
Local : 753
Regenerable : 194
Spilled : 137
Routine stack
Variables : 276 bytes*
Reads : 10 [8.00e+00 ~ 0.0%]
Writes : 26 [2.40e+01 ~ 0.0%]
Spills : 1056 bytes*
Reads : 151 [3.22e+03 ~ 1.3%]
Writes : 139 [9.02e+02 ~ 0.4%]
Notes
*Non-overlapping variables and spills may share stack space,
so the total stack size might be less than this.
===========================================================================
Thanks
Eric
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page