Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Array attribution with all elements

Nicholas_S_
Beginner
247 Views

Hi,

one way to set values to all elements of an array is:

Do i=1, N

    A(i)=2*B(i)
    C(1,i)=2*B(i)

End do

Another possibility is to use this:

A(:)=2*B(:)

C(1,:)=2*B(:)

Which is the optimum way to set values of an entire array? And with parallel computing?

Thank you.

 

0 Kudos
1 Solution
jimdempseyatthecove
Honored Contributor III
247 Views

Although the compiler has the ability for auto-parallelism (when enabled), it is often much better to use explicit parallelization via OpenMP (need to enable).

!$OMP PARALLEL DO
Do i=1, N

    A(i)=2*B(i)
    C(1,i)=2*B(i)

End do

Note, C(1,i)= above has a stride of N. This is inefficient when uses such as above. It would be more efficient to swap the indices (allocations and use elsewhere) such that the innter most (or only) loop access is the left most index (this is reverse from C/C++).

Note 2, if you change the loop indexing, then the C(i,1)=... can be vectorized (without scatter if your CPU supports scatter).

!$OMP PARALLEL DO SIMD
Do i=1, N

    A(i)=2*B(i)
    C(i,1)=2*B(i) ! requires change in index order

End do

Additional note on the above: The computation in the above loop is relatively small with regards to the memory fetch and store. For loops like this, you may find it more efficient to restrict the number of threads to a small-ish number. The most effective number will vary from system to system. The number for the above loop would likely depend on the number of memory channels available on the system.

Jim Dempsey

View solution in original post

0 Kudos
1 Reply
jimdempseyatthecove
Honored Contributor III
248 Views

Although the compiler has the ability for auto-parallelism (when enabled), it is often much better to use explicit parallelization via OpenMP (need to enable).

!$OMP PARALLEL DO
Do i=1, N

    A(i)=2*B(i)
    C(1,i)=2*B(i)

End do

Note, C(1,i)= above has a stride of N. This is inefficient when uses such as above. It would be more efficient to swap the indices (allocations and use elsewhere) such that the innter most (or only) loop access is the left most index (this is reverse from C/C++).

Note 2, if you change the loop indexing, then the C(i,1)=... can be vectorized (without scatter if your CPU supports scatter).

!$OMP PARALLEL DO SIMD
Do i=1, N

    A(i)=2*B(i)
    C(i,1)=2*B(i) ! requires change in index order

End do

Additional note on the above: The computation in the above loop is relatively small with regards to the memory fetch and store. For loops like this, you may find it more efficient to restrict the number of threads to a small-ish number. The most effective number will vary from system to system. The number for the above loop would likely depend on the number of memory channels available on the system.

Jim Dempsey

0 Kudos
Reply