Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
Welcome to the Intel Community. If you get an answer you like, please mark it as an Accepted Solution to help others. Thank you!

## Array attribution with all elements

Beginner
96 Views

Hi,

one way to set values to all elements of an array is:

```Do i=1, N

A(i)=2*B(i)
C(1,i)=2*B(i)

End do```

Another possibility is to use this:

```A(:)=2*B(:)

C(1,:)=2*B(:)```

Which is the optimum way to set values of an entire array? And with parallel computing?

Thank you.

1 Solution
Black Belt
96 Views

Although the compiler has the ability for auto-parallelism (when enabled), it is often much better to use explicit parallelization via OpenMP (need to enable).

```!\$OMP PARALLEL DO
Do i=1, N

A(i)=2*B(i)
C(1,i)=2*B(i)

End do
```

Note, C(1,i)= above has a stride of N. This is inefficient when uses such as above. It would be more efficient to swap the indices (allocations and use elsewhere) such that the innter most (or only) loop access is the left most index (this is reverse from C/C++).

Note 2, if you change the loop indexing, then the C(i,1)=... can be vectorized (without scatter if your CPU supports scatter).

```!\$OMP PARALLEL DO SIMD
Do i=1, N

A(i)=2*B(i)
C(i,1)=2*B(i) ! requires change in index order

End do
```

Additional note on the above: The computation in the above loop is relatively small with regards to the memory fetch and store. For loops like this, you may find it more efficient to restrict the number of threads to a small-ish number. The most effective number will vary from system to system. The number for the above loop would likely depend on the number of memory channels available on the system.

Jim Dempsey

Black Belt
97 Views

Although the compiler has the ability for auto-parallelism (when enabled), it is often much better to use explicit parallelization via OpenMP (need to enable).

```!\$OMP PARALLEL DO
Do i=1, N

A(i)=2*B(i)
C(1,i)=2*B(i)

End do
```

Note, C(1,i)= above has a stride of N. This is inefficient when uses such as above. It would be more efficient to swap the indices (allocations and use elsewhere) such that the innter most (or only) loop access is the left most index (this is reverse from C/C++).

Note 2, if you change the loop indexing, then the C(i,1)=... can be vectorized (without scatter if your CPU supports scatter).

```!\$OMP PARALLEL DO SIMD
Do i=1, N

A(i)=2*B(i)
C(i,1)=2*B(i) ! requires change in index order

End do
```

Additional note on the above: The computation in the above loop is relatively small with regards to the memory fetch and store. For loops like this, you may find it more efficient to restrict the number of threads to a small-ish number. The most effective number will vary from system to system. The number for the above loop would likely depend on the number of memory channels available on the system.

Jim Dempsey