Solved: array reshape in-place

mrentropy1 · ‎08-12-2009

I have a huge (10^8 element, say) rank-1 array. I want to reshape this into a rank-3 array.

Since the array is big, I want to do it in-place with NO shuffling of data around. The numbers themselves should NOT move.

I know about Fortran-ordering and C-ordering so I'm ok on that score; I know where in memory the data needs to be put.

I know about the intrinsics RESHAPE() and PACK() but it's not clear this is what I need - it seems from what I've read in the Intel lit that these won't do the job. Am I mistaken?

Any ideas? In principle it seems simple - just a matter of internal relabeling of the array - but in practice, not so clear....

Thanks,

mrentropy

Steven_L_Intel1 · ‎08-13-2009

use, intrinsic :: ISO_C_BINDING
real, allocatable, target :: rank1_array(:)
real, pointer :: rank3_array(:,:,:)
...
! Assume rank1_array has been allocated. This doesn't have to be an ALLOCATABLE, but TARGET
! is needed
!
! Here's where the magic happens
call C_F_POINTER (C_LOC(rank1_array), rank3_array, [100,100,1000])
!
! Now rank3_array is the same data as rank1_array, but a 3-dimension array with bounds (100,100,1000)

View solution in original post

Steven_L_Intel1 · ‎08-13-2009

use, intrinsic :: ISO_C_BINDING
real, allocatable, target :: rank1_array(:)
real, pointer :: rank3_array(:,:,:)
...
! Assume rank1_array has been allocated. This doesn't have to be an ALLOCATABLE, but TARGET
! is needed
!
! Here's where the magic happens
call C_F_POINTER (C_LOC(rank1_array), rank3_array, [100,100,1000])
!
! Now rank3_array is the same data as rank1_array, but a 3-dimension array with bounds (100,100,1000)

mrentropy1 · ‎08-13-2009

Thank you SO MUCH. You just saved me a day of head-pounding.

BTW for reasons having to do with what's been passed around to which procedure &c, I didn't want to allocate space in the rank-1 array, but rather allocate it in the rank-3 array, generate a rank-1 pointer, pass that to a routine that filled it in, and then access the data from the original rank-3 array. (For simplicity, I didn't describe the situation like this in my original post....)

So from reading the Intel docs, it appears I can make BOTH arrays pointers, and do it like so:

[plain]program steve_modified
  use, intrinsic :: ISO_C_BINDING
  implicit none
  integer, pointer :: r1_(:)
  integer, pointer :: r3_(:,:,:)
  integer :: i, j, k
  !
  allocate(r3_(4,4,4))
  !
  call C_F_POINTER (C_LOC(r3_), r1_, [64])
  !
  ! Filling data into r1_(:):
  do i = 1,64
     r1_(i) = i
  end do
  !
  ! Reading data out of r3_(:,:,:):
  do k=1,4
     do j=1,4
        do i=1,4
           print *, i, "  ",j, "  ", k, "  ", r3_(i,j,k)
        end do
     end do
  end do
  !
end program steve_modified[/plain]

... which seems to work just fine.

Thanks again!

Peter (mrentropy)

Steven_L_Intel1 · ‎08-13-2009

Yep, that's fine. C_F_POINTER is one of my favorite features from F2003 - you can do so much with it and get rid of so many non-portable hacks.

mrentropy1 · ‎08-13-2009

BTW one other complication that I didn't mention is that both arrays are zero-indexed on their first index.

As a matter of style, I suppose this is something to avoid... but there's a reason for it.

(Does this slow, speed up, or make no difference to array element read speed, anyway?)

Just in case anybody else has this problem, here's how I solved it - note you can't add/subtract C_INT32_T to C_LOC() so you have to do some other trick to avoid SIGSEGV. Here I use rank-2 to rank-4 (since again, that's what I'm actually doing, not rank-1 to rank3 :) - the white lies keep piling up...):

[plain]program steve_mod2
  use, intrinsic :: ISO_C_BINDING
  implicit none
  integer, pointer :: r1_(:,:)
  integer, pointer :: r1b_(:,:)
  integer, pointer :: r3_(:,:,:,:)
  integer :: i, j, k, m
  !
  allocate(r3_(0:2,4,4,4))
  !
  call C_F_POINTER (C_LOC(r3_), r1b_, [3,64])
  !
  r1_ => r1b_(2:,:)
  r1b_ => null()
  !
  forall (m=0:2)
     forall (i = 1:64)
        r1_(m,i) = (m + 1) * i
     end forall
  end forall
  !
  do m = 0,2
     print *, " ------------------------- "
     print *, " m = ", m
     print *, " ------------------------- "
     do k=1,4
        do j=1,4
           do i=1,4
              print *, i, "  ",j, "  ", k, "  ", r3_(m,i,j,k)
           end do
        end do
     end do
  end do
  !
end program steve_mod2
[/plain]

Steven_L_Intel1 · ‎08-13-2009

The origin doesn't really have an effect on performance. What you'd really like is the F2003 pointer remapping feature, but we don't have that implemented yet. See this thread for another method.

mrentropy1 · ‎08-13-2009

Very, very cool. Thanks for pointing me to that link!

Regards,

Peter

brunocalado · ‎08-14-2009

Sorry, but RESHAPE() is more expensive than the methods above? I'm doing the same thing but using RESHAPE(), do you think use one of your techs is better??

thks

Steven_L_Intel1 · ‎08-14-2009

They're different. RESHAPE is a function that returns a copy of the array with a different shape. You then have to do something with that, such as assign it to another array. Depending on how you use it, an actual copy may not be made, but you can't just use RESHAPE to reference an array with different rank. RESHAPE has lots of other uses.

brunocalado · ‎08-15-2009

Quoting - Steve Lionel (Intel)

use, intrinsic :: ISO_C_BINDING
real, allocatable, target :: rank1_array(:)
real, pointer :: rank3_array(:,:,:)
...
! Assume rank1_array has been allocated. This doesn't have to be an ALLOCATABLE, but TARGET
! is needed
!
! Here's where the magic happens
call C_F_POINTER (C_LOC(rank1_array), rank3_array, [100,100,1000])
!
! Now rank3_array is the same data as rank1_array, but a 3-dimension array with bounds (100,100,1000)

[cpp]LOGICAL (KIND=logicalSize), INTENT(IN) 	:: gene(geneSize, ygeneSize)
LOGICAL (KIND=logicalSize) 		:: extendGene(geneSize*ygeneSize)

extendGene = RESHAPE(gene,(/geneSize*ygeneSize/))


! It is equal to 


use, intrinsic :: ISO_C_BINDING
real, allocatable, target :: extendGene(:)
real, pointer :: gene(:,:)

call C_F_POINTER (C_LOC(extendGene), gene, [geneSize, ygeneSize])


[/cpp]

The two things do the same, right? Which is the faster ?

tks

Hirchert__Kurt_W · ‎08-16-2009

Quoting - brunocalado

[cpp]LOGICAL (KIND=logicalSize), INTENT(IN) 	:: gene(geneSize, ygeneSize)
LOGICAL (KIND=logicalSize) 		:: extendGene(geneSize*ygeneSize)

extendGene = RESHAPE(gene,(/geneSize*ygeneSize/))


! It is equal to 


use, intrinsic :: ISO_C_BINDING
real, allocatable, target :: extendGene(:)
real, pointer :: gene(:,:)

call C_F_POINTER (C_LOC(extendGene), gene, [geneSize, ygeneSize])


[/cpp]

The two things do the same, right? Which is the faster ?

tks

No, they do not do the same thing. Your RESHAPE example provides a 1-dimensional view of the 2-dimensional array gene (an operation not always efficiently implementable) and assigns it to the 1-dimensional array extendGene (doubling the memory usage and probably adding the cost of doing an extra copy). Your C_F_POINTER example provides a 2-dimensional view (named gene) of the 1-dimensional array extendGene.

To do "the same thing", the RESHAPE part should look like

[cpp]real, allocatable :: extendGene(:)

! gene does not exist
! just use RESHAPE(extendGene,[geneSize,ygeneSize])
! wherever you would use gene in the C_F_POINTER example[/cpp]

Even then, there is significant functional difference -- in the C_F_POINTER example, you can assign to gene and change extendGene, but you aren't allowed to assign to the RESHAPE of an array.

Which is faster? That depends on the quality of your compiler:

If a compiler is sufficiently clever, it should be able to implement RESHAPE in this case without copying the data, while the attributes target and pointer necessary to the C_F_POINTER version will interfere with optimization. With such a compiler, RESHAPE would be faster.

In less clever compilers, RESHAPE is merely a library routine and it always copies the data into a new location. In such compilers, the cost of this copying is nearly always much greater than what is lost because of the target and pointer attributes.

In a different kind of less clever compiler, RESHAPE is implemented in-line, but the in-line code always does the copy because the compiler doesn't do enough analysis to distinguish between the cases that can be done without copying from those that can't (even though the cases that can be done without copying occur far more frequently).

In short, under the best of circumstances, RESHAPE is faster, but under less favorable circumstances, using C_F_POINTER will be faster.

Unfortunately, nearly all compilers in their early releases are "less clever", so many programmers "know" that RESHAPE is an expensive operation. They then tend to avoid using RESHAPE. This, in turn, means that the compiler implementors end up with little incentive to make their compilers "more clever".

Perhaps Steve (or some other Intel employee) can comment on how clever the Intel compiler is its handling of RESHAPE.

[As an aside, there are a number of other intrinsic functions with this same property -- most of the time they can be implemented without any copying of the input argument, but early implementations that unconditionally copied have lead programmers to assume that they are necessarily expensive. TRANSFER, SPREAD, and CSHIFT are example of intrinsics with this property.]

-Kurt

Steven_L_Intel1 · ‎08-17-2009

Our compiler tries to eliminate unnecessary temporaries, but it is not perfect in this regard.