Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

reading big endian file?

steve_o_
Beginner
2,745 Views
 

I'm trying to read a big endian file from http://yann.lecun.com/exdb/mnist/

So far I open the file as in a test program

Program Arraytest3

   	Type magic
       	        byte :: byte1  ! INTEGER(1)
  		byte :: byte2
  		byte :: dataType
  		byte :: dimentions
     END Type magic


    Type Images
 		Type(magic) :: magic_number
  		INTEGER(kind=4) 	:: numberOfImages
  		INTEGER(kind=4) 	:: numberOfRows
  		INTEGER(kind=4) 	:: numberOfColumns
  		byte        :: imageCount(1)
  	END Type Images
   	

   	
    Type (Images) :: idx
    

open(10, CONVERT='LITTLE_ENDIAN',file='t10k-images-idx3-ubyte', access='stream', status='old', FORM='BINARY')
    read(10), idx
    close(10)
     
    !! Need to byteFlip each byte
    print*,"numberOfImages", idx%numberOfImages
    print*,"numberOfRows", idx%numberOfRows
    print*,"numberOfColumns", idx%numberOfColumns
    print*,"magic_number", idx%imageCount(1)

   print*,"Should be 10000 28x28 images"
end Program Arraytest3

 

referencing the intel page here https://software.intel.com/en-us/node/524834 none of the key words seem to have any effect the result always being the same 

 numberOfImages   270991360

 numberOfRows   469762048

 numberOfColumns   469762048

 

Now if I put these numbers into my calculator and byte flip them then I get the expected result. Does this mean Convert doesn't work on derived types?

0 Kudos
1 Solution
mecej4
Honored Contributor III
2,745 Views

steve o. wrote:

PS whats the difference here in the picture from the ref page or am I being dyslexic?

https://software.intel.com/sites/default/files/managed/08/83/9C8C9E9B-FE4F-4EB1-B850-8CEC8A8B0618-imageId=A6A0C0E1-C865-422E-A2A0-2AD5BE5BBE73.jpg

My eyes are used to big-endian decimal strings, but it appears that the image is a work-in-progress. Moreover, I have not traced the link back to any text, so ...

Here is a modification that works on Suse-Linux, but works by cheating the compiler to do something that is not provided for

Program Arraytest3

        Type magic
                sequence
                byte :: byte1  ! INTEGER(1)
                byte :: byte2
                byte :: dataType
                byte :: dimentions
     END Type magic


    Type Images
                sequence
                Type(magic) :: magic_number
                INTEGER(kind=4)         :: numberOfImages
                INTEGER(kind=4)         :: numberOfRows
                INTEGER(kind=4)         :: numberOfColumns
                byte        :: imageCount(1)
        END Type Images
        

        
    Type (Images) :: idx
    integer :: iidx(4)
    equivalence(iidx,idx)
    

open(10, CONVERT='BIG_ENDIAN',file='t10k-images-idx3-ubyte', access='stream', status='old')
    read(10), iidx
    close(10)
     
    !! Need to byteFlip each byte
    print*,"numberOfImages", idx%numberOfImages
    print*,"numberOfRows", idx%numberOfRows
    print*,"numberOfColumns", idx%numberOfColumns
    print*,"magic_number", idx%imageCount(1)

   print*,"Should be 10000 28x28 images"
end Program Arraytest3

.

 

View solution in original post

0 Kudos
10 Replies
TimP
Honored Contributor III
2,745 Views

If you want conversion from big-endian, why do you specify that the file is little_endian?

0 Kudos
mecej4
Honored Contributor III
2,745 Views

Redundant/conflicting specifiers in OPEN can lead to problems. Remove access='stream' from the OPEN statement.

See https://software.intel.com/en-us/node/524840#A8125889-2623-465B-963D-3678F2D5DEC1, where it says "Converted data should have basic data types, or arrays of basic data types. Derived data types are disabled."

0 Kudos
steve_o_
Beginner
2,745 Views

Hi Tim

Thats just the last copy of the code, I tried all the options 

0 Kudos
steve_o_
Beginner
2,745 Views

Hi @mecrj4

I did see that link you posted, it was the first thing I tried, but it didn't work for me for the reason given, however surely that restriction only applies to the use of F_UFMTENDIAN=little;big:10,20 which is what I had my shell env variable set to

I tried removing stream and setting convert to IBM, BIG_ENDIAN without success - I guess that the restriction also applies to structures here despite not being stated.

PS whats the difference here in the picture from the ref page or am I being dyslexic?

https://software.intel.com/sites/default/files/managed/08/83/9C8C9E9B-FE4F-4EB1-B850-8CEC8A8B0618-imageId=A6A0C0E1-C865-422E-A2A0-2AD5BE5BBE73.jpg

 

0 Kudos
mecej4
Honored Contributor III
2,746 Views

steve o. wrote:

PS whats the difference here in the picture from the ref page or am I being dyslexic?

https://software.intel.com/sites/default/files/managed/08/83/9C8C9E9B-FE4F-4EB1-B850-8CEC8A8B0618-imageId=A6A0C0E1-C865-422E-A2A0-2AD5BE5BBE73.jpg

My eyes are used to big-endian decimal strings, but it appears that the image is a work-in-progress. Moreover, I have not traced the link back to any text, so ...

Here is a modification that works on Suse-Linux, but works by cheating the compiler to do something that is not provided for

Program Arraytest3

        Type magic
                sequence
                byte :: byte1  ! INTEGER(1)
                byte :: byte2
                byte :: dataType
                byte :: dimentions
     END Type magic


    Type Images
                sequence
                Type(magic) :: magic_number
                INTEGER(kind=4)         :: numberOfImages
                INTEGER(kind=4)         :: numberOfRows
                INTEGER(kind=4)         :: numberOfColumns
                byte        :: imageCount(1)
        END Type Images
        

        
    Type (Images) :: idx
    integer :: iidx(4)
    equivalence(iidx,idx)
    

open(10, CONVERT='BIG_ENDIAN',file='t10k-images-idx3-ubyte', access='stream', status='old')
    read(10), iidx
    close(10)
     
    !! Need to byteFlip each byte
    print*,"numberOfImages", idx%numberOfImages
    print*,"numberOfRows", idx%numberOfRows
    print*,"numberOfColumns", idx%numberOfColumns
    print*,"magic_number", idx%imageCount(1)

   print*,"Should be 10000 28x28 images"
end Program Arraytest3

.

 

0 Kudos
steve_o_
Beginner
2,745 Views

It appears you can't use structures, the following works though, sorry for the messy code ;-)

Program Arraytest3

	byte   ::  m1, m2, m3, m4
	Integer(kind=4) :: I, R, C

   	Type magic
       	byte :: byte1  ! INTEGER(1)
  		byte :: byte2
  		byte :: dataType
  		byte :: dimentions
     END Type magic


    Type Images
 		Type(magic) :: magic_number
  		INTEGER(kind=4) 	:: numberOfImages
  		INTEGER(kind=4) 	:: numberOfRows
  		INTEGER(kind=4) 	:: numberOfColumns
  		byte        :: imageCount(1)
  	END Type Images
   	

   	
    Type (Images) :: idx
    

    open(10, CONVERT='IBM',file='t10k-images-idx3-ubyte',  status='old', FORM='BINARY')
    read(10), m1
    read(10), m2
    read(10), m3
    read(10), m4
    read(10), I
    read(10), R
    read(10), C
 
    close(10)
     
    !! Need to byteFlip each byte
    print*,"numberOfImages", I !idx%numberOfImages
    print*,"numberOfRows", R !idx%numberOfRows
    print*,"numberOfColumns", C ! idx%numberOfColumns
    
 	print*,"Should be 10000 28x28 images"
end Program Arraytest3

numberOfImages       10000

 numberOfRows          28

 numberOfColumns          28

 Should be 10000 28x28 images

0 Kudos
steve_o_
Beginner
2,745 Views

 

Sorry, our postings seemed to cross 

Just tried your code on OS X 10.9.5 and it also works, however it does flag the following warning

arraytest3.f90(12): warning #6380: The structure length is not a multiple of its largest element; could create misalignments for arrays of this type.   [IMAGES]
    Type Images
---------^

I wouldn't have thought of doing it your way, thanks ;)

Not to worry - fixed it 

I did read that this is an obsolescent feature in Fortran 90/95 - guess it should be safe to use for 40 years or so 

0 Kudos
mecej4
Honored Contributor III
2,745 Views

That warning is issued on Linux, too, but it pertains to arrays of type IMAGES. If you do create an array later, you could pad the type with enough junk bytes to align to 8-byte or 16-byte boundaries, or change the type of ImageCount to INTEGER.

0 Kudos
Steven_L_Intel1
Employee
2,745 Views

At present, we do not do CONVERT= on components of derived types/structures. This is a historical artifact due to our support of UNIONs and we've discussed changing it, but it would introduce an incompatibility.

0 Kudos
steve_o_
Beginner
2,744 Views

Thanks Steve

I have it working now - in case anyone else is looking for a fortran mnist file reader (http://yann.lecun.com/exdb/mnist/ ) in future - I converted the unsigned bytes in the file to an array of 16 bit integers which seems to have done the trick (the values of the file represent gray scale images from 0 to 255 apparently) Hopefully my use of zext is correct - it seems to give reasonable results

Now to plumb it into my Boltzmann Machine code

Program Arraytest3

	INTEGER :: AllocateStatus, I
    Type Magic
        sequence
        byte :: byte1  ! INTEGER(1)
        byte :: byte2
        byte :: dataType
        byte :: dimentions
     END Type Magic


    Type Images
        Sequence
        Type(magic) 		:: magicNumber
        INTEGER(kind=4)         :: numberOfImages
        INTEGER(kind=4)         :: numberOfRows
        INTEGER(kind=4)         :: numberOfColumns
    END Type Images
        
    INTEGER(2), 	DIMENSION(:,:,:), ALLOCATABLE :: imageMatrix
    INTEGER(1), 	DIMENSION(:,:), ALLOCATABLE :: buffer

    Type (Images) :: idx
    INTEGER :: iidx(4)
    Equivalence(iidx,idx)
    

    open(10, CONVERT='BIG_ENDIAN',file='t10k-images-idx3-ubyte', access='stream', status='old')
    read(10), iidx

    ALLOCATE(  imageMatrix(idx%numberOfColumns,idx%numberOfRows,idx%numberOfImages), &
    	buffer(idx%numberOfColumns,idx%numberOfRows) , STAT = AllocateStatus)
    IF (AllocateStatus /= 0) STOP "*** Not enough memory, try a smaller network ***"

    Do I=1, idx%numberOfImages
	read(10), buffer
	imageMatrix(:,:,I) = zext(buffer)
    EndDo

    close(10)
    
    DEALLOCATE( buffer)
  
    print*,"numberOfImages", 	idx%numberOfImages
    print*,"numberOfRows", 	idx%numberOfRows
    print*,"numberOfColumns", 	idx%numberOfColumns
    
    print*, imageMatrix(:,:,10)
    
    DEALLOCATE( imageMatrix )

end Program Arraytest3

 

0 Kudos
Reply