Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Problem with LNBLNK and len_trim

reidar
New User
1,921 Views

I have a small subroutine transferring text from keyboard entries, etc from lower to uppercase letters. I used this routine for CVF and the version shown below I also apply for IVF Win32 applications. What happens is that the program crashes with the error message:Unhandled exception at 0x7790DF7C (ntdll.dll) in CaFeMS1-IVF.exe: 0xC0000005: Access violation writing location 0x002A0FFC... before the line ILENGTH = len_trim(STRING) - 1 where a breakpoint is set.

The subroutine is linked from  a library file.

Can anyone give me a tip?

Program listing follows here:

! IF(I.GT.IEND)GOTO 2000 ! 25/5-2000

! ICOM is character*2

READ(IDEDIT,202)ICOM

CALL TRANS_TO_UPPERCASE(ICOM,2)

SUBROUTINE TRANS_TO_UPPERCASE(STRING,LENGTH)

use IFPORT

INTEGER*4 KAR,ILENGTH,LENGTH

CHARACTER*(LENGTH)STRING

c CHARACTER*(40)STRING

! ILENGTH = LNBLNK(STRING)

ILENGTH = len_trim(STRING) - 1

C....Transfer STRING to uppercase letters....................

DO 1000 I=1,ILENGTH

KAR=0

KAR=ICHAR(STRING(I:I))

IF(KAR.GE.97)STRING(I:I)=CHAR(KAR-32)

1000 CONTINUE

RETURN

END

 

0 Kudos
26 Replies
andrew_4619
Honored Contributor II
1,690 Views
character(len=*) :: string 

Strings length has been defined in the caller. You can get in in the sub with len(string). What is the length passed in mean to be the decalred length or a sub string length?

I would use ascii >=65 to 90<= BTW

0 Kudos
mecej4
Honored Contributor III
1,690 Views

As you probably know by now, fragments of code with some bug are nearly impossible to diagnose in a reasonable amount of time, and the number of ways in which the code could have failed can be more than you want to read about. You did not show the code where the subroutine is called, so we do not know whether the actual arguments match the dummy arguments.

The subroutine that you showed has the following potential bugs:

1. The second argument is not used except to declare the desired length of the first argument. What happens if the caller passes a value in LENGTH that is not equal to the length of STRING? What happens when STRING contains a single non-blank character followed by blanks?

2. Consider what happens with several edge cases:

   (i) The trimmed string length is 1

   (ii) One or more characters in the part of STRING that are to be uppercased are in the set ['{','}','|','~']

3. What is IFPORT needed for? You commented out the line with a reference to LNBLNK.

0 Kudos
Paul_Curtis
Valued Contributor I
1,690 Views

Optimal methods of changing character case have been around for a long time.  Try this:

SUBROUTINE updncase (string, updn)
	IMPLICIT NONE
	CHARACTER(LEN=*), INTENT(INOUT)	:: string
	INTEGER, INTENT(IN)				:: updn
	INTEGER							:: j, nc
	CHARACTER(LEN=26), PARAMETER	:: lower = 'abcdefghijklmnopqrstuvwxyz'
	CHARACTER(LEN=26), PARAMETER	:: upper = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
	
	SELECT CASE (updn)
	CASE (1)		
		DO j = 1, LEN(string)
			nc = INDEX(lower, string(j:j))
			IF (nc > 0) string(j:j) = upper(nc:nc)
		END DO
	CASE (2)		
		DO j = 1, LEN(string)
			nc = INDEX(upper, string(j:j))
			IF (nc > 0) string(j:j) = lower(nc:nc)
		END DO
	END SELECT	
END SUBROUTINE updncase	 

 

0 Kudos
reidar
New User
1,690 Views

Thank you all for comments. It turned out that the root cause was due to something that happened during migration from CVF to IVF. 

Paul: If I understand your code, the routine updncase change the case either from upper lo lower or vice versa.. My routine takes a mixed string and transfer all characters to uppercase.

0 Kudos
mecej4
Honored Contributor III
1,690 Views

reidar wrote:
Paul: If I understand your code, the routine updncase change the case either from upper lo lower or vice versa. My routine takes a mixed string and transfer all characters to uppercase.

In that case, you need the following half (case updn = 1) of Paul's code. On X86 and X64, the intrinsic INDEX can use the fast string search (SCASB) instruction with a repetition count of 26. If you are going to pass strings with trailing blanks to this subroutine, you may wish to use LEN_TRIM instead of LEN, or, if your strings will have no embedded blanks, exit the DO loop when a blank character is encountered.

SUBROUTINE upcase (string)
   IMPLICIT NONE
   CHARACTER(LEN=*), INTENT(INOUT)	:: string
   INTEGER				:: j, nc
   CHARACTER(LEN=26), PARAMETER	:: lower = 'abcdefghijklmnopqrstuvwxyz'
   CHARACTER(LEN=26), PARAMETER	:: upper = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
	
   DO j = 1, LEN(string)
      nc = INDEX(lower, string(j:j))
      IF (nc > 0) string(j:j) = upper(nc:nc)
   END DO
END SUBROUTINE upcase	 

 

0 Kudos
andrew_4619
Honored Contributor II
1,690 Views

mecej4 wrote:
.....the intrinsic INDEX can use the fast string search (SCASB) instruction with a repetition count of 26. 

Very Interesting. I have long realised the INDEX is a wonderful (and quick) tool (which I use frequently) but was unaware of  SCASB which I have just read up on.  In this instance however I would have thought checking a character against 26 characters 'quickly' within a loop would not be significantly different in time to checking if an ascii integer was between upper and lower limit values within a loop. I am not going to test this hypothesis as even if there was a 'big' difference (x5) it would not make a worthwhile benefit as I am not doing gazillions up upp/low casing. Anyhow thank you for the education!

0 Kudos
mecej4
Honored Contributor III
1,690 Views

Andrew, if you found the SCAS instruction interesting, you may find the much less used XLAT instruction ( http://www.felixcloutier.com/x86/XLAT:XLATB.html )equally interesting.

One could set up one 256-byte array, say UPR,  set to Z'00',...,Z'FF', except that UPR(97:122) should be set to Z'45',...,Z'5A'. This sets up the "translation table" which, combined with string LODS, XLATB and STOS in a tight loop would be very fast.

On the other hand, the C library the IFort uses already contains the standard functions strupr(), etc., which could be used as a more portable, safer, alternative.

0 Kudos
JVanB
Valued Contributor II
1,690 Views

I was trying to find an expression that determined whether a character was a letter or not that wasn't too expensive to evaluate...

module M
   implicit none
   contains
      subroutine upcase(string)
         character(*), intent(INOUT) :: string
         call sub(string,len(string))
         contains
            subroutine sub(string,len)
               integer, intent(in) :: len
               character, intent(INOUT) :: string(len)
               string = merge(achar(ibclr(iachar(string),5)),string, &
                  ibits(dim(ibclr(iachar(string),5),64)-1,0,8)<26)
            end subroutine sub
      end subroutine upcase
end module M

program P
   use M
   implicit none
   integer i
   character(40) string
   string = 'yOU WiLl `FInD'' iT @ThE [13{ oaK tRee.'
   write(*,'(a)') trim(string)
   call upcase(string)
   write(*,'(a)') trim(string)
end program P

Output with ifort:

yOU WiLl `FInD' iT @ThE [13{ oaK tRee.
YOU WILL `FIND' IT @THE [13{ OAK TREE.

 

Steve_Lionel
Honored Contributor III
1,690 Views

RO, you're slipping! I would have expected at least five ICEs here....

0 Kudos
Paul_Curtis
Valued Contributor I
1,691 Views

The implicit point of updncase() is that a precomputed or preset lookup table (LUT) is usually (always?) much more efficient than on-the-spot computation, regardless of how fascinatingly compactly (and incomprehensibly) the computation can be written [merge(achar(ibclr(iachar(   ... really?].

Many years ago I wrote an image-processing code for a fluorescence system which computed the ratio image of two video frames by using the numerator and denominator pixel values as indices into a LUT whose values were the ratio for each combination of n/d; was very fast, worked at frame rates, where an attempt to actually compute the ratio image would have killed the project. 

0 Kudos
JVanB
Valued Contributor II
1,691 Views

LUTs touch memory or blow out cache. Computation is fast on 21st century CPUs. Three filse are needed for this test: UpCase.asm

; UpCase.asm
; fasm UpCase.asm
format ms64 coff
section '.code' readable executable
public UpCase
UpCase:
   push rsi
   push rdi
   sub rsp, 40
   test rdx, rdx
   js epilog
   mov r8, rcx
   mov rsi, rcx
   neg ecx
   and ecx, 31
   vbroadcastss ymm1, [bias]
   vbroadcastss ymm2, [lower]
   vbroadcastss ymm3, [bit]
   jz main_part
   mov rdi, rsp
   sub rdx, rcx
   cmovl ecx, edx
   mov eax, ecx
   REP movsb
   vmovdqu ymm0, [rsp]
   vpaddb ymm4, ymm0, ymm1
   vpcmpgtb ymm4, ymm4, ymm2
   vpand ymm4, ymm4, ymm3
   vpandn ymm0, ymm4, ymm0
   vmovdqu [rsp], ymm0
   mov rsi, rsp
   mov rdi, r8
   mov ecx, eax
   REP movsb
   mov rsi, rdi
   jle epilog
main_part:
   cmp rdx, 31
   jle tail
   vmovdqa ymm0, [rsi]
   vpaddb ymm4, ymm0, ymm1
   vpcmpgtb ymm4, ymm4, ymm2
   vpand ymm4, ymm4, ymm3
   vpandn ymm0, ymm4, ymm0
   vmovdqa [rsi], ymm0
   add rsi, 32
   sub rdx, 32
   jmp main_part
tail:
   test rdx, rdx
   jz epilog
   mov r8, rsi
   mov rdi, rsp
   mov ecx, edx
   REP movsb
   vmovdqu ymm0, [rsp]
   vpaddb ymm4, ymm0, ymm1
   vpcmpgtb ymm4, ymm4, ymm2
   vpand ymm4, ymm4, ymm3
   vpandn ymm0, ymm4, ymm0
   vmovdqu [rsp], ymm0
   mov rsi, rsp
   mov rdi, r8
   mov ecx, edx
   REP movsb
epilog:
   add rsp, 40
   pop rdi
   pop rsi
ret
public RDTSC1
RDTSC1:
   RDTSC
   shl rdx, 32
   or rax, rdx
   ret
section '.data' readable writeable
temp rb 32
bias dd 05050505h
lower dd 65656565h
bit dd 20202020h

And test.f90:

! test.f90
! ifort test.f90 UpCase.obj
module M
   use ISO_C_BINDING
   implicit none
   private
   public UpCase, RDTSC, updncase
   interface
      subroutine UpCase(string, length) bind(C,name='UpCase')
         import
         implicit none
         character(KIND=C_CHAR), intent(INOUT) :: string(*)
         integer(C_SIZE_T), value :: length
      end subroutine UpCase
   end interface
   interface
      function RDTSC() bind(C,name='RDTSC1')
         import
         implicit none
         integer(C_INT64_T) RDTSC
      end function RDTSC
   end interface
   contains
      SUBROUTINE updncase (string, updn)
         IMPLICIT NONE
         CHARACTER(LEN=*), INTENT(INOUT) :: string
         INTEGER, INTENT(IN)    :: updn
         INTEGER       :: j, nc
         CHARACTER(LEN=26), PARAMETER :: lower = 'abcdefghijklmnopqrstuvwxyz'
         CHARACTER(LEN=26), PARAMETER :: upper = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
 
         SELECT CASE (updn)
         CASE (1)  
            DO j = 1, LEN(string)
               nc = INDEX(lower, string(j:j))
               IF (nc > 0) string(j:j) = upper(nc:nc)
            END DO
         CASE (2)  
            DO j = 1, LEN(string)
               nc = INDEX(upper, string(j:j))
               IF (nc > 0) string(j:j) = lower(nc:nc)
            END DO
         END SELECT 
      END SUBROUTINE updncase  
end module M

program P
   use M
   use ISO_C_BINDING
   implicit none
   character(LEN=:,KIND=C_CHAR), allocatable :: Bible
   integer(C_INT64_T) t0, t1
   integer size
! http://ebible.org/kjv/kjvtxt.zip
   open(10,file='kjv.txt',status='old',access='stream')
   inquire(unit=10,size=size)
   write(*,'(*(g0))') 'File size = ',size
   allocate(character(LEN=size,KIND=C_CHAR) :: Bible)
   read(10) Bible
   close(10)
   t0 = RDTSC()
   call UpCase(Bible,len(Bible,C_SIZE_T))
   t1 = RDTSC()
   open(10,file='UpCase.txt',status='replace',access='stream')
   write(10) Bible
   close(10)
   write(*,'(*(g0))') 'Clock cycles for UpCase = ', t1-t0

   open(10,file='kjv.txt',status='old',access='stream')
   read(10) Bible
   close(10)
   t0 = RDTSC()
   call updncase(Bible,1)
   t1 = RDTSC()
   open(10,file='updncase.txt',status='replace',access='stream')
   write(10) Bible
   close(10)
   write(*,'(*(g0))') 'Clock cycles for updncase = ', t1-t0
end program P

Also it requires kjv.txt. To get it, go to http://ebible.org/kjv/kjvtxt.zip unzip it and run combine.bat. I didn't post it here because it's 4 MB. Then you can assemble UpCase.asm with FASM and compile test.f90 with ifort. My output:

File size = 4523438
Clock cycles for UpCase = 897107
Clock cycles for updncase = 375973272

The first time is for UpCase  and the second for updncase. Windiff.exe found the output files UpCase.txt and updncase.txt to be identical. subroutine UpCase was about 400X faster than subroutine updncase because it could process 32 bytes in the 11 instructions of its main_part: loop. That's the kind of thing I was trying to achieve with my earlier Fortran code, but there's a bit of a problem that Fortran doesn't define overflows as nicely as it could. Still I could have done better in my original Fortran code. Couldn't get it to ICE even when I incorporated it into an initialization expression -- must be a compiler bug :)

EDIT: The program had the file size hardwired incorrectly, so I fixed that by using INQUIRE to get the correct size.

mecej4
Honored Contributor III
1,691 Views

First of all, THANKS to RO for his interesting posts.

Out of curiosity, I added RO's Fortran Upcase from #9 to his test program of #12, and removed the UPDN=2 code from UPDNCASE. On a laptop with an I5-4200U CPU, I get

Clock cycles for UpCase    =      1104393 (  1.1E6)
Clock cycles for ROUpcase  =     22040992 ( 22E6)
Clock cycles for PCUpcase  =    328797223 (329E6)

Some comments:

  • Repeating the runs gave slightly different clock counts. Not more than the first three digits are repeatable.
  • The run (time for reading the bible three times, converting and writing the uppercase version three times) took 0.2 s. Of this time, 0.05 s was spent on file I/O.
  • The timing will vary with the CPU generation and model
  • I opened kjv.txt in a text editor, Select+All, Uppercase Selection; this took a fraction of a second. Undo (to revert) took about a second.
  • RO's upcase.asm will require a CPU with AVX instructions (Sandy Bridge and newer).
  • Based on marketing literature, I tended to think of AVX as more for floating point and image processing. RO has taught us that AVX instructions can be used for text processing.
  • RO can probably backport upcase.asm so that it will work on CPUs with SSE2, and thereby improve his standing as a repeat offender.

 

0 Kudos
andrew_4619
Honored Contributor II
1,691 Views

I really enjoy following this type of thread but back in the real world the conclusion is that all the methods are 'fast' for most practical purposes.

0 Kudos
Steve_Lionel
Honored Contributor III
1,691 Views

It gets more fun when you need to support character sets that don't have a 1:1 mapping between lower and upper case!

0 Kudos
mecej4
Honored Contributor III
1,691 Views

andrew_4619 wrote:
I really enjoy following this type of thread but back in the real world the conclusion is that all the methods are 'fast' for most practical purposes.

Agreed, but it is fun to escape from the real world for a break now and then.

Also, when someone shows us things that we had never imagined were possible, sometimes they can become very useful.

See this Gfortran bug report for something of this sort: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78549 . One of the other posters here and on C.L.F. had come across the problem earlier, but his complaints were being sidelined. Zeroing in on the problem (a tenfold slow-down in formatting real numbers) helped getting the it to be recognized and fixed.

0 Kudos
mecej4
Honored Contributor III
1,691 Views

Steve Lionel (Ret.) wrote:
 It gets more fun when you need to support character sets that don't have a 1:1 mapping between lower and upper case!

And no fun at all when the character set has no concept of case, such as Indian languages ?

0 Kudos
JVanB
Valued Contributor II
1,691 Views

I tried a couple of translations of my assembly code into Fortran:

      subroutine trial1(string,length) bind(C)
         use ISO_C_BINDING
         implicit none
         character(KIND=C_CHAR), intent(INOUT), target :: string(*)
         integer(C_SIZE_T), value :: length
         type(C_PTR) address
         integer(C_INT8_T), pointer :: data(:)
         integer(C_INT8_T), parameter :: &
            bias = int(Z'05',C_INT8_T), &
            lower = int(Z'65',C_INT8_T), &
            bit = int(Z'20',C_INT8_T)
         address = C_LOC(string)
         call C_F_POINTER(address,data,[length])
         where(data+bias > lower) data = iand(not(bit),data)
      end subroutine trial1
      subroutine trial2(string,length) bind(C)
         use ISO_C_BINDING
         implicit none
         character(KIND=C_CHAR), intent(INOUT), target :: string(*)
         integer(C_SIZE_T), value :: length
         type(C_PTR) address
         integer(C_INT8_T), pointer :: data(:)
         integer(C_INT8_T), parameter :: &
            bias = int(Z'05',C_INT8_T), &
            lower = int(Z'65',C_INT8_T), &
            bit = int(Z'20',C_INT8_T), &
            truemask = int(Z'FF',C_INT8_T), &
            falsemask = int(Z'00',C_INT8_T)
         address = C_LOC(string)
         call C_F_POINTER(address,data,[length])
         data = iand(not(iand(merge(truemask,falsemask, &
            data+bias > lower),bit)),data)
      end subroutine trial2

It seemed that /arch:core-avx2 helped the WHERE version out a lot, but not the more literal translation:

D:\>ifort /nologo /arch:host test2.f90 UpCase.obj

D:\>test2
File size = 4523438
Clock cycles for UpCase = 875272
Clock cycles for updncase = 393948858
Clock cycles for trial1 = 35775283
Clock cycles for trial2 = 12334499

D:\>ifort /nologo /arch:core-avx2 test2.f90 UpCase.obj

D:\>test2
File size = 4523438
Clock cycles for UpCase = 847216
Clock cycles for updncase = 379932665
Clock cycles for trial1 = 9246359
Clock cycles for trial2 = 9139662

Windiff considered UpCase.txt, updncase.txt, trial1.txt, and trial2.txt to be identical.

 

0 Kudos
FortranFan
Honored Contributor II
1,691 Views

Repeat Offender wrote:

.. Couldn't get it to ICE even when I incorporated it into an initialization expression -- must be a compiler bug :) ..

Mega kudos to Intel Fortran team on the lack of ICEs thus far!

Is the posted assembler derived from the non-standard C function strupr?  I can't use FASM at the moment and the posted assembler code won't work with Microsoft ML64.  Perhaps someone will post the tick counts with the strupr C function, here's the code snippet toward it!  

module m

   use, intrinsic :: iso_c_binding, only : c_char, c_ptr, c_loc, c_size_t

   implicit none

   private

   interface

      function strupr( sptr ) bind(C, name="strupr") result(ups)
      ! non-standard C function; #include <string.h>
      ! char *strupr(char *string);
         import :: c_ptr
         implicit none
         ! Argument list
         type(c_ptr), intent(in), value :: sptr
         ! function result
         type(c_ptr) :: ups
      end function

   end interface

   public :: UpperCase

contains

   subroutine UpperCase( string )

      ! Argument list
      character(kind=c_char, len=*), intent(inout), target :: string

      ! Local variables
      type(c_ptr) :: add_s

      add_s = strupr( c_loc(string) )

      return

   end subroutine UpperCase

end module m

program p

   use iso_c_binding, only : c_char, c_size_t
   use iso_fortran_env, only : output_unit
   use m, only : UpperCase

   implicit none

   ! Local variables
   character(len=*), parameter :: fmtg = "(*(g0))"
   character(kind=c_char,len=:), allocatable :: Bible
   integer(c_size_t) :: fsize
   integer :: lun

   ! http://ebible.org/kjv/kjvtxt.zip
   open( newunit=lun, file='kjv.txt', status='old', access='stream' )

   inquire( unit=lun, size=fsize )
   write( output_unit, fmt=fmtg ) 'File size = ', fsize
   allocate (character(len=fsize, kind=c_char) :: Bible )

   read(unit=lun) Bible
   close(unit=lun)

   call UpperCase( Bible )

   write( output_unit, fmt=fmtg ) Bible(:42)

   stop

end program p

Upon execution I get:

C:\Temp>ifort /standard-semantics /warn:all p.f90
Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R
) 64, Version 18.0.1.156 Build 20171018
Copyright (C) 1985-2017 Intel Corporation.  All rights reserved.

Microsoft (R) Incremental Linker Version 14.12.25835.0
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:p.exe
-subsystem:console
p.obj

C:\Temp>p.exe
File size = 4523438

THE FIRST BOOK OF MOSES, CALLED GENESIS

C:\Temp>

 

0 Kudos
JVanB
Valued Contributor II
1,691 Views

I tried strupr and _strupr_s

   interface
      function strupr_s(str, numberOfElements) bind(C,name='_strupr_s')
         use ISO_C_BINDING
         implicit none
         integer(C_INT) strupr_s
         character(KIND=C_CHAR), intent(INOUT) :: str(*)
         integer(C_SIZE_T), value :: numberOfElements
      end function strupr_s
   end interface
   interface
      function strupr(str) bind(C,name='strupr')
         use ISO_C_BINDING
         implicit none
         integer(C_INT) strupr
         character(KIND=C_CHAR), intent(INOUT) :: str(*)
      end function strupr
   end interface

Results with \arch:core-avx2 were

File size = 4523438
Clock cycles for UpCase = 882187
Clock cycles for updncase = 392973204
Clock cycles for trial1 = 9601161
Clock cycles for trial2 = 9834643
Clock cycles for strupr = 35929257

No output for _strupr_s, which always crashed. BTW, you can probably persuade ml64 to assemble at least RDTSC1; if not I posted a pure Fortran version of RDTSC somewhere.

0 Kudos
FortranFan
Honored Contributor II
1,475 Views

Repeat Offender wrote:

.. Results with \arch:core-avx2 were

File size = 4523438
Clock cycles for UpCase = 882187

Holy XXX!  Remarkable indeed, hats off!

Presumably C, C++ based implementations all do similar pointer-based traversals and end up with similar tick counts, here's a result with C++ STL and Microsoft toolset:

#include <iostream>
#include <fstream>
#include <sstream>
using namespace std;

extern "C" int64_t RDTSC1();

int main( )
{
   std::ifstream ifs ("kjv.txt", std::ifstream::in);

   stringstream strStream;
   strStream << ifs.rdbuf();
   string Bible = strStream.str();

   ifs.close();

   int64_t t0 = RDTSC1();
   transform(Bible.begin(), Bible.end(), Bible.begin(), ::toupper);
   int64_t t1 = RDTSC1();

   std::cout << "Clock cycles for C++ STL Transform: " << (t1-t0) << "\n";
   std::cout << Bible.substr(0,41) << "\n";

   return 0;
}
C:\Temp>cl /c /EHsc ucase.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

ucase.cpp

C:\Temp>link ucase.obj UpCase.obj /subsystem:console /out:p.exe
Microsoft (R) Incremental Linker Version 14.00.24215.1
Copyright (C) Microsoft Corporation.  All rights reserved.


C:\Temp>p.exe
Clock cycles for C++ STL Transform: 85559157

THE FIRST BOOK OF MOSES, CALLED GENESIS


C:\Temp>

 

0 Kudos
Reply