Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.
Announcements
FPGA community forums and blogs on community.intel.com are migrating to the new Altera Community and are read-only. For urgent support needs during this transition, please visit the FPGA Design Resources page or contact an Altera Authorized Distributor.

Problem with LNBLNK and len_trim

reidar
New User
7,525 Views

I have a small subroutine transferring text from keyboard entries, etc from lower to uppercase letters. I used this routine for CVF and the version shown below I also apply for IVF Win32 applications. What happens is that the program crashes with the error message:Unhandled exception at 0x7790DF7C (ntdll.dll) in CaFeMS1-IVF.exe: 0xC0000005: Access violation writing location 0x002A0FFC... before the line ILENGTH = len_trim(STRING) - 1 where a breakpoint is set.

The subroutine is linked from  a library file.

Can anyone give me a tip?

Program listing follows here:

! IF(I.GT.IEND)GOTO 2000 ! 25/5-2000

! ICOM is character*2

READ(IDEDIT,202)ICOM

CALL TRANS_TO_UPPERCASE(ICOM,2)

SUBROUTINE TRANS_TO_UPPERCASE(STRING,LENGTH)

use IFPORT

INTEGER*4 KAR,ILENGTH,LENGTH

CHARACTER*(LENGTH)STRING

c CHARACTER*(40)STRING

! ILENGTH = LNBLNK(STRING)

ILENGTH = len_trim(STRING) - 1

C....Transfer STRING to uppercase letters....................

DO 1000 I=1,ILENGTH

KAR=0

KAR=ICHAR(STRING(I:I))

IF(KAR.GE.97)STRING(I:I)=CHAR(KAR-32)

1000 CONTINUE

RETURN

END

 

0 Kudos
26 Replies
JVanB
Valued Contributor II
1,510 Views

Mmmm... I restructured my UpCase subroutine and found it had a bug that could bite at small lengths, so I post the restructured subroutine which I have tested for different cases of alignments and lengths.

; UpCase2.asm
; fasm UpCase2.asm
format ms64 coff
section '.code' readable executable
public UpCase
UpCase:
   mov r9, rsi
   mov r10, rdi
   test rdx, rdx
   jle epilog
   mov r8, rcx
   mov rsi, rcx
   neg ecx
   and ecx, 31
   mov eax, 00056520h
   vmovd xmm1, eax
   vpbroadcastb ymm3, xmm1
   vpsrld ymm1, ymm1, 8
   vpbroadcastb ymm2, xmm1
   vpsrld ymm1, ymm1, 8
   vpbroadcastb ymm1,xmm1
   jz main_part
   lea rdi, [rsp+8]
   mov eax, edx
   sub rdx, rcx
   cmovl ecx, eax
   mov eax, ecx
   REP movsb
   vmovdqu ymm0, [rsp+8]
   vpaddb ymm4, ymm0, ymm1
   vpcmpgtb ymm4, ymm4, ymm2
   vpand ymm4, ymm4, ymm3
   vpandn ymm0, ymm4, ymm0
   vmovdqu [rsp+8], ymm0
   lea rsi, [rsp+8]
   mov rdi, r8
   mov ecx, eax
   REP movsb
   mov rsi, rdi
   jle epilog
main_part:
   mov ecx, edx
   and rdx, -32
   jz tail
   add rsi, rdx
   neg rdx
main_loop:
   vmovdqa ymm0, [rsi+rdx]
   vpaddb ymm4, ymm0, ymm1
   vpcmpgtb ymm4, ymm4, ymm2
   vpand ymm4, ymm4, ymm3
   vpandn ymm0, ymm4, ymm0
   vmovdqa [rsi+rdx], ymm0
   add rdx, 32
   js main_loop
tail:
   and ecx, 31
   jz epilog
   mov r8, rsi
   mov eax, ecx
   lea rdi, [rsp+8]
   REP movsb
   vmovdqu ymm0, [rsp+8]
   vpaddb ymm4, ymm0, ymm1
   vpcmpgtb ymm4, ymm4, ymm2
   vpand ymm4, ymm4, ymm3
   vpandn ymm0, ymm4, ymm0
   vmovdqu [rsp+8], ymm0
   lea rsi, [rsp+8]
   mov rdi, r8
   mov ecx, eax
   REP movsb
epilog:
   mov rsi, r9
   mov rdi, r10
ret
public RDTSC1
RDTSC1:
   RDTSC
   shl rdx, 32
   or rax, rdx
   ret

It had similar performance characteristics to the original.

 

0 Kudos
JVanB
Valued Contributor II
1,510 Views

Code that just used XLAT to perform the conversion was faster than the libc stuff:

XLAT1:
   mov r8, rbx
   mov rbx, table
   test rdx, rdx
   jle xlat_done
   mov r9, rcx
   mov r10, rdx
   neg ecx
   and ecx, 7
   jz main_xlat
   sub rdx, rcx
   cmovl ecx, r10d
   add r9, rcx
   neg rcx
xlat_prolog:
   mov al, [r9+rcx]
   xlatb
   mov [r9+rcx], al
   add rcx, 1
   js xlat_prolog
main_xlat:
   mov rcx, rdx
   and rdx, -8
   jl xlat_done
   je xlat_epilog
   add r9, rdx
   neg rdx
xlat_loop:
   mov rax, [r9+rdx]
   repeat 8
      xlatb
      ror rax, 8
   end repeat
   mov [r9+rdx], rax
   add rdx, 8
   js xlat_loop
xlat_epilog:
   and ecx, 7
   jz xlat_done
epilog_loop:
   mov al, [r9+rcx-1]
      xlatb
   mov [r9+rcx-1], al
   sub rcx, 1
   jnz epilog_loop
xlat_done:
   mov rbx, r8
   ret

Output:

File size = 4523438
Clock cycles for UpCase = 922020
Clock cycles for updncase = 395661851
Clock cycles for XLAT = 11453485

 

0 Kudos
mecej4
Honored Contributor III
1,510 Views

Repeat Offender wrote:
I tried strupr and _strupr_s

... No output for _strupr_s, which always crashed.

If you had linked against the static version of the UCRT library, you would probably have seen an assertion-failure pop-up complaining about the buffer not containing a null-terminated string. Something like the following would work:

   char bible[BUFSIZE+1];
   ...
   nin=fread(bible,1,BUFSIZE,fili);
   bible[nin]='\0';                // everyone must null-terminate strings
   err = _strupr_s(txt,nin+1);

But then, a lot of CPU cycles would be wasted in _strupr_s() making sure that the null is present, so why use _strupr_s?

0 Kudos
FortranFan
Honored Contributor III
1,510 Views

Repeat Offender wrote:

Code that just used XLAT to perform the conversion was faster than the libc stuff: ..

Fortran users may prefer to use code they can relate to like the one below that is based on Fortran intrinsics with the ASCII character set and get similar (or even faster) conversion than that based on the non-standard strupr function in C while being functional for their needs:

   subroutine UpperCaseAscii( string )

      ! Argument list
      character(len=*), intent(inout) :: string

      ! Local variables
      integer, parameter :: CASE_OFFSET = iachar('A') - iachar('a')
      integer :: I
      integer :: K

      do I = 1, len_trim(string)
         K = iachar( string(I:I) )
         if ( K >= iachar('a') .and. K <= iachar('z') ) then
            string(I:I) = achar(K + CASE_OFFSET)
         end if
      end do

      return

   end subroutine UpperCaseAscii

The following is a reasonable average of results I see on a couple of different machines:

C:\Temp>p.exe
File size = 4523438
Clock cycles for RO UpCase AVX ASM code = 867051
Clock cycles with C strupr function = 36714453
Clock cycles for UpperCaseAscii (Fortran IACHAR) = 34864104
Clock cycles for PC updncase (CHAR LUT) = 273949856

Ratio of clock cycles: strupr/UpperCaseAscii = 1.05

C:\Temp>

 

0 Kudos
JVanB
Valued Contributor II
1,510 Views

Of, so numberOfElements has to be the size of the whole buffer, counting the terminating NUL. Got that to work.

File size = 4523438
Clock cycles for updncase = 376145820
Clock cycles for ROupcase = 21672422
Clock cycles for UpCase = 805513
Clock cycles for trial1 = 9281737
Clock cycles for trial2 = 9176733
Clock cycles for strupr = 36265847
Clock cycles for strupr_s = 42524356
Clock cycles for XLAT = 11115903

(with /arch:core-avx2)

 

0 Kudos
mecej4
Honored Contributor III
1,510 Views

Repeat Offender wrote:
Oh, so numberOfElements has to be the size of the whole buffer, counting the terminating NUL

Rather, the portion of the buffer that you pass to _strupr_s must contain a NUL somewhere. The conversion stops at the first NUL.

#include <string.h>
#include <stdio.h>

int main( void ){
   char string[25];
   errno_t err;
   strcpy(string,"AbraCobrA"); strcpy(string+10,"CobraAbra");
   err = _strupr_s( string, 20);
   printf( "Upper: %s %s\n", string, string+10 );
}

Output:

Upper: ABRACOBRA CobraAbra

 

0 Kudos
Reply