- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a small subroutine transferring text from keyboard entries, etc from lower to uppercase letters. I used this routine for CVF and the version shown below I also apply for IVF Win32 applications. What happens is that the program crashes with the error message:Unhandled exception at 0x7790DF7C (ntdll.dll) in CaFeMS1-IVF.exe: 0xC0000005: Access violation writing location 0x002A0FFC... before the line ILENGTH = len_trim(STRING) - 1 where a breakpoint is set.
The subroutine is linked from a library file.
Can anyone give me a tip?
Program listing follows here:
! IF(I.GT.IEND)GOTO 2000 ! 25/5-2000
! ICOM is character*2
READ(IDEDIT,202)ICOM
CALL TRANS_TO_UPPERCASE(ICOM,2)
SUBROUTINE TRANS_TO_UPPERCASE(STRING,LENGTH)
use IFPORT
INTEGER*4 KAR,ILENGTH,LENGTH
CHARACTER*(LENGTH)STRING
c CHARACTER*(40)STRING
! ILENGTH = LNBLNK(STRING)
ILENGTH = len_trim(STRING) - 1
C....Transfer STRING to uppercase letters....................
DO 1000 I=1,ILENGTH
KAR=0
KAR=ICHAR(STRING(I:I))
IF(KAR.GE.97)STRING(I:I)=CHAR(KAR-32)
1000 CONTINUE
RETURN
END
Link Copied
- « Previous
-
- 1
- 2
- Next »
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Mmmm... I restructured my UpCase subroutine and found it had a bug that could bite at small lengths, so I post the restructured subroutine which I have tested for different cases of alignments and lengths.
; UpCase2.asm ; fasm UpCase2.asm format ms64 coff section '.code' readable executable public UpCase UpCase: mov r9, rsi mov r10, rdi test rdx, rdx jle epilog mov r8, rcx mov rsi, rcx neg ecx and ecx, 31 mov eax, 00056520h vmovd xmm1, eax vpbroadcastb ymm3, xmm1 vpsrld ymm1, ymm1, 8 vpbroadcastb ymm2, xmm1 vpsrld ymm1, ymm1, 8 vpbroadcastb ymm1,xmm1 jz main_part lea rdi, [rsp+8] mov eax, edx sub rdx, rcx cmovl ecx, eax mov eax, ecx REP movsb vmovdqu ymm0, [rsp+8] vpaddb ymm4, ymm0, ymm1 vpcmpgtb ymm4, ymm4, ymm2 vpand ymm4, ymm4, ymm3 vpandn ymm0, ymm4, ymm0 vmovdqu [rsp+8], ymm0 lea rsi, [rsp+8] mov rdi, r8 mov ecx, eax REP movsb mov rsi, rdi jle epilog main_part: mov ecx, edx and rdx, -32 jz tail add rsi, rdx neg rdx main_loop: vmovdqa ymm0, [rsi+rdx] vpaddb ymm4, ymm0, ymm1 vpcmpgtb ymm4, ymm4, ymm2 vpand ymm4, ymm4, ymm3 vpandn ymm0, ymm4, ymm0 vmovdqa [rsi+rdx], ymm0 add rdx, 32 js main_loop tail: and ecx, 31 jz epilog mov r8, rsi mov eax, ecx lea rdi, [rsp+8] REP movsb vmovdqu ymm0, [rsp+8] vpaddb ymm4, ymm0, ymm1 vpcmpgtb ymm4, ymm4, ymm2 vpand ymm4, ymm4, ymm3 vpandn ymm0, ymm4, ymm0 vmovdqu [rsp+8], ymm0 lea rsi, [rsp+8] mov rdi, r8 mov ecx, eax REP movsb epilog: mov rsi, r9 mov rdi, r10 ret public RDTSC1 RDTSC1: RDTSC shl rdx, 32 or rax, rdx ret
It had similar performance characteristics to the original.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Code that just used XLAT to perform the conversion was faster than the libc stuff:
XLAT1: mov r8, rbx mov rbx, table test rdx, rdx jle xlat_done mov r9, rcx mov r10, rdx neg ecx and ecx, 7 jz main_xlat sub rdx, rcx cmovl ecx, r10d add r9, rcx neg rcx xlat_prolog: mov al, [r9+rcx] xlatb mov [r9+rcx], al add rcx, 1 js xlat_prolog main_xlat: mov rcx, rdx and rdx, -8 jl xlat_done je xlat_epilog add r9, rdx neg rdx xlat_loop: mov rax, [r9+rdx] repeat 8 xlatb ror rax, 8 end repeat mov [r9+rdx], rax add rdx, 8 js xlat_loop xlat_epilog: and ecx, 7 jz xlat_done epilog_loop: mov al, [r9+rcx-1] xlatb mov [r9+rcx-1], al sub rcx, 1 jnz epilog_loop xlat_done: mov rbx, r8 ret
Output:
File size = 4523438 Clock cycles for UpCase = 922020 Clock cycles for updncase = 395661851 Clock cycles for XLAT = 11453485
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Repeat Offender wrote:
I tried strupr and _strupr_s... No output for _strupr_s, which always crashed.
If you had linked against the static version of the UCRT library, you would probably have seen an assertion-failure pop-up complaining about the buffer not containing a null-terminated string. Something like the following would work:
char bible[BUFSIZE+1]; ... nin=fread(bible,1,BUFSIZE,fili); bible[nin]='\0'; // everyone must null-terminate strings err = _strupr_s(txt,nin+1);
But then, a lot of CPU cycles would be wasted in _strupr_s() making sure that the null is present, so why use _strupr_s?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Repeat Offender wrote:
Code that just used XLAT to perform the conversion was faster than the libc stuff: ..
Fortran users may prefer to use code they can relate to like the one below that is based on Fortran intrinsics with the ASCII character set and get similar (or even faster) conversion than that based on the non-standard strupr function in C while being functional for their needs:
subroutine UpperCaseAscii( string ) ! Argument list character(len=*), intent(inout) :: string ! Local variables integer, parameter :: CASE_OFFSET = iachar('A') - iachar('a') integer :: I integer :: K do I = 1, len_trim(string) K = iachar( string(I:I) ) if ( K >= iachar('a') .and. K <= iachar('z') ) then string(I:I) = achar(K + CASE_OFFSET) end if end do return end subroutine UpperCaseAscii
The following is a reasonable average of results I see on a couple of different machines:
C:\Temp>p.exe File size = 4523438 Clock cycles for RO UpCase AVX ASM code = 867051 Clock cycles with C strupr function = 36714453 Clock cycles for UpperCaseAscii (Fortran IACHAR) = 34864104 Clock cycles for PC updncase (CHAR LUT) = 273949856 Ratio of clock cycles: strupr/UpperCaseAscii = 1.05 C:\Temp>
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Of, so numberOfElements has to be the size of the whole buffer, counting the terminating NUL. Got that to work.
File size = 4523438 Clock cycles for updncase = 376145820 Clock cycles for ROupcase = 21672422 Clock cycles for UpCase = 805513 Clock cycles for trial1 = 9281737 Clock cycles for trial2 = 9176733 Clock cycles for strupr = 36265847 Clock cycles for strupr_s = 42524356 Clock cycles for XLAT = 11115903
(with /arch:core-avx2)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Repeat Offender wrote:
Oh, so numberOfElements has to be the size of the whole buffer, counting the terminating NUL
Rather, the portion of the buffer that you pass to _strupr_s must contain a NUL somewhere. The conversion stops at the first NUL.
#include <string.h> #include <stdio.h> int main( void ){ char string[25]; errno_t err; strcpy(string,"AbraCobrA"); strcpy(string+10,"CobraAbra"); err = _strupr_s( string, 20); printf( "Upper: %s %s\n", string, string+10 ); }
Output:
Upper: ABRACOBRA CobraAbra

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- « Previous
-
- 1
- 2
- Next »