- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a small subroutine transferring text from keyboard entries, etc from lower to uppercase letters. I used this routine for CVF and the version shown below I also apply for IVF Win32 applications. What happens is that the program crashes with the error message:Unhandled exception at 0x7790DF7C (ntdll.dll) in CaFeMS1-IVF.exe: 0xC0000005: Access violation writing location 0x002A0FFC... before the line ILENGTH = len_trim(STRING) - 1 where a breakpoint is set.
The subroutine is linked from a library file.
Can anyone give me a tip?
Program listing follows here:
! IF(I.GT.IEND)GOTO 2000 ! 25/5-2000
! ICOM is character*2
READ(IDEDIT,202)ICOM
CALL TRANS_TO_UPPERCASE(ICOM,2)
SUBROUTINE TRANS_TO_UPPERCASE(STRING,LENGTH)
use IFPORT
INTEGER*4 KAR,ILENGTH,LENGTH
CHARACTER*(LENGTH)STRING
c CHARACTER*(40)STRING
! ILENGTH = LNBLNK(STRING)
ILENGTH = len_trim(STRING) - 1
C....Transfer STRING to uppercase letters....................
DO 1000 I=1,ILENGTH
KAR=0
KAR=ICHAR(STRING(I:I))
IF(KAR.GE.97)STRING(I:I)=CHAR(KAR-32)
1000 CONTINUE
RETURN
END
Link Copied
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
character(len=*) :: string
Strings length has been defined in the caller. You can get in in the sub with len(string). What is the length passed in mean to be the decalred length or a sub string length?
I would use ascii >=65 to 90<= BTW
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
As you probably know by now, fragments of code with some bug are nearly impossible to diagnose in a reasonable amount of time, and the number of ways in which the code could have failed can be more than you want to read about. You did not show the code where the subroutine is called, so we do not know whether the actual arguments match the dummy arguments.
The subroutine that you showed has the following potential bugs:
1. The second argument is not used except to declare the desired length of the first argument. What happens if the caller passes a value in LENGTH that is not equal to the length of STRING? What happens when STRING contains a single non-blank character followed by blanks?
2. Consider what happens with several edge cases:
(i) The trimmed string length is 1
(ii) One or more characters in the part of STRING that are to be uppercased are in the set ['{','}','|','~']
3. What is IFPORT needed for? You commented out the line with a reference to LNBLNK.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Optimal methods of changing character case have been around for a long time. Try this:
SUBROUTINE updncase (string, updn) IMPLICIT NONE CHARACTER(LEN=*), INTENT(INOUT) :: string INTEGER, INTENT(IN) :: updn INTEGER :: j, nc CHARACTER(LEN=26), PARAMETER :: lower = 'abcdefghijklmnopqrstuvwxyz' CHARACTER(LEN=26), PARAMETER :: upper = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' SELECT CASE (updn) CASE (1) DO j = 1, LEN(string) nc = INDEX(lower, string(j:j)) IF (nc > 0) string(j:j) = upper(nc:nc) END DO CASE (2) DO j = 1, LEN(string) nc = INDEX(upper, string(j:j)) IF (nc > 0) string(j:j) = lower(nc:nc) END DO END SELECT END SUBROUTINE updncase
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you all for comments. It turned out that the root cause was due to something that happened during migration from CVF to IVF.
Paul: If I understand your code, the routine updncase change the case either from upper lo lower or vice versa.. My routine takes a mixed string and transfer all characters to uppercase.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
reidar wrote:
Paul: If I understand your code, the routine updncase change the case either from upper lo lower or vice versa. My routine takes a mixed string and transfer all characters to uppercase.
In that case, you need the following half (case updn = 1) of Paul's code. On X86 and X64, the intrinsic INDEX can use the fast string search (SCASB) instruction with a repetition count of 26. If you are going to pass strings with trailing blanks to this subroutine, you may wish to use LEN_TRIM instead of LEN, or, if your strings will have no embedded blanks, exit the DO loop when a blank character is encountered.
SUBROUTINE upcase (string) IMPLICIT NONE CHARACTER(LEN=*), INTENT(INOUT) :: string INTEGER :: j, nc CHARACTER(LEN=26), PARAMETER :: lower = 'abcdefghijklmnopqrstuvwxyz' CHARACTER(LEN=26), PARAMETER :: upper = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' DO j = 1, LEN(string) nc = INDEX(lower, string(j:j)) IF (nc > 0) string(j:j) = upper(nc:nc) END DO END SUBROUTINE upcase
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
mecej4 wrote:
.....the intrinsic INDEX can use the fast string search (SCASB) instruction with a repetition count of 26.
Very Interesting. I have long realised the INDEX is a wonderful (and quick) tool (which I use frequently) but was unaware of SCASB which I have just read up on. In this instance however I would have thought checking a character against 26 characters 'quickly' within a loop would not be significantly different in time to checking if an ascii integer was between upper and lower limit values within a loop. I am not going to test this hypothesis as even if there was a 'big' difference (x5) it would not make a worthwhile benefit as I am not doing gazillions up upp/low casing. Anyhow thank you for the education!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Andrew, if you found the SCAS instruction interesting, you may find the much less used XLAT instruction ( http://www.felixcloutier.com/x86/XLAT:XLATB.html )equally interesting.
One could set up one 256-byte array, say UPR, set to Z'00',...,Z'FF', except that UPR(97:122) should be set to Z'45',...,Z'5A'. This sets up the "translation table" which, combined with string LODS, XLATB and STOS in a tight loop would be very fast.
On the other hand, the C library the IFort uses already contains the standard functions strupr(), etc., which could be used as a more portable, safer, alternative.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was trying to find an expression that determined whether a character was a letter or not that wasn't too expensive to evaluate...
module M implicit none contains subroutine upcase(string) character(*), intent(INOUT) :: string call sub(string,len(string)) contains subroutine sub(string,len) integer, intent(in) :: len character, intent(INOUT) :: string(len) string = merge(achar(ibclr(iachar(string),5)),string, & ibits(dim(ibclr(iachar(string),5),64)-1,0,8)<26) end subroutine sub end subroutine upcase end module M program P use M implicit none integer i character(40) string string = 'yOU WiLl `FInD'' iT @ThE [13{ oaK tRee.' write(*,'(a)') trim(string) call upcase(string) write(*,'(a)') trim(string) end program P
Output with ifort:
yOU WiLl `FInD' iT @ThE [13{ oaK tRee. YOU WILL `FIND' IT @THE [13{ OAK TREE.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
RO, you're slipping! I would have expected at least five ICEs here....
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The implicit point of updncase() is that a precomputed or preset lookup table (LUT) is usually (always?) much more efficient than on-the-spot computation, regardless of how fascinatingly compactly (and incomprehensibly) the computation can be written [merge
(
achar
(
ibclr
(
iachar
(
... really?].
Many years ago I wrote an image-processing code for a fluorescence system which computed the ratio image of two video frames by using the numerator and denominator pixel values as indices into a LUT whose values were the ratio for each combination of n/d; was very fast, worked at frame rates, where an attempt to actually compute the ratio image would have killed the project.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
LUTs touch memory or blow out cache. Computation is fast on 21st century CPUs. Three filse are needed for this test: UpCase.asm
; UpCase.asm ; fasm UpCase.asm format ms64 coff section '.code' readable executable public UpCase UpCase: push rsi push rdi sub rsp, 40 test rdx, rdx js epilog mov r8, rcx mov rsi, rcx neg ecx and ecx, 31 vbroadcastss ymm1, [bias] vbroadcastss ymm2, [lower] vbroadcastss ymm3, [bit] jz main_part mov rdi, rsp sub rdx, rcx cmovl ecx, edx mov eax, ecx REP movsb vmovdqu ymm0, [rsp] vpaddb ymm4, ymm0, ymm1 vpcmpgtb ymm4, ymm4, ymm2 vpand ymm4, ymm4, ymm3 vpandn ymm0, ymm4, ymm0 vmovdqu [rsp], ymm0 mov rsi, rsp mov rdi, r8 mov ecx, eax REP movsb mov rsi, rdi jle epilog main_part: cmp rdx, 31 jle tail vmovdqa ymm0, [rsi] vpaddb ymm4, ymm0, ymm1 vpcmpgtb ymm4, ymm4, ymm2 vpand ymm4, ymm4, ymm3 vpandn ymm0, ymm4, ymm0 vmovdqa [rsi], ymm0 add rsi, 32 sub rdx, 32 jmp main_part tail: test rdx, rdx jz epilog mov r8, rsi mov rdi, rsp mov ecx, edx REP movsb vmovdqu ymm0, [rsp] vpaddb ymm4, ymm0, ymm1 vpcmpgtb ymm4, ymm4, ymm2 vpand ymm4, ymm4, ymm3 vpandn ymm0, ymm4, ymm0 vmovdqu [rsp], ymm0 mov rsi, rsp mov rdi, r8 mov ecx, edx REP movsb epilog: add rsp, 40 pop rdi pop rsi ret public RDTSC1 RDTSC1: RDTSC shl rdx, 32 or rax, rdx ret section '.data' readable writeable temp rb 32 bias dd 05050505h lower dd 65656565h bit dd 20202020h
And test.f90:
! test.f90 ! ifort test.f90 UpCase.obj module M use ISO_C_BINDING implicit none private public UpCase, RDTSC, updncase interface subroutine UpCase(string, length) bind(C,name='UpCase') import implicit none character(KIND=C_CHAR), intent(INOUT) :: string(*) integer(C_SIZE_T), value :: length end subroutine UpCase end interface interface function RDTSC() bind(C,name='RDTSC1') import implicit none integer(C_INT64_T) RDTSC end function RDTSC end interface contains SUBROUTINE updncase (string, updn) IMPLICIT NONE CHARACTER(LEN=*), INTENT(INOUT) :: string INTEGER, INTENT(IN) :: updn INTEGER :: j, nc CHARACTER(LEN=26), PARAMETER :: lower = 'abcdefghijklmnopqrstuvwxyz' CHARACTER(LEN=26), PARAMETER :: upper = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' SELECT CASE (updn) CASE (1) DO j = 1, LEN(string) nc = INDEX(lower, string(j:j)) IF (nc > 0) string(j:j) = upper(nc:nc) END DO CASE (2) DO j = 1, LEN(string) nc = INDEX(upper, string(j:j)) IF (nc > 0) string(j:j) = lower(nc:nc) END DO END SELECT END SUBROUTINE updncase end module M program P use M use ISO_C_BINDING implicit none character(LEN=:,KIND=C_CHAR), allocatable :: Bible integer(C_INT64_T) t0, t1 integer size ! http://ebible.org/kjv/kjvtxt.zip open(10,file='kjv.txt',status='old',access='stream') inquire(unit=10,size=size) write(*,'(*(g0))') 'File size = ',size allocate(character(LEN=size,KIND=C_CHAR) :: Bible) read(10) Bible close(10) t0 = RDTSC() call UpCase(Bible,len(Bible,C_SIZE_T)) t1 = RDTSC() open(10,file='UpCase.txt',status='replace',access='stream') write(10) Bible close(10) write(*,'(*(g0))') 'Clock cycles for UpCase = ', t1-t0 open(10,file='kjv.txt',status='old',access='stream') read(10) Bible close(10) t0 = RDTSC() call updncase(Bible,1) t1 = RDTSC() open(10,file='updncase.txt',status='replace',access='stream') write(10) Bible close(10) write(*,'(*(g0))') 'Clock cycles for updncase = ', t1-t0 end program P
Also it requires kjv.txt. To get it, go to http://ebible.org/kjv/kjvtxt.zip unzip it and run combine.bat. I didn't post it here because it's 4 MB. Then you can assemble UpCase.asm with FASM and compile test.f90 with ifort. My output:
File size = 4523438 Clock cycles for UpCase = 897107 Clock cycles for updncase = 375973272
The first time is for UpCase and the second for updncase. Windiff.exe found the output files UpCase.txt and updncase.txt to be identical. subroutine UpCase was about 400X faster than subroutine updncase because it could process 32 bytes in the 11 instructions of its main_part: loop. That's the kind of thing I was trying to achieve with my earlier Fortran code, but there's a bit of a problem that Fortran doesn't define overflows as nicely as it could. Still I could have done better in my original Fortran code. Couldn't get it to ICE even when I incorporated it into an initialization expression -- must be a compiler bug :)
EDIT: The program had the file size hardwired incorrectly, so I fixed that by using INQUIRE to get the correct size.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
First of all, THANKS to RO for his interesting posts.
Out of curiosity, I added RO's Fortran Upcase from #9 to his test program of #12, and removed the UPDN=2 code from UPDNCASE. On a laptop with an I5-4200U CPU, I get
Clock cycles for UpCase = 1104393 ( 1.1E6) Clock cycles for ROUpcase = 22040992 ( 22E6) Clock cycles for PCUpcase = 328797223 (329E6)
Some comments:
- Repeating the runs gave slightly different clock counts. Not more than the first three digits are repeatable.
- The run (time for reading the bible three times, converting and writing the uppercase version three times) took 0.2 s. Of this time, 0.05 s was spent on file I/O.
- The timing will vary with the CPU generation and model
- I opened kjv.txt in a text editor, Select+All, Uppercase Selection; this took a fraction of a second. Undo (to revert) took about a second.
- RO's upcase.asm will require a CPU with AVX instructions (Sandy Bridge and newer).
- Based on marketing literature, I tended to think of AVX as more for floating point and image processing. RO has taught us that AVX instructions can be used for text processing.
- RO can probably backport upcase.asm so that it will work on CPUs with SSE2, and thereby improve his standing as a repeat offender.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I really enjoy following this type of thread but back in the real world the conclusion is that all the methods are 'fast' for most practical purposes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It gets more fun when you need to support character sets that don't have a 1:1 mapping between lower and upper case!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
andrew_4619 wrote:
I really enjoy following this type of thread but back in the real world the conclusion is that all the methods are 'fast' for most practical purposes.
Agreed, but it is fun to escape from the real world for a break now and then.
Also, when someone shows us things that we had never imagined were possible, sometimes they can become very useful.
See this Gfortran bug report for something of this sort: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78549 . One of the other posters here and on C.L.F. had come across the problem earlier, but his complaints were being sidelined. Zeroing in on the problem (a tenfold slow-down in formatting real numbers) helped getting the it to be recognized and fixed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Steve Lionel (Ret.) wrote:
It gets more fun when you need to support character sets that don't have a 1:1 mapping between lower and upper case!
And no fun at all when the character set has no concept of case, such as Indian languages ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried a couple of translations of my assembly code into Fortran:
subroutine trial1(string,length) bind(C) use ISO_C_BINDING implicit none character(KIND=C_CHAR), intent(INOUT), target :: string(*) integer(C_SIZE_T), value :: length type(C_PTR) address integer(C_INT8_T), pointer :: data(:) integer(C_INT8_T), parameter :: & bias = int(Z'05',C_INT8_T), & lower = int(Z'65',C_INT8_T), & bit = int(Z'20',C_INT8_T) address = C_LOC(string) call C_F_POINTER(address,data,[length]) where(data+bias > lower) data = iand(not(bit),data) end subroutine trial1 subroutine trial2(string,length) bind(C) use ISO_C_BINDING implicit none character(KIND=C_CHAR), intent(INOUT), target :: string(*) integer(C_SIZE_T), value :: length type(C_PTR) address integer(C_INT8_T), pointer :: data(:) integer(C_INT8_T), parameter :: & bias = int(Z'05',C_INT8_T), & lower = int(Z'65',C_INT8_T), & bit = int(Z'20',C_INT8_T), & truemask = int(Z'FF',C_INT8_T), & falsemask = int(Z'00',C_INT8_T) address = C_LOC(string) call C_F_POINTER(address,data,[length]) data = iand(not(iand(merge(truemask,falsemask, & data+bias > lower),bit)),data) end subroutine trial2
It seemed that /arch:core-avx2 helped the WHERE version out a lot, but not the more literal translation:
D:\>ifort /nologo /arch:host test2.f90 UpCase.obj D:\>test2 File size = 4523438 Clock cycles for UpCase = 875272 Clock cycles for updncase = 393948858 Clock cycles for trial1 = 35775283 Clock cycles for trial2 = 12334499 D:\>ifort /nologo /arch:core-avx2 test2.f90 UpCase.obj D:\>test2 File size = 4523438 Clock cycles for UpCase = 847216 Clock cycles for updncase = 379932665 Clock cycles for trial1 = 9246359 Clock cycles for trial2 = 9139662
Windiff considered UpCase.txt, updncase.txt, trial1.txt, and trial2.txt to be identical.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Repeat Offender wrote:
.. Couldn't get it to ICE even when I incorporated it into an initialization expression -- must be a compiler bug :) ..
Mega kudos to Intel Fortran team on the lack of ICEs thus far!
Is the posted assembler derived from the non-standard C function strupr? I can't use FASM at the moment and the posted assembler code won't work with Microsoft ML64. Perhaps someone will post the tick counts with the strupr C function, here's the code snippet toward it!
module m use, intrinsic :: iso_c_binding, only : c_char, c_ptr, c_loc, c_size_t implicit none private interface function strupr( sptr ) bind(C, name="strupr") result(ups) ! non-standard C function; #include <string.h> ! char *strupr(char *string); import :: c_ptr implicit none ! Argument list type(c_ptr), intent(in), value :: sptr ! function result type(c_ptr) :: ups end function end interface public :: UpperCase contains subroutine UpperCase( string ) ! Argument list character(kind=c_char, len=*), intent(inout), target :: string ! Local variables type(c_ptr) :: add_s add_s = strupr( c_loc(string) ) return end subroutine UpperCase end module m program p use iso_c_binding, only : c_char, c_size_t use iso_fortran_env, only : output_unit use m, only : UpperCase implicit none ! Local variables character(len=*), parameter :: fmtg = "(*(g0))" character(kind=c_char,len=:), allocatable :: Bible integer(c_size_t) :: fsize integer :: lun ! http://ebible.org/kjv/kjvtxt.zip open( newunit=lun, file='kjv.txt', status='old', access='stream' ) inquire( unit=lun, size=fsize ) write( output_unit, fmt=fmtg ) 'File size = ', fsize allocate (character(len=fsize, kind=c_char) :: Bible ) read(unit=lun) Bible close(unit=lun) call UpperCase( Bible ) write( output_unit, fmt=fmtg ) Bible(:42) stop end program p
Upon execution I get:
C:\Temp>ifort /standard-semantics /warn:all p.f90 Intel(R) Visual Fortran Intel(R) 64 Compiler for applications running on Intel(R ) 64, Version 18.0.1.156 Build 20171018 Copyright (C) 1985-2017 Intel Corporation. All rights reserved. Microsoft (R) Incremental Linker Version 14.12.25835.0 Copyright (C) Microsoft Corporation. All rights reserved. -out:p.exe -subsystem:console p.obj C:\Temp>p.exe File size = 4523438 THE FIRST BOOK OF MOSES, CALLED GENESIS C:\Temp>
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried strupr and _strupr_s
interface function strupr_s(str, numberOfElements) bind(C,name='_strupr_s') use ISO_C_BINDING implicit none integer(C_INT) strupr_s character(KIND=C_CHAR), intent(INOUT) :: str(*) integer(C_SIZE_T), value :: numberOfElements end function strupr_s end interface interface function strupr(str) bind(C,name='strupr') use ISO_C_BINDING implicit none integer(C_INT) strupr character(KIND=C_CHAR), intent(INOUT) :: str(*) end function strupr end interface
Results with \arch:core-avx2 were
File size = 4523438 Clock cycles for UpCase = 882187 Clock cycles for updncase = 392973204 Clock cycles for trial1 = 9601161 Clock cycles for trial2 = 9834643 Clock cycles for strupr = 35929257
No output for _strupr_s, which always crashed. BTW, you can probably persuade ml64 to assemble at least RDTSC1; if not I posted a pure Fortran version of RDTSC somewhere.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Repeat Offender wrote:
.. Results with \arch:core-avx2 were
File size = 4523438 Clock cycles for UpCase = 882187
Holy XXX! Remarkable indeed, hats off!
Presumably C, C++ based implementations all do similar pointer-based traversals and end up with similar tick counts, here's a result with C++ STL and Microsoft toolset:
#include <iostream> #include <fstream> #include <sstream> using namespace std; extern "C" int64_t RDTSC1(); int main( ) { std::ifstream ifs ("kjv.txt", std::ifstream::in); stringstream strStream; strStream << ifs.rdbuf(); string Bible = strStream.str(); ifs.close(); int64_t t0 = RDTSC1(); transform(Bible.begin(), Bible.end(), Bible.begin(), ::toupper); int64_t t1 = RDTSC1(); std::cout << "Clock cycles for C++ STL Transform: " << (t1-t0) << "\n"; std::cout << Bible.substr(0,41) << "\n"; return 0; }
C:\Temp>cl /c /EHsc ucase.cpp Microsoft (R) C/C++ Optimizing Compiler Version 19.00.24215.1 for x64 Copyright (C) Microsoft Corporation. All rights reserved. ucase.cpp C:\Temp>link ucase.obj UpCase.obj /subsystem:console /out:p.exe Microsoft (R) Incremental Linker Version 14.00.24215.1 Copyright (C) Microsoft Corporation. All rights reserved. C:\Temp>p.exe Clock cycles for C++ STL Transform: 85559157 THE FIRST BOOK OF MOSES, CALLED GENESIS C:\Temp>
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page