I find our code is spending a lot of time assuring strings are lower case. I hoped MAXVAL would scan a character string to see if it was all lowercase already but MAXVAL only works on character arrays, not strings. If there is a faster way to convert strings to all lowercase or check to see if it has any uppercase characters already, that would be really helpful.
thanks
scott
連結已複製
Assuming ASCII internal representation and latin characters, it's possible to convert the string to an array of integers and perform the conversion efficiently on the array. Alternatively you could have used TRANSFER to convert the string to an array and back so as to convert individual characters.
module M implicit none contains function tolower(string) use ISO_FORTRAN_ENV character(*) string character(len(string)) tolower integer(INT8) temp(len(string)) temp = transfer(string,temp) tolower = transfer(merge(IOR(temp,32_INT8), temp, & 65_INT8 <= temp .AND. temp <= 90_INT8), tolower) end function tolower end module M program P use M implicit none character(50) mess mess = 'lEavE $10,000 By tHe Oak trEe On The hIll' write(*,'(a)') tolower(mess) end program P
Output with ifort:
leave $10,000 by the oak tree on the hill
Scott, you did not state what you wish to do with the string if it contains upper case characters. The following comments assume that converting to all lower case will suffice.
The IPP library (included in Parallel Studio with the C compiler) contains a number of string processing subroutines that you can use. For example,
program P implicit none !DIR$ ALIAS IPPLcase,'ippsLowercaseLatin_8u_I' integer i character(40) string string = 'yOU WiLl `FInD'' iT @ThE [13{ oaK tRee.' write(*,'(a)') trim(string) call IPPLcase(string) write(*,'(a)') trim(string) end program P
Compile and link using the command
ifort lcaseI.f90 ippch.lib
Thanks for the suggestions.
My original intent was to quickly determine if there are any uppercase characters in the string to avoid doing the conversion if it was not needed or had already been done. thought MAXVAL could tell me that. I think the organization of this code means the lowercase() routine gets called multiple times at many levels of the hierarchy to assure case independent string comparisons.
Based on vtune running in visual studio 2015/ifort 17.1 on a test case, of 3300 total seconds, for_cpstr takes up 600 seconds, lowcase() subroutine 17 seconds, so seems like room for improvement.
I tried the code from RepeatOffender (Thanks) and the total time reduced (system load???) but time for the revised lowcase subroutine with the logic above (RepeatOffender ) increased, probably because the string length is >> the len_trim length. So I am trying what I hope will avoid the full string length:
nlen = len_trim(string)
temp(1:nlen) = transfer(string(1:nlen),temp(1:nlen))
string = transfer(merge(IOR(temp(1:nlen),32_INT8), temp(1:nlen), 65_INT8 <= temp .AND. temp <= 90_INT8), string(1:nlen))
I do not have a range spec (1:nlen) on the temp in the logical part of the expression. 65_INT8 <= temp .AND. temp <= 90_INT8),
I will also try mecej4 IPP library.
Is there any intel C documentation accessible? I ask for this but also I heavily use ifort OPEN(BUFFERED...) and if there is a way (or a need) to do this for C++ code (ifort ascii write defaults to flush after every write, correct?).
Thanks.
Here is the documentation for the string functions in IPP: https://software.intel.com/en-us/ipp-dev-reference-string-manipulation . There are several items there that you could use.
If, after detecting uppercase characters, you will convert to lowercase, it may be faster to "assume the worst" and convert every candidate string to lowercase without trying to detect if the conversion is needed. In other words, "Shift first and ask questions later".
See RO's contributions of AVX assembly routines to uppercase strings in https://software.intel.com/en-us/forums/intel-visual-fortran-compiler-for-windows/topic/757222#comment-1918919 . As he points out, any temporary variables such as your temp should be avoided since they have a high cost in memory accesses.
I don't understand your question about buffering. If you call IPP routines to do string operations, there is no file I/O involved. Nevertheless, standard C routines setbuf() and setvbuf() are available,if you find them relevant.
Hello
Nothing is magical.
Scanning a string to find a character implies to loop into the string until you find what you want.
MAXVAL must do the same loop so it will waste time if you use it before making the conversion.
So the simplest will be the best especially for the readability of the code.
Something like that will do the job
function to_upper(string) ... to_upper=string do i=1,len(to_upper) if( to_upper .... ) to_upper=... enddo end function
I am all for simplicity, but sometimes the simple version can be awfully slow. The O.P. has stated that these conversions are consuming quite a bit of time. He could certainly try to rearrange his code to reduce the number of case conversions/inspections, but that may not be enough.
The simple table lookup case-shift subroutine given in https://software.intel.com/en-us/forums/intel-visual-fortran-compiler-for-windows/topic/757222#comment-1918644 converted the KJV Bible text (4.5 MB) with a throughput of 0.0082 bytes/clock, or about 120 clock ticks per character. An SSE case conversion routine (http://www.alfredklomp.com/programming/sse-strings/) achieved 0.8 bytes/clock (both times on my 12 year old PC with an AMD Athlon 4200+ running W10). Repeat Offender's AVX routine achieved about 5 bytes/clock (on a different PC, of course).
Thus, the more complex SSE/AVX version can give a speed-up of 100 to 400 times. Part of the reason for the speed up is that the vectorized SSE2 instructions process 16 characters at a time (32 at a time with AVX).
Scott,
Your original post did not state if the low-case operation was to be performed in place. Assuming it is, and if the program is to have a long life time, then the newer AVX512 instruction set can be utilized via C interoperability or through use of an AVX512 optimized in situ low-case-er.
The AVX512 instruction set has test that produce a bit mask of true which can be subsequently pass into a general purpose register (i.e. uint64_t variable). This can be used to test then conditionally write only the vectors that require updating, assuming you have significant runs of lower case sub-strings that do not require updating.
Jim Dempsey
If each string is assignated once but used many times it will be smarter to do the conversion just after assignation and then presume it's uppercase. It's the most efficient way do solve the problem.