Intel® Fortran Compiler
Build applications that can scale for the future with optimized code designed for Intel® Xeon® and compatible processors.

Inefficient code for simple string comparison

mriedman
Novice
944 Views

The following reproducer exposes an inefficiency with string comparisons. The compiler generates a call to for_cpstr() that consumes substantial compute time for a case I have. Tried with versions 11.1 and 13.1 and get the same assembly code. What is that call good for ? The compiler should be able to do much better with such a simple piece of code.

subroutine scmp(a,b,l, flag)

  character(*), intent(in) :: a
  character(*), intent(in) :: b
  integer, intent(in)  :: l
  logical, intent(out) :: flag

  if (a(1:l) == b(1:l)) then
    flag = .true.
  else
    flag = .false.
  end if

end subroutine scmp

0 Kudos
7 Replies
Heinz_B_Intel
Employee
944 Views

for_cpstr()  compares two Fortran CHARACTER objects dealing with "strings" of different length and the different relational operators (EQ here). Is this the call itself you complain about (the overhead of the call) or the efficiency of the code implementing for_cpstr() ? I don't see anything,  the compiler could do better.

0 Kudos
mriedman
Novice
944 Views

O.k. I remember that the world of multi-byte-characters and multi-language strings can be rather complex. (Fortran programmers don't do such stuff.)

In my case the string comparison deals with keywords of less than a dozen plain ASCII characters, so the subroutine call overhead gets significant. It seems like for_cpstr() then calls intel_memcmp() so the overhead is even worse. One workaround I found is to start comparing just the first character and immediately return in case of inequality. That single character operation does not invoke for_cpstr().

Specific code generation for equality / inequality would be desired here. I think it could be done with less than 10 assembly instructions.

0 Kudos
Steven_L_Intel1
Employee
944 Views

I think the suggestion is that the compiler special-case comparison of short strings of known same constant length and just do integer compares. That seems like a worthwhile idea to me, as such comparisons are frequently found in Fortran code.

0 Kudos
Heinz_B_Intel
Employee
944 Views

yes - in general it might be helpful to have special implementations of short strings. But here the length is not a compile-time constant and intel_memcmp() does the optimizations like returning as soon as the first character pair is different, using 4-byte/8-byte integer compares and even using the vector-extensions.

the only optimization I see here would be a special version for strings of equal lenght ( saves a very, very few cycles) or in-lining  - but in-lining is not free either ( code size increase, additional register pressure).

It would be interesting to see benchmark data   on how many cycles could be gained really.

0 Kudos
mriedman
Novice
944 Views

See the Oprofile data to show the relevance of this issue in the complete program context. The workaround I mentioned cuts 5.7% of total runtime.

Before workaround
samples  %        symbol name

522549   20.2889  m_tdt512_mp_psi_sweep_cycl_sf_
264716   10.2781  afpktr_
129155    5.0147  __svml_floorf4.R
 . . .
121155    4.7041  for_cpstr
78103     3.0325  _intel_fast_memcmp


After workaround
samples  %        symbol name

519767   25.7614  m_tdt512_mp_psi_sweep_cycl_sf_
162412    8.0497  afpktr_
128690    6.3783  __svml_floorf4.R
 . . .
21856     1.2935  for_cpstr
 . . .
11758     0.6959  _intel_fast_memcmp

0 Kudos
Steven_L_Intel1
Employee
944 Views

Oh, my eyes were not good enough to see that the end position was a variable and not a constant. I agree with Heinz that an optimization here is more difficult.

0 Kudos
jimdempseyatthecove
Honored Contributor III
944 Views

In the above case, the segment (1:l) is the same, in this situation (lengths equal and == operator) you could treat the "arrays" as integer(sizeof character), dimension(l). I do not think the optimization check would be hard to perform and impliment. N.B. this may introduce issues with respect to "l" indexing out of bounds.

Jim Dempsey

0 Kudos
Reply