Severe regressions in ifort17 and 18

Juergen_R_R · ‎04-21-2017

What happened to this once so great compiler? Since v17 and now in 18beta so many things are broken? How could that happen? I will try to come up with a list of issues, but I'm really frustrated. The full test case is here: http://www.hepforge.org/archive/whizard/whizard-2.4.1.tar.gz

Note that you need OCaml besides the ifort compiler. Just do configure, make, and then make check. 4 of 114 unit tests fail, and 127 of 224 functional tests. I'm not yet sure whether these are the same issue(s) that I already reported for the 17.0.X version of ifort. I will try to reduce this.

Kevin_D_Intel · ‎04-21-2017

I’m sorry to hear about the continued problems. It is best if you will please report these via the Online Service Center, http://www.intel.com/supporttickets. It allows better/easier private communication and if there are multiple unique underlying issues it enables better individual tracking of their resolution. Thank you.

Juergen_R_R · ‎04-21-2017

I believe that this things are the same severe problems that hamper v17. So sad that it always takes ages until they are fixed.

Steve_Lionel · ‎04-21-2017

Some bugs are fixed promptly, others take more time. Your experience seems to be somewhat unusual, based on reports I have seen elsewhere. If you can honor Kevin;s request and provide, as best as possible, minimal test cases, that will help speed the fixes. I know from my time doing support that when presented with a large, unknown program and vague symptoms, analysis takes longer and has lower priority. That you are saying some third-party program reports "failures" doesn't necessarily mean the compiler is at fault. Often times it is the test itself that is wrong or that it makes inappropriate assumptions.

Juergen_R_R · ‎04-24-2017

I perfectly understand this. But I already filed several bug reports with smaller test cases. That was based on v17 update1 or 2 I believe. We also have only limited time, and we also give priority to compilers where bugs are fixed fast(er).

Juergen_R_R · ‎04-24-2017

By the way, the support platform doesn't work: I cannot submit because I cannot tick a platform. Great service, thanks.

Juergen_R_R · ‎04-24-2017

The support mask doesn't accept ANY targeted architecture ... wanted to click Intel-64 but doesn't accept any of those. WHY?

Juergen_R_R · ‎04-24-2017

So uploading the case here:

The following code segfaults with ifort v18beta but compiles and runs fine with ifort16, nagfor v6 and gfortran v4.8, 4.9, 5.X, 6.X, 7.X and 8.X: $ ./whizard_test
*** Error in `./whizard_test': double free or corruption (out): 0x0000000000b03100 ***
Aborted (core dumped)

Kevin_D_Intel · ‎04-24-2017

I will inquire about the issues with the support system. I don't know what the issues are with that. Thank you for submitting the test case here. I will investigate it shortly.

Juergen_R_R · ‎04-24-2017

Now I managed. The drop down menues dont show if you already selected anything for the required fields or not.

jimdempseyatthecove · ‎04-25-2017

You might want to check (fix) your pointer assignments. You have:

if (associated (prt%mass_val)) prt%mass_val = mass

when you should have:

if (associated (prt%mass_val)) prt%mass_val => mass

Similar issues elsewhere.

Jim Dempsey

Juergen_R_R · ‎04-25-2017

Thanks, Jim, for the remark. The corresponding routines are not used in the example shown here. The routines with prt%mass_val = mass actually are intended really for cases where the pointer is associated and only its value should be changed, while for other functions (erased in the small code snippet) the whole pointer is transferred. Our code including the complete test suite passes gfortran -fcheck=all and nagfor -C=all.

jimdempseyatthecove · ‎04-25-2017

>> ..cases where the pointer is associated and only its value should be changed...

Thanks for clarifying this.

Other than that, I did not notice any coding error. There is one section that, though correct in syntax, may be where the compiler bug is exposed..

subroutine resonance_history_add_resonance (res_hist, resonance)
  ...
  type(resonance_info_t), dimension(:), allocatable :: tmp
  ...
  allocate (tmp (n + n_max_resonances))
  tmp(1:n) = res_hist%resonances(1:n)
  call move_alloc (from=tmp, to=res_hist%resonances)
  ...

The type resonance_info_t contains allocatable entities. In past versions of Fortran, sequences like that above exposed errors. I suspect that tmp gets deallocated twice: Once from the move_alloc, and a second time when the subroutine returns (auto deallocation of local allocatables). This is just a hypothesis.

Jim Dempsey

jimdempseyatthecove · ‎04-25-2017

You could test this with:

subroutine resonance_history_add_resonance (res_hist, resonance)
  ...
  type(resonance_info_t), dimension(:), allocatable :: tmp
  ...
  allocate (tmp (n + n_max_resonances))
  tmp(1:n) = res_hist%resonances(1:n)
! call move_alloc (from=tmp, to=res_hist%resonances)
  res_hist%resonances = tmp
  delete(tmp)
  ...

Or simply let tmp auto-deallocate upon return.

Jim Dempsey

Juergen_R_R · ‎04-25-2017

No, the move_alloc doesn't seem to be the problem. The problem arises in the subroutine resonance_history_remove_resonance, when assigning res_hist%resonances (i - 1) = res_hist%resonances (i).

Kevin_D_Intel · ‎04-25-2017

I reproduced the failure w/18.0 compiler and noted that the test case runs successfully w/17.0. I submitted it to Development for their analysis.

(Internal tracking id: CMPLRS-42619)

Juergen_R_R · ‎04-25-2017

Thanks, Kevin! We discussed this internally, and we believe that an explicit assignment for the derived type resonance_info_t could maybe solve/circumvent the issue. We'll let you know.

One more thing: when using the workaround for the error for the internal tracking #02767138 makes 1 of 4 failing unit tests work again, and instead of 127 failing functional tests, only 55 functional test now fail. We will further investigate.

FortranFan · ‎04-25-2017

Juergen R. wrote:

.. We discussed this internally, and we believe that an explicit assignment for the derived type resonance_info_t could maybe solve/circumvent the issue. We'll let you know. ..

I would consider this a blessing in disguise and review all the derived types involved here, starting from the very top, field_data_t. I would then examine closely if components with the POINTER attribute (e.g., mass_val and width_val in field_data_t) are truly necessary or if all such components can now be given the ALLOCATABLE attribute. If POINTERs are stil required, I would include defined assignments in all the derived types where such components are included while paying close attention to deep vs shallow copy of such components. In addition, I would apply FINAL bindings (finalizers) to all such types.

Juergen_R_R · ‎05-03-2017

OK, there are now workarounds for most things. There is, however, one thing which I cannot isolate. The basically insane message

forrtl: severe (153): allocatable array or pointer is not allocated

when hitting the line

if (allocated (prt%child)) deallocate (prt%child)

My attempts to construct a small test case failed. This definitely means that we cannot the Intel compiler any more.

The case can be found in the shower unit test and 10 functional tests: mlm_matching_isr, mlm_matching_fsr, parton_shower_1, parton_shower_2, pythia6_1, pythia6_2, pythia_6_3, pythia6_4 ....

Juergen_R_R · ‎05-03-2017

So what shall I do?

Kevin_D_Intel · ‎05-03-2017

Is this with the 18.0 compiler or 17.0 compiler? If the latter, I don't know if there will be any resolution w/PSXE 2017 Update 4 coming very soon as was the case with one of your reports.