<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Debugging 'Too many communicators'-Error in Intel® MPI Library</title>
    <link>https://community.intel.com/t5/Intel-MPI-Library/Debugging-Too-many-communicators-Error/m-p/1149126#M6050</link>
    <description>&lt;P&gt;I have a large code, that fails with the Error:&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;Fatal error in PMPI_Comm_split: Other MPI error, error stack:
PMPI_Comm_split(532)................: MPI_Comm_split(comm=0xc4027cf0, color=0, key=0, new_comm=0x7ffdb50f2bd0) failed
PMPI_Comm_split(508)................: fail failed
MPIR_Comm_split_impl(260)...........: fail failed
MPIR_Get_contextid_sparse_group(676): Too many communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Comm_split: Other MPI error, error stack:
PMPI_Comm_split(532)................: MPI_Comm_split(comm=0xc401bcf1, color=1, key=0, new_comm=0x7ffed5aa4fd0) failed
PMPI_Comm_split(508)................: fail failed
MPIR_Comm_split_impl(260)...........: fail failed
MPIR_Get_contextid_sparse_group(676): Too many communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Comm_split: Other MPI error, error stack:
PMPI_Comm_split(532)................: MPI_Comm_split(comm=0xc4027ce9, color=0, key=0, new_comm=0x7ffe37e477d0) failed
PMPI_Comm_split(508)................: fail failed
MPIR_Comm_split_impl(260)...........: fail failed
MPIR_Get_contextid_sparse_group(676): Too many communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Comm_split: Other MPI error, error stack:
PMPI_Comm_split(532)................: MPI_Comm_split(comm=0xc401bcf1, color=1, key=0, new_comm=0x7ffd511ac4d0) failed
PMPI_Comm_split(508)................: fail failed
MPIR_Comm_split_impl(260)...........: fail failed
MPIR_Get_contextid_sparse_group(676): Too many communicators (0/16384 free on this process; ignore_id=0)
&lt;/PRE&gt;

&lt;P&gt;I and would like to debug it. I can reproduce this error in totalview.&lt;/P&gt;

&lt;P&gt;My first idea is to the stacktrace at the point of the Error. It I set a breakpoint to the call of "Get_contextid_sparse_group" or "Comm_split_impl", the error occurs before the breakpoint and totalview just closes.&lt;/P&gt;

&lt;P&gt;If I set it to "Comm_split" i have so many breakpoint, that I can't find the correct one. How can I set a breakpoint in IntelMPI's errorhandeling routine. Some routine must print this "Too many communicators" error-message. Can I set my break-point there?&lt;/P&gt;

&lt;P&gt;My second idea is to monitor the number of communicators somehow. The line&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;Too many communicators (0/16384 free on this process; ignore_id=0)&lt;/PRE&gt;

&lt;P&gt;indicates, that MPI knows how many communicators are free at any given time. How can I, as a developer, monitor this number? Is there a function&amp;nbsp; I call returning the number of current communicators?&lt;/P&gt;

&lt;P&gt;I am open for other ideas on how to track down this "communicator leak"&lt;/P&gt;</description>
    <pubDate>Fri, 26 Oct 2018 14:54:32 GMT</pubDate>
    <dc:creator>Redies__Matthias</dc:creator>
    <dc:date>2018-10-26T14:54:32Z</dc:date>
    <item>
      <title>Debugging 'Too many communicators'-Error</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Debugging-Too-many-communicators-Error/m-p/1149126#M6050</link>
      <description>&lt;P&gt;I have a large code, that fails with the Error:&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;Fatal error in PMPI_Comm_split: Other MPI error, error stack:
PMPI_Comm_split(532)................: MPI_Comm_split(comm=0xc4027cf0, color=0, key=0, new_comm=0x7ffdb50f2bd0) failed
PMPI_Comm_split(508)................: fail failed
MPIR_Comm_split_impl(260)...........: fail failed
MPIR_Get_contextid_sparse_group(676): Too many communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Comm_split: Other MPI error, error stack:
PMPI_Comm_split(532)................: MPI_Comm_split(comm=0xc401bcf1, color=1, key=0, new_comm=0x7ffed5aa4fd0) failed
PMPI_Comm_split(508)................: fail failed
MPIR_Comm_split_impl(260)...........: fail failed
MPIR_Get_contextid_sparse_group(676): Too many communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Comm_split: Other MPI error, error stack:
PMPI_Comm_split(532)................: MPI_Comm_split(comm=0xc4027ce9, color=0, key=0, new_comm=0x7ffe37e477d0) failed
PMPI_Comm_split(508)................: fail failed
MPIR_Comm_split_impl(260)...........: fail failed
MPIR_Get_contextid_sparse_group(676): Too many communicators (0/16384 free on this process; ignore_id=0)
Fatal error in PMPI_Comm_split: Other MPI error, error stack:
PMPI_Comm_split(532)................: MPI_Comm_split(comm=0xc401bcf1, color=1, key=0, new_comm=0x7ffd511ac4d0) failed
PMPI_Comm_split(508)................: fail failed
MPIR_Comm_split_impl(260)...........: fail failed
MPIR_Get_contextid_sparse_group(676): Too many communicators (0/16384 free on this process; ignore_id=0)
&lt;/PRE&gt;

&lt;P&gt;I and would like to debug it. I can reproduce this error in totalview.&lt;/P&gt;

&lt;P&gt;My first idea is to the stacktrace at the point of the Error. It I set a breakpoint to the call of "Get_contextid_sparse_group" or "Comm_split_impl", the error occurs before the breakpoint and totalview just closes.&lt;/P&gt;

&lt;P&gt;If I set it to "Comm_split" i have so many breakpoint, that I can't find the correct one. How can I set a breakpoint in IntelMPI's errorhandeling routine. Some routine must print this "Too many communicators" error-message. Can I set my break-point there?&lt;/P&gt;

&lt;P&gt;My second idea is to monitor the number of communicators somehow. The line&lt;/P&gt;

&lt;PRE class="brush:bash; class-name:dark;"&gt;Too many communicators (0/16384 free on this process; ignore_id=0)&lt;/PRE&gt;

&lt;P&gt;indicates, that MPI knows how many communicators are free at any given time. How can I, as a developer, monitor this number? Is there a function&amp;nbsp; I call returning the number of current communicators?&lt;/P&gt;

&lt;P&gt;I am open for other ideas on how to track down this "communicator leak"&lt;/P&gt;</description>
      <pubDate>Fri, 26 Oct 2018 14:54:32 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Debugging-Too-many-communicators-Error/m-p/1149126#M6050</guid>
      <dc:creator>Redies__Matthias</dc:creator>
      <dc:date>2018-10-26T14:54:32Z</dc:date>
    </item>
    <item>
      <title>I got an answer on</title>
      <link>https://community.intel.com/t5/Intel-MPI-Library/Debugging-Too-many-communicators-Error/m-p/1149127#M6051</link>
      <description>&lt;P&gt;I got &lt;A href="https://stackoverflow.com/questions/53043835/get-number-of-mpi-communicators-in-use"&gt;an answer on Stackoverflow&lt;/A&gt;, which overloads the MPI_Comm_split and MPI_Comm_free functions and then includes a counter. By reading out this counter, I can determine the number of Communicators currently in use.&lt;/P&gt;</description>
      <pubDate>Wed, 31 Oct 2018 13:02:01 GMT</pubDate>
      <guid>https://community.intel.com/t5/Intel-MPI-Library/Debugging-Too-many-communicators-Error/m-p/1149127#M6051</guid>
      <dc:creator>Redies__Matthias</dc:creator>
      <dc:date>2018-10-31T13:02:01Z</dc:date>
    </item>
  </channel>
</rss>

