Ticket #180 (closed docs: fixed)

Opened 14 months ago

Last modified 13 months ago

MPID_Comm->remote_size vs MPID_Comm->local_size

Reported by: goodell Owned by: goodell
Priority: minor Milestone: mpich2-1.1a2
Component: mpich2 Keywords:
Cc:

Description

The values for these two variables are clearly defined in the case that the communicator is an MPID_INTERCOMM, but unclear in the case of an MPID_INTRACOMM. The collective utility routines such as MPIR_Bcast all use local_size as the size of the communicator. However it looks like some places use remote_size instead. It also looks like most code paths that create intracommunicators set the remote_size=local_size, so that the inconsistent usage doesn't matter.

I recently got burned by this when I failed to set the local_size in a routine that creates communicators. I suspect that the right thing to do is to set local_size=remote_size and get on with things. We should figure this out and make sure it's clear in the code and comments.

-Dave

Attachments

Change History

follow-up: ↓ 2   Changed 14 months ago by William Gropp

Most routines should use the "remote size" as the size of the group of the possible target processes. In most cases, the "local_size" should only be used in intercommunicator operations.

Certainly for safety, intracomms should set local_size to remote_size . However, this should rarely matter; in which routine was local_size used for an intracomm? It may be that that routine also has a bug and should be using remote_size.

Bill

On Oct 1, 2008, at 4:37 PM, mpich2 wrote:

--------------------- +------------------------------------------------------ Reporter: goodell | Owner: Type: docs | Status: new Priority: minor | Milestone: Component: mpich2 | Keywords: --------------------- +------------------------------------------------------ The values for these two variables are clearly defined in the case that the communicator is an MPID_INTERCOMM, but unclear in the case of an MPID_INTRACOMM. The collective utility routines such as MPIR_Bcast all use local_size as the size of the communicator. However it looks like some places use remote_size instead. It also looks like most code paths that create intracommunicators set the remote_size=local_size, so that the inconsistent usage doesn't matter. I recently got burned by this when I failed to set the local_size in a routine that creates communicators. I suspect that the right thing to do is to set local_size=remote_size and get on with things. We should figure this out and make sure it's clear in the code and comments. -Dave -- Ticket URL: <https://trac.mcs.anl.gov/projects/mpich2/ticket/180>

William Gropp Paul and Cynthia Saylor Professor of Computer Science University of Illinois Urbana-Champaign

in reply to: ↑ 1   Changed 14 months ago by goodell

Replying to William Gropp:

Most routines should use the "remote size" as the size of the group of the possible target processes. In most cases, the "local_size" should only be used in intercommunicator operations. Certainly for safety, intracomms should set local_size to remote_size . However, this should rarely matter; in which routine was local_size used for an intracomm? It may be that that routine also has a bug and should be using remote_size.

Here are some examples of uses of local_size:

Basically all of the MPIR_<collective> routines do this. I suspect that the only reason that everything works is that

(comm->remote_size == comm->local_size)

is always true for intracommunicators except in my (now fixed) buggy code.

-Dave

  Changed 14 months ago by Rajeev Thakur

I did them that way because MPI_Comm_size itself does

*size = comm_ptr->local_size;

And the comment in the MPID_Comm structure is

int remote_size; /* Value of MPI_Comm_(remote)_size */ int local_size; /* Value of MPI_Comm_size for local group */

Rajeev

-----Original Message----- From: owner-mpich2-bugs@… owner-mpich2-bugs@… On Behalf Of mpich2 Sent: Thursday, October 02, 2008 11:17 AM To: undisclosed-recipients: Subject: Re: [mpich2-maint] #180: MPID_Comm->remote_size vs MPID_Comm->local_size ---------------------+---------------------------------------- -------------- Reporter: goodell | Owner: Type: docs | Status: new Priority: minor | Milestone: Component: mpich2 | Resolution: Keywords: | ---------------------+---------------------------------------- -------------- Comment (by goodell): Replying to William Gropp:

Most routines should use the "remote size" as the size of

the group of

the possible target processes. In most cases, the

"local_size" should

only be used in intercommunicator operations. Certainly for safety, intracomms should set local_size to remote_size . However, this should rarely matter; in which routine was local_size used for an intracomm? It may be that that routine also has a bug and should be using remote_size.

Here are some examples of uses of local_size: * https://trac.mcs.anl.gov/projects/mpich2/browser/mpich2/trunk/ src/mpi/coll/bcast.c#L98 * https://trac.mcs.anl.gov/projects/mpich2/browser/mpich2/trunk/ src/mpi/coll/allgather.c#L103 * https://trac.mcs.anl.gov/projects/mpich2/browser/mpich2/trunk/ src/mpi/coll/allreduce.c#L160 Basically all of the MPIR_<collective> routines do this. I suspect that the only reason that everything works is that {{{ (comm->remote_size == comm->local_size) }}} is always true for intracommunicators except in my (now fixed) buggy code. -Dave -- Ticket URL: <https://trac.mcs.anl.gov/projects/mpich2/ticket/180#comment:2>

  Changed 13 months ago by balaji

  • owner set to goodell
  • milestone set to mpich2-1.1a2

Needs a wiki entry.

  Changed 13 months ago by goodell

  • status changed from new to closed
  • resolution set to fixed

Fixed by r3478. A clarifying comment was added to the MPID_Comm structure.

WARNING! You need to either login using OpenID here or enter your email address here before you can create or edit tickets. Otherwise the ticket will get treated as spam. More information on creating tickets can be found here.

Don't forget to add your email address to the cc list to make sure that you get updated of the ticket status.

Add/Change #180 (MPID_Comm->remote_size vs MPID_Comm->local_size)

Author



Change Properties
Action
as closed
Next status will be 'reopened'
 
Note: See TracTickets for help on using tickets.