Ticket #363 (closed bug: fixed)
Re: MPI_IN_PLACE bug in Allgatherv in MPE's collchk
| Reported by: | kakollu@… | Owned by: | chan |
|---|---|---|---|
| Priority: | blocker | Milestone: | mpich2-1.1.1 |
| Component: | mpich2 | Keywords: | |
| Cc: |
Description (last modified by chan) (diff)
----- "Satyanarayana Kakollu" <kakollu@gmail.com> wrote: > Hi Anthony, > Is it safe to use MPI_ALLGATHERV with MPI_IN_PLACE in fortran? > > Should we just use the recv buffer as send buffer instead of > MPI_IN_PLACE? > > Thanks, > Satya > > > > On Tue, Jan 6, 2009 at 4:45 PM, Anthony Chan <chan@mcs.anl.gov> > wrote: > > > > > Hi Satyanarayana, > > > > The support of MPI_IN_PLACE for Allgatherv in CollChk library > > is definitely in 1.0.6p1. My simple test program didn't reveal > > any problem. If your program is small, could you send it to > > me so I can check if the collchk library contains any bug ? > > > > Thanks, > > A.Chan > > > > ----- "Anthony Chan" <chan@mcs.anl.gov> wrote: > > > > > ----- "Rajeev Thakur" <thakur@mcs.anl.gov> wrote: > > > > > > > That might be a bug in the collchk library. If sendbuf is > > > MPI_IN_PLACE > > > > in > > > > Allgatherv, the sendcount argument should be ignored. > > > > > > > > Rajeev > > > > > > > > > > > > > > > > _____ > > > > > > > > From: Satyanarayana Kakollu [mailto:kakollu@gmail.com] > > > > Sent: Friday, December 19, 2008 9:53 AM > > > > To: Anthony Chan > > > > Cc: Rajeev Thakur > > > > Subject: Re: Trouble with MPI_BCAST > > > > > > > > > > > > Thank you Rajeev and Anthony, > > > > > > > > -mpe=mpicheck give the following message at an MPI_ALL_GATHERV > call > > > > in our > > > > code. > > > > > > > > ALLGATHERV (Rank 0) --> Inconsistent datatype signatures > detected > > > > between > > > > local rank 0 > > > > > > > > I am using the MPI_IN_PLACE option with send count set as '0', > can > > > > this be > > > > the problem ? > > > > > > > > Satya > > > > > > > > On Wed, Dec 17, 2008 at 10:02 PM, Anthony Chan > <chan@mcs.anl.gov> > > > > wrote: > > > > > > > > > > > > > > > > Or use "mpicc -mpe=mpicheck" or "mpif90 -mpe=mpicheck" as > linker. > > > > > > > > A.Chan > > > > > > > > > > > > ----- "Rajeev Thakur" <thakur@mcs.anl.gov> wrote: > > > > > > > > > Satya, > > > > > Try linking with -lmpe_collchk. It will run MPE's > > > > > collective call > > > > > checker to see if there is any discrepancy in the parameters > > > passed > > > > > to > > > > > MPI_Bcast. If that doesn't show any errors, try running a > simple > > > > test > > > > > program that contains only the broadcast. > > > > > > > > > > Rajeev > > > > > > > > > > > > > > > > > > > > _____ > > > > > > > > > > From: Satyanarayana Kakollu [mailto:kakollu@gmail.com] > > > > > Sent: Tuesday, December 16, 2008 5:31 PM > > > > > To: Rajeev Thakur > > > > > Subject: Trouble with MPI_BCAST > > > > > > > > > > > > > > > Rajeev, > > > > > > > > > > We are seeing that our code is getting stuck at MPI_BCAST on > a > > > > > customer > > > > > machine. The call simple, all ranks use same size buffer and > > > count, > > > > > we > > > > > verified that the root is same on all ranks. > > > > > > > > > > The code works on our clusters, but not on the user's > machine. > > > Here > > > > > are the > > > > > differences between our clusters and the user's machine. > > > > > > > > > > > > > > > Our clusters User's machine > > > > > > > > > > Multi-proc nodes Single SMP node with 8 > cores on > > > > > two > > > > > sockets. > > > > > CentOS 4, RHEL 4 RHEL 5 client version > > > > > mpich2 1.0.6p1 mpich2 1.0.6p1 (same) > > > > > > > > > > We were using gdb to localize the bug to MPI_BCAST two of the > 8 > > > > ranks > > > > > do not > > > > > get past the BCAST. If we replace the BCAST with PT2PT > > > > communication > > > > > it is > > > > > running well for 1000s of iterations. > > > > > > > > > > We linked our applications statically, on the RHEL 4 machine. > > > > > > > > > > Can you share your first thoughts about the issue. > > > > > > > > > > Thanks, > > > > > Satya > >
Attachments
Change History
Note: See
TracTickets for help on using
tickets.
