id	summary	reporter	owner	description	type	status	priority	milestone	component	resolution	keywords	cc
363	Re: MPI_IN_PLACE bug in Allgatherv in MPE's collchk	kakollu@…	chan	"
{{{


----- ""Satyanarayana Kakollu"" <kakollu@gmail.com> wrote:

> Hi Anthony,
> Is it safe to use MPI_ALLGATHERV with MPI_IN_PLACE in fortran?
>
> Should we just use the recv buffer as send buffer instead of
> MPI_IN_PLACE?
>
> Thanks,
> Satya
>
>
>
> On Tue, Jan 6, 2009 at 4:45 PM, Anthony Chan <chan@mcs.anl.gov>
> wrote:
>
> >
> > Hi Satyanarayana,
> >
> > The support of MPI_IN_PLACE for Allgatherv in CollChk library
> > is definitely in 1.0.6p1.  My simple test program didn't reveal
> > any problem.  If your program is small, could you send it to
> > me so I can check if the collchk library contains any bug ?
> >
> > Thanks,
> > A.Chan
> >
> > ----- ""Anthony Chan"" <chan@mcs.anl.gov> wrote:
> >
> > > ----- ""Rajeev Thakur"" <thakur@mcs.anl.gov> wrote:
> > >
> > > > That might be a bug in the collchk library. If sendbuf is
> > > MPI_IN_PLACE
> > > > in
> > > > Allgatherv, the sendcount argument should be ignored.
> > > >
> > > > Rajeev
> > > >
> > > >
> > > >
> > > >   _____
> > > >
> > > > From: Satyanarayana Kakollu [mailto:kakollu@gmail.com]
> > > > Sent: Friday, December 19, 2008 9:53 AM
> > > > To: Anthony Chan
> > > > Cc: Rajeev Thakur
> > > > Subject: Re: Trouble with MPI_BCAST
> > > >
> > > >
> > > > Thank you Rajeev and Anthony,
> > > >
> > > > -mpe=mpicheck give the following message at an MPI_ALL_GATHERV
> call
> > > > in our
> > > > code.
> > > >
> > > > ALLGATHERV (Rank 0) --> Inconsistent datatype signatures
> detected
> > > > between
> > > > local rank 0
> > > >
> > > > I am using the MPI_IN_PLACE option with send count set as '0',
> can
> > > > this be
> > > > the problem ?
> > > >
> > > > Satya
> > > >
> > > > On Wed, Dec 17, 2008 at 10:02 PM, Anthony Chan
> <chan@mcs.anl.gov>
> > > > wrote:
> > > >
> > > >
> > > >
> > > > Or use ""mpicc -mpe=mpicheck"" or ""mpif90 -mpe=mpicheck"" as
> linker.
> > > >
> > > > A.Chan
> > > >
> > > >
> > > > ----- ""Rajeev Thakur"" <thakur@mcs.anl.gov> wrote:
> > > >
> > > > > Satya,
> > > > >            Try linking with -lmpe_collchk. It will run MPE's
> > > > > collective call
> > > > > checker to see if there is any discrepancy in the parameters
> > > passed
> > > > > to
> > > > > MPI_Bcast. If that doesn't show any errors, try running a
> simple
> > > > test
> > > > > program that contains only the broadcast.
> > > > >
> > > > > Rajeev
> > > > >
> > > > >
> > > > >
> > > > >   _____
> > > > >
> > > > > From: Satyanarayana Kakollu [mailto:kakollu@gmail.com]
> > > > > Sent: Tuesday, December 16, 2008 5:31 PM
> > > > > To: Rajeev Thakur
> > > > > Subject: Trouble with MPI_BCAST
> > > > >
> > > > >
> > > > > Rajeev,
> > > > >
> > > > > We are seeing that our code is getting stuck at MPI_BCAST on
> a
> > > > > customer
> > > > > machine. The call simple, all ranks use same size buffer and
> > > count,
> > > > > we
> > > > > verified that the root is same on all ranks.
> > > > >
> > > > > The code works on our clusters, but not on the user's
> machine.
> > > Here
> > > > > are the
> > > > > differences between our clusters and the user's machine.
> > > > >
> > > > >
> > > > > Our clusters                         User's machine
> > > > >
> > > > > Multi-proc nodes                   Single SMP node with 8
> cores on
> > > > > two
> > > > > sockets.
> > > > > CentOS 4, RHEL 4                RHEL 5 client version
> > > > > mpich2 1.0.6p1                     mpich2 1.0.6p1 (same)
> > > > >
> > > > > We were using gdb to localize the bug to MPI_BCAST two of the
> 8
> > > > ranks
> > > > > do not
> > > > > get past the BCAST. If we replace the BCAST with PT2PT
> > > > communication
> > > > > it is
> > > > > running well for 1000s of iterations.
> > > > >
> > > > > We linked our applications statically, on the RHEL 4 machine.
> > > > >
> > > > > Can you share your first thoughts about the issue.
> > > > >
> > > > > Thanks,
> > > > > Satya
> >
}}}"	bug	closed	blocker	mpich2-1.1.1	mpich2	fixed		
