Ticket #618 (closed bug: fixed)

Opened 6 months ago

Last modified 3 weeks ago

nesting and subarray

Reported by: Rob Latham <robl@…> Owned by: robl
Priority: major Milestone: mpich2-1.2.1
Component: mpich2 Keywords:
Cc:

Description (last modified by robl) (diff)

Hi

over on mpich-discuss someone reported a bug in
MPI_TYPE_CREATE_SUBARRAY.  Test program attached -- note the use of
MPI_THREAD_MULTIPLE.

I built MPICH2 r4626 with --enable-g=all and get an error about
incorrect nesting level.  Here's the output from rank 0:

% ~/work/soft/mpich2/bin/mpiexec -l -np 4 ./cp_subarray pvfs2:/pvfs/cptesta
0: In direct memory block for handle type DATATYPE, 1 handles are still
allocated
0: Unexpected value for nesting level = 1
0: Nest stack is:
0:      [0] /home/robl/work/mpich2/src/mpi/datatype/type_create_subarray.c:93
0: [0] 72 at [0x08b45108], h2/src/mpid/common/datatype/dataloop/dataloop.c[380]
0: [0] 72 at [0x08b45020], h2/src/mpid/common/datatype/dataloop/dataloop.c[380]
0: [0] 56 at [0x08b44c90], c/mpid/common/datatype/mpid_datatype_contents.c[62]

- mpi_type_create_subarray *is* a publicly-facing routine and so
  rightly calls MPIR_Nest_incr()

- dataloop.c:380 is a MPIU_Malloc call

- mpid_datatype_contents.c:62 is also an MPIU_Malloc call

Suggestions or ideas?

Thanks
==rob

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

Attachments

cp_subarray.c (2.5 KB) - added by Rob Latham 6 months ago.
Added by email2trac
cp_subarray_new.c (2.2 KB) - added by gropp 6 months ago.
This version demonstrates the problem without the subarray call

Change History

Changed 6 months ago by Rob Latham

Added by email2trac

Changed 6 months ago by Rob Latham

  • id set to 618

This message has 1 attachment(s)

Changed 6 months ago by balaji

  • milestone set to mpich2-1.1

Changed 6 months ago by gropp

The problem isn't in the subarray code - I think that the problem is that that ROMIO nest incr code doesn't implement the debugging option, which means that the information on the nest stack is not probably saved (i.e., this is stale data). I found that just using File_open and File_close returned an error.

The other messages are from the handle code - the sample code needs an MPI_Type_free( &subarray );

Changed 6 months ago by gropp

It looks like the File_open is failing and then either it or the close fails to decrement the nest count on the error branch.

1.It would be good for ROMIO to implement the debugging version of the nest macros.

2. We should consider again eliminating the need for the nest macro (all we need is to eliminate the need to call MPI routines directly from other MPI routines)

Changed 6 months ago by gropp

This version demonstrates the problem without the subarray call

Changed 6 months ago by Rob Latham

On Thu, May 28, 2009 at 08:26:07PM -0000, mpich2 wrote:
> Comment (by gropp):
>  It looks like the File_open is failing and then either it or the close
>  fails to decrement the nest count on the error branch.

Indeed.  no problems if the file actually exists (I did not realize
this was a read-only test when I sent this bug report)

>  1.It would be good for ROMIO to implement the debugging version of the
>  nest macros.

I can do that, but can you tell me more about the debugging version of
the test macros?

thanks
==rob

--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

Changed 6 months ago by gropp

I've added a version of the debugging macros (I'll check it in shortly). The problem in in the definition of this macro (and probably others):

#define MPIO_CHECK_FILE_HANDLE(fh, myname, error_code)          \
if ((fh <= (ADIO_File) 0) ||					\
    ((fh)->cookie != ADIOI_FILE_COOKIE)) {			\
    error_code = MPIO_Err_create_code(MPI_SUCCESS,		\
				      MPIR_ERR_RECOVERABLE,	\
				      myname, __LINE__,		\
				      MPI_ERR_ARG,		\
				      "**iobadfh", 0);		\
    error_code = MPIO_Err_return_file(MPI_FILE_NULL, error_code);\
    goto fn_exit;                                               \
}

The problem is the "goto fn_exit" - in close.c, this bypasses the decrement of the nest counter. Most of the ROMIO mpi-io routines only have an fn_exit, but open.c and close.c have fn_fail, and they rely on the difference between the two. I'm not sure whether the best long term solution is to add fn_fail (and the associated handling) to the other routines or to remove fn_fail from open and close.

Changed 5 months ago by balaji

  • owner set to robl

Changed 5 months ago by thakur

  • milestone changed from mpich2-1.1.1 to mpich2-1.1.2

Changed 3 months ago by balaji

  • milestone changed from mpich2-1.1.2 to mpich2-1.2

Milestone mpich2-1.1.2 deleted

Changed 3 weeks ago by robl

  • status changed from new to closed
  • resolution set to fixed
  • description modified (diff)

OK, think this is fixed in r5679 -- moved nest-decr into fn_exit. It's a little spaghetti with fn_fail going to fn_exit but does the trick.

WARNING! You need to either login using OpenID here or enter your email address here before you can create or edit tickets. Otherwise the ticket will get treated as spam. More information on creating tickets can be found here.

Don't forget to add your email address to the cc list to make sure that you get updated of the ticket status.

Add/Change #618 (nesting and subarray)

Author



Change Properties
Action
as closed
Next status will be 'reopened'
 
Note: See TracTickets for help on using tickets.