Ticket #664 (closed bug: fixed)

Opened 5 months ago

Last modified 5 months ago

FW: Problem with MPI_Type_commit() and assert insegment_ops.c

Reported by: "Rajeev Thakur" <thakur@…> Owned by: rross
Priority: major Milestone: mpich2-1.1.1
Component: mpich2 Keywords:
Cc: jratt@…

Description




  _____

From: mpich2-dev-bounces@mcs.anl.gov
[mailto:mpich2-dev-bounces@mcs.anl.gov] On Behalf Of Joe Ratterman
Sent: Tuesday, June 09, 2009 3:14 PM
To: MPICH2 Dev
Subject: [mpich2-dev] Problem with MPI_Type_commit() and assert
insegment_ops.c


The specifics of this test come from an MPI excerciser that gathered
(using MPIR_Gather) a variety of types, including MPI_SHORT_INT.  The
way that gather is implemented, it created and then sent a struct
datatype of the tmp-data from the software tree and the local-data.  I
pulled out the important bits, and got this test-case.  It asserts on
PPC32 Linux 1.1 and BGP 1.1rc0, but runs fine on 1.0.7.  The
addresses/displacements are fake, but were originally based on the
actual values used inside MPIR_Gather.  It does the type-create on the
first two types just to show that it doesn't always fail.


Error message:


Creating  addr=[0x1,0x2]  types=[8c000003,4c00010d]  struct_displs=[1,2]
blocks=[256,256]  MPI_BOTTOM=(nil)
foo:25
Assertion failed in file segment_ops.c at line 994: *lengthp > 0
internal ABORT - process 0




Code


#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <mpi.h>

void foo(void *sendbuf,
         MPI_Datatype sendtype,
         void *recvbuf,
         MPI_Datatype recvtype)
{
  int blocks[2];
  MPI_Aint struct_displs[2];
  MPI_Datatype types[2], tmp_type;

  blocks[0] = 256;
  struct_displs[0] = (size_t)sendbuf;
  types[0] = sendtype;
  blocks[1] = 256;
  struct_displs[1] = (size_t)recvbuf;
  types[1] = MPI_BYTE;

  printf("Creating  addr=[%p,%p]  types=[%x,%x]  struct_displs=[%x,%x]
blocks=[%d,%d]  MPI_BOTTOM=%p\n",
         sendbuf, recvbuf, types[0], types[1], struct_displs[0],
struct_displs[1], blocks[0], blocks[1], MPI_BOTTOM);
  MPI_Type_create_struct(2, blocks, struct_displs, types, &tmp_type);
  printf("%s:%d\n", __func__, __LINE__);
  MPI_Type_commit(&tmp_type);
  printf("%s:%d\n", __func__, __LINE__);
  MPI_Type_free  (&tmp_type);
  puts("Done");
}


int main()
{
  MPI_Init(NULL, NULL);

  foo((void*)0x1,
      MPI_FLOAT_INT,
      (void*)0x2,
      MPI_BYTE);
  sleep(1);
  foo((void*)0x1,
      MPI_DOUBLE_INT,
      (void*)0x2,
      MPI_BYTE);
  sleep(1);
  foo((void*)0x1,
      MPI_SHORT_INT,
      (void*)0x2,
      MPI_BYTE);

  MPI_Finalize();
  return 0;
}




I don't know anything about how this might be fixed, but we are looking
into it as well.

Thanks,
Joe Ratterman
jratt@us.ibm.com

Attachments

part0001.html (4.7 KB) - added by Rajeev Thakur 5 months ago.
Added by email2trac

Change History

Changed 5 months ago by Rajeev Thakur

Added by email2trac

Changed 5 months ago by Rajeev Thakur

  • id set to 664

This message has 1 attachment(s)

Changed 5 months ago by thakur

Adding Rob's reply


Hi,

Those type casts to (size_t) should be to (MPI_Aint).

That assertion is checking that a parameter being passed to Segment_mpi_flatten is > 0. The parameter is the length of the list of regions being passed in by reference to be filled in (the destination of the list of regions). So for some reason we're getting a zero (or possibly negative) value passed in as the length of the arrays.

There's only one place in the struct creation where Segment_mpi_flatten() is called; it's line 666 (evil!) of dataloop_create_struct.c. This is in DLOOP_Dataloop_create_flattened_struct(), which is a function used to make a struct into an indexed type.

The "pairtypes", such as MPI_SHORT_INT, are special cases in MPI in that some of them have more than one "element type" (e.g. MPI_INT, MPI_SHORT_INT) in them. My guess is that there's an assumption in the DLOOP_Dataloop_create_flattened_struct() code path that is having trouble with the pairtype.

I'm surprised that we might have introduced something between 1.0.7 and 1.1; I can't recall anything in particular that has changed in this code path. Someone should check the repo logs and see if something snuck in?

Rob

Changed 5 months ago by thakur

Adding Joe's reply;

I know it's not that important--and clearly not relevant--but BG/P will generate a compiler warning if I use an MPI_Aint cast there. We want to avoid any ambiguity that such a cast would involve (i.e. is it sign extended?), so I use a cast that works correctly for this sort of micro-tests. It is also correct on PPC32. This small example shows the warnings:

$ cat -n size.c

1 #include <mpi.h> 2 3 extern void bar(MPI_Aint a, MPI_Aint b, MPI_Aint c); 4 5 void foo(void* p) 6 { 7 MPI_Aint aint = (MPI_Aint)p; 8 9 MPI_Aint one = (long long int)p;

10 MPI_Aint two = (int)p; 11 12 bar(aint, one, two); 13 }

$ /bgsys/drivers/ppcfloor/comm/bin/mpicc -Wall -g -c size.csize.c: In function 'foo': size.c:7: warning: cast from pointer to integer of different size size.c:9: warning: cast from pointer to integer of different size

Thanks,

Joe Ratterman

Changed 5 months ago by thakur

  • owner set to rross
  • milestone set to mpich2-1.1.1

Changed 5 months ago by thakur

  • cc jratt@… added
  • status changed from new to closed
  • resolution set to fixed

Fixed by RobR in r4874 and r4875

WARNING! You need to either login using OpenID here or enter your email address here before you can create or edit tickets. Otherwise the ticket will get treated as spam. More information on creating tickets can be found here.

Don't forget to add your email address to the cc list to make sure that you get updated of the ticket status.

Add/Change #664 (FW: Problem with MPI_Type_commit() and assert insegment_ops.c)

Author



Change Properties
Action
as closed
Next status will be 'reopened'
 
Note: See TracTickets for help on using tickets.