Ticket #927 (new bug)
Spawn() fails on remote node with nemesis on windows
| Reported by: | jayesh | Owned by: | jayesh |
|---|---|---|---|
| Priority: | major | Milestone: | mpich2-1.3 |
| Component: | mpich2 | Keywords: | |
| Cc: | lradev@… |
Description (last modified by balaji) (diff)
Actually, it does work locally but fails remotely, with channel nemesis. As it turns out, the issue is unrelated to C++ and Boost.
Consider this, a program named "tm":
int main(int argc, char* argv[])
{
int supported;
MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &supported);
MPI_Comm parent, children;
MPI_Comm_get_parent(&parent);
if (parent == MPI_COMM_NULL) {
const int NHOST = 2;
int nproc[NHOST] = {1, 1};
char* hosts[NHOST] = {"lradev-w02", "lradev-w03"};
char* progs[NHOST] = {"c:/pub/tm", "c:/pub/tm"};
MPI_Info infos[NHOST];
for (int i=0; i < NHOST; ++i) {
MPI_Info_create(&infos[i]);
MPI_Info_set(infos[i], "host", hosts[i]);
}
MPI_Comm_spawn_multiple(NHOST, progs, NULL, nproc, infos, 0, MPI_COMM_WORLD, &children, NULL);
}
MPI_Finalize();
return 0;
}
lradev-w02 is my localhost on which the program is being run, and lradev-w03 is the remote host.
The program runs fine when run with NHOST==1, i.e. only locally - it spawns a copy of itself and exits.
However, when run with NHOST==2, it freezes after spawning one local and one remote copy, i.e. locally I can observe 2 processes named "tm.exe" (plus mpiexec) and one "tm.exe" process on the remote host (plus mpiexec). Those apparently eat all CPU available to them and have to be killed to stop.
With the sock channel it works fine both locally and remotely, obviously in MPI_THREAD_SINGLE mode. It crashes with mt and ssm channels (due to unhandled win32 exception).
I have your private build installed on both hosts.
