Ticket #926 (reopened bug)

Opened 4 months ago

Last modified 4 months ago

enable-fast

Reported by: balaji Owned by: goodell
Priority: major Milestone: mpich2-1.3
Component: mpich2 Keywords:
Cc:

Description

Bcast seems to be hanging with enable-fast in the new nightly tests: http://www.mcs.anl.gov/research/projects/mpich2/nightly/new/latest

Attachments

Change History

Changed 4 months ago by thakur

  • owner set to goodell
  • status changed from new to assigned

I can reproduce this by hand. Built with --enable-fast on thrash. bcast2 runs up to 8 processes. On 10 procs, which triggers the long msg algorithm, it hangs.

Changed 4 months ago by thakur

No, it works. It just takes a while.

thrash:/sandbox/thakur/tmp/test/mpi/coll% date; mpiexec -n 10 bcast2; date
Fri Nov  6 16:20:28 CST 2009
 No Errors
Fri Nov  6 16:24:10 CST 2009

Changed 4 months ago by thakur

I ran all the tests in the coll directory. They all completed.

Changed 4 months ago by goodell

So this is just the usual nemesis over-subscription issue being aggravated by tighter loops from --enable-fast, right? Is there anything that actually needs to be done here in the short term?

Changed 4 months ago by thakur

  • status changed from assigned to closed
  • resolution set to wontfix

Probably not. Resolving it for now.

Changed 4 months ago by balaji

Is the problem with the sched_yield() call alone, or other CPU yielding calls as well (e.g., usleep(0) or select())? If this is specific to sched_yield, shouldn't we just give a higher priority to pick one of the other routines before trying sched_yield?

Changed 4 months ago by thakur

  • status changed from closed to reopened
  • resolution wontfix deleted

With the default build, the coll tests go through real fast. That --enable-fast makes the oversubscription case real slow is something to look into, but maybe not for this release. Reopening.

Changed 4 months ago by balaji

  • milestone changed from mpich2-1.2.1 to mpich2-1.3

Changed 4 months ago by buntinas

Yes, we should look into select() or sleep() as alternatives.

Hmm. I wonder if --enable-fast disables yield...

WARNING! You need to enter your email address here before you can create or edit tickets. Otherwise the ticket will get treated as spam.

Add/Change #926 (enable-fast)

Author



Change Properties
Action
as reopened
 
Note: See TracTickets for help on using tickets.