Ticket #926 (reopened bug)

Opened 2 weeks ago

Last modified 2 weeks ago

enable-fast

Reported by: balaji Owned by: goodell
Priority: major Milestone: mpich2-1.3
Component: mpich2 Keywords:
Cc:

Description

Bcast seems to be hanging with enable-fast in the new nightly tests: http://www.mcs.anl.gov/research/projects/mpich2/nightly/new/latest

Attachments

Change History

Changed 2 weeks ago by thakur

  • owner set to goodell
  • status changed from new to assigned

I can reproduce this by hand. Built with --enable-fast on thrash. bcast2 runs up to 8 processes. On 10 procs, which triggers the long msg algorithm, it hangs.

Changed 2 weeks ago by thakur

No, it works. It just takes a while.

thrash:/sandbox/thakur/tmp/test/mpi/coll% date; mpiexec -n 10 bcast2; date
Fri Nov  6 16:20:28 CST 2009
 No Errors
Fri Nov  6 16:24:10 CST 2009

Changed 2 weeks ago by thakur

I ran all the tests in the coll directory. They all completed.

Changed 2 weeks ago by goodell

So this is just the usual nemesis over-subscription issue being aggravated by tighter loops from --enable-fast, right? Is there anything that actually needs to be done here in the short term?

Changed 2 weeks ago by thakur

  • status changed from assigned to closed
  • resolution set to wontfix

Probably not. Resolving it for now.

Changed 2 weeks ago by balaji

Is the problem with the sched_yield() call alone, or other CPU yielding calls as well (e.g., usleep(0) or select())? If this is specific to sched_yield, shouldn't we just give a higher priority to pick one of the other routines before trying sched_yield?

Changed 2 weeks ago by thakur

  • status changed from closed to reopened
  • resolution wontfix deleted

With the default build, the coll tests go through real fast. That --enable-fast makes the oversubscription case real slow is something to look into, but maybe not for this release. Reopening.

Changed 2 weeks ago by balaji

  • milestone changed from mpich2-1.2.1 to mpich2-1.3

Changed 2 weeks ago by buntinas

Yes, we should look into select() or sleep() as alternatives.

Hmm. I wonder if --enable-fast disables yield...

WARNING! You need to either login using OpenID here or enter your email address here before you can create or edit tickets. Otherwise the ticket will get treated as spam. More information on creating tickets can be found here.

Don't forget to add your email address to the cc list to make sure that you get updated of the ticket status.

Add/Change #926 (enable-fast)

Author



Change Properties
Action
as reopened
 
Note: See TracTickets for help on using tickets.