Ticket #27 (new enhancement)
Failed CN can hang the whole pset
Reported by: | iskra | Owned by: | iskra |
---|---|---|---|
Priority: | minor | Milestone: | V1R4 Release |
Component: | ZeptoOS | Version: | |
Keywords: | Cc: |
Description
If a compute node hangs for whatever reason, and we keep sending it packets on the tree from the ION, this will quickly lock the tree network. That's because packets will back up to the send FIFO on the ION, blocking any other packets to remaining, operational compute nodes.
It is apparently possible to reconfigure the tree interface into loopback mode to read those blocking packets and so unlock the tree on ION. We should ask IBM for the details and implement this.
Change History
Note: See
TracTickets for help on using
tickets.