TY - GEN
T1 - Software-based fault-tolerant routing algorithm in multi-dimensional networks
AU - Safaei, F.
AU - Rezazad, M.
AU - Khonsari, A.
AU - Fathy, M.
AU - Ould-Khaoua, M.
AU - Alzeidi, N.
PY - 2006
Y1 - 2006
N2 - Massively parallel computing systems are being built with hundreds or thousands of components such as nodes, links, memories, and connectors. The failure of a component in such systems will not only reduce the computational power but also alter the network's topology. The Software-Based fault-tolerant routing algorithm is a popular routing to achieve fault-tolerance capability in networks. This algorithm is initially proposed only for two dimensional networks [1]. Since, higher dimensional networks have been widely employed in many contemporary massively parallel systems; this paper proposes an approach to extend this routing scheme to these indispensable higher dimensional networks. Deadlock and livelock freedom and the performance of presented algorithm, have been investigated for networks with different dimensionality and various fault regions. Furthermore, performance results have been presented through simulation experiments.
AB - Massively parallel computing systems are being built with hundreds or thousands of components such as nodes, links, memories, and connectors. The failure of a component in such systems will not only reduce the computational power but also alter the network's topology. The Software-Based fault-tolerant routing algorithm is a popular routing to achieve fault-tolerance capability in networks. This algorithm is initially proposed only for two dimensional networks [1]. Since, higher dimensional networks have been widely employed in many contemporary massively parallel systems; this paper proposes an approach to extend this routing scheme to these indispensable higher dimensional networks. Deadlock and livelock freedom and the performance of presented algorithm, have been investigated for networks with different dimensionality and various fault regions. Furthermore, performance results have been presented through simulation experiments.
UR - http://www.scopus.com/inward/record.url?scp=33847119971&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33847119971&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2006.1639644
DO - 10.1109/IPDPS.2006.1639644
M3 - Conference contribution
AN - SCOPUS:33847119971
SN - 1424400546
SN - 9781424400546
T3 - 20th International Parallel and Distributed Processing Symposium, IPDPS 2006
BT - 20th International Parallel and Distributed Processing Symposium, IPDPS 2006
PB - IEEE Computer Society
T2 - 20th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2006
Y2 - 25 April 2006 through 29 April 2006
ER -