The challenge of Fault-Tolerance mechanisms for power-efficient massive parallel systems

Speaker: Prof. Avi Mendelson – CS/EE Technion Haifa Israel

It is expected that soon, hundreds or even thousands of cores will be integrated on a single chip. Thus, large scale systems, such as cloud computers or HPC systems will be built out of millions of cores, sophisticated communication and vast memory subsystems. In such systems, faults cannot be avoided, but the system needs to be built to handle them in the most efficient way.

The talk will present different hardware based and software based mechanisms that were proposed to handle faults in massive parallel systems. Then, it will focus on a new “software-hardware” interfaces we developed as part of the “Teraflux” project sponsored by the EU as part of the FP7 research framework.

Short Biography

Avi Mendelson is a professor in the CS and EE departments Technion, Israel, and a member of the TCE (Technion Computer Engineering center). He earn his BSC and MSC degrees from the CS department, Technion, and got his PhD from University of Massachusetts at Amherst (UMASS).

Prof. Avi Mendelson has a blend of industrial and academic experience. As part of his industrial role, he spent 11 years in Intel, where he served as a senior researcher and Principle engineer in the Mobile Computer Architecture Group, in Haifa. While in Intel he was the chief architect of the CMP (multi-core-on-chip) feature of the first dual core processors Intel developed.

His research interests span over different areas such as Computer architecture, Operating systems, Power management, reliability, fault-tolerance, cloud computing, HPC and GPGPU.

Recently he served as the general manager of the ISCA-40 conference that was held in Tel-Aviv.

Promoted by

Co-sponsored by

RISC Project

Conference Proceedings
will be published by


Organized by

SGI Sherm Intel Software