Fault-Tolerant Linux

 FT-Linux pioneers full-software stack fault-tolerance on a single SMP machine based on the Linux software ecosystem, demonstrating -- through a proof-of-concept -- that  monolithic operating systems such as Linux can be made fault resilient.

FT-Linux enables Linux to tolerate transient and permanent hardware failures. FT-Linux is built on top of the replicated-kernel version of Linux, Popcorn Linux, and aims to provide fault tolerance along the whole software stack, from the OS kernel to  applications. FT-Linux targets SMP machines in which there are multiple core/processor domains as well as memory controllers (and eventually I/O controllers), with the presumption that each of them may fail independently. Two main categories of faults are considered: a) permanent failures such as fail-stop errors and b) transient errors such as bit-flops in main memory.

 

You can find more information about FT-Linux in the following papers and Virginia Tech thesis: 

Source code and documentation is available online on GitHub

 
Contact:

Binoy Ravindran, Virginia Tech:  This email address is being protected from spambots. You need JavaScript enabled to view it.


This is an open-source project of the Systems Software Research Group at Virginia Tech.

This work is supported in part by AFOSR (grants FA9550-14-1-0163 and FA9550-16-1-0371) and ONR (grants N00014-13-1-0317 and N00014-16-1-2711). Any opinions, findings, and conclusions or recommendations expressed in this site are those of the author(s) and do not necessarily reflect the views of AFOSR and ONR.