Algorithms designed to ensure a distributed system continues to function even when one or more nodes fail or go down.