In this paper general simulations of algorithms designed
for fully operational BSP machines on BSP
machines with fault y or unavailable processors, are
deveioped. The fail-stop model is considered for the
fault occurrences, that is, if a processor fails or becomes
unavailable, it remains so until the end of the
computation. The faults are random, in the sense that
a processor may fail independently with probability at
most a, where a is a constant.
Two possible settings for fault occurrences are
considered: the static case where the faults are
static (the faulty or unavailable processors are already
known at the beginning of the computation), and the
dynamic case where the processors may become faulty
or unavailable during the computation. In the case of
static faults, a simulation of an n-processor fault-free
BSP machine on a faulty n-processor BSP machine
is presented with constant slowdown per local computation
step and O(log n . max{ L, g }) slowdown per
communication step, given that a preprocessing has
been done that needs O(n/ log n max{L, g}) time (L
and g are the parameters of the simulating BSP machine).
In the case of dynamic faults, a simulation of an
n-processor fault-free BSP machine on an cn log nprocessor
faulty BSP machine is presented. No dynamic
faults may occur during certain periods of
the simulation. The simulations are randomized and
Monte Carlo: they are guaranteed to be correct with high probability, and the time bounds always hold. To
our knowledge, no previous work on the fault tolerance
of the BSP model exists.