In this paper we present an efficient general simulation strategy
for computations designed for fully operational BSP machines
of n ideal processors, on n-processor dynamic-fauhprone
BSP machines. The fault occurrences are fail-stop
and fully dynamic, i.e., they are ahowed to happen on-line
at any point of the computation, subject to the constraint
that the total number of faulty processors may never exceed
a known fraction. The computational paradigm can be exploited
for robust computations over virtual parallel settings
with volatile underlying infrastructure, such as a Network of
Workstations (where workstations may be taken out of the
virtual parallel machine by their owner).
Our simulation strategy is Las Vegas (i.e. it may never
fail, due to a BACKTRACKING process to robustly stored instances
of the computation). It adopts an adaptive balancing
scheme of the work load among the currently live processors
of the BSP machine. Moreover, the storage schemes
adopted in this work achieve space optimality, which is very
crucial in the BSP cost model, since space overhead is interpreted
into communication overhead, when a fraction of
the work load has to migrate to a currently live processor.
Our strategy is efficient in the sense that, compared
to an optimal off-line adversarial computation under
the same sequence of fault occurrences, it achieves an
0 ((log n . loglog n)“) multiplicative factor times the optimal
work (namely, this measure is in the sense of “competitive
ratio” of on-line analysis). In addition, our scheme
is modular, integrated, and considers many implementation
points. We comment that, up to our knowledge, no previous
work on robust parallel computations has considered fully
dynamic faults in the BSP model, or in general distributed
memory systems. Furthermore, this is the first time where
an efficient Las Vegas simulation in this area is achieved.