Time-Adaptive Self Stabilization
We study the scenario where a transient fault hit f of the n
nodes of a distributed system by corrupting their state. We consider the
basic problem of persistent bit, where the system is required to
maintain a value in the face of transient failures by means of replication.
We give an algorithm to recover the value quickly: the value of the bit
is recovered at all nodes in O(f) time units for any unknown value
of f<n/2. Moreover, complete state quiescence occurs in O(Diam)
time units, where Diam denotes the diameter of the network. This
means that the value persists indefinitely so long as any f<n/2
faults are followed by Omega(Diam) fault-free time units. We prove
lower bounds which show that both time bounds are asymptotically optimal.
Using the algorithm for persistent bit, we present a general transformer
which takes a distributed non-reactive, non-stabilizing protocol P,
and produces a self-stabilizing protocol P' which solves the problem
P solves, with the additional property that if the number of faults
that hit the system after stabilization is f, for any unknown f<n/2,
then the output of P' regains stability in O(f) time units,
and the state stabilizes in O(Diam) time units.
Click here
for proceedings version, and here
for the full version.