Groupware systems allow
physically dispersed teams to collaborate over common tasks over
distance and/or time. In a real-time groupware system, all users are
required to be present at their respective sites at the same time,
whereas a non real-time groupware system allows users to work on
common tasks at different times. Real-time collaborative editing
systems, that enable groups of geographically distributed users to
simultaneously view and edit shared document, make the groupware
applications more practical. This is even more pronounced if users can
use real-time collaborative editing systems on the internet.
In
real-time collaborative editing systems, good responsiveness,
supporting unconstrained collaboration and tolerant failed processes
are main issues. Hence, if a real-time collaborative editor is to be
effectively used over the Internet, the system should tolerant the
client and link failures, for the quality of the Internet are
unpredictable. There are two main approaches to improving the fault
tolerance: replication and persistence. Components are replicated to
make the systems fault-tolerant by ensuring that all replicates
process the same messages in the same order. If any one of them fails,
others will still be able to continue. Persistence-based solutions
rely on checkpointing, which can recover the failures by
periodically saving the states of components. Checkpoint recovery may
be preferable for small problems if local disks are available, but
wide-area replication outperforms checkpoint recovery for larger-grain
problem.
In
a real-time collaborative editing system, the clients are able to
rejoin the system in the presence of the client or link failures. One
basic requirement is that the existing user can continue their work
while a new crashed client join the group again. Thus, the group’s
current status should transfer to the new client even in the presence
of any failures. From the client’s aspect, it can rejoin the
collaborative editing system without start from the very beginning.
Normally speaking, starting from the scratch can result in a
substantial delay, that is unnecessary is an efficient approach is
applied.
In this research, we devise a
new, efficient approach to support crash recovery in the real-time
collaborative editing systems. In regarding to the fault-tolerant
support for server in the system, we have developed the primary-backup
server to tolerate the single server failure. If the primary
server is crashed, the backup server will automatically continue to
server the clients without restarting the server. This research have
included an introduction to the notion of local final state,
which records the final state of the client. In order to protect the
clients from crash, local final state is stored on permanent storage
at each client site. If a client gets disconnected because of the
server crashes or link failures, the client are able to rejoin the
collaborative editing system by loading the local final state. In
order to synchronize with the current state of the whole system, the
server also will resend some operations according to this local final
state. How to determine the operations that should be resent by the
server is the main issue discussed in our research.
|