Real-time Scheduling in Heterogeneous Distributed Systems
Research is supported in part by NSF under Grant EPS-0091900, the University of Nebraska under Grant 26-0511-0019, and the New Mexico Institute of Mining and Technology under Grant 103295.
Reliability Cost-Driven (RCD) algorithm
Distributed systems are increasingly being applied for critical real-time applications, in which each task must be guaranteed a priori to meet its timing constraint. Therefore, efficient scheduling algorithms and schedulability analysis are needed. Many real-time scheduling algorithms, in which schedulability is a main objective function to be maximized, can be found in the literature. However, most of these real-time scheduling algorithms have an assumption that no error occurs in real-time systems. In order to make real-time scheduling algorithms more practical, reliability in addition to task precedence constraints need to be taken into account.
In general, distributed heterogeneous computing involves multiple heterogeneous modules that interact with one another to solve a problem. In a heterogeneous computing system, applications have subtasks that have diverse execution requirements. The subtasks must be assigned to machines (processors) and ordered for execution such that the overall application execution time is minimized. Extensive research has been done on scheduling algorithms for heterogeneous systems. Unfortunately, many scheduling algorithms for heterogeneous systems are non-real-time, which are not suitable for scheduling the tasks with timing constraints. The goal of our research is to study real-time scheduling algorithms for heterogeneous distributed systems.
Most recently, we have proposed two real-time fault-tolerant scheduling algorithms, named RTFTNO and RTFTRC, for periodic real-time tasks in heterogeneous distributed systems. A reliability measure, called reliability cost, is used as one of the objective functions for task scheduling. RTFTRC algorithm tries to minimize the reliability cost, while RTFTNO does not consider such metric. The results of the performance evaluation for these two algorithms show that RTFTRC has better performance than RTFTNO.
Meanwhile, we also have devised a reliability cost-driven (RCD) algorithm, which could schedule tasks with precedence constraints in a heterogeneous distributed system.More specifically, we consider the off-line scheduling of tasks with precedence constraints, represented by directed acyclic graphs (DAG), on a distributed heterogeneous system where processors operate at different speeds and inter-processor links have different bandwidths. Further, imulation results show that the reliability-driven scheduling algorithm significantly outperforms two previously proposed algorithms that are not reliability driven.
Real-time and Fault-tolerant Scheduling in Distributed Systems
In real-time and distributed systems, tasks must be guaranteed a priori to meet its timing constraint. Mission critical real-time tasks must tolerate faults. In other words, mission critical tasks must be completed before their deadlines even the real-time system encounter temporary and permanent faults. Scheduling multiple copies of a task on different processors is a promising approach to support fault-tolerance. If the primary copy of a task cannot be completed due to a fault, its backup copy executes and complete task before its deadline. We propose a series of new algorithms for fault-tolerant scheduling on a real-time distributed system. The algorithms guarantee the completion of a scheduled task before its deadline in the presence of a processor's permanent failures.
Firstly, we attempted to investigate two real-time fault-tolerant scheduling algorithms for independent tasks, which have no precedence constraints. The description of the algorithms and the experiment results can be found in Journal of Software, No5. 2000. The novel idea associate with these two algorithms is that real-time tasks without fault-tolerant requirement are also considered. This is because complex real-time systems consist of both fault-tolerant task and non-fault-tolerant tasks.
Heterogeneous systems involve multiple heterogeneous modules that interact with one another via high-speed network, which also can be heterogeneous. For each real-time task, the execution time on different computing modules are various. A fault-tolerant real-time scheduling is studied to provide a fault-tolerant support in the heterogeneous real-time systems. More information about this algorithm, including its description and simulation results, can be found in the Proceedings of 2000 International Conference on Parallel and Distributed Processing Techniques and Applications.
Above three algorithms are offline scheduling algorithms, which cannot directly be applied in dynamic real-time computing environment. Therefore, a heuristic dynamic scheduling scheme for parallel real-time jobs in a heterogeneous system is presented. The parallel real-time jobs studied in this paper are modelled by directed acyclic graphs (DAG). In the scheduling model, parallel real-time jobs arrive at a heterogeneous system following a Poisson process. The scheduling algorithms developed in this paper take the reliability measure into account, in order to enhance the reliability of the heterogeneous system without any additional hardware cost. In addition, scheduling time and dispatch time are both incorporated into our scheduling scheme so as to make the scheduling result more realistic and precise. Admission control is in place so that a parallel real-time job whose deadline cannot be guaranteed is rejected by the system. The performance of the proposed scheme is evaluated via extensive simulations. The simulation results show that the heuristic algorithm performs significantly better than two other algorithms that do not consider reliability cost. Furthermore, results suggest that shortening the scheduling time results in a higher guarantee ratio. Hence, if parallel scheduling algorithm is devised and employed to shorten the scheduling time, the performance of the heterogeneous system will be further enhanced. The detailed discussion about this algorithm is given in the proceedings of the 30th International Conference on Parallel Processing (ICPP 2001).
Current scheduling work includes
(1) Present static fault-tolerant algorithms that schedule real-time tasks with different deadlines.
(2) Study dynamic scheduling algorithms with fault-tolerant for aperiodic real-time tasks.
(3) Based on RTFTRC, study an efficient algorithm, which releases some assumptions of the the scheduling model.
(4) Consider more DAG types to represent more real applications.
14. An Availability-Aware Task Scheduling Strategy for Heterogeneous Systems.
X. Qin and T. Xie, IEEE Transactions on Computers. Submitted March 2006; revised Aug. 2006 and Dec. 2006; accepted April 2007.
13. TERCOS: A Novel Approach to Exploiting Redundancies in Fault-Tolerant and Real-Time Distributed Systems.
W. Luo, F.-M. Yang, L.-P. Pang, G. Tu, and X. Qin, Proc. 13th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), Daegu, Korea, Aug. 2007.
X. Qin and H. Jiang, Parallel Computing, vol. 32, no. 5-6, pp. 331-356, June 2006.
X. Qin and H. Jiang, Journal of Parallel and Distributed Computing, vol. 65, no. 8, pp. 885-900, August 2005.
X. Qin, W. Luo, and K. Bellam, Proc. the 4th IEEE International Symposium on Embedded Computing, Ontario, Canada, May 2007.
T. Xie and X. Qin, Proc. IEEE 8th International Conference on Cluster Computing (Cluster'06), Sept. 2006.
8. Fault-tolerant Scheduling for Periodic Tasks in Heterogeneous Systems. [ Abstract ]
W. Luo, F.-M. Yang, L.-P. Pang, and X. Qin, Proc. 3rd Int'l Conf. Autonomic and Trusted Computing (ATC), Sept. 2006.
X. Qin, H. Jiang, D. R. Swanson, Proc. 31st International Conference on Parallel Processing (ICPP), pp.360-368, Aug. 2002.
X. Qin and H. Jiang, Proc. 30th International Conference on Parallel Processing (ICPP), pp.113-122, Sept. 2001.
X. Qin, H Jiang, C. Xie, and Z. Han, Proc. International Conference Parallel and Distributed Computing Systems 2000, pp.617-623, Nov. 2000.
4. Real-time Scheduling for Dependable Multimedia Tasks in Multiprocessor Systems.
X. Qin, L. Pang, Z. Han, and S. Li, Proc. IEEE TENCON 2000, Sept. 2000.
X. Qin, Z. Han, H. Jin, L. Pang, and S. Li, Proc. Int'l Workshop Cluster Computing-Tech, Environments, and Applications, pp.421-427, June 2000.
2. An Efficient Fault-tolerant Scheduling Algorithm for Real-time Tasks with Precedence Constraints in Heterogeneous Systems.
X. Qin and H. Jiang, Technical Report TR-UNL-CSE 2002-0501, Department of Computer Sci. and Eng., Univ. of Nebraska-Lincoln, Feb. 2002.
1. A Fault-tolerant Real-time Scheduling Algorithm for Precedence-Constrained Tasks in Distributed Heterogeneous Systems.
X. Qin and H. Jiang, Technical Report TR-UNL-CSE 2001-1003, Department of Computer Sci. and Eng., Univ. of Nebraska-Lincoln, September 2001.
Updated on 5/18/2007