您的当前位置:首页正文

Scheduling hard real-time tasks with tolerance of multiple processor failures

来源:好兔宠物网
SchedulingHardReal-TimeTaskswith1-Processor-Fault-Tolerance

YingfengOhandSangH.SonTechnicalReportNo.CS-93-27

May24,1993

Scheduling Hard Real-Time Tasks with Tolerance of

Multiple Processor Failures

Yingfeng Oh and Sang H. Son1Department of Computer Science

University of VirginiaCharlottesville, VA 22903

Abstract

Real-time systems are being extensively used in applications that are

mission-critical and life-critical, such as space exploration, aircraft avionics,and robotics. Since these systems are usually operating in environments thatare non-deterministic, and even hazardous, it is extremely important that harddeadlines of tasks be met even in the presence of certain failures. To tolerateprocessor failures in a real-time multiprocessor system, the problem of sched-uling a set of hard real-time tasks with duplication is studied. We first provethat the problem of scheduling a set of non-preemptive tasks onm≥ 3 proces-sors to tolerate one arbitrary processor failure isNP-complete even when thetasks share a common deadline. A heuristic algorithm is then proposed to solvethe problem. The schedule generated by the scheduling algorithm can tolerate,in the worst case, one arbitrary processor failure, but in the best casem⁄2processor failures, wherem is the number of processors in the system. Experi-mental data and analysis show that the performance of the algorithm is near-optimal. The research described in this paper is a part of our on-going researcheffort to address the problem of supporting timeliness and dependability simul-taneously in a system.

Keywords:real-time scheduling, parallel processing, fault-tolerance

1. This work was supported in part by ONR, by DOE, and by IBM.I. Introduction

The support of computer systems is indispensable to many applications that are mission-critical and life-critical, such as space exploration, aircraft avionics, and robotics. These applica-tions require not only long duration of reliable services, but also timeliness of operations. Com-puter systems that are built to support these applications include SIFT [28], FTMP [9], the spaceshuttle primary computer system [26], and MAFT [11]. These mission critical systems are mainlyparallel or distributed systems that are embedded into complex, even hazardous environments,under tight constraints on timeliness and dependability of operations. A great deal of efforts hasbeen invested to make computer systems highly dependable and predictable, just to cite a few, [1][10] [12] [17] [18] [21] [22] [24] [27].

Yet, conspicuously lacking in this scenario is a formal approach towards supporting timeli-ness (real-time) and dependability (fault-tolerance) simultaneously in a system at the level of taskscheduling. Traditional approaches to provide fault-tolerance and real-time in a system have beento separate the concern of the two issues, i.e., the timeliness of tasks are ensured through real-timescheduling, with the assumption that processors and tasks are fault-free, while the dependabilityof processors or tasks is achieved through redundancy techniques, assuming that task deadlinescan be met separately. These two assumptions have been challenged recently by several research-ers [25], arguing that real-time and fault-tolerant requirements are not orthogonal. Consequently,some efforts [23] have been made to address the joint requirement of timeliness and dependabil-ity. However, the approaches adopted so far have either beenad hoc or limited to specific casestudies. A formal approach which addresses the problem in a top-down or bottom-up manner isneeded, because such approach is essential in building timeliness and dependability into a singlecomputer system.

Our approach, which we deem formal, is to formalize this real-time fault-tolerant problem atthe level of task scheduling, and systematically study the various cases of this general schedulingproblem. In the formation, tasks are characterized not only by timing constraints, but also bydegree of redundancy. The scheduling goal is twofold: timeliness and dependability. The generalapproach of studying scheduling problems is used, i.e., the complexity of the various cases of thescheduling problems is examined, and then optimal algorithms or heuristic algorithms are devisedto solve them. By studying the general scheduling problem, we hope that we will be able toanswer such critical questions in designing highly responsive and resilient computer systems asthe following one: given the overall system requirement of dependability and timeliness, and thecharacteristics of a task set (possibly with duplicated tasks or different versions) and those of pro-cessors, can the task set be scheduled such that the overall system requirement be met? If so, how

1

to schedule it?Since most cases of the general real-time scheduling problem are intractable, it is reasonableto expect that many cases of the general real-time fault-tolerant scheduling problem are alsointractable. This is indeed the case, as shown by some of the results in this paper. However, thisfact neither makes the problem go away nor render our approach ineffective, rather it requires thatheuristics be developed where the problem instances areNP-complete. In this paper, we present aformal definition of the scheduling problem, followed by our major results on a special case of thescheduling problem. It is shown that the problem of scheduling a set of real-time tasks with acommon deadline onm≥ 3 processors for the tolerance of one arbitrary processor failure isNP-complete.SinceNP-complete problems are widely believed to be computationally intractable, a heu-ristic algorithm is proposed to obtain an approximate solution. The schedule generated by thescheduling algorithm can tolerate, in the worst case, one arbitrary processor failure, but in the bestcase,m⁄2 processor failures, wherem is the number of processors in the system. Simulationand analysis have been carried out to evaluate the performance of the algorithm, and it is shownthat the algorithm finds the optimal solution in most of the cases.The organization of the paper is as follows. The related work is described in Section II. Thescheduling problem is defined and a special case of it proven to beNP-complete in Section III.The scheduling heuristic is presented in Section IV. The analysis of the performance of the algo-rithm and the simulation results for the algorithm are given in Section V. We conclude this paperin Section VI with a look at future work.II. Related WorkIn this section, we focus our review on those studies that are related to the real-time fault-tolerant scheduling problem. It is obvious from our studies that research efforts in this area hasbeen quite limited, and it should be noted that these resultes reviewed below are remotely relatedto the problem we are considering. Balaji et al [2] presented an algorithm to dynamically distrib-ute the workload of a failed processor to other operable processors. The tolerance of some proces-sor failures is achieved under the condition that the task set is fixed, and enough processing poweris available to execute it. Bannister and Trivedi [3] considered the allocation of a set of periodictasks, each of which has the same number of clones, onto a number of processors, so that a certainnumber of processor failures can be sustained. An approximation algorithm is proposed, and theratio of the performance of the algorithm to that of the optimal solution, with respect to the bal- 2

ance of processor utilization, is shown to be bounded by(9m) / (8 (m - r + 1)), wherem is thenumber of processors to be allocated, andr is the number of clones for each task. Their allocationalgorithm is based on the assumption that sufficient processors are available to accommodate thescheduling of tasks.

Krishna and Shin [13] proposed a dynamic programming algorithm that ensures thatbackup, or contingency, schedules can be efficiently embedded within the original, “primary”schedule to ensure that hard deadlines continue to be met even in the face of processor failures.They assume that a given algorithmP, which finds the optimal nonfault-tolerant schedule for sys-tems, can be split into two subalgorithms:P1, which finds the optimal allocation of tasks to pro-cessors and also the optimal schedule, by callingP2, which is an optimal scheduler for a singleprocessor. Each task in the task set has a cost function associated with it. The scheduling goal istherefore to minimize the total cost of the system. The fault-tolerant algorithmQ is derived fromalgorithmP, such that for each processor,Q takes as its inputs the set of primary clones and theset of backup clones, and produces as its output a fault-tolerant schedule that can sustain up toNsust processor failures, while the total cost of the system is minimized. However, the algorithmin [13] has a severe drawback for the following reason: the problem to schedule a set of indepen-dent and preemptive tasks with different release times and deadlines and different weighted func-tions to minimize the total cost on a single processor isNP-hard [14]. This implies that it isunlikely to find an efficient algorithm P, which was assumed to exist and used as the base algo-rithm forQ. Furthermore, no such algorithm asP has yet been found, that can be split into twosubalgorithms that are both optimal.

We have investigated several special cases of the real-time fault-tolerant scheduling prob-lem. Two scheduling algorithms [19] [20] have been proposed to obtain approximate solutions tothose special cases. The complexity result presented in this paper is the first solid evidence thateven for a very simple case of the scheduling problem, it is intractable. The heuristic thus devisedis an improvement over the previous ones.

III. Problem Formulation and Complexity Result

We assume that processors fail in the fail-stop manner and the failure of a processor can bedetected by other processors. The means of processor monitoring, failure detection, and failurenotification are not considered here. We further assume that all tasks have hard deadlines andtheir deadlines must be met even in the presence of processor failures. We say that a task meets itsdeadline if either its primary copy or its backup copy finishes before or at the deadline. Because

3

processor failure is unpredictable and the task deadlines are hard, no optimal dynamic schedulingalgorithm exists. We therefore focus on static scheduling algorithm to ensure that task deadlinesare met even in the presence of processor failures. The scheduling problem can be formallydefined as follows:A set ofn tasksℜ={τ1,τ2,…,τn} is given to be scheduled onm processors. Each taskis characterized by the tuple−τi=(ri,ci,pi,di), whereri is the release time of taski,ci is thecomputation time of taski,pi is the period of task i, anddi is the deadline of taski. Ifpi is speci-fied as a variable, then the task system is termed anaperiodic task system. Otherwise, it is aperi-odic task system. Associated with each task are a number of primary copies and a number ofbackup copies. Ak-Timely-Fault-Tolerant (hereinafterk-TFT)schedule is defined as the schedulein which no task deadlines are missed, despitek arbitrary processor failures. Then, given a setℜofn tasks,m processors,the scheduling problem(hereinafter referred to as the TFT schedulingproblem) can be defined, in terms of a decision problem, as deciding whether there exists a sched-ule, which isk-TFT for the task setℜ onm processors. In reality, it is more likely that a task setℜ is given, and the scheduling goal is to find the minimum number of processorsm, such that ak-TFT schedule can be constructed for the task setℜ onm processors. This then becomes an opti-mization problem. If a decision problem isNP-complete, then its corresponding optimizationproblem is at leastNP-complete.The TFT scheduling problem is a natural extension to the real-time scheduling problem.Figure 1 depicts the structure of the TFT problem. On the real-time dimension, the parameters areessentially the same as those used in real-time scheduling. On the fault-tolerance dimension, hard-ware and software redundancy, more specifically processor and task redundancy, are incorporatedinto the scheduling problem. The scheduling goal is represented by thek-TFT parameter, themeaning of which is given above.k-TFT (processor / task)processorredundancyFault-Tolerancepreemptive / non-preemptiveperiodicityprecedence-constraints (independent / tree /...)Real-Timetask redundancy(Duplicated / Multi-Version)Figure 1: The TFT Scheduling Problem 4

In the following, a special case of the TFT scheduling problem is considered. The tasks areassumed to be independent and non-preemptive. Each task has a primary copy and a backup copy,and the scheduling goal is to achieve1-TFT for processor failure, i.e., the tolerance of one arbi-trary processor failure. This case of the TFT problem is chosen to be studied, because it is the sim-plest case. Our strategy to tackle the TFT scheduling problem is to start with the simplest cases,and then walk our way towards more complicated cases.

The task redundancy scheme specified in the above case actually corresponds to the pri-mary-backup copy approach or recovery block approach. Primary-backup copy approach requiresthe multiple implementation of a specification [10]. The first implementation is called the primarycopy, and the other implementations are called the backup copies. The primary and if necessary,the backup copies, execute in series. If the primary copy fails, one of the backup copies isswitched in to perform the computation again. This process is repeated until that either correctresults are produced or all the backup copies are exhausted. Here we consider a special case of theprimary-backup copy approach, i.e., each task has one backup copy only. The following Lemmasguarantee that having one backup copy for each task is sufficient for the tolerance of one arbitraryprocessor failure. The proofs of these Lemmas can be found in [20].

Lemma 1: In order to tolerate one or more processor failures and guarantee that the deadlineof a task is met using the primary-backup copy approach, the computation time of the task mustbe less than or equal to half of the period of the task, assuming that the deadline coincides with theperiod.

Lemma 2: One arbitrary processor failure is tolerated and the deadlines of tasks are met, ifand only if the primary copy and the backup copy of each task is scheduled on two different pro-cessors and there is no overlapping in time between their executions.

An obvious implication ofLemma 1 is that for each task, if the computation time of the taskis larger than half of its period, it is impossible to find a schedule which is1-TFT. This is due tothe observation that if the primary copy fails at the very end, there will not be enough time left tocomplete a backup copy, assuming that the backup copy has the same computation time require-ment as the primary copy. This fact is used implicitly in many situations throughout this paper.In scheduling the backup copies, we have the options of allowing them to be overlapped orforbidding them from overlapping. Here we consider the case where the backup copies are notallowed to be overlapped with each other. What we mean by disallowing them to be overlapped isthat backup copies of the tasks whose primary copies are scheduled on different processors arenot allowed to overlap in time of their executions on a processor. For obvious reasons, backupcopies of the tasks whose primary copies are scheduled on the same processor should not bescheduled to overlap in time of their executions on a processor. When the given number of pro-

5

cessors is two, there obviously exists an optimal algorithm to schedule a set of tasks having acommon deadline so as to tolerate one arbitrary processor failure. However, for more than twoprocessors, the scheduling problem isNP-complete, even when the tasks have the same deadline.Task Sequencing Using Primary-Backup with a Common Deadline(Non-Overlapping of Backups)Instance: Setℜ of tasks, number of processorsm≥3, for each taskt∈ℜ, one primary copy

P(t) and one backup copyG(t), a lengthl(t)∈Z+(the set of natural numbers), a com-mon release timer∈Z+, a common deadlined(t)=D∈Z+, andl(P(t))=l(G(t)) =l(t). No overlapping of backup copies is allowed.Question: Is there anm-processor scheduleσ forℜ that is1-TFT, i.e., for each taskt∈ℜ,

σi(P(t))+l(P(t))≤σj(G(t)), andσi(G(t))+l(G(t))≤D, wherei≠j,i andjdesignate the index of processors.Theorem 1:Task Sequencing Using Primary-Backup with a Common Deadline isNP-complete.Proof: It is sufficient to prove that this scheduling problem isNP-complete even in the case ofm= 3. It is easy to verify that this problem is inNP. We next transform the PARTITION problem−anNP-complete problem− to the scheduling problem.

The PARTITION problem [7] is stated as follows: Given a finite set A and a “size”

s(a)∈Z+ for eacha∈A, is there a subsetA⊆A such that∑′s(a) =∑′s(a)?

a∈A

a∈A−A

Given an instance of A ={a1,a2,…,an} of the PARTITION problem, we construct atask setℜ using the primary-backup copy approach to run on three processors for the tolerance ofa single arbitrary processor failure, such thatℜ can be scheduled, if and only if there is a solutionto the PARTITION problem.ℜ consists ofn +1 tasks as follows:

r(t)=0,l(t)=at,

d(t)=2B, wheret∈{τ1,τ2,…,τn},

loss of generality);and one other taskβ:

r(β)=0,l(β)=B,d(β)=2B.

It is easy to see that this transformation can be constructed in polynomial time. What weneed to show is that the setA can be partitioned into two setsS1 andS2 such that∑a∈Ss(a) =

1

∑a∈Ss(a) andS1 +S2= A, if and only if the task set can be scheduled.

2

∑1≤i≤nai =2B(this can be assumed without

First, suppose that A can be partitioned into two setsS1 andS2 such that∑a∈Ss(a) =

1

∑a∈S2s(a) =B andS1 +S2= A. Then we schedule, for eachα∈S1, the primary copy of thetaskα withl(α) =a on processor 2 anywhere between time interval[0,B), and its backup copyon processor 3 anywhere between time interval[B,2B). For each taskα∈S2 withl(α) =a, the

6

primary copy of taskα is scheduled on processor 3 anywhere between time interval [0,B) and itsbackup copy on processor 1 anywhere between time interval[B,2B). Therefore, the2*n copies ofthen tasks can be scheduled on processors satisfying the condition set inLemma 2. For taskβ, itsprimary copy is scheduled on processor 1 during time period [0,B), and its backup copy is sched-uled on processor 2 between time period[B,2B), as shown in Figure 2. Thus, the task setℜ can0processor 1processor 2processor 3P(β)P(S1)P(S2)BG(S2)G(β)G(S1)2BFigure 2: Mapping from PARTITION to Task Sequencingbe scheduled on three processors such that the schedule is1-TFT.Conversely, if the task setℜ is scheduled on three processors such that the schedule is1-TFT, we claim that for all tasks scheduled between the time interval[0, B) on processor 2, the sumof the tasks’ lengths isB, i.e.,∑a∈Ss(a)= B. To be able to tolerate one arbitrary processor1failure, the primary copy of a task and its backup copy must be scheduled on two different proces-sors and their execution time must not be overlapped. This later requirement is guaranteed by theprimary-backup copy approach. Since the common deadline is2B and the total task executiontime is2*(2B+B) = 6B, any1-TFT schedule should have no idle time during the time interval[0,2B) on all three processors. Therefore, any1-TFT schedule must be equivalent to the scheduleshown in Figure 3, if processors are properly renamed and the primary copies are moved in frontof all the backup copies for each processor. Shuffling the primary copies in front of all the backupcopies will not violate any scheduling constraint, since primary copies can start earlier than sched-uled and backup copies can start later than scheduled, as long as the release time and the deadlineconstraints are not violated. For processor 3, exactly one copy, either primary or backup, of anytask among then tasks must be scheduled on it. This is because any1-TFT schedule for the threeprocessor requires that no idle time exists on any processor, and the primary copy of a task and itsbackup copy must never be scheduled on the same processor. Therefore, we let all the tasks0B2Bprocessor 1processor 2processor 3P(β)P(U2)P(U1)P(U2)G(U1)G(β)G(U2)Figure 3: Mapping from Task Sequencing to PARTITIONscheduled on processor 2 between time interval[0, B) be the setS1, and the tasks on processor 1 7

between time interval[B, 2B) be the setS2. We then have∑a∈Ss(a) =∑a∈Ss(a) andS1 +12S2= A. We have solved the PARTITION problem. The scheduling is thereforeNP-complete.IV. A1-Timely-Fault-Tolerant Scheduling AlgorithmSince the scheduling problem isNP-complete, a heuristic scheduling algorithm is presentedin this section to obtain approximate solution.In scheduling a set of tasks onm processors, the algorithm must be designed to minimizethe schedule length on each processor such that the task set can be successfully scheduled, and atthe meantime, to prevent the overlapping of the primary copy of a task and its backup copy. Thisscheduling problem, at a first glance, seems very much to resemble the scheduling problem ofminimizing the makespan of a schedule in a multiprocessor system. Since the scheduling to mini-mize the makespan of a schedule isNP-complete, several scheduling heuristics have been devel-oped, among which LPT [8] and MULTIFIT [6] are notable ones. However, there are two keyissues that set this scheduling problem apart from the one to minimize the makespan: the require-ment of scheduling primary copies as well as backup copies, and the requirement that the primarycopy of a task can not overlap its backup copy, and backup copies of different tasks can not over-lap each other in execution either. The MULTIFIT algorithm, though out-performing LPT in theworst cases, is not easily adapted to solve the1-TFT scheduling problem. The LPT algorithm istherefore adopted here to serve as the base algorithm upon which a scheduling heuristic is devel-oped.The algorithm starts by first scheduling the primary copies on them processors using theLPT algorithm. It then schedules the backup copies, by following several rules described below,such that the primary copy of a task and its backup copy are scheduled on different processors,and the backup copies of those tasks, whose primary copies are scheduled on a processor, are alsoscheduled on one processor. The algorithm is given as follows. Note thatD is the common dead-line of the tasks.Algorithm 1(Input: Task Setℜ,m,1-TFT; Output:success,schedule)Step 1: Sort the tasks in order of non-increasing computation times and rename themnT1,T2,…,Tn. ComputeΩ=∑il(Ti). IfΩ>(mD)⁄2 orl(T1)>D⁄2,=1then report that the task set can not be scheduled onm processors by this algo-rithm such that a1-TFT schedule can be produced. Otherwise, go toStep 2.Step 2: Apply the LPT algorithm to schedule the task set onm processors.Step 3:Sort the primary schedules for them processors in order of non-increasing sched- 8

ule lengths. Duplicate the primary schedules to formm backup schedules andappend them at the end of the primary schedules (Figure 4a).Step 4:Swap the backup schedules according to the swapping rules defined below (Figure4b). Shift the backup schedules to obtain the mixed schedules according to theshifting rules defined below (Figure 4c).Step 5:Find the maximum length among the mixed schedules and compare it toD. If it islonger thanD,the task set cannot be scheduled. Otherwise, the mixed schedulesgenerated in Step 4 are the schedules which are1-TFT as a whole.D/2DD/2Dm processorsprimary schedulesappended backup schedulesprimary schedulesswapped backup schedulesFigure 4a: Schedules after AppendingD/2Figure 4b: Schedules after SwappingDprimarybackupidlem processorssorted primary scheduleshifted backup scheduleFigure 4c: Schedules after ShiftingThe functioning of the algorithm is illustrated by the following simple example.Example 1: Using Algorithm 1 to schedule the following task set on four processors:ℜ ={τ1,τ2,…,τ7},{l(τi)i=1,…,7} ={10,8,8,7,6,6,3}, r = 0, andD = 25. First, the LPTalgorithm is used to schedule the primary copies of the tasks on four processors, as shown by Fig-ure 5a. Secondly, the four primary schedules are sorted in non-increasing order. Thirdly, the pri-mary schedules are duplicated to form the backup schedules, which are then appended to the backof the primary schedules. Finally, the backup schedules are swapped and shifted appropriately.The final result is shown in Figure 5b. Note that if the number of processors available is three, thetask set cannot be scheduled by this algorithm.25366primarybackupidleprocessor 1processor 2processor 3processor 488710Figure 5a: Schedule Generated by LPT 9

processor 1processor 2processor 3processor 4887103668781066253primarybackupidleFigure 5b: Schedule Generated after Swapping and Append-The reason to sort the primary schedules before appending is to minimize the maximumlength of the mixed schedule along with the swapping and shifting processes in the later stages.The swapping process makes sure that the backup copy of a task is not scheduled on the same pro-cessor as its primary copy. The purpose of shifting is to minimize the finishing time of the mixedschedule as well as to avoid the overlapping of backup copies among different tasks. To elaborateon the swapping and shifting processes, we formally define the swapping and shifting rules.Swapping Rules:(1) If the number of processorsm is even, the longest backup schedule is appended behindthe shortest primary schedule, and the second longest backup schedule is appendedbehind the second shortest primary schedule, and so forth.(2) Ifm is odd, then the backup schedules of the three central processors are appended inacyclic fashion. The three central processors are the ones whose positions are in themiddle. The backup schedules of the rest of the processors are swapped by followingswapping rule (1).To define the shifting rules, we need the following definitions.Definition 1: Two processors are called twin processors if backup copies of the tasks in theprimary schedule on a processor are appended after the primary schedule of the other processor.The two schedules on twin processors are called twin schedules. For example, in Figure 5b, pro-cessors 1 and 4 are twin processors, so are processors 2 and 3.Definition 2: For the primary schedule of a processori,lp(i) is defined as its primaryschedule length.lq(i) is defined as the computation time of the first task in the primary schedule.Obviously,lp(i)≥lq(i). Thoughlp(i) denotes the length of a schedule, it will also be used todenote the corresponding time interval whose length islp(i).Shifting Rules:Suppose the backup schedule of processorj is appended behind the primary schedule ofprocessori.(1) Iflp(i)≤D/2 andlp(j)≤D/2, then the tasks inlp(j) are shifted together ahead oftime such that the starting time of the first task inlp(j) ismax{lp(i),lp(j)}. If 10

lp(j)≠Lq(j), the starting time of the first task inlp(j) can be moved tolp(i)andthe rest of the backup copies can be moved ahead accordingly.

(2) Iflp(i)≤D/2 andlp(j) >D/2, the tasks inlp(j) are shifted together ahead of time

such that the starting time of the first task inlp(j) islp(i).(3) Iflp(i) >D/2andlp(j)≤D/2, the tasks inlp(j) are shifted together ahead of time

such that the starting time of the first task inlp(j) islp(i).(4) Iflp(i) >D/2 andlp(j) >D/2, the tasks inlp(j) are shifted together ahead of time

such that the starting time of the first task inlp(j) islp(i).(5) Apply the above rules to every schedule on the processors.

The schedule thus generated byAlgorithm 1 is1-TFT, as shown by the following theorem.Theorem 2:Algorithm 1 produces a1-Timely-Fault-Tolerant schedule.

Proof: Since any primary copy of a task and its backup copy are scheduled on two differentprocessors, as guaranteed by the Swapping Rules, we need only show that there is no overlappingbetween the primary copy of a task and its backup copy. Obviously, there is no overlappingbetween the primary copy of a task and its backup copy after the swapping process, but before theshifting process. What we need to show is that no overlapping occurs when the shifting is carriedout. There are four cases to consider.

Case 1:lp(i)≤D/2 andlp(j)≤D/2: Since the starting time for the first task inlp(j) ismax{lp(i),lp(j)}, there is no overlapping between the primary copies of the task scheduled onprocessorj and their corresponding backup copies on processori. Iflp(j)≠lq(j), there must beat least two tasks in the primary schedule on processorj. Also, the inequalitylp(i) >lq(j) musthold. If not, the second task on processorj should be scheduled on processori according to theLPT algorithm. Sincelp(i) >lq(j), no overlapping can occur between any primary copy and itscorresponding backup copy.

Case 2:lp(i)≤D/2 andlp(j) >D/2: Sincelp(j) >lp(i), there must be at least two tasksin the primary schedule on processorj. Following similar argument used in Case 1 yields that nooverlapping can occur between any primary copy and its corresponding backup copy.

Case 3:lp(i) >D/2andlp(j)≤D/2: No overlapping can possibly occur between any pri-mary copy and its corresponding backup copy in this case.

Case 4:lp(i) >D/2 andlp(j) >D/2: Obviously, no overlapping can possibly occurbetween any primary copy and its corresponding backup copy in this case. If this case occurs, no1-TFT schedule can be generated.

SinceStep 5 in the scheduling algorithm ensures that any backup copy finishes before the

11

deadlineD, the schedule thus generated is1-TFT. The theorem holds.Observation: The schedule generated byAlgorithm1 is1-TFT in the worst case andm⁄2-Fault-Tolerant in the best case, wherem is the number of processors. The schedule is1-TFT byTheorem2. The schedule ism⁄2-Fault-Tolerant, because the failure of up tom⁄2number processors can be sustained, if none of them⁄2 processors that fail has its twin amongthem. In the schedule generated byAlgorithm1as shown in Figure 5b, if processors1 and2 fail,their twin processors− processors3 and4 can execute the backup copies such that none of thetask deadline is missed.V. Analysis and Performance EvaluationIn order to evaluate the performance of the scheduling algorithm, we develop another heu-ristic algorithm that calls the above algorithm to solve its corresponding optimization problem. Inother words, we assume that the number of processors is not known and the scheduling goal is tofind the minimum number of processors required to execute a set of tasks. Then this is the optimi-zation problem corresponding to the schedule problem described above. We use the typical binarysearch technique to find the minimum number of processors required to schedule a given set oftasks such that the schedule generated is1-TFT. The algorithm is given as follows:Algorithm 2 (Input: Task Setℜ,1-TFT; Output:m andschedule);nStep 1: LowerB :=[∑il(Ti)]⁄D; UpperB := n;=1Step 2: m :=(LowerB+UpperB)⁄2; IF (LowerB=m) THEN {m := m + 1; EXIT};Step 3: InvokeAlgorithm 1 (ℜ, m,1-TFT, success,schedule);IF success THEN UpperB := m ELSE LowerB := m; Goto Step 2.Example 2: Suppose the same task set is given as in Example 1, and the question is to findthe minimum number of processors necessary to execute the task set, allowing for one processorfailure. The number of processors returned by executingAlgorithm 2 is four, which is in factequal to the optimal number of processors required.The time complexity ofAlgorithm 1 isO(nlogn+nlogm), wheren is the number oftasks, andm is the number of processors. The sorting process takesO(nlogn) time. The LPT inStep 2 takesO(nlogm) time.Algorithm 2 takesO((nlogn+nlogm)logn) time, since thebinary search is bounded byO(logn).To evaluate the performance of the algorithms−Algorithm 1, we generate task sets ran-domly, and runAlgorithm 2. Since the scheduling problem isNP-complete, it is hopeless in prac- 12

tice to use enumeration techniques to find the optimal solution even when the number of tasks issmall. However, to find out how well the algorithms perform, we consider the lowest bounds pos-sible for each schedule. Since backup copies are allowed to overlap, the minimum number of pro-cessors required to schedule the task set is2Sum⁄D, whereSum is the total computation timeof the tasks, andD is the deadline or period. The factor of 2 comes from the fact that no overlap-ping of backup copies is allowed. Therefore, we use2Sum⁄D as the lowest bound possible foreach schedule.Our simulation is carried out in the following fashion: First, a common deadlineD is cho-sen. Then a range of values is chosen, from which the computation times of the tasks are ran-domly generated.Algorithm 2 is run for each set of tasks, the number of which is incremented foreach run. The ratio between the common deadline and the maximum computation time of thetasks is kept between 2 and 7. For each different valueD, we invoke 100 task sets, each of whichdiffers from the previous one in number of one. Eighty different values ofD are chosen from therange of 20 and 99. For a particular value of D (D = 90), the performance of the scheduling algo-rithm is given in Figure 6. It is evident from our extensive simulation that the difference betweenthe number of processors computed by this algorithm and the lowest bound possible is only one ortwo. Thus it is concluded that the performance of the algorithm is near-optimal.Figure 6: Performance of the Scheduling Algorithm (D = 90, 1≤ Ci ≤90)The performance of the scheduling heuristic -Algorithm 1 may seem surprisingly good atthe first glance. However, it is not surprising at all if we take a closer look at the performance ofthe heuristic. Graham [8] proved that the worst case performance of LPT was tightly bounded by 13

4/3 - 1/3m, wherem is the number of processors. However, that bound is only achievable by apathological example, where, with the exception of one processor, the number of tasks scheduledon each processor is only two. Coffman and Sethi [5] later generalized Graham’s bound to be(k+1)/k - 1/(km), wherem is the number of processors, andk is the least number of tasks on anyprocessor, ork is the number of tasks on a processor whose last task terminates the schedule. Thisresult shows that the worst case performance bound for LPT approaches unity approximately as1+ 1/k. The worst case performance of Algorithm 1 is therefore expected to be better than1 + 1/k.In our experiments, each processor is approximately assigned five tasks, and thus the worstcase performance bounds for both heuristics are expected to be less than 1 + 1/5 = 1.2, accordingto the above analysis. Also, it is quite unlikely to randomly generate a task set, which can coincidewith the worst cases for the heuristic.VI. ConclusionThe contribution of this paper is twofold: One is that the NP-completeness result tells usthat the TFT scheduling problem is a very hard problem to solve, even in the simple case whenthere are only three processors and the tasks share a common deadline. Therefore, heuristicapproaches are called for to solve the problem. The second contribution is that a scheduling heu-ristic is proposed to generate a schedule that can tolerate one arbitrary processor failure. It isshown that the performance of the algorithm is near-optimal.Many problems remain open, since we only consider a special case of the general real-timefault-tolerant scheduling problem. The tolerance of more than one processor failures requires thatthe number of primary copies or backup copies be more than one for each task. Also, good heuris-tics are needed to obtain approximate solutions to the scheduling problem where tasks have dif-ferent deadlines. Furthermore, it is interesting to mathematically derive the worst-caseperformance bound of the scheduling algorithm presented in this paper. We are currently investi-gating these problems.References2[1]Avizienis, A. “The N-version approach to fault-tolerant software,” IEEE Transactions onSoftware Engineering 11, 1985, pp. 1491-1501.2. Due to space limit, not all the references are listed. 14

[2][3][4][5]

Balaji, S. et al. “Workload redistribution for fault-tolerance in a hard real-time distributedcomputing system,” FTCS-19, Chicago, Illinois, June 1989, pp. 366-373.

Bannister, J.A. and K. S. Trivedi. “Task allocation in fault-tolerant distributed systems,”Acta Informatica, 20, Springer-Verlag, 1983, pp. 261-281.

Coffman, E.G., Jr. Computer and Job Shop Scheduling Theory, New York: Wiley, 1975.Coffman, E.G., Jr. and R. Sethi. “A generalized bound on LPT sequencing,” RevueFrancaise d’Automatique Informatique Recherche Operationelle, Vol. 10, No. 5, 1976,Suppl., pp. 17-25.

Coffman, E.G., Jr., M.R. Garey, and D.S. Johnson. “An application of bin-packing to mul-tiprocessor scheduling,” SIAM J. Computing 7, 1978, pp. 1-17.

Garey, M.R. and D.S. Johnson. Computers and Intractability: A guide to the theory of NP-completeness, W.H. Freeman and Company, NY, 1978.

Graham, R.L. “Bounds on multiprocessing timing anomalies,” SIAM J. Appl. Math., 17,1969, pp. 416-429.

Hopkins, A.L. et al. “FTMP-A highly reliable fault-tolerant multiprocessor for aircraft,”Proceedings of the IEEE, Vol. 66, No. 10, October, 1978.

Johnson, B.W. Design and Analysis of Fault Tolerant Digital Systems, Addison-Wesley,1989.

Kieckhafer, R.M., C.J. Walter, A.M. Finn, and P.M. Thambidurai. “THe MAFT Architec-ture for distributed fault tolerance,” IEEE Transactions on Computers, Vol. 37, No. 4,April 1988, pp. 398-405.

Knight, J.C. and P.E. Ammann. “Design fault tolerance,” Reliability Engineering and Sys-tem Safety 32, 1991, pp. 25-49.

Krishna, C.M. and K.C Shin. “On scheduling tasks with a quick recovery from failure,”IEEE Transactions on Computers, C-35(5), May 1986, pp. 448-454.

Labetoulle, J., E.L. Lawler, J.K. Lenstra, and A.H.G. Rinnooy Kan. “Preemptive schedul-ing of uniform machines subject to release dates,” Report BW 99, Mathematisch Centrum,Amsterdam, 1979.

Leung, J.Y.T. and J. Whitehead. “On the complexity of fixed-priority scheduling of peri-odic, real-time tasks,” Performance Evaluation, Vol. 2, pp. 237-250, 1982.

Liestman, A.L. and R.H. Campbell. “A fault tolerant scheduling problem,” IEEE Transac-tions on Software Engineering, SE-12(11), November 1986, pp. 1089-1095.

Liu, C.L., and J. Layland. “Scheduling algorithms for multiprogramming in a hard real-time environment,” JACM 10(1), 1973.

Liu. J.W.S., K-J. Lin, W-K. Shih, A. Yu, A-Y. Chung, and W. Zhao “Algorithms forscheduling imprecise computations,” Computer, Vol. 24, No. 5, May 1991, pp. 58-69.Oh, Y., and S.H. Son. “Multiprocessor support for real-time fault-tolerant scheduling,”

[6][7][8][9][10][11]

[12][13][14]

[15][16][17][18][19]

15

IEEE 1991 Workshop on Architectural Aspects of Real-Time Systems, San Antonio,Texas, pp. 76-80, Dec. 3, 1991.[20]

Oh, Y., and S.H. Son. “An algorithm for real-time fault-tolerant scheduling in multiproces-sor systems,” 4th Euromicro Workshop on Real-Time Systems, Athens, Greece, June1992.

Pradhan, D.K. Fault-Tolerant Computing -- Theory and Techniques, Volumes I and II,Prentice-Hall, Englewood Cliffs, N.J., 1986.

Ramamritham, K. and J.A. Stankovic. “Scheduling strategies adopted in Spring: a over-view,” Chapter in Foundations of Real-Time Computing: Scheduling and Resource Allo-cation (ed.) by A.M. van Tilborg and G.M. Koob, 1991.

Ramos-Thuel, S., and J.K. Strosnider. “The transient server approach to scheduling time-critical recovery operations,” RTSS, 1991, pp. 286-295.

Sha, L., and J.B. Goodenough. “Real-time scheduling theory and Ada,” Computer, April1990, pp. 53-65.

Shin, K.G., G. Koob, and F. Jahanian. “Fault-tolerance in real-time systems,” IEEE Real-Time Systems Newsletter, Vol. 7, No. 3, 1991, pp. 28-34.

Spector, A., and D. Gifford. “The space shuttle primary computer system,” CACM, Sep-tember 1984, pp. 874-900.

Stankovic, J.A. “Misconception of real-time computing,” IEEE Computer, October 1988,pp. 10-19.

Wensley, et.al. “SIFT: design and analysis of a fault-tolerant computer for aircraft con-trol,” Proc.of the IEEE, Vol. 66, No. 10, October 1978, pp. 1240-1255.

[21][22]

[23][24][25][26][27][28]

16

因篇幅问题不能全部显示,请点此查看更多更全内容