The presence, in the loop body, of conditionals whose execution does not depend on the value of the index of the outer loop does not affect the load imbalance resulting from the partitioning schemes described so far; each iteration of the outer loop still performs the same amount of work and, consequently, perfect load balance can be achieved. In the special case where the execution of a conditional depends on a condition involving the index of the outer loop and a constant, then, applying index set splitting [20] prior to partitioning, the conditional may be removed [2].
For instance, let the inequality
bound the index of the outermost, parallel
loop , and the inequality
correspond to a logical expression evaluated
by a conditional in the loop body. Then, the original loop nest can
be split into three
consecutive loop nests, whose indices take values in the following
intervals respectively:
|
Consider the code shown in Figure 1.a; applying (6) and assuming that for the values of L, U, A it is known at compile-time that LAU, the code shown in Figure 1.b results. One approach to partition this code is to partition each loop using any of the two schemes described in Section 3.1, and then assign the same partition of each loop to the same processor; for both loops, if the number of iterations is a multiple of the number of processors, , then perfect load balance is achieved. In the general case, let be the number of loop iterations, and the amount of work in the body of each loop, respectively; assuming that the same partitioning scheme (either by decreasing or increasing order) is applied to both loops, then the expected load imbalance is equal to Whenever , the load imbalance can be reduced by partitioning one of the loops by decreasing order and the other one by increasing order. This approach is followed in the code shown in Figure 1.c.