next up previous
Next: Instantiating the Dynamic Task Up: Compiler Techniques for Synthesizing Previous: Synthesizing the Static Task


Condensing Nodes of the Static Task Graph

The process described above produces a first-cut version of the STG. For many typical modeling studies of parallel program performance, however, a less detailed graph will be sufficient. For instance, a coarse-grain modeling approach could assume that all operations of a single process between two communication points constitute a single task. In order to add this functionality, the compiler traverses the STG and marks contiguous nodes, connected by a control-flow edge, that do not include communication. Such sequences of nodes are then collapsed and replaced in the STG by a single condensed task. Note that such a task will have a single point of entry and a single point of exit. For example, the two large dotted rectangles in Figure 1 (c) correspond to sequences of tasks that can be collapsed into a single condensed task.

To preserve precision, the computation of the scaling function of the new condensed task is of particular importance. Ignoring conditional control flow for the moment, this scaling function is the symbolic sum of the scaling functions of the individual collapsed tasks, each multiplied by the symbolic number of iterations of the surrounding loops, where appropriate (only if these loops are also collapsed).

Figure 2: Collapsing Branches of the Static Task Graph.
  DO I=1,N
     RECV
     DO J=1,M
        S1
        IF (f)
           S2
        ENDIF
        S3
     ENDDO
     SEND
  ENDDO
\psfig{file=Figures/fig2b.eps,width=15truemm}
\psfig{file=Figures/fig2c.eps,width=20truemm}
\psfig{file=Figures/fig2d.eps,width=10truemm}

(a) Source; (b) Initial STG; (c) Collapsed STG if f=G(I); (d) Collapsed STG if f=Y(J)

In cases with conditional control flow, tasks can sometimes be condensed with no loss of accuracy, using sophisticated compiler analysis. For example, no accuracy will be lost in cases where all dynamic instances of the resulting collapsed task have identical computational time; that is, the workload expressions are free of any input parameters whose value changes for different instances of that task. In other cases, condensing would result in some loss of accuracy, and the goals of the modeling study should be used to dictate the degree to which tasks are collapsed together.

To illustrate, consider the code shown in Figure 2 (a). Let $w_1, w_2, w_3$ represent the workload (i.e., the scaling function) for statements S1, S2 and S3, respectively. The initial version of the STG is shown in Figure 2 (b); the nodes inside the dotted rectangle are candidates for collapsing. Assuming that the function f in the IF depends on at least one of I or J, we distinguish between:

The three cases can be differentiated using well-known but aggressive dataflow analysis. We note that the first two cases correspond directly to loop unswitching and identifying loop-invariant code respectively, except that only the static task graph is modified and the code itself is not transformed. A key point to note is that in the first two cases, there is no resulting loss of accuracy in condensing the task graph. For example, in the ASCI benchmark Sweep3D [4] used in Section 4, the one significant branch is in fact of the first type, which can be pulled out of the task and enclosing loops (the analysis would have to be interprocedural because the enclosing loops are not in the same procedure as the branch).



next up previous
Next: Instantiating the Dynamic Task Up: Compiler Techniques for Synthesizing Previous: Synthesizing the Static Task
Rizos Sakellariou 2000-10-16