The Static Task Graph: A Compile-Time Representation

Next: The Dynamic Task Graph: Up: The Application Representation in Previous: Definitions of Key Terms

The Static Task Graph: A Compile-Time Representation

The static task graph, defined above, provides a concise description of program parallelism even for very large programs, and is designed to be synthesized using parallelizing compiler technology. Informally, this is a representation of the static parallel structure of the program. In particular, this representation is defined only by the program and is independent of runtime input values or computational results. Note that such a graph must include control-flow (both loops and conditional branches) since it has to capture all possible executions of the given program. Furthermore, it must represent many quantities symbolically, such as the number of iterations of a loop.

The static task graph is actually a collection of graphs, one per procedure. Each call site in the program is represented by an explicit call node in the graph, which provides information about the procedure being called and the actual parameters to the call (as symbolic expressions). Conceptually, such a node can be substituted by a sub-graph representing a particular instance of the called procedure.

The communication operations in the program are grouped into ``logical communication events''. For example, for a single logical SHIFT operation in a message passing program, all the send, receive, and wait operations implementing the SHIFT would be grouped into a single such event. Explicit communication task nodes are included in the STG to represent the computational overheads incurred by each thread when performing the communication operations. For each communication event, the set of tasks and the synchronization edges between them depend on the communication pattern and on the communication primitives used (e.g., blocking vs. nonblocking sends and receives). The use of explicit communication tasks has proved extremely useful for capturing complex communication patterns that can overlap communication with computation in arbitrary ways [4]. A communication event descriptor, kept separate from the STG, holds all the other information about each communication event including the logical pattern of communication (e.g., Shift, Pipelined Shift, Broadcast, Sum reduction, etc.), a symbolic expression describing the communication size, and the task indices for the communication task nodes in the STG.

Each task node (computation or communication) may actually represent a set of parallel tasks since the number of parallel tasks may not be known at compile-time. Each parallel task node therefore contains a symbolic integer set describing the set of parallel tasks possible at runtime. For example, consider a ``left shift'' communication operation on a 1-dimensional processor grid with threads numbered $0,1,\ldots,P-1$ . In this pattern, each thread sends data to thread . Therefore, the SEND task node in the static task graph represents the set of tasks $\{ [i] : 1 \leq i \leq P-1 \}$ ; the RECEIVE task node represents the set of tasks $\{ [i] : 0 \leq i \leq P-2 \}$ .

Because each task node in the static task graph may represent multiple parallel task instances, each edge of the static graph must also represent multiple edge instances. Furthermore, the mapping between task instances at the source and sink of the edge may not be a simple identity mapping. For example, in the 1-dimensional shift communication above, the mapping between SEND and RECEIVE task instances can be described as: $\{ [i] \rightarrow [j] : 1 \leq i \leq P-1 \wedge j = i-1 \}$ This symbolic integer mapping describes a set of edge instances connecting exactly those pairs of SEND and RECEIVE task instances that correspond to processors pairs that communicate during the SHIFT.

Finally, the scalar execution behavior of individual tasks is described both by the source code of the task itself, and more abstractly by symbolic scaling functions. The scaling function for a task node describes how task execution time varies with program input values and internal variables. Each loop node has symbolic expressions describing loop bounds and stride. Each conditional branch has a symbolic conditional branch expression. The scaling function for each computational task includes a symbolic parameter representing the actual task execution time per loop iteration. After synthesizing the STG, the compiler can generate an instrumented version of the parallel program to measure this per-iteration time directly for all computational task nodes in the program. Additional numerical parameters can be included in the STG, both architecture-independent parameters such as the number of floating point operations between consecutive memory operations, or architecture-dependent parameters such as the number of cache misses or the actual task execution time (per loop iteration).

Next: The Dynamic Task Graph: Up: The Application Representation in Previous: Definitions of Key Terms

Rizos Sakellariou 2000-09-15