next up previous
Next: Related Work Up: Application Representations for Multi-Paradigm Previous: An Example of Task


Implementation and Status

The implementation of the application representation requires several software components. These include:

The application representation has been implemented in an extension of the Rice dHPF compiler. The compiler constructs task graphs for MPI programs generated by the dHPF compiler from an input HPF program, and successfully captures the sophisticated computation partitionings and optimized communication patterns generated by the compiler. The compiler techniques used in this implementation are described in more detail elsewhere [4]. Briefly, the compiler first synthesizes the static task graph and associated symbolic information, after the code has been transformed for parallelization. The compiler then optionally instantiates a dynamic task graph from the static task graph, if this is required for a particular modeling study. This instantiation uses a key capability of the dHPF infrastructure, namely the ability to generate code to enumerate symbolic integer sets and mappings. Finally, the compiler incorporates techniques to condense the task graph as follows.

Condensing the task graph happens in two stages in dHPF. First, before instantiating the dynamic task graph, we significantly condense the static task graph to collapse any sequence of tasks (or loops) that do not include communication or ``significant'' branches. (Significant branches are any branches that can impact the execution time of the final condensed task.) Second, if the compiler instantiates the dynamic task graph, it further condenses this graph as follows. Note that, when instantiating the dynamic task graph, any significant branches are interpreted and resolved (i.e., eliminated from the graph). This may produce sequences of tasks allocated to the same process that are not interrupted by branches. These sequences are further condensed in this second step. The graph resulting from both the above steps is the final condensed task graph, as defined earlier.

We have successfully used the compiler generated static task graphs to improve the performance of MPI-Sim, a direct-execution parallel simulator for MPI programs developed at UCLA [2]. In order to integrate the two systems, additional support has been added to the dHPF compiler. Based on the static task graph, the compiler identifies those computations within the computational tasks whose results can affect the performance of the program (e.g., computations that compute loop bounds or affect the outcomes of significant branches). The compiler then generates an abstracted MPI program from the static task graph in which these ``essential'' computations are retained but the other computations are replaced with symbolic estimates of their execution time. These symbolic estimates are derived directly from the scaling functions for each computational task along with the per-iteration execution time for the task. The compiler also generates an instrumented version of the parallel MPI program which measures the per-iteration execution time for the tasks. The non-essential computations do not have to be simulated in detail, and instead the simulator's clock can simply be advanced by the estimated execution time. Furthermore, the data used only in non-essential computations do not have to be allocated, leading to potentially large savings in the memory usage of the simulator.

Experimental results showed dramatic improvements in the simulator's performance, without any significant loss in the accuracy of the predictions. For the benchmark programs we studied, the optimized simulator requires factors of 5 to 2000 less memory and up to a factor of 10 less time to execute than the original simulator. These dramatic savings allow us to simulate systems or problem sizes 10 to 100 times larger than is possible with the original simulator, with little loss in accuracy. For further details, the reader is referred to [2].


next up previous
Next: Related Work Up: Application Representations for Multi-Paradigm Previous: An Example of Task
Rizos Sakellariou 2000-09-15