... buffering).[*]
More precisely, the isend task should be broken into two tasks, one that performs local initialization and does not depend on the irecv, and a second one that can only be initiated once the irecv has been issued but does not block the local computation on the sending node. This would be simply require introducing additional communication tasks into the task graph.