MUPI is a parallel discrete event simulator designed for enabling software-based experimentation via simulated execution across a range of synthetic to unmodified parallel programs using the Message Passing Interface (MPI) with millions of tasks. Here, we report work in progress in improving the efficiency of MUPI. Among the issues uncovered are the scaling problems with implementing barriers and inter-task message ordering. Preliminary performance shows the possibility of supporting hundreds of virtual MPI ranks per real processor core. Performance improvements of at least 2x are observed, and enable execution of benchmark MPI runs with over 16 million virtual ranks synchronized in a discrete event fashion on as few as 16,128 real cores of a Cray XT5.
[Pub 131]