A Case Study of the Performance of Speculative Asynchronous Simulation on Parallel Computers

Abstract

Modern supercomputers use thousands of processors running in parallel to achieve their high computational speeds. However, on such large processor counts, communication and synchronization operations can waste valuable processor time. Communication involves processors exchanging intermediate computed data that needs to be shared by processors at runtime. Synchronization involves processors ensuring the mutual orderings of operations across processors are correct. In this work, we investigated the runtime efficiency of two methods that are aimed at reducing communication and synchronization costs, respectively, namely, asynchronous updates and speculative execution. The experimental investigation is performed on a parallel finite difference time domain (FDTD) simulation developed at ORNL, which has wide applicability in simulating various physical system phenomena. It uses an iterative algorithm to reduce communication by allowing messages to be asynchronously sent when the change in values on a given processor is greater than some threshold value. We conducted research to develop an empirical performance study of the algorithm. The first part of asynchronous updates is accomplished by exploring the effect of threshold-based communication on overall runtime of the parallel simulation, with the number of processors increasing. A significant improvement in performance on up to 64 processors was observed when using the asynchronous update scheme, due to reduced communication. We are currently working on the synchronization aspect of the problem, to relieve the tight coupling among processors, using speculative execution with sophisticated rollback techniques being developed in an ORNL Laboratory Directed Research and Development (LDRD) project. In speculative execution, processors are allowed to progress without having to wait for other processors, but any violations in ordering of computations are detected and corrected using rollback techniques. Further research is being done to implement a rollback mechanism n

[Pub 139]

http://science.energy.gov/wdts/

Kalyan Perumalla
Kalyan Perumalla

Kalyan Perumalla is Founder and President of Discrete Computing, Inc. He led advanced research and development at ORNL and holds senior faculty appointments at UTK, GT, and UNL.

Next
Previous

Related