|
Publication type: conference [total 59]
Show full details for all items
2010On Deciding between Conservative and Optimistic Approaches on Massively Parallel Platforms Christopher Carothers and Kalyan Perumalla Winter Simulation Conference 2010 http://www.wintersim.org Invited Abstract: Over 5000 publications on parallel discrete event simulation (PDES) have appeared in the literature to date. Nevertheless, few articles have focused on empirical studies of PDES performance on large supercomputer-based systems. This gap is bridged here, by undertaking a parameterized performance study on thousands of processor cores of a Blue Gene supercomputing system. In contrast to theoretical insights from analytical studies, our study is based on actual implementation in software, incurring the actual messaging and computational overheads for both conservative and optimistic synchronization approaches of PDES. Complex and counter-intuitive effects are uncovered and analyzed, with different event timestamp distributions and available levels of concurrency in the synthetic benchmark models. The results are intended to provide guidance to the PDES community in terms of how the synchronization protocols behave at high processor core counts using a state-of-the-art supercomputing systems. Reversible Parallel Discrete-Event Execution of Large-scale Epidemic Outbreak Models Kalyan Perumalla, Sudip Seal 24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010) 2010 http://www.pads-workshop.org/pads2010.html Best Paper Finalist Abstract: The spatial scale, runtime speed, and behavioral detail of epidemic outbreak simulations altogether require the use of large-scale parallel processing. Here, an optimistic parallel discrete event execution of a reaction-diffusion simulation model is presented. Rollback support is achieved with the development of a novel reversible model that combines reverse computation with a small amount of incremental state saving. Parallel speedup and other runtime performance metrics of the system are tested on a small (8,192-core) Blue Gene / P system, while scalability is demonstrated on 65,536 cores of a large Cray XT5 system. Scenarios representing large population sizes (up to several hundreds of millions in the largest case) are exercised. Compiler-based Automation Approaches to Reverse Computation Kalyan Perumalla, Christopher Carothers 24th ACM/IEEE/SCS Workshop on Principles of Advanced and Distributed Simulation (PADS 2010) 2010 http://www.pads-workshop.org/pads2010.html Workshop on Reverse Computation Abstract: Automation is useful to facilitate reverse code generation from normal code. Here, we describe our source-to-source compilation approaches to automatic reverse code generation, developed along three different translation tools/frameworks. At RPI, we are developing frameworks based on PIPS, which is a purely source-to-source translation and optimization tool for parallel computing, and on CLANG/LLVM, which is a full compiler complete with backend processing and optimizers. At ORNL we are continuing development of the seminal reverse computation framework called RCC (Reverse C Compiler) system. For all three systems, we present some implementation issues and challenges encountered in our development. Efficient Simulation of Agent-Based Models on Multi-GPU and Multi-Core Clusters Brandon G. Aaby, Kalyan S. Perumalla, Sudip K. Seal 3rd International ICST Conference on Simulation Tools and Techniques (SimuTools) 2010 http://www.simutools.org Best Paper Finalist Abstract: An effective latency-hiding mechanism is presented in the parallelization of agent-based model simulations (ABMS) with millions of agents. The mechanism is designed to accommodate the hierarchical organization as well as heterogeneity of current state-of-the-art parallel computing platforms. We use it to explore the computation vs. communication trade-off continuum available with the deep computational and memory hierarchies of extant platforms and present a novel analytical model of the tradeoff. We describe our implementation and report preliminary performance results on two distinct parallel platforms suitable for ABMS: CUDA threads on multiple, networked graphical processing units (GPUs), and pthreads on multi-core processors. Message Passing Interface (MPI) is used for inter-GPU as well as inter-socket communication on a cluster of multiple GPUs and multi-core processors. Results indicate the benefits of our latency-hiding scheme, delivering as much as over 100-fold improvement in runtime for certain benchmark ABMS application scenarios with several million agents. This speed improvement is obtained on our system that is already two to three orders of magnitude faster on one GPU than an equivalent CPU-based execution in a popular simulator in Java. Thus, the overall execution of our current work is over four orders of magnitude faster when executed on multiple GPUs. µπ: A Scalable and Transparent System for Simulating MPI Programs Kalyan Perumalla 3rd International ICST Conference on Simulation Tools and Techniques (SimuTools) 2010 http://www.simutools.org Abstract: µπ is a scalable, transparent system for experimenting with the execution of parallel programs on simulated computing platforms. The level of simulated detail can be varied for application behavior as well as for machine characteristics. Unique features of µπ are repeatability of execution, scalability to millions of simulated (virtual) MPI ranks, scalability to hundreds of thousands of host (real) MPI ranks, portability of the system to a variety of host supercomputing platforms, and the ability to experiment with scientific applications whose source-code is available. The set of source-code interfaces supported by µπ is being expanded to support a wider set of applications, and MPI-based scientific computing benchmarks are being ported. In proof-of-concept experiments, µπ has been successfully exercised to spawn and sustain very large-scale executions of an MPI test program given in source code form. Low slowdowns are observed, due to its use of purely discrete event style of execution, and due to the scalability and efficiency of the underlying parallel discrete event simulation engine, µsik. In the largest runs, µπ has been executed on up to 216,000 cores of a Cray XT5 supercomputer, successfully simulating over 27 million virtual MPI ranks, each virtual rank containing its own thread context, and all ranks fully synchronized by virtual time. High-Performance Simulations for Capturing Feedback and Fidelity in Complex Networked Systems Kalyan Perumalla SIAM Conference on Parallel Processing for Scientific Computing (PP10) 2010 http://www.siam.org/meetings/pp10/ Abstract and Presentation in MS44 Computational Network Science Abstract: In a variety of complex networked systems, simulation is a powerful method to capture critical feedback effects among inter-dependent processes. Network-based phenomena in areas such as cyberinfrastructure, transportation, epidemiology, and social networks, all offer important analysis problems that need such feedback effects to be accurately captured. However, accurate modeling of feedback effects requires increased levels of model fidelity. Moreover, such high-fidelity, feedback-heavy models are especially characterized by very high computational needs. In this backdrop, the need for high-fidelity simulations is illustrated, with examples of how they are driving new high-performance computing-based solutions in the aforementioned areas. Our parallel computing approaches are described in the context of very large-scale, high-fidelity simulations in regional-scale transportation network simulations, nation-scale epidemiological simulations, and Internet simulations with detailed models millions of nodes. Towards Highly Interactive, GPU-based Evaluation of Evacuation Transport Scenarios at State-Scale Kalyan Perumalla, Brandon Aaby, Srikanth Yoginath, Sudip Seal National Evacuation Conference 2010 http://www.nationalevacuationconference.org Abstract: In large-scale scenarios, transportation modeling and simulation is severely constrained by simulation time. For example, few real-time simulators exist that can scale to evacuation traffic scenarios at the level of an entire state such as Louisiana (approx. 1 million links) or Florida (2.5 million links). New modeling techniques are needed to overcome severe computational demands of conventional (microscopic or mesoscopic) modeling techniques. Here, a modeling and execution methodology is explored which holds potential to provide a tradeoff among the level of behavioral detail, the scale of transportation network, and real-time execution capabilities. A novel, field-based modeling technique, and its implementation on graphical processing units (GPUs) are presented, as a step forward in enabling large-network transportation modeling and simulation. Although additional research with input from domain experts is needed for refining and validating the models, the techniques reported here afford interactive experience at hitherto fore unimaginable scales of multi-million road segments. Illustrative experiments on a few state-scale networks are described based on our implementation of this approach in a software system called GARFIELD-EVAC. 2009Cyber Security Experimentation: Gory Detail or None at All? Kalyan Perumalla SIAM Annual Meeting 2009 http://www.siam.org/meetings/an09/ Abstract and presentation Abstract: Unique facets confronted by current cyber security analysis efforts are: tremendous pace of change of ground rules (axioms), apparently wide and deep phenomenological effects, and widely-varying interpretations of security objectives. Together, effective methods for cyber security analysis appear to be swung between two extremes: experimentation-based methods with full, gory detail, and abstraction-based methods with significant simplifications. In the case of methods in between, accuracy considerations make intermediate methods tend to swing rapidly back towards full glory, while scientific inquiry and efficiency considerations tend to swing them back towards abstractions. Based on our experience and past evidence, we argue that experimentation with gory detail is the most effective approach in the short- to medium-term, while the other extreme is relevant one for the longer term. Feasibility will be shown of sustaining the scale and fidelity for the former extreme, namely, experiments with full gory. Switching to High Gear: Opportunities for Grand-scale Real-time Parallel Simulations Kalyan Perumalla ACM International Symposium on Distributed Simulations and Real-time Applications 2009 Keynote Abstract: The recent emergence of dramatically large computational power, spanning desktops with multi-core processors and multiple graphics cards to supercomputers with 10^5 processor cores, has suddenly resulted in simulation-based solutions trailing behind in the ability to fully tap the new computational capacity. Here, we motivate the need for switching the parallel simulation research to a higher gear to exploit the new, immense levels of computational power. The potential for grand-scale real-time solutions is illustrated using preliminary results from prototypes in four example application areas: (a) state- or regional-scale vehicular mobility modeling, (b) very large-scale epidemic modeling, (c) modeling the propagation of wireless network signals in very large, cluttered terrains, and, (d) country- or world-scale social behavioral modeling. We believe the stage is perfectly poised for the parallel/distributed simulation community to envision and formulate similar grand-scale, real-time simulation-based solutions in many application areas. GPU-based Real-Time Execution of Vehicular Mobility Models in Large-Scale Road Network Scenarios Kalyan Perumalla, Brandon Aaby, Srikanth Yoginath and Sudip Seal Proceedings of International Workshop on Principles of Advanced and Distributed Simulation 2009 Abstract: A methodology and its associated algorithms are presented for mapping a novel, field-based vehicular mobility model onto graphical processing unit computational platform for simulating mobility in large-scale road networks. Of particular focus is the achievement of real-time execution, on desktop platforms, of vehicular mobility on road networks comprised of millions of nodes and links, and multi-million counts of simultaneously active vehicles. The methodology is realized in a system called GARFIELD, whose implementation details and performance study are described. The runtime characteristics of a prototype implementation are presented that show real-time performance in simulations of networks at the scale of a few states of the US road networks. A Connectionist Modeling Approach to Rapid Analysis of Emergent Social Cognition Properties in Large-Populations Kalyan S. Perumalla and Jack C. Schryver Human Behavior-Computational Modeling and Interoperability Conference 2009 Abstract: Traditional modeling methodologies, such as those based on rule-based agent modeling, are exhibiting limitations in application to rich behavioral scenarios, especially when applied to large population aggregates. Here, we propose a new modeling methodology based on a well-known "connectionist approach," and articulate its pertinence in new applications of interest. This methodology is designed to address challenges such as speed of model development, model customization, model reuse across disparate geographic/cultural regions, and rapid and incremental updates to models over time. Coping at the User-Level with Resource Limitations in the Cray Message Passing Toolkit MPI at Scale: How Not to Spend Your Summer Vacation Richard Mills, Forrest Hoffman, Patrick Worley, Kalyan Perumalla, Art Mirin, Glenn Hammond and Barry Smith Proceedings of Cray User Group Meeting 2009 Scalable Parallel Execution of an Event-based Radio Signal Propagation Model for Cluttered 3D Terrains Sudip Seal and Kalyan Perumalla Proceedings of International Conference on Parallel Processing 2009 2008Efficient Execution on GPUs of Field-based Vehicular Mobility Models Kalyan Perumalla Proceedings of International Workshop on Principles of Advanced and Distributed Simulation 2008 Data Parallel Execution Challenges and Runtime Performance of Agent Simulations on GPUs Kalyan Perumalla and Brandon Aaby Proceedings of Spring Computer Simulation Conference 2008 Best Paper Award Parallel Vehicular Traffic Simulations using Reverse Computation-based Optimistic Execution Srikanth Yoginath and Kalyan Perumalla Proceedings of International Workshop on Principles of Advanced and Distributed Simulation 2008 2007Scaling Time Warp-based Discrete Event Execution to 10^4 Processors on a Blue Gene Supercomputer An Analysis Approach to Large-Scale Vehicular Network Simulations 2006On Evaluation Needs of Real-life Sensor Network Deployments Alfred Park, Kalyan S. Perumalla, Vladimir Protopopescu, Mallikarjun Shankar, Frank DeNap and Bryan Gorman Proceedings of European Modeling and Simulation Symposium (EMSS) 2006 Parallel and Distributed Simulation: Traditional Techniques and Recent Advances A Systems Approach to Scalable Transportation Network Modeling Discrete-Event Execution Alternatives on GPGPUs Kalyan S. Perumalla Proceedings of International Workshop on Principles of Advanced and Distributed Simulation 2006 On Accounting for the Interplay of Kinetic and Non-Kinetic Aspects of Population Mobility Models Parallel Execution of Region-Scale Evacuation Traffic Models Kalyan S. Perumalla Proceedings of International Workshop on Principles of Advanced and Distributed Simulation 2006 Scalable Simulation of Electromagnetic Hybrid Codes Kalyan S. Perumalla, Richard M. Fujimoto, Homa Karimabadi Proceedings of International Conference on Computational Science 2006 2005Virtual Simulator: An Infrastructure for Design and Performance-Prediction of Massively Parallel Codes Kalyan Perumalla, Richard Fujimoto, Santosh Pande, Homa Karimabadi, Jonathan Driscoll, and Yuri Omelchenko Proceedings of Eos Transactions, American Geophysical Union Fall Meeting 2005 Abstract and presentation Abstract: Large parallel/distributed scientific simulations are very complex, and their dynamic behavior is hard to predict. Efficient development of massively parallel codes remains a computational challenge. For example, almost none of the kinetic codes in use in space physics today have dynamic load balancing capability. Here we present a new infrastructure for design and prediction of parallel codes. Performance prediction is useful to analyze, understand and experiment with different partitioning schemes, multiple modeling alternatives and so on, without having to run the application on supercomputers. Instrumentation of the model (with least perturbance to performance) is useful to glean key metrics and understand application-level behavior. Unfortunately, traditional approaches to virtual execution and instrumentation are limited by either slow execution speed or low resolution or both. We present a new framework that provides a high-resolution framework that provides a virtual CPU abstraction (with a full thread context per CPU), yet scales to thousands of virtual CPUs. The tool, called PDES2, presents different levels of modeling interfaces, from general purpose parallel simulations to parallel grid-based particle-in-cell (PIC) codes. The tool itself runs on multiple processors in order to accommodate the high-resolution by distributing the virtual execution across processors. Validation experiments of PIC models in the framework using a 1-D hybrid shock application show close agreement of results from virtual executions with results from actual supercomputer runs. The utility of this tool is further illustrated through an application to a parallel global hybrid code. A New Simulation Technique for Study of Collision-less Shocks: Self Adaptive Simulations Homa Karimabadi, Yuri Omelchenko, Jonathan Driscoll, Richard Fujimoto and Kalyan Perumalla Proceedings of 4th Annual International Astrophysics Conference (IGPP) 2005 A New Methodology for Multi-scale Simulation of Plasmas Homa Karimabadi, Yuri Omelchenko, Jonathan Driscoll, Richard Fujimoto and Kalyan Perumalla Proceedings of 7th International Symposium for Space Simulations (ISSS) 2005 µsik - A Micro-kernel for Parallel/Distributed Simulation Systems Kalyan S. Perumalla Proceedings of ACM/IEEE/SCS Workshop on Parallel and Distributed Simulation (PADS) 2005 Performance Prediction of Large-scale Parallel Discrete Event Models of Physical Systems Kalyan S. Perumalla, Richard M. Fujimoto, Prashant Thakare, Santosh Pande, Homa Karimabadi, Yuri Omelchenko, Jonathan Driscoll Proceedings of Winter Simulation Conference (WSC) 2005 Optimistic Parallel Discrete Event Simulations of Physical Systems using Reverse Computation Yarong Tang, Kalyan Perumalla, Richard Fujimoto, Homa Karimabadi, Jonathan Driscoll and Yuri Omelchenko Proceedings of ACM/IEEE/SCS Workshop on Parallel and Distributed Simulation (PADS) 2005 2004Conservative Synchronization of Large-scale Network Simulations Alfred Park, Richard Fujimoto and Kalyan Perumalla Proceedings of ACM/IEEE/SCS Workshop on Parallel and Distributed Simulation (PADS) 2004 A New Approach to Modeling Physical Systems: Discrete Event Simulations of Grid-based Models Homa Karimabadi, Yuri Omelchenko, Jonathan Driscoll, N. Omidi, Richard Fujimoto and Kalyan Perumalla Proceedings of Workshop on State-Of-The-Art in Scientific Computing (PARA) 2004 High Fidelity Modeling of Computer Network Worms Kalyan Perumalla and Srikanth Sundaragopalan Proceedings of Annual Computer Security Applications Conference (ACSAC) 2004 2003Scalable RTI-based Parallel Simulation of Networks Kalyan Perumalla, Alfred Park, Richard Fujimoto and George Riley Proceedings of ACM/IEEE/SCS Workshop on Parallel and Distributed Simulation (PADS) 2003 Large-Scale Network Simulation - How Big? How Fast? Richard Fujimoto, Kalyan Perumalla, Alfred Park, Hao Wu, Mostafa Ammar, and George Riley Proceedings of IEEE/ACM International Symposium on Modeling, Analysis and Simulation of Computer Telecommunication Systems (MASC 2003 Power-aware State Dissemination in Mobile Distributed Virtual Environments Weidong Shi, Kalyan Perumalla and Richard Fujimoto Proceedings of ACM/IEEE/SCS Workshop on Parallel and Distributed Simulation (PADS) 2003 2002Using Reverse Circuit Execution for Efficient Parallel Simulation of Logic Circuits Kalyan Perumalla, and Richard Fujimoto Proceedings of The International Society for Optical Engineering (SPIE) Annual Meeting 2002 Experiences Applying Parallel and Interoperable Network Simulation Techniques in On-line Simulations of Military Networks Kalyan Perumalla, Richard Fujimoto, Thom McLean and George Riley Proceedings of ACM/IEEE/SCS Workshop on Parallel and Distributed Simulation (PADS) 2002 Web Services for Extensible Modeling and Simulation Kalyan S. Perumalla Proceedings of Workshop on Extensible Modeling and Simulation Framework (XMSF) 2002 Updateable Simulations Steve Ferenci, Richard Fujimoto, Mostafa Ammar, Kalyan Perumalla and George Riley Proceedings of ACM/IEEE/SCS Workshop on Parallel and Distributed Simulation (PADS) 2002 2001Distributed Network Simulations using the Dynamic Simulation Backplane George Riley, Mostafa Ammar, Richard Fujimoto, Donghua Xu and Kalyan Perumalla Proceedings of the International Conference on Distributed Computing Systems, April 2001 (ICDCS) 2001 Virtual Time Synchronization over Unreliable Network Transport Kalyan Perumalla and Richard Fujimoto Proceedings of ACM/IEEE/SCS Workshop on Parallel and Distributed Simulation (PADS) 2001 2000Design of High-performance RTI software Richard Fujimoto, Thom McLean, Kalyan Perumalla and Ivan Tacic Proceedings of Distributed Simulations and Real-time Applications (DS-RT) 2000 An Approach to Federating Parallel Simulators Steve Ferenci, Kalyan Perumalla and Richard Fujimoto Proceedings of ACM/IEEE/SCS Workshop on Parallel and Distributed Simulation (PADS) 2000 1999The Effect of State Saving in Optimistic Simulation on a Cache-coherent Non-uniform Memory Access (CC-NUMA) Architecture Christopher Carothers, Kalyan Perumalla and Richard Fujimoto Proceedings of the Winter Simulation Conference 1999 Efficient Optimistic Parallel Simulation using Reverse Computation Christopher Carothers, Kalyan Perumalla and Richard Fujimoto Proceedings of ACM/IEEE/SCS Workshop on Parallel and Distributed Simulation (PADS) 1999 PARINO: A Parallel Branch and Cut Code Jeff Linderoth, Kalyan Perumalla and Martin Savelsbergh Proceedings of INFORMS National Meeting 1999 1998Efficient Large-scale Process-oriented Parallel Simulations 1997Time Parallel Generation of Self-similar ATM Traffic Ioannis Nikolaidis, Anthony Cooper, Kalyan Perumalla and Richard Fujimoto Proceedings of the Winter Simulation Conference (WSC) 1997 A Virtual PNNI Network Testbed Kalyan Perumalla, Matthew Andrews and Sandeep Bhatt Proceedings of the Winter Simulation Conference (WSC) 1997 PARINO, A Parallel Integer Optimizer Martin Savelsbergh, Kalyan Perumalla, Jeff Linderoth, and Umakishore Ramachandran Proceedings of International Symposium on Mathematical Programming 1997 1996An Efficiency Prediction Method for ATM Multiplexer Kalyan Perumalla, Anthony Cooper, Richard Fujimoto Proceedings of Broadband Communications 1996 1994Parallelizing Sequential Algorithms for the Generalized Assignment Problem Ivan Yanasak, Gautam Shah, Kalyan Perumalla, et al DIMACS Challenge of Parallel Computing 1994 Parallel Algorithms for Maximum Sub-sequence and Sub-array Kalyan Perumalla and Narsingh Deo Proceedings of International Conference on Combinatorics, Graph Theory and Computing 1994 1993Integrating Aggregate and Vehicle Level Simulations Clark Karr, Robert Francescini and Kalyan Perumalla Proceedings of the 3rd Conference on Computer Generated Forces and Behavioral Representation 1993 A Distributed Algorithm for Ear Decomposition Sridhar Hannenhalli, Kalyan Perumalla and Narayan Chandrasekharan Proceedings of International Conference on Computing and Information (ICCI) 1993 A Debugging Environment for PVM Uday Vemulapati and Kalyan Perumalla Distributed Computing for Aeroscience Applications 1993 1992Integrating Battlefield Simulations of Different Granularity Clark Karr, Robert Francescini and Kalyan Perumalla Proceedings of the Southeastern Simulation Conference 1992 SELECT * FROM pubs WHERE PubType=4 ORDER BY PubYear DESC, PubDate DESC, PubAuthors |