Publications | Kalyan Perumalla

Competitive Portfolios for Advanced Scientific Computing Research: Data Management and Visualization

LAB 25-3520: Funding opportunity solicitation in the Base Computer Science program; $35 million over 5 years (FY25-30)

Exploratory Research for Extreme-Scale Science (EXPRESS)

FOA 25-3545: Funding opportunity solicitation in the EXPRESS program; $16 million over 2 years (FY25-27)

Scientific Discovery through Advanced Computing (SciDAC) Institutes

LAB 25-3510: Funding opportunity solicitation in the SciDAC program; $75 million over 5 years (FY25-30)

Early Career Research Program (ECRP)

FOA 25-3450: Funding opportunity solicitation in the Early Career Research Program; $136 million over 5 years (FY25-30)

National Quantum Information Science Research Centers

LAB 25-3530: Funding opportunity solicitation for national QIS centers; $625 million over 5 years (FY25-30)

Continuation of Solicitation for the Office of Science Financial Assistance Program

FOA 25-3432: Open solicitation across all programs in the DOE Office of Science; $500 million (FY25)

Reaching a New Energy Sciences Workforce (RENEW)

FOA 24-3280: Funding opportunity solicitation in the RENEW Program; $36 million over 3 years (FY25-38)

Advancements in Artificial Intelligence for Science

DE-FOA-3264: Funding opportunity solicitation in Artificial Intelligence; $36 million over 3 years (FY24-27)

Competitive Portfolios for Advanced Scientific Computing Research

LAB 24-3210: Funding opportunity solicitation in the Base Computer Science and Applied Mathematics program; $60 million over 5 years (FY24-29)

Exploratory Research for Extreme-Scale Science (EXPRESS)

FOA 24-3300: Funding opportunity solicitation in the EXPRESS program; $10.8 million over 2 years (FY24-26); Lead PM Kalyan Perumalla

A Flexible OT Testbed for Evaluating On-Device Implementations of IEC-61850 GOOSE

A flexible OT testbed is demonstrated with the GOOSE protocol vulnerability-tested against production IEDs. Device implementations show varied responses to sustained cyberattacks. A device vulnerability was found with replicable tests, which was resolved by the manufacturer.

Artificial Intelligence Tools for Catalyzing Interdisciplinary Science (SBIR)

Funding opportunity solicitation in the SBIR Phase I Program

Digital-Twin Capabilities for Science Network Infrastructures (SBIR)

Solicitation PDF Original at OSTI Cached local copy Selected Pages Selected Extracts Digital twins are an emerging area of modern science where a physical object (i.e., device, process, or infrastructure) is paired with a digital (virtual) version of that same object.

Mixed Integer Solver Technology for Accelerated Computing Systems (SBIR)

Funding opportunity solicitation in the SBIR Phase I Program

Determining the Most Significant Metadata Features to Indicate Defective Software Commits

In this study, we compared various machine learning models with varying feature selection techniques and found the superiority of random forest-based machine learning techniques with wrapper methods. Random forest-based models with the wrapper method were able to detect all the buggy classes successfully on the validation data set.

Scientific Enablers of Quantum Communications

LAB 23-3040: Funding opportunity solicitation in the Quantum Networking program; $24 million; 3 years (FY24-27)

Impact of Grammar on Language Model Comprehension

We introduce a new language model based on transformers with the addition of syntactical information into the embedding process. We show that our proposed Structurally Enriched Transformer (SET) language model outperforms baseline datasets on a number of downstream tasks from the GLUE benchmark.

A Testbed for Evaluating Performance and Cybersecurity Implications of IEC-61850 GOOSE Hardware Implementations

We introduce a flexible and practical testbed for GOOSE implementation evaluations performed on different devices, and to enable the demonstration of how different implementations on physical devices aim to mitigate GOOSE’s vulnerabilities. We show the results of our testbed generating GOOSE traffic at variable data rates, with varying packet sizes.

Survey of Cybersecurity Governance, Threats, and Countermeasures for the Power Grid

We evaluate the various cybersecurity risks in industrial control systems and how they may affect these areas of concern, with a particular focus on energy-sector Operational Technology systems. We identify regulations, standards, frameworks and typical system architectures associated with this domain. We review relevant challenges, threats, and countermeasures, as well as critical differences in priorities between Information and Operational Technology cybersecurity efforts and implications. We provide analysis of countermeasure implementation to align with the continuous functions recommended for a sound cybersecurity framework.

ExaSGD Performance Profiling

This is a report of the runtime performance profiling efforts and results for the ECP ExaSGD project (Exascale Computing Program – Stochastic Grid Dynamics at Exascale.

Design of a Novel Information System for Semi-Automated Management of Cybersecurity in Industrial Control Systems

This paper presents new methodology and algorithms to automatically identify cybersecurity-related claims expressed in natural language form in ICS device documents. The verification pipeline includes automated vendor identification, device document curation, and feature claim identification via sentiment analysis. Our novel matching engine represents the first automated information system available in the cybersecurity domain to directly aid ICS compliance reporting.

An Accuracy-Maximization Approach for Claims Classifiers in Document Content Analytics for Cybersecurity

Our classifier of feature claims for cybersecurity literature analytics is introduced in our new model called ClaimsBERT. ClaimsBERT outperforms all other evaluated framework aiming to improve the cybersecurity of industrial control systems (ICS).

Computer Science Research Needs for Parallel Discrete Event Simulation (PDES)

This report captures a computer science-oriented view of research in PDES and important PDES applications. Needs are outlined in core areas of PDES research as well as cross-cutting directions that positively impact scientific advancements. A selection of priority research directions in advanced computing for PDES is identified for scientific advancements.

Extending the Naming Game in Social Networks to Multiple Hearers per Speaker

We extend the classic Naming Game to multiple hearers per speaker in each conversation even while allowing simultaneous speaking and hearing. We simulate the impact on the rate of convergence by varying the number of hearers and investigate the impact of different network types on the global convergence. Multiple network types and agent population sizes are used in the simulation experiments.

ZeroIn: Characterizing the Data Distributions of Commits in Software Repositories

A characterization of the software development metadata is presented in terms of distributions of data that best captures the trends in the datasets, to feed into the machine learning components of ZeroIn to exploit connectivity among the sets of repositories, commits, and developers.

Using Machine Learning Towards Early Flagging of Potentially Buggy Software Commits

Using multiple classifiers we verify the feasibility of using metadata from synthetic datasets modeled by a characterization of a few large software repositories and developer profiles. Results show that the metadata-based learning approach appears promising towards early flagging of potentially buggy commits in software repositories.

DarkNet Cyber Resilience

Two key cyber elements affecting the grid’s timing capability are Cyber Resilience and Cyber Trust. The timing services of DarkNet are systematically subjected to a range of cyber phenomena that stress four key performance factors, namely: Accuracy, Manageability, Telemetry, and Visibility. This analysis is designed to provide insights into four important categories of undesirable cyber phenomena: Loss of View (LoV), Loss of Control (LoC), Manipulation of View (MoV), and Manipulation of Control (MoC).

Characterizing the Distributions of Commits in Large Source Code Repositories

We present preliminary results from characterizing the distribution of 452 million commits in a metadata listing from GitHub repositories. Based on multiple distributions, we find the best fits and second best fits across different ranges in the data. The characterization is aimed at synthetic repository generation suitable for use in simulation and machine learning.

CyBERT: Cybersecurity Claim Classification by Fine-Tuning the BERT Language Model

We introduce CyBERT, a cybersecurity feature claims classifier based on bidirectional encoder representations from transformers and a key component in our semi-automated cybersecurity vetting for industrial control systems (ICS)…The results showed that CyBERT outperforms these models on the validation accuracy and the F1 score, validating CyBERT’s robustness and accuracy as a cybersecurity feature claims classifier.

Fast GPU-Based Generation of Large Graph Networks From Degree Distributions

A novel algorithm has been designed, developed, and implemented on modern GPU accelerators, and benchmarked on networks with billions of edges, including Facebook and Twitter networks. Rates of generation exceed 50 billion edges per second.

Design Considerations for GPU-based Mixed Integer Programming on Parallel Computing Platforms

Mixed Integer Programming (MIP) is a powerful abstraction in combinatorial optimization…here, we recount the conventional processor-based strategies and focus on configurations where the most promising intersection lies between parallel MIP solver approaches and the specific strengths of accelerated parallel platforms.

Mesoscopic Modeling and Rapid Simulation of Incremental Changes in Epidemic Scenarios on GPUs: Fast What–If Analyses of Localized and Dynamic Effects

A mesoscopic modeling approach is described that strikes a middle ground between macroscopic models based on coupled differential equations and microscopic models built on fine-grained behaviors at the individual entity level. Execution of our implementation scaled to 8192 GPUs of supercomputing platforms demonstrates the ability to rapidly evaluate what–if scenarios several orders of magnitude faster than the conventional methods.

A Case Study in Simulation Methods for Power Electronic Circuits

A simplified circuit is used as a case study to uncover and highlight key considerations in the use of traditional numerical simulation methods and compare them with those obtained from alternative methods that are discrete event-based from the outset. Results show the regimes where the traditional numerical methods and the alternative discrete event methods are applicable, and the need for discrete event approaches that precisely and efficiently resolve switching dynamics produced by power electronics systems that are important in emerging grid scenarios, such as large scale renewable energy.

Trust-but-Verify in Cyber-Physical Systems

Cyber-physical systems span a wide spectrum, from long-lived legacy systems to more modern installations. Trust is an issue that arises across the spectrum, albeit with different variants of goals and constraints. On the one end of the spectrum, legacy systems are characterized by function-based designs in which trust is an implicitly in-built concept…

Smart Semi-Supervised Accumulation of Large Repositories for Industrial Control Systems Device Information

A solution is needed for vetting the vendor-supplied feature claims and their adherence to cybersecurity requirements and standards. We are presently engaged in an effort to develop such a system. This paper demonstrates one vital aspect of this effort in proposing an end-to-end framework to accumulate a large repository of ICS device information for this vetting system, curate the dataset, and conduct extensive processing. This framework is designed to use web scraping, data analytics and Natural Language Processing (NLP) techniques to identify vendor websites, automate the collection of website-accessible documents and automatically derive metadata from them for identification of product documents relevant to the repository…

Scale-Free Graph Networks with Trillions of Edges: Rapid Generation using 1000 GPUs

Our algorithm cuPPA generates scale-free networks using the preferential-attachment model, custom-designed to exploit multiple GPUs. We generate extremely large scale-free networks of 4 trillion edges in less than 8 minutes using 1,008 NVIDIA Volta GPUs of the Summit supercomputer. This represents the first ever graph network generation at this scale of parallel execution with over thousand GPUs. Moreover, our algorithm is uniquely suitable for generating networks in a streaming mode without the need for explicitly storing (writing to disk) the entire network, and is suitable for targeting even larger scales with quadrillions of edges.

Next in Reversible Computing: Breaking the Memory-Computation Asymmetry

A more comprehensive and practical treatment of memory remains to be performed in visiting and answering memory-related unknowns. Some of the answers could have profound impact on classical memory technologies. Reversible execution restores symmetry between memory and computation, correctly reinstating the memory state restoration cost in the aggregate memory cost.

On the Robustness of Network Community Structure Under Addition of Edges

We study the impact of edge additions on the community structure using Lancichinetti-Fortunato-Radicchi (LFR) benchmark networks. We show that, for a fixed network size, the impact of edge additions is greater on networks with initially weak community structure than on networks with strongly clustered structures. Also, we find that the perception of the impact is also dependent on the community detection algorithm used to uncover communities.

A Novel Vetting Approach to Cybersecurity Verification in Energy Grid Systems

The cybersecurity auditing for Operation Technology is critical and has been largely missing from the cybersecurity research, especially in the energy sector. In this paper, we present a novel “cybersecurity vetting” approach (CYVET) to the problem of verification and validation of cybersecurity in complex cyber-physical installations underlying modern energy grid systems.

Generating Massive Scale-free Networks: Novel Parallel Algorithms using the Preferential Attachment Model

We present a message passing interface-based distributed memory parallel algorithms for generating random scale-free networks using the preferential-attachment model. The algorithms have been exercised in scale and speed to generate scale-free networks with one trillion edges in 6 minutes using 1,000 PEs.

RL-HEMS: Reinforcement Learning-Based Home Energy Management System for HVAC Energy Optimization (OR-20-C051)

https://ashraem.confex.com/ashraem/w20/meetingapp.cgi/Paper/26352

A Digital Twin Framework for Testing, Evaluation and Deployment of Resilient Cyber-physical Systems

We describe an approach to detecting and preventing cyber attacks by continuously comparing the infrastructure state with a real-time digital-twin simulation of it. Specifically, we describe and demonstrate a Digital Twin Framework (DTF) designed specifically to detect and eventually prevent such attacks. The canal lock system’s digital twin uses a recurrent neural network trained from the experimental data collected via the DTF.

RL-HEMS: Reinforcement Learning Based Home Energy Management System for HVAC Energy Optimization.

The objective of this work is to create the optimal schedule for HVAC operation to reduce the cost while satisfying the home-owner and equipment’s constraints using a model-free Reinforcement Learning (RL)-based optimization. This optimization is addressed with the development of initial learning testbed and implementation of RL techniques on a real home. Our preliminary results showed a 17% reduction in the total cost and a 15% reduction in the power utilization using our RL-based HVAC model–RL-HEMS.

Kensor: Coordinated Intelligence from Co-Located Sensors

Here, we focus on coordinated intelligence about normal and abnormal phenomena from multiple sensors geographically co-located, monitoring and controlling a set of co-located devices. Given a set of co-located sensors, we develop an intelligent approach that automatically determines the ’normal’ patterns of behaviors among the correlated sensors. After normal behavior is extracted, later monitoring detects deviant variations over time.

Energy Conservation Through Cloned Execution Of Simulations

HPC facilities used for scientific computing draw enormous energy, consuming many megawatt-hours. Using simulation cloning, we reduce energy consumption and track the power drawn by thousands of GPUs, achieving significant aggregate energy savings from cloned simulations.

On the Effectiveness of Recurrent Neural Networks for Live Modeling of Cyber-Physical Systems

We empirically study the effectiveness of Recurrent Neural Network (RNN)-based models as the basis of DT-based resilience and uncover the important characteristics of an RNN-based solution with experimentation on a lab-scale Canal Lock CPS emulator with live validations and attack scenarios. For the first time, we demonstrate actual, real-time use of a RNN-based model as a DT for performing live analysis on an operational CPS.

Detecting Sensors and Inferring their Relations at Level-0 in Industrial Cyber-Physical Systems

In this paper, we present our research and development efforts aimed at addressing the gap in discovering sensors at level 0 in industrial CPS by building a system called Deep-cyberia (Deep Cyber-Physical System Interrogation and Analysis) that incorporates algorithms and interfaces aimed at uncovering sensors and computing estimates of correlations among them.

Volatile Memory Extraction-Based Approach for Level 0-1 CPS Forensics

Our focus is to extract volatile and dynamically changing internal information form CPS 0-1 level devices, and design preliminary schemes to exploit that extracted information. As a case study, we apply the proposed methodology to Modicon PLC using Modbus protocol. We extract the memory layout and subject the device to read operations at the most critical regions of memory. This capability of generating a sequence of volatile memory snapshots for offline, detailed and sophisticated analysis opens a new class of cyber security schemes for CPS forensic analysis, taint analysis and watermarking.

Exact-Differential Simulation: Differential Processing of Large-Scale Discrete Event Simulations

We propose a new redundancy reduction technique for large-scale discrete event simulations, called exact-differential simulation, which simulates only the altered portions of scenarios and their influences in repeated executions while still achieving the same results as the re-execution of entire simulations. We evaluate our approach by using two case studies, PHOLD benchmark and a traffic simulation of Tokyo.

Towards Native Execution of Deep Learning on a Leadership-Class HPC System

In this paper, we report and analyze performance results from native execution of deep learning on a leadership-class high-performance computing (HPC) system. Using our new code called DeepEx, we present a study of the parallel speed up and convergence rates of learning achieved with native parallel execution. Scaling results are reported from execution on up to 15,000 GPUs using two scientific data sets from atom microscopy and protein folding applications, and also using the popular ImageNet data set.

Novel Parallel Algorithms for Fast Multi-GPU-Based Generation of Massive Scale-Free Networks

A novel parallel algorithm, cuPPA, is presented for generating random scale-free networks using the preferential attachment model. The algorithm is custom-designed for ‘single instruction multiple data (SIMD)’ for GPUs. Our algorithm is the first to exploit GPUs, and also the fastest implementation available today, for scale-free networks. On an NVidia GeForce 1080 GPU, cuPPA generates a scale-free network of two billion edges in less than 3 s. On multi-GPU platforms, cuPPA-Hash generates a scale-free network of 16 billion edges in less than 7 s using a machine consisting of 4 NVidia Tesla P100 GPUs.

Normalcy, Magic, Miracle and Error: Emergence along a Reversibility Spectrum

Formation of a butterfly from a pupa, extraction of a live dove from a magician’s empty hat, generation of new particles from high-energy particle collisions and spawning a new dream world from mind in sleep are all examples of a common, fuzzy notion called ‘emergence’. In this paper, I pin the concept of emergence to the element of surprise in a phenomenon. I categorise the various notions of emergence into three main classes. These definitions are used to explain instances of emergence, organised along a continuous spectrum as normality, magic, miracle and error.

An Agent-based Model of the Observed Distribution of Wealth in the United States

In this paper, we present an agent based model (and a scalable approximation of it) in a closely related spirit. The central feature of this model is that wealth enables an individual to secure more wealth. Using historical data, we initialize the model with US wealth shares in 1988 and show that the model tracks wealth share changes from 1988 to 2012. Simulations to 2088 project that the top 0.01% of the population will possess more than 70% of the total wealth in the economy.

Efficient reversible uniform and non-uniform random number generation in UNU.RAN

https://www.osti.gov/biblio/1468255

Scalable Cloning on Large-Scale GPU Platforms with Application to Time-Stepped Simulations on Grids

Cloning is a technique to efficiently simulate a tree of multiple what-if scenarios that are unraveled during the course of a base simulation. We present the conceptual simulation framework, algorithmic foundations, and runtime interface of CloneX, a new system we designed for scalable simulation cloning.

GPU-based parallel algorithm for generating massive scale-free networks using the preferential attachment model

A novel parallel algorithm, cuPPA, is presented for generating random scale-free networks using the preferential-attachment model. In one of the best cases, when executed on an NVidia GeForce 1080 GPU, cuPPA generates a scale-free network of two billion edges in less than 3 seconds.

Concurrent conversation modeling and parallel simulation of the naming game in social networks

Here, we develop a new concurrent model as a relaxation of the classical formulation of the Naming Game and express it in a discrete event style of evaluation. Further, with the uncovered concurrency that was absent in the classical algorithm, we map the concurrent model to parallel discrete event simulation. We present a parallel performance study on networks with hundreds of thousands of individuals.

Generating Billion-Edge Scale-Free Networks in Seconds: Performance Study of a Novel GPU-based Preferential Attachment Model

A novel parallel algorithm is presented for generating random scale-free networks using the preferential-attachment model. In one of the best cases, when executed on an NVidia GeForce 1080 GPU, cuPPA generates a scale free network of a billion edges in less than 2 seconds.

Design exploration of a Symmetric Pass Gate Adiabatic Logic for energy-efficient and secure hardware

https://www.sciencedirect.com/science/article/abs/pii/S016792601630044X

Efficient Simulation of Nested Hollow Sphere Intersections: for Dynamically Nested Compartmental Models in Cell Biology

We define a new problem of computing the intersections among arbitrarily nested hollow spheres of possibly different sizes, thicknesses, positions, and nesting levels. We describe a new algorithm designed to solve this nested hollow sphere intersection problem and implement it for parallel execution on graphical processing units (GPUs). We present first results about the runtime performance and scaling to hundreds of thousands of spheres, and compare the performance with that from a leading solid object intersection package also running on GPUs.

Computational Speed and Matching Quality using an Upper Bound on the Normalized Mutual Information

URL

Computing a Non-trivial Lower Bound on the Joint Entropy between Two Images

https://www.osti.gov/biblio/1347338

Performance of Point and Range Queries for In-memory Databases Using Radix Trees on GPUs

https://ieeexplore.ieee.org/abstract/document/7828553

Model-based Dynamic Control of Speculative Forays in Parallel Computation

https://www.sciencedirect.com/science/article/pii/S1571066116300706

Energy-Efficient and Secure S-Box Circuit Using Symmetric Pass Gate Adiabatic Logic

https://ieeexplore.ieee.org/abstract/document/7560215

Model-based Dynamic Control of Speculative Forays in Parallel Computation

https://www.osti.gov/biblio/1408647

Identifying and Harnessing Concurrency for Parallel and Distributed Network Simulation

Andelfinger’s thesis online

On the Definability of Art

There are many key concepts that, even while being part of everyday life, elude definition. One such is “Art.” Here, possible ways are identified to define Art, along with a description of a few factors that underlie the challenge in arriving at a definition. Additionally, a candidate definition from a scientist’s viewpoint is proposed for an abstract, encompassing model.

Efficient Parallel Discrete Event Simulation on Cloud/Virtual Machine Platforms

Cloud and Virtual machine (VM) technologies present new challenges with respect to performance and monetary cost in executing parallel discrete event simulation (PDES) applications…

Virtual Time-Aware Virtual Machine Systems

https://smartech.gatech.edu/bitstream/handle/1853/52321/YOGINATH-DISSERTATION-2014.pdf

Design of A High-Fidelity Testing Framework for Secure Electric Grid Control

A solution methodology and implementation components are presented that can uncover unwanted, unin-t…

Discrete Event Execution with One-Sided and Two-Sided GVT Algorithms on 216,000 Processor Cores

Global virtual time (GVT) computation is a key determinant of the efficiency and runtime dynamics of parallel discrete event simulations (PDES)…

Parallel Discrete Event Simulation

This tutorial introduces the fundamental principles and algorithms underlying parallel/distributed discrete event simulation (PDES)…

Parallel Discrete Event Simulation using Supercomputers, Cloud/VM, and Accelerators

[Pub 150] http://www.acm-sigsim-pads.org/

Reverse Computation for Rollback-based Fault Tolerance in Large Parallel Systems

Reverse computation is presented here as an important future direction in addressing the challenge o…

Simulating Billion-Task Parallel Programs

In simulating large parallel systems, bottom-up approaches exercise detailed hardware models with effects from simplified software models or traces, …

Towards Reversible Basic Linear Algebra Subprograms: A Performance Study

Problems such as fault tolerance and scalable synchronization can be efficiently solved using reversibility of applications…

Tutorial: Introduction to Reversible Computing

This tutorial provides an introduction to the concept of reversible computing, adopting an expanded view…

Virtual Machine-Based Simulation Platform For MANET-Based Cyber Infrastructure

In modeling and simulating complex systems such as mobile ad-hoc networks (MANETs) in defense communications, …

Simulation des Réseaux à grande Échelle sur les architectures de calcules hètèrogénes

Romdhanne’s thesis online

Interfacing JavaScript and MPI

Reversibly Finding the Square Root of an Integer

Efficient Heterogeneous Execution on Large Multicore and Accelerator Platforms: Case Study Using a Block Tridiagonal Solver

The algorithmic and implementation principles are explored in gainfully exploiting GPU accelerators in conjunction with multicore processors on high-end systems…

Empirical Evaluation of PDES Execution on Cloud and Virtual Machine Platforms

Virtual machine (VM) technologies, especially those offered via Cloud platforms, present new dimensions with respect to performance and cost in executing parallel discrete event simulation (PDES) applications…

Improved Parallelization of the SIESTA Magneto-hydrodynamic Equilibrium Code Using Cyclic Reduction

SIESTA is a parallel three-dimensional plasma equilibrium code capable of resolving magnetic islands…

Introduction to Reversible Computing

Few books comprehensively cover the software and programming aspects of reversible computing. Fillin…

Optimized Hypervisor Scheduler for Parallel Discrete Event Simulations on Virtual Machine Platforms

With the advent of virtual machine (VM)-based platforms for parallel computing, it is now possible to execute parallel discrete event simulations (PDES)…

Reversible Simulations of Elastic Collisions

Consider a system of N identical hard spherical particles moving in a d-dimensional box and undergoi…

Reversible Simulations of Elastic Collisions (Extended arXiv version)

Consider a system of N identical hard spherical particles moving in a d-dimensional box and undergoing elastic, possibly multi-particle, collisions…

Revisiting Cyclic Reduction and Parallel Prefix-Based Algorithms for Tri-diagonal Systems of Equations

Direct solvers based on prefix computation and cyclic reduction algorithms exploit the special struc…

Discrete Event Execution and Reversibility: Challenges in the Path to Asynchrony for Massively Parallel Computing

To keep up with the increasing number of processing elements in parallel/distributed computing, traditional tightly-coupled time-stepped models…

Discrete Event Modeling and Massively Parallel Execution of Epidemic Outbreak Phenomena

In complex phenomena such as epidemiological outbreaks, the intensity of inherent feedback effects a…

Parallel Simulation on Supercomputers

This tutorial introduces the typical hardware and software characteristics of extant and emerging supercomputing systems…

ReveR-SES: Reversible Software Execution Systems

[Pub 137]

Runtime Performance And Virtual Network Control Alternatives In VM-Based High-Fidelity Network Simulations

In prior work (Yoginath and Perumalla, 2011; Yoginath, Perumalla and Henz, 2012), the motivation, challenges and issues were articulated in favor of virtual time ordering of virtual machines…

Scaling Optimization of the SIESTA MHD Code

SIESTA is capable of computing three-dimensional plasma equilibria with magnetic islands at high spatial resolutions for toroidally confined plasmas…

Scaling the SIESTA Magnetohydrodynamics Equilibrium Code

We report the results of a scaling effort that increases both the speed and resolution of the SIESTA…

Taming Wild Horses: The Need for Virtual Time-based Scheduling of VMs in Network Simulations

The next generation of scalable network simulators employ virtual machines (VMs) to act as high-fidelity models of traffic producer/consumer nodes…

Towards Highly Interactive, GPU-based Evaluation of Evacuation Transport Scenarios at State-Scale

In large-scale scenarios, transportation modeling and simulation is severely constrained by simulati…

Efficiently Scheduling Multi-core Guest Virtual Machines on Multi-core Hosts in Network Simulation

Virtual machine (VM)-based simulation is a method used by network simulators to incorporate realistic application behaviors by executing actual VMs as high-fidelity surrogates for simulated end-hosts…

GVT Algorithms and Discrete Event Dynamics on 129K+ Processor Cores

Parallel discrete event simulation (PDES) represents a class of codes that are challenging to scale to large number of processors…

Improved Parallelization of the SIESTA Magneto-hydrodynamic Equilibrium Code Using Cyclic Reduction

[Pub 128] http://www.aps.org/meetings/meeting.cfm?name=DPP11

Improving Multi-Million Virtual Rank MPI Execution in MUPI

MUPI is a parallel discrete event simulator designed for enabling software-based experimentation via…

ReveR-SES: Reversible Software Execution Systems

[Pub 136] http://science.energy.gov/ascr/ascac/meetings/nov-2011/

Reversible Parallel Discrete Event Formulation of a TLM-based Radio Signal Propagation Model

Radio signal strength estimation is essential in many applications, including the design of military…

Towards High Performance Discrete Event Simulations of Smart Electric Grids

Future electric grid technology is envisioned on the notion of a smart grid in which responsive end-…

µπ: A Scalable and Transparent System for Simulating MPI Programs

µπ is a scalable, transparent system for experimenting with the execution of parallel progr…

An Incremental Parallelization Approach Applied to the ORNL/NRC FAVOR Code

Parallelizing a domain-specific production code with thousands of lines that is developed over several years is a daunting task…

Bcyclic: A Parallel Block Tri-diagonal Matrix Cyclic Solver

A block tri-diagonal matrix is factored with minimal fill-in using a cyclic reduction algorithm that…

Compiler-based Automation Approaches to Reverse Computation

Automation is useful to facilitate reverse code generation from normal code. Here, we describe our …

Efficient Simulation of Agent-Based Models on Multi-GPU and Multi-Core Clusters

An effective latency-hiding mechanism is presented in the parallelization of agent-based model simul…

High-Performance Simulations for Capturing Feedback and Fidelity in Complex Networked Systems

In a variety of complex networked systems, simulation is a powerful method to capture critical feedb…

On Deciding between Conservative and Optimistic Approaches on Massively Parallel Platforms

Over 5000 publications on parallel discrete event simulation (PDES) have appeared in the literature …

Reversible Parallel Discrete-Event Execution of Large-scale Epidemic Outbreak Models

The spatial scale, runtime speed, and behavioral detail of epidemic outbreak simulations altogether …

Supercomputing Applications of the Other Kind: Real-time Parallel Discrete Event Simulations of Large-scale, Smart Infrastructures

Ultra-scale supercomputing hardware is a reality, reaching peta-scale recently and now moving to exa…

Towards Highly Interactive, GPU-based Evaluation of Evacuation Transport Scenarios at State-Scale

In large-scale scenarios, transportation modeling and simulation is severely constrained by simulati…

A Connectionist Modeling Approach to Rapid Analysis of Emergent Social Cognition Properties in Large-Populations

Traditional modeling methodologies, such as those based on rule-based agent modeling, are exhibiting…

Computational Spectrum of Agent Model Simulation

The study of human social behavioral systems is finding renewed interest in military, homeland secur…

Coping at the User-Level with Resource Limitations in the Cray Message Passing Toolkit MPI at Scale: How Not to Spend Your Summer Vacation

[Pub 25]

Cyber Security Experimentation: Gory Detail or None at All?

Experiences Applying Parallel and Interoperable Network Simulation Techniques in On-line Simulations of Military Networks

[Pub 48]