Software

Software

Listing

The following is a selected set of software artifacts I developed or co-developed.

Package Description Impact
Deep CYBERIA Novel software system to detect, identify, correlate, and visualize sensors beyond network-reachable cyber surface; ORNL research transitioning to Air Force Deployment package using WireShark, PyShark, Python, gephi, CRViz.
ExaCorona Scalable generator of simulated datasets for COVID and similar pandemics, aimed at three dimensions of scalability https://github.com/perumallaks/exacorona Runs on Linux, MacOS, and Summit Supercomputer.
MutEnt Novel mutual entropy computation code for highly scalable and efficient computation of image registration operations for large sized, high volume images. Runs on CPU and GPU platform (C++, CUDA). Beats the best-known open source implementations available in OpenCV.
DeepEx Manager for novel ORNL code for Deep Learning designed for very light software footprint, scaling to large heterogeneous platforms (GPU and multicore CPU), highly portable compiled implementation for high performance. Runs on supercomputing platforms with GPUs and CPUs (C++, MPI, CUDA – CUDNN, NCCL) Tested on several networks (VGGNet, etc.), and image data sets.
RBLAS Reversible version of basic linear algebra subprograms (BLAS) interface and implementation that works over traditional (irreversible) BLAS. The only available reversible linear algebra library. Portable across GPUs and CPUs (C, C++, FORTRAN, CUDA)
μπ (MUPI) The world’s most scalable simulator of Message Passing Interface (MPI) programs. Tested on up to 216,000 processor cores of Cray XT5; supports over 227 million virtual tasks
libSynk Library for high performance time-synchronized communication on distributed memory platforms; written in C, over sockets, MPI & shared memory. Employed by most leading distributed network simulators including pdns, DaSSF & GTNetS
µsik Novel PDES “micro-kernel”, unifying most existing virtual time-synchronization techniques; written in C++. Designed for scalable Time Warp as well as conservative synchronization on 216,000 processor core execution. Being applied to large-scale space physics DES models, neurological simulations and others
TeD Domain Specific Language and compiler for automated Time Warp-based execution of network models. www.cc.gatech.edu/computing/pads/teddoc.html Precursor to currently leading parallel/distributed network simulators. Widely disseminated world-wide and well cited in the literature
FDK High-performance realization of the Department of Defense High Level Architecture (HLA) Runtime Infrastructure (RTI) www.cc.gatech.edu/computing/pads/fdk.html Among the very few source-available HLA RTI implementations. Well recognized in HLA community
PARINO Parallel/distributed branch-and-cut solver for mixed integer programming (MIP) Incorporated novel cut sharing and distributed management mechanisms

Approach

In general, I am a proponent of lean and mean software – that which has as small a footprint as feasible, with as few external dependencies as possible.

I wonder to what extent import is enabling versus constraining. I also wonder about the disk, memory, computation, and network costs we are paying in the world in return for functionality and reuse from extreme modularization and sharing.