Software
Listing
The following is a selected set of software artifacts I developed or co-developed.
Package | Description | Impact |
Deep CYBERIA | Novel software system to detect, identify, correlate, and visualize sensors beyond network-reachable cyber surface; ORNL research transitioning to Air Force | Deployment package using WireShark, PyShark, Python, gephi, CRViz. |
ExaCorona | Scalable generator of simulated datasets for COVID and similar pandemics, aimed at three dimensions of scalability https://github.com/perumallaks/exacorona | Runs on Linux, MacOS, and Summit Supercomputer. |
MutEnt | Novel mutual entropy computation code for highly scalable and efficient computation of image registration operations for large sized, high volume images. | Runs on CPU and GPU platform (C++, CUDA). Beats the best-known open source implementations available in OpenCV. |
DeepEx | Manager for novel ORNL code for Deep Learning designed for very light software footprint, scaling to large heterogeneous platforms (GPU and multicore CPU), highly portable compiled implementation for high performance. | Runs on supercomputing platforms with GPUs and CPUs (C++, MPI, CUDA – CUDNN, NCCL) Tested on several networks (VGGNet, etc.), and image data sets. |
RBLAS | Reversible version of basic linear algebra subprograms (BLAS) interface and implementation that works over traditional (irreversible) BLAS. | The only available reversible linear algebra library. Portable across GPUs and CPUs (C, C++, FORTRAN, CUDA) |
μπ (MUPI) | The world’s most scalable simulator of Message Passing Interface (MPI) programs. | Tested on up to 216,000 processor cores of Cray XT5; supports over 227 million virtual tasks |
libSynk | Library for high performance time-synchronized communication on distributed memory platforms; written in C, over sockets, MPI & shared memory. | Employed by most leading distributed network simulators including pdns, DaSSF & GTNetS |
µsik | Novel PDES “micro-kernel”, unifying most existing virtual time-synchronization techniques; written in C++. | Designed for scalable Time Warp as well as conservative synchronization on 216,000 processor core execution. Being applied to large-scale space physics DES models, neurological simulations and others |
TeD | Domain Specific Language and compiler for automated Time Warp-based execution of network models. www.cc.gatech.edu/computing/pads/teddoc.html | Precursor to currently leading parallel/distributed network simulators. Widely disseminated world-wide and well cited in the literature |
FDK | High-performance realization of the Department of Defense High Level Architecture (HLA) Runtime Infrastructure (RTI) www.cc.gatech.edu/computing/pads/fdk.html | Among the very few source-available HLA RTI implementations. Well recognized in HLA community |
PARINO | Parallel/distributed branch-and-cut solver for mixed integer programming (MIP) | Incorporated novel cut sharing and distributed management mechanisms |
Approach
In general, I am a proponent of lean and mean software – that which has as small a footprint as feasible, with as few external dependencies as possible.
I wonder to what extent import
is enabling versus constraining. I also wonder about the disk, memory, computation, and network costs we are paying in the world in return for functionality and reuse from extreme modularization and sharing.