An Accuracy-Maximization Approach for Claims Classifiers in Document Content Analytics for Cybersecurity

Kimia Ameri, Michael Hempel, Hamid Sharif, Juan Lopez, Kalyan Perumalla

June 2022

Abstract

This paper presents our research approach and findings towards maximizing the accuracy of our classifier of feature claims for cybersecurity literature analytics, and introduces the resulting model ClaimsBERT. Its architecture, after extensive evaluations of different approaches, introduces a feature map concatenated with a Bidirectional Encoder Representation from Transformers (BERT) model. We discuss deployment of this new concept and the research insights that resulted in the selection of Convolution Neural Networks for its feature mapping aspects. We also present our results showing ClaimsBERT to outperform all other evaluated approaches. This new claims classifier represents an essential processing stage within our vetting framework aiming to improve the cybersecurity of industrial control systems (ICS). Furthermore, in order to maximize the accuracy of our new ClaimsBERT classifier, we propose an approach for optimal architecture selection and determination of optimized hyperparameters, in particular the best learning rate, number of convolutions, filter sizes, activation function, the number of dense layers, as well as the number of neurons and the drop-out rate for each layer. Fine-tuning these hyperparameters within our model led to an increase in classification accuracy from 76% obtained with BertForSequenceClassification’s original model to a 97% accuracy obtained with ClaimsBERT.

Type

Journal article

Publication

Journal of Cybersecurity and Privacy

Open Access: https://www.mdpi.com/2624-800X/1/4/31.

Cyber-Physical Cybersecurity Energy Grid AI ML NLP CYVET

Kalyan Perumalla

As a Federal Program Manager in Advanced Scientific Computing Research at the U.S. Dept. of Energy, Office of Science, Kalyan Perumalla manages a $100-million R&D portfolio covering AI, HPC, Quantum, SciDAC, and Basic Computer Science. In his 25-year R&D leadership experience, he previously led advanced R&D as Distinguished Research Staff Member at the Oak Ridge National Laboratory (ORNL) developing scalable software and applications on the world’s largest supercomputers for 17 years, including as a line manager and a founding group leader. He has held senior faculty and adjunct appointments at UTK, GT, and UNL, and was an IAS Fellow at Durham University.

An Accuracy-Maximization Approach for Claims Classifiers in Document Content Analytics for Cybersecurity

Abstract

Kalyan Perumalla

Related