Research on correctness testing techniques for artificial intelligence models
-
Graphical Abstract
-
Abstract
In recent years, various natural language processing models and image recognition models have been widely deployed in military equipment. However, the quality and performance levels of artificial intelligence (AI) models vary widely, making effective evaluation and testing challenging. There is an urgent need to propose testing methods tailored for military AI models to accomplish correctness evaluation. AI algorithm models rely on massive raw data as training inputs, undergo training through algorithmic models, and ultimately output prediction results via trained models. Nevertheless, due to the "black-box" nature of these algorithms and potential issues such as uneven data distribution in raw datasets, model prediction errors frequently occur, leading to safety incidents involving human lives and property. How to measure the correctness of AI models has become a critical research topic. This paper approaches the issue from two dimensions: model data and model algorithms. For model data, a data quality evaluation scheme based on clustering algorithms is proposed to identify raw data that significantly impacts model decision-making. By modifying and removing redundant portions of datasets, the quality of datasets is enhanced. For model algorithms, a correctness verification scheme based on fuzz testing theory is introduced. Guided by fuzz testing principles, minor perturbations are applied to alter original inputs, generating mutated inputs. The correctness of model algorithms is verified by detecting misclassifications of these mutated inputs. Additionally, to assess testing adequacy, a neuron coverage-guided testing method is proposed. This method maximizes the coverage of the neuron state space to identify misjudgment points in model algorithms, thereby improving the correctness of AI models. The proposed techniques aim to provide systematic solutions for evaluating and enhancing the reliability of AI models in mission-critical military applications.
-
-