Background

Verifying and validating AI in safety-critical systems

Lucas Garcia is a product manager at Mathworks.

Reading time: 6 minutes

AI regulations and verification and validation (V&V) processes will significantly impact safety-critical systems. Engineers need to employ V&V techniques that provide explainability and transparency for the AI models that run those systems. And as they use AI to aid in their V&V processes, it’s essential to explore a variety of testing approaches that address the increasingly complex challenges of AI technologies.

The EU AI Act, passed in July 2024, is the first legal framework on artificial intelligence, addressing the risks of AI and positioning Europe as a global leader. It sets clear requirements and obligations for AI developers and deployers regarding specific uses of AI. At the same time, it aims to reduce administrative and financial burdens for businesses, especially SMEs.

In October 2023, the United States White House issued an executive order on AI regulation, highlighting the importance of robust verification and validation (V&V) processes for AI-enabled systems. The directive mandates AI companies to report and test specific models to ensure that AI systems function as intended and meet specified requirements.

As countries worldwide establish AI regulations, engineers designing AI-enabled systems must meet these newly introduced specifications and standards. This includes safety-critical applications such as automotive and aerospace industries, in the design of which AI is increasingly used as well. For these AI-enabled safety-critical systems, V&V procedures are becoming crucial to obtaining industry certifications and complying with legal requirements.

V&V in AI-enabled systems

Verification determines whether an AI model is designed and developed per the specified requirements. Validation involves checking whether the product has met the client’s needs and expectations. By employing V&V techniques, engineers can ensure that the AI model’s outputs meet specifications, allowing for early bug detection and data bias mitigation.

One advantage of using AI in safety-critical systems is that AI models can approximate physical systems and validate the design. Engineers simulate entire AI-enabled systems and use the data to test systems in different scenarios, including outlier events. Performing V&V in safety-critical scenarios ensures that an AI-enabled safety-critical system can maintain its performance level under various circumstances.

Most industries that develop AI-enhanced products require engineers to comply with standards before going to market. These certification processes ensure that specific elements are built into these products. Engineers perform V&V to test the functionality of such elements, which makes it easier to obtain certifications.

In the automotive industry, ISO/CD PAS 8800 is a standard being developed to address safety-related properties and risk factors for road vehicles. In aerospace and defense, where certification is mandatory, existing standards such as the Software Considerations in Airborne Systems and Equipment Certification (DO178C) can’t always directly address the unique challenges posed by AI. For this reason, the new ARP6983 process standard is being created to provide guidelines for developing and certifying aeronautical safety-related products implementing AI. Tools such as Deep Learning Toolbox Verification Library and Matlab Test can help engineers stay at the forefront of V&V in aviation and automotive by developing software that helps to adhere to industry standards, streamlining the verification and testing of AI models within larger systems.

The W-shaped development process is a non-linear V&V workflow designed to ensure the accuracy and reliability of AI models. Credit: Mathworks

V&V for safety-critical AI

When performing V&V, the engineer’s goal is to ensure that the AI component meets the specified requirements, is reliable under all operating conditions and, therefore, is safe and ready for deployment. The V&V process for AI involves performing software assurance activities that include a combination of static and dynamic analyses, testing, formal methods and real-world operational monitoring.

V&V processes may vary slightly across industries, but the overarching steps are: analyzing the decision-making process to solve the black box problem, testing the model against representative datasets, conducting AI system simulations and ensuring the model operates within acceptable bounds. These steps are iterative, allowing for continuous refinement and improvement of the AI system as engineers collect new data, gain new insights and integrate operational feedback.

Black box problem

When engineers use an AI model to add automation to a system, one issue that arises is the black box problem. Understanding how AI-based systems make decisions is crucial to providing transparency, enabling engineers and scientists to build trust in model predictions and comprehend decision-making. To tackle the black box problem, they can use feature importance analysis and explainability techniques.

Feature importance analysis is a technique that helps engineers identify which input variables impact a model’s predictions most significantly. Although the analysis works differently for different models, such as tree-based and linear models, the general procedure assigns a feature importance score to each input variable. A higher importance score signifies that the feature has a greater impact on the model’s decision. In the case of a safety-critical system in the automotive industry, variables may include environmental factors, such as precipitation or the presence and behavior of other vehicles.

Explainability techniques offer insights into the model’s behavior. This is especially relevant when the black-box nature of the model prevents us from using other approaches. In the context of images, these techniques identify the regions of an image that contribute the most to the final prediction. This enables engineers to understand the model’s primary focus when making a prediction.

Tests and simulations

Engineers often evaluate an AI model’s performance in real-world scenarios where the safety-critical system is expected to operate. The goal is to identify limitations and improve the accuracy and reliability of the model. Engineers gather a wide range of real-world representative datasets and clean up the data to make it suitable for testing. Test cases are then designed to evaluate various aspects of the model, such as its accuracy and reproducibility. Finally, the model is applied to the datasets, and the results are recorded and compared to the expected output. The model design is improved according to the outcome of the data testing.

Simulating an AI-enabled system enables engineers to evaluate and assess the system’s performance in a controlled environment. During a simulation, a virtual environment is created that mimics a real-world system under a variety of conditions. Engineers first define the inputs and parameters to simulate a system, such as initial conditions and environmental factors. The simulation is then executed using software such as Simulink, which outputs the system’s responses to the proposed scenario. As in data testing, the simulation results are compared to expected or known outcomes, and the model is improved iteratively.

Acceptable bounds

For AI models to operate safely and reliably, it’s vital to establish limits and monitor the model’s behavior to ensure that it stays within those boundaries. One of the most common boundary issues occurs when a model has been trained on a limited dataset and encounters out-of-distribution data at runtime. Similarly, the model may not be robust enough and can potentially lead to unpredictable behavior. Engineers employ bias mitigation and robustification techniques to ensure AI models operate within acceptable bounds.

One way to mitigate data bias is to create variability in the data used to train the AI model, which reduces a model’s dependence on repeating patterns that restrict its learning. The data augmentation technique helps ensure fairness and equal treatment of different classes and demographics. In the case of a self-driving car, it may involve using pictures of pedestrians from various angles to help the model detect a pedestrian regardless of their positioning. The data balancing technique is often paired with data augmentation and includes similar samples from each data class. In the pedestrian example, this means ensuring that the dataset contains a proportionate number of images for each variation of pedestrian scenarios, such as different body shapes, clothing styles, lighting conditions and backgrounds. Data balancing minimizes bias and improves the model’s generalization ability across diverse real-world situations.

Robustness is a primary concern when deploying neural networks in safety-critical situations. Neural networks are susceptible to misclassification due to small, imperceptible changes that pose significant risks. These disturbances can cause a neural network to output incorrect or dangerous results, which is alarming in systems where errors can lead to catastrophes. One solution is integrating formal methods into the development and validation process. Formal methods use rigorous mathematical models to establish and prove the correctness properties of neural networks. By applying these methods, engineers can improve the network’s resilience to certain types of disturbances, ensuring higher robustness and reliability in safety-critical applications.

Related content