AI for Cyber-Physical Systems

Context: Significant advances in Artificial Intelligence (AI) and Machine Learning (ML) are enabling dramatic, unprecedented capabilities in all spheres of human life, including cyber-physical systems (CPS). The fundamental advantage of AI methods is their ability to handle high-dimensional state-space and learn decision procedures/control algorithms from data rather than models. This is critical as real-world state spaces are complex, dynamic, and hard to model. Our goal is to bridge this gap and learn an abstract representation of these state spaces and develop decision procedures for use in the different research verticals we investigate - proactive emergency response systems, transit management systems, and electric power grids. The research challenge is to not only develop these methods, but also consider the societal context in which they are being used and investigate co-design principles that allow us to reason about resilience, assurance, and fault diagnostics for AI-CPS.

Our Work: The problem is fundamentally in the design and integration workflows being employed for designing AI-CPS. While significant research is being done in improving the capabilities and training methods of the AI systems, there are only limited studies that focus on the complete integration and interaction uncertainties of AI components, especially in the context of fully deployed systems.

To illustrate the problem consider that conventional CPS design flows to focus on designing a system that satisfies some requirements in an environment. Often, the design process the integrator follows requires them to either develop new components or select existing components from a library, assembly them together, ensuring the interface matches, and then deploy the system. The challenge for the integrator is to select and identify the best possible architectural model that satisfies a set of relevant reliability, resilience metrics, which are often hard to evaluate as part of requirements. The difficulty in this process is that in practice the environment is only approximated using a surrogate model and is contextualized by a set of scenarios that the system must pass to approximate the proof that the assembled system meets the requirements. But as we know from an increasing number of system failures, the consequence of using this approximation is quite profound leading to scenarios where the designed architecture may fail in the physical environment. The problem is even more profound in AI-CPS because the AI components have additional uncertainty as they are not designed directly from a mathematical model of the environment (which is hard to obtain); rather they are designed using sampled observations which in most cases may never cover the full state space of the real physical environment.

This causes considerable stress on the safety of the design. Our work, in general, focuses on the development of the decision procedures to be used in the system along with runtime monitors, and assurance cases to show that the system at runtime if there is a fault in the component of the system (software, hardware, or AI) or if the problem is emanating from the invalidation of the assumptions made about the environment.

Innovation and Research Products:

  1. ReSonAte - We have designed a dynamic assurance approach called ReSonAte that computes the likelihood of unsafe conditions or system failures considering the safety requirements, assumptions made at design time, past failures in a given operating context, and the likelihood of system component failures. The system has been demonstrated in simulations using two separate autonomous system simulations: CARLA and an unmanned underwater vehicle. The system was evaluated across 600 separate simulation scenes where we tested scenes with distribution shifts, component failures, and a high likelihood of collisions (based on past observations). Through the tests, we were able to show that our methodology has a precision of 73% and a recall of 79%. We are currently working on methods to dynamically estimate and learn the conditional probabilities and improve the precision values. On average, the framework takes 0.3 milliseconds for the computation of risk scores.

  2. Assurance Monitors - While ReSonAte monitors the safety assurance at the system-level, we need monitors at component levels to detect its anomalous behavior. Monitors to detect anomalies like data validity, pre-condition, and post-condition failures, user-code failure have been designed for conventional CPS, but a LEC based CPS would require complex monitors or assurance monitors to detect OOD. For this, we have designed an assurance monitor using a 𝛃-Variational Autoencoder (𝛃-VAE) 2,3, which can detect if an input (e.g. image) to a LEC is OOD or not. Besides, it can also diagnose the precise changes (e.g. brightness, blurriness, occlusion, weather change, etc.) in the input that caused the input to be OOD. Conceptually, we generate an interpretable latent representation for inputs using the 𝛃-VAE and then perform a correspondence between the latent units and generative factors to perform a latent space-based OOD detection and diagnosis. The disentanglement based diagnosis capability of the 𝛃-VAE monitor is the key innovation of this work, and our analysis shows it can be utilized as a multi-class classifier for multi-label datasets.

  3. Runtime Recovery Procedures - We have also developed runtime recovery procedures that manage the health of the system by using the system design information including the system information flow, requirement models and function decomposition models, temporal failure propagation graphs at runtime to identify the problems and recover the system by solving a dynamic constraint problem at runtime. The goal of the runtime problem is to identify the optimal component configuration that can provide the lowest risk of subsequent failures given the currently available resources and environment information.

  4. DeepNNCar - to test the efficacy of our methods on embedded systems, we have developed a low-cost research testbed that was designed in our lab. It is built upon the chassis of Traxxas Slash 2WD 1/10 Scale RC car, and is mounted with a USB forward-looking camera, IR- optocoupler, and a 2D LIDAR. The speed and steer for the robot are controlled using pulse-width modulation (PWM), by varying the duty cycle. Further, for autonomous driving, the robot uses a modified NVIDIA DAVE-II CNN model that takes in the front camera image and speed to predict the steering. More information about the car is available in this Medium Article. Videos of DeepNNCar with different controllers is available here.

Follow ups : Further information is available at following links.

  1. DeepNNCar: A Testbed for Autonomous Algorithms
  2. Deep NN Car Repository
  3. ReSonAte: A Runtime Risk Assessment Framework for Autonomous Systems