Why This Matters

Software in cyber-physical systems like aircraft increasingly implements critical functionality, yet existing fault tolerance techniques rely on hardware redundancy and are inadequate for latent software defects. This work innovates by adapting aviation safety principles to software systems using model-based approaches and component architecture to enable fault containment and recovery. The hierarchical health management structure provides both localized quick recovery and global system-wide diagnosis without requiring system redesign.

What We Did

This paper presents model-based software health management techniques for real-time systems that detect, diagnose, and mitigate faults in complex software components. The approach applies traditional System Health Management from avionics to software using component-based architecture and the ARINC Component Model. It introduces a two-level hierarchy with Component-level Health Management detecting anomalies in individual components and System-level Health Management managing overall system health. The framework includes monitoring sensors, anomaly detection via runtime verification, and mitigating actions through timed fault propagation.

Key Results

The framework successfully applies model-based health management to the Boeing 777 Air Data Inertial Reference Unit case study, detecting and mitigating effects of component-level failures such as failed accelerometers. Experimental results demonstrate that the system can identify root failure sources using timed fault propagation and automatically execute mitigation strategies. The approach enables systems to recover from individual component failures while maintaining overall functionality and preventing failure cascade.

Full Abstract

Cite This Paper

@inproceedings{Dubey2011a,
  author = {Dubey, Abhishek and {Karsai}, G. and {Mahadevan}, N.},
  booktitle = {2011 Aerospace Conference},
  title = {Model-based software health management for real-time systems},
  year = {2011},
  month = {mar},
  pages = {1-18},
  abstract = {Complexity of software systems has reached the point where we need run-time mechanisms that can be used to provide fault management services. Testing and verification may not cover all possible scenarios that a system will encounter, hence a simpler, yet formally specified run-time monitoring, diagnosis, and fault mitigation architecture is needed to increase the software system's dependability. The approach described in this paper borrows concepts and principles from the field of {\textquotedblleft}Systems Health Management{\textquotedblright} for complex systems and implements a two level health management strategy that can be applied through a model-based software development process. The Component-level Health Manager (CLHM) for software components provides a localized and limited functionality for managing the health of a component locally. It also reports to the higher-level System Health Manager (SHM) which manages the health of the overall system. SHM consists of a diagnosis engine that uses the timed fault propagation (TFPG) model based on the component assembly. It reasons about the anomalies reported by CLHM and hypothesizes about the possible fault sources. Thereafter, necessary system level mitigation action can be taken. System-level mitigation approaches are subject of on-going investigations and have not been included in this paper. We conclude the paper with case study and discussion.},
  category = {conference},
  contribution = {lead},
  doi = {10.1109/AERO.2011.5747559},
  file = {:Dubey2011a-Model-based_software_health_management_for_real-time_systems.pdf:PDF},
  issn = {1095-323X},
  keywords = {software health management, fault diagnosis, mitigation, model-based design, ARINC-653, component architecture},
  tag = {platform},
  month_numeric = {3}
}
Quick Info
Year 2011
Keywords
software health management fault diagnosis mitigation model-based design ARINC-653 component architecture
Research Areas
CPS middleware Explainable AI
Search Tags

Model, software, health, management, real, time, systems, software health management, fault diagnosis, mitigation, model-based design, ARINC-653, component architecture, CPS, middleware, Explainable AI, 2011, Dubey, Karsai, Mahadevan