Why This Matters

Rising software complexity in aerospace systems necessitates innovative runtime mechanisms that provide fault management services beyond traditional design-time approaches. This work is significant in applying Software Health Management concepts to component-based systems, enabling automated fault detection, diagnosis, and mitigation through formal models that capture temporal and causal system dependencies.

What We Did

This technical report describes a component-level and system-level health management approach for ARINC-653 software systems in aerospace applications. The work presents a two-level health management architecture with Component-Level Health Manager (CLHM) for individual components and System-Level Health Manager (SLHM) for overall system health, using Timed Failure Propagation Graph models.

Key Results

The architecture demonstrates automatic synthesis of health management infrastructure from component models, including monitoring code, diagnosis information, and mitigation strategies. The TFPG-based diagnosis engine can isolate fault sources and trigger appropriate system-level recovery actions, providing runtime dependability for safety-critical aerospace systems.

Full Abstract

Cite This Paper

@techreport{Mahadevan2013a,
  author = {Mahadevan, Nagabhushan and Dubey, Abhishek and Balasubramanian, Daniel and Karsai, Gabor},
  institution = {Institute for Software Integrated Systems, Vanderbilt University},
  title = {Deliberative Reasoning in Software Health Management},
  year = {2013},
  month = {04/2013},
  number = {ISIS-13-101},
  type = {techreport},
  abstract = {Rising software complexity in aerospace systems makes them very dicult to analyze and prepare for all possible fault scenarios at design-time. Therefore, classical run-time fault-tolerance techniques, such as self-checking pairs and triple modular redundancy are used. However, several recent incidents have made it clear that existing software fault tolerance techniques alone are not sucient. To improve system dependability, simpler, yet formally specied and veried run-time monitoring, diagnosis, and fault mitigation are needed. Such architectures are already in use for managing the health of vehicles and systems. Software health management is the application of adapting and applying these techniques to software. In this paper, we briey describe the software health management technique and architecture developed by our research group. The foundation of the architecture is a real-time component framework (built upon ARINC-653 platform services) that denes a model of computation for software components. Dedicated architectural elements: the Component Level Health Manager (CLHM) and System Level Health Manager (SLHM) are providing health management services: anomaly detection, fault source isolation, and fault mitigation. The SLHM includes a diagnosis engine that uses a Timed Failure Propagation (TFPG) model derived from the component assembly model, and it reasons about cascading fault eects in the system and isolates the fault source component(s). Thereafter, the appropriate system level mitigation action is taken. The main focus of this article is the description of the fault mitigation architecture that uses goal-based deliberative reasoning to determine the best mitigation actions for recovering the system from the identied failure mode.},
  attachments = {http://www.isis.vanderbilt.edu/sites/default/files/TechReport2013.pdf},
  contribution = {lead},
  file = {:Mahadevan2013a-Deliberative_reasoning_in_software_health_management.pdf:PDF},
  issn = {ISIS-13-101},
  keywords = {software health management, ARINC-653, component models, fault diagnosis, timed failure propagation, aerospace systems},
  tag = {platform}
}
Quick Info
Year 2013
Keywords
software health management ARINC-653 component models fault diagnosis timed failure propagation aerospace systems
Research Areas
CPS ML for CPS
Search Tags

Deliberative, Reasoning, Software, Health, Management, software health management, ARINC-653, component models, fault diagnosis, timed failure propagation, aerospace systems, CPS, ML for CPS, 2013, Mahadevan, Dubey, Balasubramanian, Karsai