Why This Matters

Large real-time systems require fault responses that are not only functionally correct but also satisfy temporal properties like liveness, safety, and bounded responsiveness. The innovation of this work is the rigorous integration of formal verification techniques into autonomous fault mitigation, enabling designers to prove that mitigation actions will execute correctly and meet timing deadlines. This bridges the gap between theoretical fault management concepts and practical deployment in safety-critical systems.

What We Did

This paper presents a comprehensive real-time fault mitigation framework for large-scale real-time systems that combines reflex and healing architectures with formal verification. The framework enables autonomous fault diagnosis through discrete event modeling and uses hierarchical reflex engines to execute mitigation actions that respond to both time-based and event-based triggers. The work introduces timed automata models to formally specify and verify the fault mitigation strategies deployed in real-time systems.

Key Results

The paper demonstrates transformation of a reflex and healing framework into a network of timed automata models that can be verified using model checking tools like UPPAAL. Case studies show how the framework can formally specify and verify properties such as liveness and bounded time responsiveness for fault mitigation strategies. The approach enables verification of both individual reflex engines and complex multi-engine fault management hierarchies.

Full Abstract

Cite This Paper

@article{Dubey2007,
  author = {Dubey, Abhishek and Nordstrom, Steven and Keskinpala, Turker and Neema, Sandeep and Bapty, Ted and Karsai, Gabor},
  journal = {Innovations in Systems and Software Engineering},
  title = {Towards a verifiable real-time, autonomic, fault mitigation framework for large scale real-time systems},
  year = {2007},
  number = {1},
  pages = {33--52},
  volume = {3},
  abstract = {Designing autonomic fault responses is difficult, particularly in large-scale systems, as there is no single {\textquoteleft}perfect{\textquoteright} fault mitigation response to a given failure. The design of appropriate mitigation actions depend upon the goals and state of the application and environment. Strict time deadlines in real-time systems further exacerbate this problem. Any autonomic behavior in such systems must not only be functionally correct but should also conform to properties of liveness, safety and bounded time responsiveness. This paper details a real-time fault-tolerant framework, which uses a reflex and healing architecture to provide fault mitigation capabilities for large-scale real-time systems. At the heart of this architecture is a real-time reflex engine, which has a state-based failure management logic that can respond to both event- and time-based triggers. We also present a semantic domain for verifying properties of systems, which use this framework of real-time reflex engines. Lastly, a case study, which examines the details of such an approach, is presented.},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/journals/isse/DubeyNKNBK07},
  contribution = {lead},
  doi = {10.1007/s11334-006-0015-7},
  file = {:Dubey2007-Towards_a_verifiable_real-time_autonomic_fault_mitigation_framework.pdf:PDF},
  project = {cps-middleware,cps-reliability},
  tag = {platform},
  timestamp = {Sun, 28 May 2017 01:00:00 +0200},
  url = {https://doi.org/10.1007/s11334-006-0015-7},
  keywords = {fault mitigation, timed automata, formal verification, real-time systems, reflex engines, model checking, autonomous computing}
}
Quick Info
Year 2007
Keywords
fault mitigation timed automata formal verification real-time systems reflex engines model checking autonomous computing
Research Areas
CPS emergency Explainable AI
Search Tags

Towards, verifiable, real, time, autonomic, fault, mitigation, framework, large, scale, systems, fault mitigation, timed automata, formal verification, real-time systems, reflex engines, model checking, autonomous computing, CPS, emergency, Explainable AI, 2007, Dubey, Nordstrom, Keskinpala, Neema, Bapty, Karsai