Resilient Design and Operation of Complex Cyber-Physical Systems

Context: Cyber-Physical Systems encompass all modern engineered systems, including smart transit, smart emergency response, smart grid. The big issue in these systems is the construction and operation of the system in a safe and efficient manner. We strive to follow the methodology of building these systems using a component-based design whenever possible. The guiding principles of component-based design are interfaces with well defined execution models, compositional semantics and analysis. However, there are a number of challenges that have to be resolved (a) performance management, (b) modularization and adaptation of the design as the requirements and environment changes, (c) safe and secure design of the system itself and ensuring that new design and component additions can be compositionally analyzed and operated during the life cycle of the system, (d) fault diagnostics and failure isolation to detect and triage problems onlines and (e) reconfiguration and recovery to dynamically adapt to failures and environmental changes to ensure the safe completion of mission tasks.

Component based design Our work in the area of system integration and middleware for cyber physical systems has spanned over a decade. Together with Prof. Gabor Karsai at the Institute for Software Integrated Systems and Prof. Aniruddha Gokhale and Prof. Doug Schmidt at the Distributed Object Computing Group at Vanderbilt University, we have been working on CORBA, DDS, and system performance modeling. One of the key contributions we have made is the work on ARINC-653 Component Model (ACM), which combines the principle of spatial and temporal partitioning with the interaction patterns derived from the CORBA Component Model (CCM). The main extension over the the CCM are as follows: (a) The synchronous (call-return) and asynchronous (publish-subscribe) interfaces can be equipped with monitors that validate pre- and post-conditions over data that is passed on the respective interface, (b) The relevant portions of the state of the component can also be observed via a dedicated state interface, enabling the monitoring of invariants, (c) The resource usage of the component can be monitored via a resource interface that component uses for allocating and releasing resources and (d) The timing of component execution can be observed via control interface such that instance execution time violations can be detected. Given these extensions, component-level monitoring can be accomplished that evaluates pre- and post-conditions on method invocations, verifies the state invariants, tracks the resource usage, and monitors the timing behavior of the component.

This work was eventually extended and incorporated into DREMS (Distributed Real-Time Embedded Managed Systems) component model for networked CPS. It prescribed a single threaded execution model for components, which helped avoid synchronization primitives that often lead to non-analyzable code and can cause run-time deadlocks and race conditions. One of the key innovations in DREMS was development of fine-grained privileges for controlling access to system services. As part of this effort we developed a novel Multi-Level Security (MLS) information sharing policy across distributed architectures. Recently this model has been extended for a decentralized architecture for smart grid within the framework called Resilient Information Architecture Platform for Smart Grid.

Fault detection and diagnostics: Building on this foundation, we have been also working to develop mechanisms for anomaly detection, fault source isolation and recovery of the systems. In particular, we use a discrete event model that captures the causal and temporal relationships between failure modes (causes) and discrepancies (effects) in a system, thereby modeling the failure cascades while taking into account propagation constraints imposed by operating modes, protection elements, and timing delays. This formalism is called Temporal Causal Diagram (TCD) and can model the effects of faults and protection mechanisms as well as incorporate fine-grain, physics-based diagnostics into an integrated, system-level diagnostics scheme. The uniqueness of the approach is that it does not involve complex real-time computations involving high-fidelity models, but performs reasoning using efficient graph algorithms based on the observation of various anomalies in the system. TCD is based on prior work on Timed Failure Propagation Graphs (TFPG). When fine-grain results are needed and computing resources and time are available, the diagnostic hypotheses can be refined with the help of the physics-based diagnostics. Finally, we use both data-driven approaches like LSTM and graphical neural networks and the TCD models to prognosticate the effect of failures.

One of the key benefits of our approach of formalized component based construction is that we can generate a Timed Failure Propagation Graph (TFPG) from software assemblies and then use it in runtime to isolate faulty components. This is possible because the data and behavioral dependencies (and hence the fault propagation) across the assembly of software components can be deduced from the well-defined and restricted set of interaction patterns supported by the framework. We have also shown that fault containment techniques could be used to provide the primary protection from propagating failures into the high-criticality components and overall protect the system health management framework as well.

Recovery: We have also research mechanisms to recover from component failures by either reinstalling the components automatically or recovering the system functionality with alternative compositions in case of device and hardware failures. The key idea is to encode and use the design space of the cyber-physical system. This design space presents the state of an entire platform. It includes information about different resources available, well known faults, system goals, objectives and corresponding functionalities that help achieve different system goals, components that provide aforementioned functionalities, and possible different ways in which these components can be deployed and configured (this is captured using a domain specific language).

The design space can expand or shrink depending on addition or removal of related entities. A configuration point represents a valid configuration which includes information about a specific deployment scenario given a set of component instances and physical nodes on which these component instances can be deployed. A change in the state of a platform is represented by transition from one configuration point to another in the same design space. An initial configuration point represents the initial state, whereas the current configuration point represents the current state of a platform. Configuration points and their transition are critical for the self-reconfiguration mechanism that I have developed. The key idea is to reconfigure by migrating/transitioning from a faulty configuration point to a new configuration point by solving the problem using efficient SMT solvers. Additionally, if we have past information about component failures, we can reconfigure components to maximize the likelihood that the mission will succeed.

Publications in this area

A. Chhokra, C. Barreto, A. Dubey, G. Karsai, and X. Koutsoukos, Power-Attack: A comprehensive tool-chain for modeling and simulating attacks in power systems, in 9th Workshop on Modeling and Simulation of Cyber-Physical Energy Systems, MSCPES@CPSIoTWeek, 2021.
```
@inproceedings{ajay2021powerattack,
  author = {Chhokra, Ajay and Barreto, Carlos and Dubey, Abhishek and Karsai, Gabor and Koutsoukos, Xenofon},
  booktitle = {9th Workshop on Modeling and Simulation of Cyber-Physical Energy Systems, MSCPES@CPSIoTWeek},
  title = {Power-Attack: A comprehensive tool-chain for modeling and simulating attacks in power systems},
  year = {2021},
  category = {workshop},
  contribution = {colab},
  keywords = {power grid},
  project = {cps-reliability},
  tag = {platform,power}
}
```
Due to the increased deployment of novel communication, control and protection functions, the grid has become vulnerable to a variety of attacks. Designing robust machine learning based attack detection and mitigation algorithms require large amounts of data that rely heavily on a representative environment, where different attacks can be simulated. This paper presents a comprehensive tool-chain for modeling and simulating attacks in power systems. The paper makes the following contributions, first, we present a probabilistic domain specific language to define multiple attack scenarios and simulation configuration parameters. Secondly, we extend the PyPower-dynamics simulator with protection system components to simulate cyber attacks in control and protection layers of power system. In the end, we demonstrate multiple attack scenarios with a case study based on IEEE 39 bus system.
S. Basak, S. Sengupta, S.-J. Wen, and A. Dubey, Spatio-temporal AI inference engine for estimating hard disk reliability, Pervasive and Mobile Computing, vol. 70, p. 101283, 2021.
```
@article{BASAK2021101283,
  author = {Basak, Sanchita and Sengupta, Saptarshi and Wen, Shi-Jie and Dubey, Abhishek},
  journal = {Pervasive and Mobile Computing},
  title = {Spatio-temporal AI inference engine for estimating hard disk reliability},
  year = {2021},
  issn = {1574-1192},
  pages = {101283},
  volume = {70},
  contribution = {lead},
  doi = {https://doi.org/10.1016/j.pmcj.2020.101283},
  keywords = {Remaining useful life, Long short term memory, Prognostics, Predictive health maintenance, Hierarchical clustering},
  tag = {ai4cps, platform},
  url = {http://www.sciencedirect.com/science/article/pii/S1574119220301231}
}
```
This paper focuses on building a spatio-temporal AI inference engine for estimating hard disk reliability. Most electronic systems such as hard disks routinely collect such reliability parameters in the field to monitor the health of the system. Changes in parameters as a function of time are monitored and any observed changes are compared with the known failure signatures. If the trajectory of the measured data matches that of a failure signature, operators are alerted to take corrective action. However, the interest of the operators lies in being able to identify the failures before they occur. The state of the art methodology including our prior work is to train machine learning models on temporal sequence data capturing the variations across multiple features and using it to predict the remaining useful life of the devices. However, as we show in this paper temporal prediction capability alone is not sufficient and can lead to low precision and the uncertainty around the prediction is very large. This is primarily due to the non-uniform progression of feature patterns over time. Our hypothesis is that the accuracy can be improved if we combine the temporal prediction methods with a spatial analysis that compares the value of key SMART features of the devices across similar model in a fixed time window (unlike the temporal method which uses the data from a single device and a much larger historical window). In this paper, we first describe both temporal and spatial approaches, describe the methods to select various hyperparameters, and then show a workflow to combine these two methodologies and provide comparative results. Our results illustrate that the average precision of temporal methods using long-short temporal memory networks to predict impending failures in the next ten days was 84 percent. To improve precision, we use the set of disks identified as potential failures and start applying spatial anomaly detection methods on those disks. This helps us remove the false positives from the temporal prediction results and provide a tighter bound on the set of disks with impending failure.
H. M. Mustafa, M. Bariya, K. S. Sajan, A. Chhokra, A. Srivastava, A. Dubey, A. von Meier, and G. Biswas, RT-METER: A Real-Time, Multi-Layer Cyber–Power Testbed for Resiliency Analysis, in 9th Workshop on Modeling and Simulation of Cyber-Physical Energy Systems, MSCPES@CPSIoTWeek, 2021.
```
@inproceedings{rtmeter2021,
  author = {Mustafa, Hussain M. and Bariya, Mohini and Sajan, K.S. and Chhokra, Ajay and Srivastava, Anurag and Dubey, Abhishek and von Meier, Alexandra and Biswas, Gautam},
  booktitle = {9th Workshop on Modeling and Simulation of Cyber-Physical Energy Systems, MSCPES@CPSIoTWeek},
  title = {RT-METER: A Real-Time, Multi-Layer Cyber–Power Testbed for Resiliency Analysis},
  year = {2021},
  category = {workshop},
  contribution = {colab},
  keywords = {power grid},
  project = {cps-reliability},
  tag = {platform,power}
}
```
In this work, we present a Real-Time, Multi-layer cybEr–power TestbEd for the Resiliency analysis (RT-METER) to support power grid operation and planning. Developed cyber-power testbed provides a mechanism for end-to-end validation of advanced tools for cyber-power grid monitoring, control, and planning. By integrating a host of features across three core layers—physical power system, communication network, and monitoring/ control center with advanced tools,—the testbed allows for the simulation of rich and varied cyber-power grid scenarios and the generating realistic sensor, system, and network data. Developing advanced tools to assist operators during complex and challenging scenarios is essential for the successful operation of the future grid. We detail a suite of algorithmic tools validated using the developed testbed for the realistic grid data.
A. Chhokra, S. Hasan, A. Dubey, and G. Karsai, A Binary Decision Diagram Based Cascade Prognostics Scheme For Power Systems, in 2020 American Control Conference (ACC), 2020, pp. 3011–3016.
```
@inproceedings{chokraACC2020,
  author = {Chhokra, Ajay and Hasan, Saqib and Dubey, Abhishek and Karsai, Gabor},
  booktitle = {2020 American Control Conference (ACC)},
  title = {A Binary Decision Diagram Based Cascade Prognostics Scheme For Power Systems},
  year = {2020},
  month = jul,
  pages = {3011-3016},
  contribution = {minor},
  doi = {10.23919/ACC45564.2020.9147902},
  issn = {2378-5861},
  keywords = {Load modeling;Binary decision diagrams;Circuit faults;Power system stability;Analytical models;Boolean functions;Binary Decision Diagrams;Contingency Analysis;Nonlinear Optimization;Sensitivity Analysis;Load Curtailment},
  tag = {platform,power}
}
```
Cascading outages in power systems is a rare, but important phenomenon with huge social and economic implications. Due to the inherent complexity and heterogeneity of components in power system, analysis and prediction of the current and future states of the system is a challenging task. In this paper, we address prognosis of cascading outages in power systems by employing a novel approach based on reduced ordered binary decision diagrams. We present a systemic way of synthesizing these decision diagrams based on a simple cascade model. We also describe a workflow for finding the emergency load curtailment actions as a part of the mitigation strategy. In the end, we show the applicability of our approach using the standard IEEE 14 bus system.
C. Hartsell, N. Mahadevan, H. Nine, T. Bapty, A. Dubey, and G. Karsai, Workflow Automation for Cyber Physical System Development Processes, in 2020 IEEE Workshop on Design Automation for CPS and IoT (DESTION), 2020.
```
@inproceedings{Hartsell_2020,
  author = {Hartsell, Charles and Mahadevan, Nagabhushan and Nine, Harmon and Bapty, Ted and Dubey, Abhishek and Karsai, Gabor},
  booktitle = {2020 IEEE Workshop on Design Automation for CPS and IoT (DESTION)},
  title = {Workflow Automation for Cyber Physical System Development Processes},
  year = {2020},
  month = apr,
  publisher = {IEEE},
  contribution = {colab},
  doi = {http://dx.doi.org/10.1109/DESTION50928.2020.00007},
  isbn = {9781728199948},
  journal = {2020 IEEE Workshop on Design Automation for CPS and IoT (DESTION)},
  tag = {platform}
}
```
Development of Cyber Physical Systems (CPSs) requires close interaction between developers with expertise in many domains to achieve ever-increasing demands for improved performance, reduced cost, and more system autonomy. Each engineering discipline commonly relies on domain-specific modeling languages, and analysis and execution of these models is often automated with appropriate tooling. However, integration between these heterogeneous models and tools is often lacking, and most of the burden for inter-operation of these tools is placed on system developers. To address this problem, we introduce a workflow modeling language for the automation of complex CPS development processes and implement a platform for execution of these models in the Assurance-based Learning-enabled CPS (ALC) Toolchain. Several illustrative examples are provided which show how these workflow models are able to automate many time-consuming integration tasks previously performed manually by system developers.
T. Bapty, A. Dubey, and J. Sztipanovits, Cyber-Physical Vulnerability Analysis of IoT Applications Using Multi-Modeling, in Modeling and Design of Secure Internet of Things, John Wiley & Sons, Ltd, 2020, pp. 161–184.
```
@inbook{baptydubeyjanos2020,
  author = {Bapty, Ted and Dubey, Abhishek and Sztipanovits, Janos},
  chapter = {8},
  pages = {161-184},
  publisher = {John Wiley & Sons, Ltd},
  title = {Cyber-Physical Vulnerability Analysis of IoT Applications Using Multi-Modeling},
  year = {2020},
  isbn = {9781119593386},
  booktitle = {Modeling and Design of Secure Internet of Things},
  contribution = {colab},
  doi = {10.1002/9781119593386.ch8},
  eprint = {https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781119593386.ch8},
  keywords = {energy injection, home automation system, IoT-based cyber-physical systems, low-level physical vulnerabilities, multi-modeling approach, vulnerability analysis},
  tag = {platform},
  url = {https://onlinelibrary.wiley.com/doi/abs/10.1002/9781119593386.ch8}
}
```
Summary Using the Smart Home as a use case, we examine the vulnerabilities in the system across the technologies used in its implementation. A typical smart home will contain a variety of sensors, actuators (e.g. for opening doors), communication links, storage devices, video cameras, network interfaces, and control units. Each of these physical components and subsystems must be secure in order for the overall system to be secure. Typical security analysis focuses on the defined interfaces of the system: network security via firewalls, communications encryption, and authentication at terminals. Unfortunately, many of these devices in the Internet of Things (IoT) space are susceptible to physical attacks via electromagnetic energy, or other sound/heat energy. Properly designed electromagnetic (EM) waveforms can access a range of vulnerabilities, providing unanticipated entry points into the system. In this chapter, we discuss a multi-modeling methodology for analyzing cyber-physical vulnerabilities, assessing the system across geometry, electronic, and behavioral domains. A home automation system is used as an example, showing a methodology for assessing vulnerabilities in hardware. The example exploits the use of EM energy injection. A multi-modeling of the system captures the geometric structure of the hardware with links to behavioral models. Low-energy EM pathways are discovered that may impact system behavior. Computation is minimized by applying analysis of EM effects only at behavior-critical inputs and outputs. The chapter also discusses a methodology for system-level impact analysis. The final conclusion is that susceptibility to physical layer presents many attack surfaces, due to a large number of heterogeneous IoT devices, mandating consideration of the physical dimensions to vulnerability analysis and risk mitigation.
K. Sajan, M. Bariya, S. Basak, A. K. Srivastava, A. Dubey, A. von Meier, and G. Biswas, Realistic Synchrophasor Data Generation for Anomaly Detection and Event Classification, in 8th Workshop on Modeling and Simulation of Cyber-Physical Energy Systems, MSCPES@CPSIoTWeek, 2020.
```
@inproceedings{basak2020mscpes,
  author = {Sajan, Kaduvettykunnal and Bariya, Mohini and Basak, Sanchita and Srivastava, Anurag K. and Dubey, Abhishek and von Meier, Alexandra and Biswas, Gautam},
  booktitle = {8th Workshop on Modeling and Simulation of Cyber-Physical Energy Systems, MSCPES@CPSIoTWeek},
  title = {Realistic Synchrophasor Data Generation for Anomaly Detection and Event Classification},
  year = {2020},
  category = {workshop},
  contribution = {lead},
  keywords = {transactive},
  project = {cps-reliability},
  tag = {platform,power}
}
```
The push to automate and digitize the electric grid has led to widespread installation of Phasor Measurement Units (PMUs) for improved real-time wide-area system monitoring and control. Nevertheless, transforming large volumes of highresolution PMU measurements into actionable insights remains challenging. A central challenge is creating flexible and scalable online anomaly detection in PMU data streams. PMU data can hold multiple types of anomalies arising in the physical system or the cyber system (measurements and communication networks). Increasing the grid situational awareness for noisy measurement data and Bad Data (BD) anomalies has become more and more significant. Number of machine learning, data analytics and physics based algorithms have been developed for anomaly detection, but need to be validated with realistic synchophasor data. Access to field data is very challenging due to confidentiality and security reasons. This paper presents a method for generating realistic synchrophasor data for the given synthetic network as well as event and bad data detection and classification algorithms. The developed algorithms include Bayesian and change-point techniques to identify anomalies, a statistical approach for event localization and multi-step clustering approach for event classification. Developed algorithms have been validated with satisfactory results for multiple examples of power system events including faults and load/generator/capacitor variations/switching for an IEEE test system. Set of synchrophasor data will be available publicly for other researchers.
A. Chhokra, N. Mahadevan, A. Dubey, and G. Karsa, Qualitative fault modeling in safety critical Cyber Physical Systems, in 12th System Analysis and Modelling Conference, 2020.
```
@inproceedings{chhokrasam2020,
  author = {Chhokra, Ajay and Mahadevan, Nagabhushan and Dubey, Abhishek and Karsa, Gabor},
  booktitle = {12th System Analysis and Modelling Conference},
  title = {Qualitative fault modeling in safety critical Cyber Physical Systems},
  year = {2020},
  contribution = {minor},
  tag = {platform}
}
```
One of the key requirements for designing safety critical cyber physical systems (CPS) is to ensure resiliency. Typically, the cyber sub-system in a CPS is empowered with protection devices that quickly detect and isolate faulty components to avoid failures. However, these protection devices can have internal faults that can cause cascading failures, leading to system collapse. Thus, to guarantee the resiliency of the system, it is necessary to identifythe root cause(s) of a given system disturbance to take appropriate control actions. Correct failure diagnosis in such systems depends upon an integrated fault model of the system that captures the effect of faults in CPS as well as nominal and faulty operation of protection devices, sensors, and actuators. In this paper, we propose a novel graph based qualitative fault modeling formalism for CPS, called, Temporal Causal Diagrams(TCDs) that allow system designers to effectively represent faultsand their effects in both physical and cyber sub-systems. The paper also discusses in detail the fault propagation and execution semantics of a TCD model by translating to timed automata and thus allowing an efficient means to quickly analyze, validate and verify the fault model. In the end, we show the efficacy of the modeling approach with the help of a case study from energy system.
S. Eisele, C. Barreto, A. Dubey, X. Koutsoukos, T. Eghtesad, A. Laszka, and A. Mavridou, Blockchains for Transactive Energy Systems: Opportunities, Challenges, and Approaches, IEEE Computer, 2020.
```
@article{eisele2020Blockchains,
  author = {Eisele, Scott and Barreto, Carlos and Dubey, Abhishek and Koutsoukos, Xenofon and Eghtesad, Taha and Laszka, Aron and Mavridou, Anastasia},
  journal = {IEEE Computer},
  title = {Blockchains for Transactive Energy Systems: Opportunities, Challenges, and Approaches},
  year = {2020},
  contribution = {lead},
  tag = {platform,decentralization,power}
}
```
The emergence of blockchains and smart contracts have renewed interest in electrical cyber-physical systems, especially in the area of transactive energy systems. However, despite recent advances, there remain significant challenges that impede the practical adoption of blockchains in transactive energy systems, which include implementing complex market mechanisms in smart contracts, ensuring safety of the power system, and protecting residential consumers’ privacy. To address these challenges, we present TRANSAX, a blockchain-based transactive energy system that provides an efficient, safe, and privacy-preserving market built on smart contracts. Implementation and deployment of TRANSAX in a verifiably correct and efficient way is based on VeriSolid, a framework for the correct-by-construction development of smart contracts, and RIAPS, a middleware for resilient distributed power systems
S. Eisele, T. Eghtesad, N. Troutman, A. Laszka, and A. Dubey, Mechanisms for Outsourcing Computation via a Decentralized Market, in 14TH ACM International Conference on Distributed and Event Based Systems, 2020.
```
@inproceedings{eisele2020mechanisms,
  author = {Eisele, Scott and Eghtesad, Taha and Troutman, Nicholas and Laszka, Aron and Dubey, Abhishek},
  booktitle = {14TH ACM International Conference on Distributed and Event Based Systems},
  title = {Mechanisms for Outsourcing Computation via a Decentralized Market},
  year = {2020},
  acceptance = {25.5},
  category = {selectiveconference},
  contribution = {lead},
  keywords = {transactive},
  tag = {platform,decentralization}
}
```
As the number of personal computing and IoT devices grows rapidly, so does the amount of computational power that is available at the edge. Since many of these devices are often idle, there is a vast amount of computational power that is currently untapped, and which could be used for outsourcing computation. Existing solutions for harnessing this power, such as volunteer computing (e.g., BOINC), are centralized platforms in which a single organization or company can control participation and pricing. By contrast, an open market of computational resources, where resource owners and resource users trade directly with each other, could lead to greater participation and more competitive pricing. To provide an open market, we introduce MODiCuM, a decentralized system for outsourcing computation. MODiCuM deters participants from misbehaving-which is a key problem in decentralized systems-by resolving disputes via dedicated mediators and by imposing enforceable fines. However, unlike other decentralized outsourcing solutions, MODiCuM minimizes computational overhead since it does not require global trust in mediation results. We provide analytical results proving that MODiCuM can deter misbehavior, and we evaluate the overhead of MODiCuM using experimental results based on an implementation of our platform.
P. Ghosh, S. Eisele, A. Dubey, M. Metelko, I. Madari, P. Volgyesi, and G. Karsai, Designing a decentralized fault-tolerant software framework for smart grids and its applications, Journal of Systems Architecture, vol. 109, p. 101759, 2020.
```
@article{GHOSH2020101759,
  author = {Ghosh, Purboday and Eisele, Scott and Dubey, Abhishek and Metelko, Mary and Madari, Istvan and Volgyesi, Peter and Karsai, Gabor},
  journal = {Journal of Systems Architecture},
  title = {Designing a decentralized fault-tolerant software framework for smart grids and its applications},
  year = {2020},
  issn = {1383-7621},
  pages = {101759},
  volume = {109},
  contribution = {minor},
  doi = {https://doi.org/10.1016/j.sysarc.2020.101759},
  keywords = {Component, Fault tolerance, Distributed systems, Smart grid},
  tag = {platform},
  url = {http://www.sciencedirect.com/science/article/pii/S1383762120300539}
}
```
The vision of the ‘Smart Grid’ anticipates a distributed real-time embedded system that implements various monitoring and control functions. As the reliability of the power grid is critical to modern society, the software supporting the grid must support fault tolerance and resilience of the resulting cyber-physical system. This paper describes the fault-tolerance features of a software framework called Resilient Information Architecture Platform for Smart Grid (RIAPS). The framework supports various mechanisms for fault detection and mitigation and works in concert with the applications that implement the grid-specific functions. The paper discusses the design philosophy for and the implementation of the fault tolerance features and presents an application example to show how it can be used to build highly resilient systems.
S. Hasan, A. Dubey, G. Karsai, and X. Koutsoukos, A game-theoretic approach for power systems defense against dynamic cyber-attacks, International Journal of Electrical Power & Energy Systems, vol. 115, 2020.
```
@article{Hasan2020,
  author = {Hasan, Saqib and Dubey, Abhishek and Karsai, Gabor and Koutsoukos, Xenofon},
  journal = {International Journal of Electrical Power \& Energy Systems},
  title = {A game-theoretic approach for power systems defense against dynamic cyber-attacks},
  year = {2020},
  issn = {0142-0615},
  volume = {115},
  contribution = {colab},
  doi = {https://doi.org/10.1016/j.ijepes.2019.105432},
  file = {:Hasan2020-A_Game_Theoretic_Approach_for_Power_Systems_Defense_against_Dynamic_Cyber_Attacks.pdf:PDF},
  keywords = {Cascading failures, Cyber-attack, Dynamic attack, Game theory, Resilience, Smart grid, Static attack, smartgrid, reliability},
  project = {cps-reliability},
  tag = {platform,power},
  url = {http://www.sciencedirect.com/science/article/pii/S0142061519302807}
}
```
Technological advancements in today’s electrical grids give rise to new vulnerabilities and increase the potential attack surface for cyber-attacks that can severely affect the resilience of the grid. Cyber-attacks are increasing both in number as well as sophistication and these attacks can be strategically organized in chronological order (dynamic attacks), where they can be instantiated at different time instants. The chronological order of attacks enables us to uncover those attack combinations that can cause severe system damage but this concept remained unexplored due to the lack of dynamic attack models. Motivated by the idea, we consider a game-theoretic approach to design a new attacker-defender model for power systems. Here, the attacker can strategically identify the chronological order in which the critical substations and their protection assemblies can be attacked in order to maximize the overall system damage. However, the defender can intelligently identify the critical substations to protect such that the system damage can be minimized. We apply the developed algorithms to the IEEE-39 and 57 bus systems with finite attacker/defender budgets. Our results show the effectiveness of these models in improving the system resilience under dynamic attacks.
Z. Kang, R. Canady, A. Dubey, A. Gokhale, S. Shekhar, and M. Sedlacek, A Study of Publish/Subscribe Middleware Under Different IoT Traffic Conditions, in Proceedings of the 7th Workshop on Middleware and Applications for the Internet of Things, M4IoT@Middleware, 2020.
```
@inproceedings{m4iot2020,
  author = {Kang, Zhuangwei and Canady, Robert and Dubey, Abhishek and Gokhale, Aniruddha and Shekhar, Shashank and Sedlacek, Matous},
  booktitle = {Proceedings of the 7th Workshop on Middleware and Applications for the Internet of Things, M4IoT@Middleware},
  title = {A Study of Publish/Subscribe Middleware Under Different
                    IoT Traffic Conditions},
  year = {2020},
  contribution = {minor},
  tag = {platform}
}
```
Publish/Subscribe (pub/sub) semantics are critical forIoT applications due to their loosely coupled nature.Although OMG DDS, MQTT, and ZeroMQ are mature pub/sub solutions used for IoT, prior studies show that their performance varies significantly under differentload conditions and QoS configurations, which makes middleware selection and configuration decisions hard. Moreover, the load conditions and role of QoS settings inprior comparison studies are not comprehensive and well-documented. To address these limitations, we (1) propose a set of performance-related properties for pub/sub middleware and investigate their support in DDS, MQTT,and ZeroMQ; (2) perform systematic experiments under three representative, lab-based real-world IoT use cases; and (3) improve DDS performance by applying three of our proposed QoS properties. Empirical results show that DDS has the most thorough QoS support, and more reliable performance in most scenarios. In addition, its Multicast, TurboMode, and AutoThrottle QoS policies can effectively improve DDS performance in terms of throughput and latency
S. Nannapaneni, S. Mahadevan, A. Dubey, and Y.-T. T. Lee, Online monitoring and control of a cyber-physical manufacturing process under uncertainty, Journal of Intelligent Manufacturing, pp. 1–16, 2020.
```
@article{nannapaneni2020online,
  author = {Nannapaneni, Saideep and Mahadevan, Sankaran and Dubey, Abhishek and Lee, Yung-Tsun Tina},
  journal = {Journal of Intelligent Manufacturing},
  title = {Online monitoring and control of a cyber-physical manufacturing process under uncertainty},
  year = {2020},
  pages = {1--16},
  contribution = {minor},
  doi = {https://doi.org/10.1007/s10845-020-01609-7},
  publisher = {Springer},
  tag = {platform}
}
```
Recent technological advancements in computing, sensing and communication have led to the development of cyber-physical manufacturing processes, where a computing subsystem monitors the manufacturing process performance in real-time by analyzing sensor data and implements the necessary control to improve the product quality. This paper develops a predictive control framework where control actions are implemented after predicting the state of the manufacturing process or product quality at a future time using process models. In a cyber-physical manufacturing process, the product quality predictions may be affected by uncertainty sources from the computing subsystem (resource and communication uncertainty), manufacturing process (input uncertainty, process variability and modeling errors), and sensors (measurement uncertainty). In addition, due to the continuous interactions between the computing subsystem and the manufacturing process, these uncertainty sources may aggregate and compound over time. In some cases, some process parameters needed for model predictions may not be precisely known and may need to be derived from real time sensor data. This paper develops a dynamic Bayesian network approach, which enables the aggregation of multiple uncertainty sources, parameter estimation and robust prediction for online control. As the number of process parameters increase, their estimation using sensor data in real-time can be computationally expensive. To facilitate real-time analysis, variance-based global sensitivity analysis is used for dimension reduction. The proposed methodology of online monitoring and control under uncertainty, and dimension reduction, are illustrated for a cyber-physical turning process.
S. Shekhar, A. Chhokra, H. Sun, A. Gokhale, A. Dubey, X. Koutsoukos, and G. Karsai, URMILA: Dynamically Trading-off Fog and Edge Resources for Performance and Mobility-Aware IoT Services, Journal of Systems Architecture, 2020.
```
@article{SHEKHAR2020101710,
  author = {Shekhar, Shashank and Chhokra, Ajay and Sun, Hongyang and Gokhale, Aniruddha and Dubey, Abhishek and Koutsoukos, Xenofon and Karsai, Gabor},
  journal = {Journal of Systems Architecture},
  title = {URMILA: Dynamically Trading-off Fog and Edge Resources for Performance and Mobility-Aware IoT Services},
  year = {2020},
  contribution = {colab},
  issn = {1383-7621},
  doi = {https://doi.org/10.1016/j.sysarc.2020.101710},
  keywords = {Fog/Edge Computing, User Mobility, Latency-sensitive IoT Services, Resource Management, middleware, performance},
  project = {cps-middleware},
  tag = {platform,transit},
  url = {http://www.sciencedirect.com/science/article/pii/S1383762120300047}
}
```
The fog/edge computing paradigm is increasingly being adopted to support a range of latency-sensitive IoT services due to its ability to assure the latency requirements of these services while supporting the elastic properties of cloud computing. IoT services that cater to user mobility, however, face a number of challenges in this context. First, since user mobility can incur wireless connectivity issues, executing these services entirely on edge resources, such as smartphones, will result in a rapid drain in the battery charge. In contrast, executing these services entirely on fog resources, such as cloudlets or micro data centers, will incur higher communication costs and increased latencies in the face of fluctuating wireless connectivity and signal strength. Second, a high degree of multi-tenancy on fog resources involving different IoT services can lead to performance interference issues due to resource contention. In order to address these challenges, this paper describes URMILA, which makes dynamic resource management decisions to achieve effective trade-offs between using the fog and edge resources yet ensuring that the latency requirements of the IoT services are met. We evaluate URMILA’s capabilities in the context of a real-world use case on an emulated but realistic IoT testbed.
B. Potteiger, F. Cai, A. Dubey, X. Koutsoukos, and Z. Zhang, Security in Mixed Time and Event Triggered Cyber-Physical Systems using Moving Target Defense, in 2020 IEEE 23rd International Symposium on Real-Time Distributed Computing (ISORC), 2020, pp. 89–97.
```
@inproceedings{Potteiger2020,
  author = {{Potteiger}, B. and {Cai}, F. and {Dubey}, A. and {Koutsoukos}, X. and {Zhang}, Z.},
  booktitle = {2020 IEEE 23rd International Symposium on Real-Time Distributed Computing (ISORC)},
  title = {Security in Mixed Time and Event Triggered Cyber-Physical Systems using Moving Target Defense},
  year = {2020},
  pages = {89-97},
  contribution = {minor},
  doi = {https://doi.org/10.1109/ISORC49007.2020.00022},
  tag = {platform}
}
```
Memory corruption attacks such as code injection, code reuse, and non-control data attacks have become widely popular for compromising safety-critical Cyber-Physical Systems (CPS). Moving target defense (MTD) techniques such as instruction set randomization (ISR), address space randomization (ASR), and data space randomization (DSR) can be used to protect systems against such attacks. CPS often use time-triggered architectures to guarantee predictable and reliable operation. MTD techniques can cause time delays with unpredictable behavior. To protect CPS against memory corruption attacks, MTD techniques can be implemented in a mixed time and event-triggered architecture that provides capabilities for maintaining safety and availability during an attack. This paper presents a mixed time and event-triggered MTD security approach based on the ARINC 653 architecture that provides predictable and reliable operation during normal operation and rapid detection and reconfiguration upon detection of attacks. We leverage a hardware-in-the-loop testbed and an advanced emergency braking system (AEBS) case study to show the effectiveness of our approach.
H. Tu, Y. Du, H. Yu, A. Dubey, S. Lukic, and G. Karsai, Resilient Information Architecture Platform for the Smart Grid: A Novel Open-Source Platform for Microgrid Control, IEEE Transactions on Industrial Electronics, vol. 67, no. 11, pp. 9393–9404, 2020.
```
@article{riaps2020,
  author = {{Tu}, H. and {Du}, Y. and {Yu}, H. and {Dubey}, Abhishek and {Lukic}, S. and {Karsai}, G.},
  journal = {IEEE Transactions on Industrial Electronics},
  title = {Resilient Information Architecture Platform for the Smart Grid: A Novel Open-Source Platform for Microgrid Control},
  year = {2020},
  number = {11},
  pages = {9393-9404},
  volume = {67},
  contribution = {colab},
  tag = {platform}
}
```
Microgrids are seen as an effective way to achieve reliable, resilient, and efficient operation of the power distribution system. Core functions of the microgrid control system are defined by the IEEE Standard 2030.7; however, the algorithms that realize these functions are not standardized, and are a topic of research. Furthermore, the corresponding controller hardware, operating system, and communication system to implement these functions vary significantly from one implementation to the next. In this article, we introduce an open-source platform, resilient information architecture platform for the smart grid (RIAPS), ideally suited for implementing and deploying distributed microgrid control algorithms. RIAPS provides a design-time tool suite for development and deployment of distributed microgrid control algorithms. With support from a number of run-time platform services, developed algorithms can be easily implemented and deployed into real microgrids. To demonstrate the unique features of RIAPS, we propose and implement a distributed microgrid secondary control algorithm capable of synchronized and proportional compensation of voltage unbalance using distributed generators. Test results show the effectiveness of the proposed control and the salient features of the RIAPS platform.
S. Shekhar, A. Chhokra, H. Sun, A. Gokhale, A. Dubey, X. Koutsoukos, and G. Karsai, URMILA: Dynamically Trading-off Fog and Edge Resources for Performance and Mobility-Aware IoT Services, Journal of Systems Architecture, 2020.
```
@article{SHEKHAR2020101711,
  author = {Shekhar, Shashank and Chhokra, Ajay and Sun, Hongyang and Gokhale, Aniruddha and Dubey, Abhishek and Koutsoukos, Xenofon and Karsai, Gabor},
  journal = {Journal of Systems Architecture},
  title = {URMILA: Dynamically Trading-off Fog and Edge Resources for Performance and Mobility-Aware IoT Services},
  year = {2020},
  issn = {1383-7621},
  contribution = {colab},
  doi = {https://doi.org/10.1016/j.sysarc.2020.101710},
  keywords = {Fog/Edge Computing, User Mobility, Latency-sensitive IoT Services, Resource Management, middleware, performance},
  project = {cps-middleware},
  tag = {platform,transit},
  url = {http://www.sciencedirect.com/science/article/pii/S1383762120300047}
}
```
The fog/edge computing paradigm is increasingly being adopted to support a range of latency-sensitive IoT services due to its ability to assure the latency requirements of these services while supporting the elastic properties of cloud computing. IoT services that cater to user mobility, however, face a number of challenges in this context. First, since user mobility can incur wireless connectivity issues, executing these services entirely on edge resources, such as smartphones, will result in a rapid drain in the battery charge. In contrast, executing these services entirely on fog resources, such as cloudlets or micro data centers, will incur higher communication costs and increased latencies in the face of fluctuating wireless connectivity and signal strength. Second, a high degree of multi-tenancy on fog resources involving different IoT services can lead to performance interference issues due to resource contention. In order to address these challenges, this paper describes URMILA, which makes dynamic resource management decisions to achieve effective trade-offs between using the fog and edge resources yet ensuring that the latency requirements of the IoT services are met. We evaluate URMILA’s capabilities in the context of a real-world use case on an emulated but realistic IoT testbed.
A. Laszka, A. Mavridou, S. Eisele, E. Statchtiari, and A. Dubey, VeriSolid for TRANSAX: Correct-by-Design Ethereum Smart Contracts for Energy Trading, in First International Summer School on Security and Privacy for Blockchains and Distributed Ledger Technologies, BDLT 2019, Vienna, Austria, 2019.
```
@inproceedings{LaszkaVerisolid2019,
  author = {Laszka, Aron and Mavridou, Anastasia and Eisele, Scott and Statchtiari, Emmanouela and Dubey, Abhishek},
  booktitle = {First International Summer School on Security and Privacy for Blockchains and Distributed Ledger Technologies, BDLT 2019, Vienna, Austria},
  title = {VeriSolid for TRANSAX: Correct-by-Design Ethereum Smart Contracts for Energy Trading},
  year = {2019},
  month = sep,
  category = {workshop},
  contribution = {colab},
  file = {:LaszkaVerisolid2019Poster.pdf:PDF},
  keywords = {blockchain, transactive},
  project = {cps-blockchains,transactive-energy},
  tag = {platform,decentralization,power}
}
```
The adoption of blockchain based platforms is rising rapidly. Their popularity is explained by their ability to maintain a distributed public ledger, providing reliability, integrity, and auditability with- out a trusted entity. Recent platforms, e.g., Ethereum, also act as distributed computing platforms and enable the creation of smart contracts, i.e., software code that runs on the platform and automatically executes and enforces the terms of a contract. Since smart contracts can perform any computation, they allow the develop- ment of decentralized applications, whose execution is safeguarded by the security properties of the underlying platform. Due to their unique advantages, blockchain based platforms are envisioned to have a wide range of applications, ranging from financial to the Internet-of-Things. However, the trustworthiness of the platform guarantees only that a smart contract is executed correctly, not that the code of the contract is correct. In fact, a large number of contracts deployed in practice suffer from software vulnerabilities, which are often introduced due to the semantic gap between the assumptions that contract writers make about the underlying execution semantics and the actual semantics of smart contracts. A recent automated analysis of 19,336 smart contracts deployed in practice found that 8,333 of them suffered from at least one security issue. Although this study was based on smart contracts deployed on the public Ethereum blockchain, the analyzed security issues were largely plat- form agnostic. Security vulnerabilities in smart contracts present a serious issue for two main reasons. Firstly, smart-contract bugs cannot be patched. By design, once a contract is deployed, its func- tionality cannot be altered even by its creator. Secondly, once a faulty or malicious transaction is recorded, it cannot be removed from the blockchain (“code is law” principle). The only way to roll back a transaction is by performing a hard fork of the blockchain, which requires consensus among the stakeholders and undermines the trustworthiness of the platform. In light of this, it is crucial to ensure that a smart contract is se- cure before deploying it and trusting it with significant amounts of cryptocurrency. To this end, we present the VeriSolid framework for the formal verification and generation of contracts that are specified using a transition-system based model with rigorous operational semantics. VeriSolid provides an end-to-end design framework, which combined with a Solidity code generator, allows the correct- by-design development of Ethereum smart contracts. To the best of our knowledge, VeriSolid is the first framework to promote a model- based, correctness-by-design approach for blockchain-based smart contracts. Properties established at any step of the VeriSolid design flow are preserved in the resulting smart contracts, guaranteeing their correctness. VeriSolid fully automates the process of verifica- tion and code generation, while enhancing usability by providing easy-to-use graphical editors for the specification of transition sys- tems and natural-like language templates for the specification of formal properties. By performing verification early at design time, VeriSolid provides a cost-effective approach since fixing bugs later in the development process can be very expensive. Our verification approach can detect typical vulnerabilities, but it may also detect any violation of required properties. Since our tool applies verifi- cation at a high-level, it can provide meaningful feedback to the developer when a property is not satisfied, which would be much harder to do at bytecode level. We present the application of VeriSolid on smart contracts used in Smart Energy Systems such as transactive energy platforms. In particular, we used VeriSolid to design and generate the smart contract that serves as the core of the TRANSAX blockchain-based platform for trading energy futures. The designed smart contract allows energy producers and consumers to post offers for selling and buying energy. Since optimally matching selling offers with buying offers can be very expensive computationally, the contract relies on external solvers to compute and submit solutions to the matching problem, which are then checked by the contract. Using VeriSolid, we defined a set of safety properties and we were able to detect bugs after performing analysis with the NuSMV model checker.
A. Dubey, W. Emfinger, A. Gokhale, P. Kumar, D. McDermet, T. Bapty, and G. Karsai, Enabling Strong Isolation for Distributed Real-Time Applications in Edge Computing Scenarios, IEEE Aerospace and Electronic Systems Magazine, vol. 34, no. 7, pp. 32–45, Jul. 2019.
```
@article{Dubey2019c,
  author = {Dubey, Abhishek and {Emfinger}, W. and {Gokhale}, A. and {Kumar}, P. and {McDermet}, D. and {Bapty}, T. and {Karsai}, G.},
  journal = {IEEE Aerospace and Electronic Systems Magazine},
  title = {Enabling Strong Isolation for Distributed Real-Time Applications in Edge Computing Scenarios},
  year = {2019},
  issn = {1557-959X},
  month = jul,
  number = {7},
  pages = {32-45},
  volume = {34},
  contribution = {lead},
  doi = {10.1109/MAES.2019.2905921},
  file = {:Dubey2019c-Enabling_Strong_Isolation_for_Distributed_Real-Time_Applications_in_Edge_Computing_Scenarios.pdf:PDF},
  keywords = {middleware},
  project = {cps-middleware,cps-reliability},
  tag = {platform}
}
```
Distributed coexisting applications found in the military and space domains, which operate over managed but shared computing resources at the edge require strong isolation from each other. The state of the art for computation sharing at the edge is traditionally based on Docker and similar pseudovirtualization features. Our team has been working on an end-to-end architecture that provides strong spatial and temporal isolation similar to what has become standard in avionics communities. In this paper, we describe an open-source extension to Linux that we have designed and implemented for our distributed real-time embedded managed systems (DREMS) architecture. The key concepts are the partitioning scheduler, strong security design, and a health management interface.

M. Wilbur, A. Dubey, B. Leão, and S. Bhattacharjee, A Decentralized Approach for Real Time Anomaly Detection in Transportation Networks, in IEEE International Conference on Smart Computing, SMARTCOMP 2019, Washington, DC, USA, 2019, pp. 274–282.

@inproceedings{Wilbur2019,
  author = {Wilbur, Michael and Dubey, Abhishek and Le{\~{a}}o, Bruno and Bhattacharjee, Shameek},
  booktitle = {{IEEE} International Conference on Smart Computing, {SMARTCOMP} 2019, Washington, DC, USA},
  title = {A Decentralized Approach for Real Time Anomaly Detection in Transportation Networks},
  year = {2019},
  month = jun,
  acceptance = {29},
  pages = {274--282},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/smartcomp/WilburDLB19},
  category = {selectiveconference},
  contribution = {lead},
  doi = {10.1109/SMARTCOMP.2019.00063},
  file = {:Wilbur2019-A_Decentralized_Approach_for_Real_Time_Anomaly_Detection_in_Transportation_Networks.pdf:PDF},
  keywords = {transit, reliability},
  project = {cps-reliability,smart-transit,smart-cities},
  tag = {ai4cps,platform,decentralization,incident,transit},
  timestamp = {Wed, 16 Oct 2019 14:14:54 +0200},
  url = {https://doi.org/10.1109/SMARTCOMP.2019.00063}
}

A. Dubey, G. Karsai, P. Völgyesi, M. Metelko, I. Madari, H. Tu, Y. Du, and S. Lukic, Device Access Abstractions for Resilient Information Architecture Platform for Smart Grid, Embedded Systems Letters, vol. 11, no. 2, pp. 34–37, 2019.
```
@article{Dubey2019,
  author = {Dubey, Abhishek and Karsai, Gabor and V{\"{o}}lgyesi, P{\'{e}}ter and Metelko, Mary and Madari, Istv{\'{a}}n and Tu, Hao and Du, Yuhua and Lukic, Srdjan},
  journal = {Embedded Systems Letters},
  title = {Device Access Abstractions for Resilient Information Architecture Platform for Smart Grid},
  year = {2019},
  number = {2},
  pages = {34--37},
  volume = {11},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/journals/esl/DubeyKVMMTDL19},
  contribution = {lead},
  doi = {10.1109/LES.2018.2845854},
  file = {:Dubey2019-Device_Access_Abstractions_for_Resilient_Information_Architecture_Platform_for_Smart_Grid.pdf:PDF},
  keywords = {middleware, smartgrid},
  project = {cps-middleware},
  tag = {platform,power},
  timestamp = {Fri, 05 Jul 2019 01:00:00 +0200},
  url = {https://doi.org/10.1109/LES.2018.2845854}
}
```
This letter presents an overview of design mechanisms to abstract device access protocols in the resilient information architecture platform for smart grid, a middleware for developing distributed smart grid applications. These mechanisms are required to decouple the application functionality from the specifics of the device mechanisms built by the device vendors.
P. Ghosh, S. Eisele, A. Dubey, M. Metelko, I. Madari, P. Völgyesi, and G. Karsai, On the Design of Fault-Tolerance in a Decentralized Software Platform for Power Systems, in IEEE 22nd International Symposium on Real-Time Distributed Computing, ISORC 2019, Valencia, Spain, 2019, pp. 52–60.
```
@inproceedings{Ghosh2019,
  author = {Ghosh, Purboday and Eisele, Scott and Dubey, Abhishek and Metelko, Mary and Madari, Istv{\'{a}}n and V{\"{o}}lgyesi, P{\'{e}}ter and Karsai, Gabor},
  booktitle = {{IEEE} 22nd International Symposium on Real-Time Distributed Computing, {ISORC} 2019, Valencia, Spain},
  title = {On the Design of Fault-Tolerance in a Decentralized Software Platform for Power Systems},
  year = {2019},
  pages = {52--60},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/isorc/GhoshEDMMVK19},
  category = {selectiveconference},
  contribution = {minor},
  doi = {10.1109/ISORC.2019.00018},
  file = {:Ghosh2019-On_the_Design_of_Fault-Tolerance_in_a_Decentralized_Software_Platform_for_Power_Systems.pdf:PDF},
  keywords = {middleware},
  project = {cps-middleware,cps-reliability},
  tag = {platform,decentralization,power},
  timestamp = {Wed, 16 Oct 2019 14:14:53 +0200},
  url = {https://doi.org/10.1109/ISORC.2019.00018}
}
```
The vision of the ‘Smart Grid’ assumes a distributed real-time embedded system that implements various monitoring and control functions. As the reliability of the power grid is critical to modern society, the software supporting the grid must support fault tolerance and resilience in the resulting cyber-physical system. This paper describes the fault-tolerance features of a software framework called Resilient Information Architecture Platform for Smart Grid (RIAPS). The framework supports various mechanisms for fault detection and mitigation and works in concert with the applications that implement the grid-specific functions. The paper discusses the design philosophy for and the implementation of the fault tolerance features and presents an application example to show how it can be used to build highly resilient systems.
T. Krentz, A. Dubey, and G. Karsai, Short Paper: Towards An Edge-Located Time-Series Database, in IEEE 22nd International Symposium on Real-Time Distributed Computing, ISORC 2019, Valencia, Spain, May 7-9, 2019, 2019, pp. 151–154.
```
@inproceedings{Krentz2019,
  author = {Krentz, Timothy and Dubey, Abhishek and Karsai, Gabor},
  booktitle = {{IEEE} 22nd International Symposium on Real-Time Distributed Computing, {ISORC} 2019, Valencia, Spain, May 7-9, 2019},
  title = {Short Paper: Towards An Edge-Located Time-Series Database},
  year = {2019},
  pages = {151--154},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/isorc/KrentzDK19},
  category = {selectiveconference},
  contribution = {minor},
  doi = {10.1109/ISORC.2019.00037},
  file = {:Krentz2019-Towards_An_Edge-Located_Time-Series_Database.pdf:PDF},
  keywords = {middleware},
  project = {cps-middleware},
  tag = {platform},
  timestamp = {Wed, 16 Oct 2019 14:14:53 +0200},
  url = {https://doi.org/10.1109/ISORC.2019.00037}
}
```
Smart infrastructure demands resilient data storage, and emerging applications execute queries on this data over time. Typically, time-series databases serve these queries; however, cloud-based time-series storage can be prohibitively expensive. As smart devices proliferate, the amount of computing power and memory available in our connected infrastructure provides the opportunity to move resilient time-series data storage and analytics to the edge. This paper proposes time-series storage in a Distributed Hash Table (DHT), and a novel key-generation technique that provides time-indexed reads and writes for key-value pairs. Experimental results show this technique meets demands for smart infrastructure situations.
A. Mavridou, A. Laszka, E. Stachtiari, and A. Dubey, VeriSolid: Correct-by-Design Smart Contracts for Ethereum, in Financial Cryptography and Data Security - 23rd International Conference, FC 2019, Frigate Bay, St. Kitts and Nevis, Revised Selected Papers, 2019, pp. 446–465.
```
@inproceedings{Mavridou2019,
  author = {Mavridou, Anastasia and Laszka, Aron and Stachtiari, Emmanouela and Dubey, Abhishek},
  booktitle = {Financial Cryptography and Data Security - 23rd International Conference, {FC} 2019, Frigate Bay, St. Kitts and Nevis, Revised Selected Papers},
  title = {VeriSolid: Correct-by-Design Smart Contracts for Ethereum},
  year = {2019},
  pages = {446--465},
  acceptance = {21.9},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/fc/MavridouLSD19},
  category = {selectiveconference},
  contribution = {colab},
  doi = {10.1007/978-3-030-32101-7\_27},
  file = {:Mavridou2019-VeriSolid_Correct_by_Design_Smart_Contracts_for_Ethereum.pdf:PDF},
  keywords = {blockchain},
  project = {cps-blockchains},
  tag = {platform,decentralization},
  timestamp = {Mon, 14 Oct 2019 14:51:20 +0200},
  url = {https://doi.org/10.1007/978-3-030-32101-7\_27}
}
```
The adoption of blockchain based distributed ledgers is growing fast due to their ability to provide reliability, integrity, and auditability without trusted entities. One of the key capabilities of these emerging platforms is the ability to create self-enforcing smart contracts. However, the development of smart contracts has proven to be error-prone in practice, and as a result, contracts deployed on public platforms are often riddled with security vulnerabilities. This issue is exacerbated by the design of these platforms, which forbids updating contract code and rolling back malicious transactions. In light of this, it is crucial to ensure that a smart contract is secure before deploying it and trusting it with significant amounts of cryptocurrency. To this end, we introduce the VeriSolid framework for the formal verification of contracts that are specified using a transition-system based model with rigorous operational semantics. Our model-based approach allows developers to reason about and verify contract behavior at a high level of abstraction. VeriSolid allows the generation of Solidity code from the verified models, which enables the correct-by-design development of smart contracts.
G. Pettet, S. Sahoo, and A. Dubey, Towards an Adaptive Multi-Modal Traffic Analytics Framework at the Edge, in IEEE International Conference on Pervasive Computing and Communications Workshops, PerCom Workshops 2019, Kyoto, Japan, March 11-15, 2019, 2019, pp. 511–516.
```
@inproceedings{Pettet2019a,
  author = {Pettet, Geoffrey and Sahoo, Saroj and Dubey, Abhishek},
  booktitle = {{IEEE} International Conference on Pervasive Computing and Communications Workshops, PerCom Workshops 2019, Kyoto, Japan, March 11-15, 2019},
  title = {Towards an Adaptive Multi-Modal Traffic Analytics Framework at the Edge},
  year = {2019},
  pages = {511--516},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/percom/PettetSD19},
  category = {workshop},
  contribution = {lead},
  doi = {10.1109/PERCOMW.2019.8730577},
  file = {:Pettet2019a-Towards_an_Adaptive_Multi-Modal_Traffic_Analytics_Framework_at_the_Edge.pdf:PDF},
  keywords = {middleware, transit},
  project = {cps-middleware,smart-transit,smart-cities},
  tag = {platform,incident,transit},
  timestamp = {Wed, 16 Oct 2019 14:14:54 +0200},
  url = {https://doi.org/10.1109/PERCOMW.2019.8730577}
}
```
The Internet of Things (IoT) requires distributed, large scale data collection via geographically distributed devices. While IoT devices typically send data to the cloud for processing, this is problematic for bandwidth constrained applications. Fog and edge computing (processing data near where it is gathered, and sending only results to the cloud) has become more popular, as it lowers network overhead and latency. Edge computing often uses devices with low computational capacity, therefore service frameworks and middleware are needed to efficiently compose services. While many frameworks use a top-down perspective, quality of service is an emergent property of the entire system and often requires a bottom up approach. We define services as multi-modal, allowing resource and performance tradeoffs. Different modes can be composed to meet an application’s high level goal, which is modeled as a function. We examine a case study for counting vehicle traffic through intersections in Nashville. We apply object detection and tracking to video of the intersection, which must be performed at the edge due to privacy and bandwidth constraints. We explore the hardware and software architectures, and identify the various modes. This paper lays the foundation to formulate the online optimization problem presented by the system which makes tradeoffs between the quantity of services and their quality constrained by available resources.
S. Shekhar, A. Chhokra, H. Sun, A. Gokhale, A. Dubey, and X. D. Koutsoukos, Supporting fog/edge-based cognitive assistance IoT services for the visually impaired: poster abstract, in Proceedings of the International Conference on Internet of Things Design and Implementation, IoTDI 2019, Montreal, QC, Canada, 2019, pp. 275–276.
```
@inproceedings{Shekhar2019,
  author = {Shekhar, Shashank and Chhokra, Ajay and Sun, Hongyang and Gokhale, Aniruddha and Dubey, Abhishek and Koutsoukos, Xenofon D.},
  booktitle = {Proceedings of the International Conference on Internet of Things Design and Implementation, IoTDI 2019, Montreal, QC, Canada},
  title = {Supporting fog/edge-based cognitive assistance IoT services for the visually impaired: poster abstract},
  year = {2019},
  pages = {275--276},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/iotdi/ShekharCSGDK19},
  category = {poster},
  contribution = {minor},
  doi = {10.1145/3302505.3312592},
  file = {:Shekhar2019-Supporting_fog_edge-based_cognitive_assistance_IoT_services_for_the_visually_impaired_poster_abstract.pdf:PDF},
  keywords = {middleware},
  project = {cps-middleware,smart-cities},
  tag = {platform,transit},
  timestamp = {Fri, 29 Mar 2019 00:00:00 +0100},
  url = {https://doi.org/10.1145/3302505.3312592}
}
```
The fog/edge computing paradigm is increasingly being adopted to support a variety of latency-sensitive IoT services, such as cognitive assistance to the visually impaired, due to its ability to assure the latency requirements of these services while continuing to benefit from the elastic properties of cloud computing. However, user mobility in such applications imposes a new set of challenges that must be addressed before such applications can be deployed and benefit the society. This paper presents ongoing work on a dynamic resource management middleware called URMILA that addresses these concerns. URMILA ensures that the service remains available despite user mobility and ensuing wireless connectivity issues by opportunistically leveraging both fog and edge resources in such a way that the latency requirements of the service are met while ensuring longevity of the battery life on the edge devices. We present the design principles of URMILA’s capabilities and a real-world cognitive assistance application that we have built and are testing on an emulated but realistic IoT testbed.
J. P. Talusan, F. Tiausas, K. Yasumoto, M. Wilbur, G. Pettet, A. Dubey, and S. Bhattacharjee, Smart Transportation Delay and Resiliency Testbed Based on Information Flow of Things Middleware, in IEEE International Conference on Smart Computing, SMARTCOMP 2019, Washington, DC, USA, June 12-15, 2019, 2019, pp. 13–18.
```
@inproceedings{Talusan2019,
  author = {Talusan, Jose Paolo and Tiausas, Francis and Yasumoto, Keiichi and Wilbur, Michael and Pettet, Geoffrey and Dubey, Abhishek and Bhattacharjee, Shameek},
  booktitle = {{IEEE} International Conference on Smart Computing, {SMARTCOMP} 2019, Washington, DC, USA, June 12-15, 2019},
  title = {Smart Transportation Delay and Resiliency Testbed Based on Information Flow of Things Middleware},
  year = {2019},
  pages = {13--18},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/smartcomp/TalusanTYWPDB19},
  category = {workshop},
  contribution = {colab},
  acceptance = {29},
  doi = {10.1109/SMARTCOMP.2019.00022},
  file = {:Talusan2019-Smart_Transportation_Delay_and_Resiliency_Testbed_Based_on_Information_Flow_of_Things_Middleware.pdf:PDF},
  keywords = {middleware, transit},
  project = {cps-middleware,smart-transit},
  tag = {platform,incident,transit},
  timestamp = {Wed, 16 Oct 2019 14:14:54 +0200},
  url = {https://doi.org/10.1109/SMARTCOMP.2019.00022}
}
```
Edge and Fog computing paradigms are used to process big data generated by the increasing number of IoT devices. These paradigms have enabled cities to become smarter in various aspects via real-time data-driven applications. While these have addressed some flaws of cloud computing some challenges remain particularly in terms of privacy and security. We create a testbed based on a distributed processing platform called the Information flow of Things (IFoT) middleware. We briefly describe a decentralized traffic speed query and routing service implemented on this framework testbed. We configure the testbed to test countermeasure systems that aim to address the security challenges faced by prior paradigms. Using this testbed, we investigate a novel decentralized anomaly detection approach for time-sensitive distributed smart transportation systems.
Y. Zhang, S. Eisele, A. Dubey, A. Laszka, and A. K. Srivastava, Cyber-Physical Simulation Platform for Security Assessment of Transactive Energy Systems, in 7th Workshop on Modeling and Simulation of Cyber-Physical Energy Systems, MSCPES@CPSIoTWeek 2019, Montreal, QC, Canada, 2019, pp. 1–6.
```
@inproceedings{Zhang2019a,
  author = {Zhang, Yue and Eisele, Scott and Dubey, Abhishek and Laszka, Aron and Srivastava, Anurag K.},
  booktitle = {7th Workshop on Modeling and Simulation of Cyber-Physical Energy Systems, MSCPES@CPSIoTWeek 2019, Montreal, QC, Canada},
  title = {Cyber-Physical Simulation Platform for Security Assessment of Transactive Energy Systems},
  year = {2019},
  pages = {1--6},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/cpsweek/ZhangEDLS19},
  category = {workshop},
  contribution = {colab},
  doi = {10.1109/MSCPES.2019.8738802},
  file = {:Zhang2019a-Cyber_Physical_Simulation_Platform_for_Security_Assessment_of_Transactive_Energy_Systems.pdf:PDF},
  keywords = {transactive},
  project = {transactive-energy,cps-reliability},
  tag = {platform,decentralization,power},
  timestamp = {Wed, 16 Oct 2019 14:14:56 +0200},
  url = {https://doi.org/10.1109/MSCPES.2019.8738802}
}
```
Transactive energy systems (TES) are emerging as a transformative solution for the problems that distribution system operators face due to an increase in the use of distributed energy resources and rapid growth in scalability of managing active distribution system (ADS). On the one hand, these changes pose a decentralized power system control problem, requiring strategic control to maintain reliability and resiliency for the community and for the utility. On the other hand, they require robust financial markets while allowing participation from diverse prosumers. To support the computing and flexibility requirements of TES while preserving privacy and security, distributed software platforms are required. In this paper, we enable the study and analysis of security concerns by developing Transactive Energy Security Simulation Testbed (TESST), a TES testbed for simulating various cyber attacks. In this work, the testbed is used for TES simulation with centralized clearing market, highlighting weaknesses in a centralized system. Additionally, we present a blockchain enabled decentralized market solution supported by distributed computing for TES, which on one hand can alleviate some of the problems that we identify, but on the other hand, may introduce newer issues. Future study of these differing paradigms is necessary and will continue as we develop our security simulation testbed.
H. Tu, Y. Du, H. Yu, S. Lukic, M. Metelko, P. Volgyesi, A. Dubey, and G. Karsai, A Hardware-in-the-Loop Real-Time Testbed for Microgrid Hierarchical Control, in 2018 IEEE Energy Conversion Congress and Exposition (ECCE), 2018, pp. 2053–2059.
```
@inproceedings{Tu2018,
  author = {{Tu}, H. and {Du}, Y. and {Yu}, H. and {Lukic}, S. and {Metelko}, M. and {Volgyesi}, P. and Dubey, Abhishek and {Karsai}, G.},
  booktitle = {2018 IEEE Energy Conversion Congress and Exposition (ECCE)},
  title = {A Hardware-in-the-Loop Real-Time Testbed for Microgrid Hierarchical Control},
  year = {2018},
  month = sep,
  pages = {2053-2059},
  category = {conference},
  contribution = {minor},
  doi = {10.1109/ECCE.2018.8557737},
  file = {:Tu2018-A_Hardware-in-the-Loop_Real-Time_Testbed_for_Microgrid_Hierarchical_Control.pdf:PDF},
  issn = {2329-3721},
  keywords = {smartgrid},
  project = {cps-middleware,smart-energy},
  tag = {platform,power}
}
```
To maintain a stable, flexible and economic operation of a microgrid, hierarchical control architecture consisting of primary, secondary and tertiary control is proposed. However, the differences in dynamics of microgrid, bandwidths of control levels and speed of communication channels make it difficult to comprehensively validate the performance of the hierarchical control schemes. In this paper we propose a hardware-in-the-loop real-time testbed for microgrid hierarchical control. The proposed testbed can be used to validate control performance under different microgrid operating modes (grid-tied or islanded), different primary control schemes (current or voltage mode) and different secondary control approaches (centralized or distributed). The integration of industry-grade hardware that runs primary and secondary control into the testbed allows for complete emulation of microgrid operation, and facilitates the study of the effects of measurement noise, sampling and communication delays.
S. Nannapaneni, S. Mahadevan, and A. Dubey, Real-Time Control of Cyber-Physical Manufacturing Process Under Uncertainty, in Proceedings of ASME 2018 13th International Manufacturing Science and Engineering Conference, 2018, vol. Volume 3: Manufacturing Equipment and Systems.
```
@inproceedings{Nannapaneni2018,
  author = {Nannapaneni, Saideep and Mahadevan, Sankaran and Dubey, Abhishek},
  booktitle = {Proceedings of ASME 2018 13th International Manufacturing Science and Engineering Conference},
  title = {Real-Time Control of Cyber-Physical Manufacturing Process Under Uncertainty},
  year = {2018},
  month = jun,
  note = {V003T02A001},
  series = {International Manufacturing Science and Engineering Conference},
  volume = {Volume 3: Manufacturing Equipment and Systems},
  category = {conference},
  contribution = {minor},
  doi = {10.1115/MSEC2018-6460},
  eprint = {https://asmedigitalcollection.asme.org/MSEC/proceedings-pdf/MSEC2018/51371/V003T02A001/2520174/v003t02a001-msec2018-6460.pdf},
  keywords = {reliability},
  project = {cps-reliability},
  tag = {platform},
  url = {https://doi.org/10.1115/MSEC2018-6460}
}
```
Modern manufacturing processes are increasing becoming cyber-physical in nature, where a computational system monitors the system performance, provides real-time process control by analyzing sensor data collected regarding process and product characteristics, in order to increase the quality of the manufactured product. Such real-time process monitoring and control techniques are useful in precision and ultra-precision machining processes. However, the output product quality is affected by several uncertainty sources in various stages of the manufacturing process such as the sensor uncertainty, computational system uncertainty, control input uncertainty, and the variability in the manufacturing process. The computational system may be a single computing node or a distributed computing network; the latter scenario introduces additional uncertainty due to the communication between several computing nodes. Due to the continuous monitoring process, these uncertainty sources aggregate and compound over time, resulting in variations of product quality. Therefore, characterization of the various uncertainty sources and their impact on the product quality are necessary to increase the efficiency and productivity of the overall manufacturing process. To this end, this paper develops a two-level dynamic Bayesian network methodology, where the higher level captures the uncertainty in the sensors, control inputs, and the manufacturing process while the lower level captures the uncertainty in the communication between several computing nodes. In addition, we illustrate the use of a variance-based global sensitivity analysis approach for dimension reduction in a high-dimensional manufacturing process, in order to enable real-time analysis for process control. The proposed methodologies of process control under uncertainty and dimension reduction are illustrated for a cyber-physical turning process.
S. Pradhan, A. Dubey, S. Khare, S. Nannapaneni, A. Gokhale, S. Mahadevan, D. C. Schmidt, and M. Lehofer, CHARIOT: Goal-Driven Orchestration Middleware for Resilient IoT Systems, ACM Trans. Cyber-Phys. Syst., vol. 2, no. 3, Jun. 2018.
```
@article{Pradhan2018,
  author = {Pradhan, Subhav and Dubey, Abhishek and Khare, Shweta and Nannapaneni, Saideep and Gokhale, Aniruddha and Mahadevan, Sankaran and Schmidt, Douglas C. and Lehofer, Martin},
  journal = {ACM Trans. Cyber-Phys. Syst.},
  title = {CHARIOT: Goal-Driven Orchestration Middleware for Resilient IoT Systems},
  year = {2018},
  issn = {2378-962X},
  month = jun,
  number = {3},
  volume = {2},
  address = {New York, NY, USA},
  articleno = {16},
  contribution = {lead},
  doi = {10.1145/3134844},
  issue_date = {July 2018},
  keywords = {resilience at the edge, orchestration middleware, cyber-physical systems, Autonomous management},
  numpages = {37},
  project = {cps-middleware,cps-reliability},
  publisher = {Association for Computing Machinery},
  tag = {ai4cps,platform},
  url = {https://doi.org/10.1145/3134844}
}
```
An emerging trend in Internet of Things (IoT) applications is to move the computation (cyber) closer to the source of the data (physical). This paradigm is often referred to as edge computing. If edge resources are pooled together, they can be used as decentralized shared resources for IoT applications, providing increased capacity to scale up computations and minimize end-to-end latency. Managing applications on these edge resources is hard, however, due to their remote, distributed, and (possibly) dynamic nature, which necessitates autonomous management mechanisms that facilitate application deployment, failure avoidance, failure management, and incremental updates. To address these needs, we present CHARIOT, which is orchestration middleware capable of autonomously managing IoT systems consisting of edge resources and applications.CHARIOT implements a three-layer architecture. The topmost layer comprises a system description language, the middle layer comprises a persistent data storage layer and the corresponding schema to store system information, and the bottom layer comprises a management engine that uses information stored persistently to formulate constraints that encode system properties and requirements, thereby enabling the use of satisfiability modulo theory solvers to compute optimal system (re)configurations dynamically at runtime. This article describes the structure and functionality of CHARIOT and evaluates its efficacy as the basis for a smart parking system case study that uses sensors to manage parking spaces.
Y. Du, H. Tu, S. Lukic, D. Lubkeman, A. Dubey, and G. Karsai, Resilient Information Architecture Platform for Smart Systems (RIAPS): Case Study for Distributed Apparent Power Control, in 2018 IEEE/PES Transmission and Distribution Conference and Exposition (T D), 2018, pp. 1–5.
```
@inproceedings{DuTu2018a,
  author = {{Du}, Y. and {Tu}, H. and {Lukic}, S. and {Lubkeman}, D. and Dubey, Abhishek and {Karsai}, G.},
  booktitle = {2018 IEEE/PES Transmission and Distribution Conference and Exposition (T D)},
  title = {Resilient Information Architecture Platform for Smart Systems (RIAPS): Case Study for Distributed Apparent Power Control},
  year = {2018},
  month = apr,
  pages = {1-5},
  category = {selectiveconference},
  contribution = {minor},
  doi = {10.1109/TDC.2018.8440324},
  file = {:DuTu2018a-Resilient_Information_Architecture_Platform_for_Smart_Systems_Case_Study_Distributed_Apparent_Power_Control.pdf:PDF},
  issn = {2160-8563},
  keywords = {middleware, smartgrid},
  tag = {platform}
}
```
Maintaining voltage and frequency stability in an islanded microgrid is challenging, due to the low system inertia. In addition, islanded microgrids have limited generation capability, requiring that all DGs contribute proportionally to meet the system power consumption. This paper proposes a distributed control algorithm for optimal apparent power utilization in islanded microgrids. The developed algorithm improves system apparent power utilization by maintaining proportional power sharing among DGs. A decentralized platform called Resilient Information Architecture Platform for Smart Systems (RIAPS) is introduced that runs on processors embedded within the DGs. The proposed algorithm is fully implemented in RIAPS platform and validated on a real-time microgrid testbed.
A. Chhokra, A. Dubey, N. Mahadevan, G. Karsai, D. Balasubramanian, and S. Hasan, Hierarchical Reasoning about Faults in Cyber-Physical Energy Systems using Temporal Causal Diagrams, International Journal of Prognostics and Health Management, vol. 9, no. 1, Feb. 2018.
```
@article{Chhokra2018a,
  author = {Chhokra, Ajay and Dubey, Abhishek and Mahadevan, Nagabhushan and Karsai, Gabor and Balasubramanian, Daniel and Hasan, Saqib},
  journal = {International Journal of Prognostics and Health Management},
  title = {Hierarchical Reasoning about Faults in Cyber-Physical Energy Systems using Temporal Causal Diagrams},
  year = {2018},
  month = feb,
  number = {1},
  volume = {9},
  attachments = {https://www.isis.vanderbilt.edu/sites/default/files/ijphm_18_001_0.pdf},
  contribution = {colab},
  file = {:Chhokra2018a-Hierarchical_Reasoning_about_Faults_in_Cyber-Physical_Energy_Systems_using_Temporal_Causal_Diagrams.pdf:PDF},
  keywords = {reliability, smartgrid},
  tag = {platform,power},
  type = {Journal Article},
  url = {https://www.phmsociety.org/node/2290}
}
```
The resiliency and reliability of critical cyber physical systems like electrical power grids are of paramount importance. These systems are often equipped with specialized protection devices to detect anomalies and isolate faults in order to arrest failure propagation and protect the healthy parts of the system. However, due to the limited situational awareness and hidden failures the protection devices themselves, through their operation (or mis-operation) may cause overloading and the disconnection of parts of an otherwise healthy system. This can result in cascading failures that lead to a blackout. Diagnosis of failures in such systems is extremely challenging because of the need to account for faults in both the physical systems as well as the protection devices, and the failure-effect propagation across the system. Our approach for diagnosing such cyber-physical systems is based on the concept of Temporal Causal Diagrams (TCD-s) that capture the timed discrete models of protection devices and their interactions with a system failure propagation graph. In this paper we present a refinement of the TCD language with a layer of independent local observers that aid in diagnosis. We describe a hierarchical two-tier failure diagnosis approach and showcase the results for 4 different scenarios involving both cyber and physical faults in a standard Western System Coordinating Council (WSCC) 9 bus system.
A. Chhokra, A. Dubey, N. Mahadevan, S. Hasan, and G. Karsai, Diagnosis in Cyber-Physical Systems with Fault Protection Assemblies, in Diagnosability, Security and Safety of Hybrid Dynamic and Cyber-Physical Systems, M. Sayed-Mouchaweh, Ed. Cham: Springer International Publishing, 2018, pp. 201–225.
```
@inbook{Chhokra2018,
  author = {Chhokra, Ajay and Dubey, Abhishek and Mahadevan, Nagabhushan and Hasan, Saqib and Karsai, Gabor},
  chapter = {Chapter 8},
  editor = {Sayed-Mouchaweh, Moamar},
  pages = {201--225},
  publisher = {Springer International Publishing},
  title = {Diagnosis in Cyber-Physical Systems with Fault Protection Assemblies},
  year = {2018},
  address = {Cham},
  isbn = {978-3-319-74962-4},
  booktitle = {Diagnosability, Security and Safety of Hybrid Dynamic and Cyber-Physical Systems},
  contribution = {colab},
  doi = {10.1007/978-3-319-74962-4_8},
  file = {:Chhokra2018-Diagnosis_In_Cyber-Physical_Systems_with_Fault_Protection_Assemblies.pdf:PDF},
  keywords = {reliability, smartgrid},
  tag = {platform,power},
  url = {https://doi.org/10.1007/978-3-319-74962-4_8}
}
```
Fault Protection Assemblies are used in cyber-physical systems for automated fault-isolation. These devices alter the mode of the system using locally available information in order to stop fault propagation. For example, in electrical networks relays and breakers isolate faults in order to arrest failure propagation and protect the healthy parts of the system. However, these assemblies themselves can have faults, which may inadvertently induce secondary failures. Often these secondary failures lead to cascade effects, which then lead to total system collapse. This behavior is often seen in electrical transmission systems where failures of relays and breakers may cause overloading and the disconnection of parts of an otherwise healthy system. In the past, we had developed a consistency based diagnosis approach for physical systems based on the temporal failure propagation graph. We now describe an extension that uses the concept of timed discrete event observers in combination with the timed failure propagation graphs to extend the hypothesis to include the possibility of failures in the fault protection units. Using a simulated power system case study, we show that the combined approach is able to diagnose faults in both the plant and the protection devices.

A. Laszka, A. Mavridou, and A. Dubey, Resilient and Trustworthy Transactive Platform for Smart and Connected Communities, in High Confidence Software and Systems Conference, 2018.

@inproceedings{DubeyHCSS2018,
  author = {Laszka, Aron and Mavridou, Anastasia and Dubey, Abhishek},
  booktitle = {High Confidence Software and Systems Conference},
  title = {Resilient and Trustworthy Transactive Platform for Smart and Connected Communities},
  year = {2018},
  contribution = {colab},
  keywords = {blockchain},
  project = {cps-reliability},
  tag = {platform,decentralization},
  timestamp = {Wed, 16 Oct 2019 14:14:54 +0200}
}

Garcı́a-Valls Marisol, A. Dubey, and V. J. Botti, Introducing the new paradigm of Social Dispersed Computing: Applications, Technologies and Challenges, Journal of Systems Architecture - Embedded Systems Design, vol. 91, pp. 83–102, 2018.
```
@article{GarciaValls2018,
  author = {Garc{\'{\i}}a{-}Valls, Marisol and Dubey, Abhishek and Botti, Vicent J.},
  journal = {Journal of Systems Architecture - Embedded Systems Design},
  title = {Introducing the new paradigm of Social Dispersed Computing: Applications, Technologies and Challenges},
  year = {2018},
  pages = {83--102},
  volume = {91},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/journals/jsa/Garcia-VallsDB18},
  contribution = {colab},
  doi = {10.1016/j.sysarc.2018.05.007},
  file = {:Garcia-Valls2018-Introducing_the_new_paradigm_of_Social_Dispersed_Computing_Applications_Technologies_and_Challenges.pdf:PDF},
  keywords = {middleware},
  project = {cps-middleware},
  tag = {platform,decentralization},
  timestamp = {Mon, 16 Sep 2019 01:00:00 +0200},
  url = {https://doi.org/10.1016/j.sysarc.2018.05.007}
}
```
If last decade viewed computational services as a utilitythen surely this decade has transformed computation into a commodity. Computation is now progressively integrated into the physical networks in a seamless way that enables cyber-physical systems (CPS) and the Internet of Things (IoT) meet their latency requirements. Similar to the concept of “platform as a service” or “software as a service”, both cloudlets and fog computing have found their own use cases. Edge devices (that we call end or user devices for disambiguation) play the role of personal computers, dedicated to a user and to a set of correlated applications. In this new scenario, the boundaries between the network node, the sensor, and the actuator are blurring, driven primarily by the computation power of IoT nodes like single board computers and the smartphones. The bigger data generated in this type of networks needs clever, scalable, and possibly decentralized computing solutions that can scale independently as required. Any node can be seen as part of a graph, with the capacity to serve as a computing or network router node, or both. Complex applications can possibly be distributed over this graph or network of nodes to improve the overall performance like the amount of data processed over time. In this paper, we identify this new computing paradigm that we call Social Dispersed Computing, analyzing key themes in it that includes a new outlook on its relation to agent based applications. We architect this new paradigm by providing supportive application examples that include next generation electrical energy distribution networks, next generation mobility services for transportation, and applications for distributed analysis and identification of non-recurring traffic congestion in cities. The paper analyzes the existing computing paradigms (e.g., cloud, fog, edge, mobile edge, social, etc.), solving the ambiguity of their definitions; and analyzes and discusses the relevant foundational software technologies, the remaining challenges, and research opportunities.
S. Hasan, A. Ghafouri, A. Dubey, G. Karsai, and X. D. Koutsoukos, Vulnerability analysis of power systems based on cyber-attack and defense models, in 2018 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference, ISGT 2018, Washington, DC, USA, February 19-22, 2018, 2018, pp. 1–5.
```
@inproceedings{Hasan2018,
  author = {Hasan, Saqib and Ghafouri, Amin and Dubey, Abhishek and Karsai, Gabor and Koutsoukos, Xenofon D.},
  booktitle = {2018 {IEEE} Power {\&} Energy Society Innovative Smart Grid Technologies Conference, {ISGT} 2018, Washington, DC, USA, February 19-22, 2018},
  title = {Vulnerability analysis of power systems based on cyber-attack and defense models},
  year = {2018},
  pages = {1--5},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/isgt/HasanGDKK18},
  category = {selectiveconference},
  contribution = {minor},
  doi = {10.1109/ISGT.2018.8403337},
  file = {:Hasan2018-Vulnerability_analysis_of_power_systems_based_on_cyber-attack_and_defense_models.pdf:PDF},
  keywords = {smartgrid},
  project = {cps-reliability},
  tag = {platform,power},
  timestamp = {Wed, 16 Oct 2019 14:14:57 +0200},
  url = {https://doi.org/10.1109/ISGT.2018.8403337}
}
```
Reliable operation of power systems is a primary challenge for the system operators. With the advancement in technology and grid automation, power systems are becoming more vulnerable to cyber-attacks. The main goal of adversaries is to take advantage of these vulnerabilities and destabilize the system. This paper describes a game-theoretic approach to attacker / defender modeling in power systems. In our models, the attacker can strategically identify the subset of substations that maximize damage when compromised. However, the defender can identify the critical subset of substations to protect in order to minimize the damage when an attacker launches a cyber-attack. The algorithms for these models are applied to the standard IEEE-14, 39, and 57 bus examples to identify the critical set of substations given an attacker and a defender budget.
C. Samal, A. Dubey, and L. J. Ratliff, Mobilytics- An Extensible, Modular and Resilient Mobility Platform, in 2018 IEEE International Conference on Smart Computing, SMARTCOMP 2018, Taormina, Sicily, Italy, June 18-20, 2018, 2018, pp. 356–361.
```
@inproceedings{Samal2018,
  author = {Samal, Chinmaya and Dubey, Abhishek and Ratliff, Lillian J.},
  booktitle = {2018 {IEEE} International Conference on Smart Computing, {SMARTCOMP} 2018, Taormina, Sicily, Italy, June 18-20, 2018},
  title = {Mobilytics- An Extensible, Modular and Resilient Mobility Platform},
  year = {2018},
  pages = {356--361},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/smartcomp/SamalDR18},
  category = {selectiveconference},
  contribution = {lead},
  acceptance = {40},
  doi = {10.1109/SMARTCOMP.2018.00029},
  file = {:Samal2018-Mobilytics-An_Extensible_Modular_and_Resilient_Mobility_Platform.pdf:PDF},
  keywords = {transit},
  project = {smart-transit,smart-cities},
  tag = {platform,transit},
  timestamp = {Wed, 16 Oct 2019 14:14:54 +0200},
  url = {https://doi.org/10.1109/SMARTCOMP.2018.00029}
}
```
Transportation management platforms provide communities the ability to integrate the available mobility options and localized transportation demand management policies. A central component of a transportation management platform is the mobility planning application. Given the societal relevance of these platforms, it is necessary to ensure that they operate resiliently. Modularity and extensibility are also critical properties that are required for manageability. Modularity allows to isolate faults easily. Extensibility enables update of policies and integration of new mobility modes or new routing algorithms. However, state of the art mobility planning applications like open trip planner, are monolithic applications, which makes it difficult to scale and modify them dynamically. This paper describes a microservices based modular multi-modal mobility platform Mobilytics, that integrates mobility providers, commuters, and community stakeholders. We describe our requirements, architecture, and discuss the resilience challenges, and how our platform functions properly in presence of failure. Conceivably, the patterns and principles manifested in our system can serve as guidelines for current and future practitioners in this field.
F. Sun, A. Dubey, C. Kulkarni, N. Mahadevan, and A. G. Luna, A data driven health monitoring approach to extending small sats mission, in Conference Proceedings, Annual Conference of The Prognostics And Health Management Society, 2018.
```
@inproceedings{Sun2018a,
  author = {Sun, Fangzhou and Dubey, Abhishek and Kulkarni, C and Mahadevan, Nagbhushan and Luna, Ali Guarneros},
  booktitle = {Conference Proceedings, Annual Conference of The Prognostics And Health Management Society},
  title = {A data driven health monitoring approach to extending small sats mission},
  year = {2018},
  category = {conference},
  contribution = {minor},
  file = {:Sun2018a-A_data_driven_health_monitoring_approach_to_extending_small_sats_mission.pdf:PDF},
  keywords = {reliability},
  project = {cps-reliability},
  tag = {platform}
}
```
In the next coming years, the International Space Station (ISS) plans to launch several small-sat missions powered by lithium-ion battery packs. An extended version of such mission requires dependable, energy dense, and durable power sources as well as system health monitoring. Hence a good health estimation framework to increase mission success is absolutely necessary as the devices are subjected to high demand operating conditions. This paper describes a hierarchical architecture which combines data-driven anomaly detection methods with a fine-grained model-based diagnosis and prognostics architecture. At the core of the architecture is a distributed stack of deep neural network that detects and classifies the data traces from nearby satellites based on prior observations. Any identified anomaly is transmitted to the ground, which then uses model-based diagnosis and prognosis framework to make health state estimation. In parallel, periodically the data traces from the satellites are transported to the ground and analyzed using model-based techniques. This data is then used to train the neural networks, which are run from ground systems and periodically updated. The collaborative architecture enables quick data-driven inference on the satellite and more intensive analysis on the ground where often time and power consumption are not constrained. The current work demonstrates implementation of this architecture through an initial battery data set. In the future we propose to apply this framework to other electric and electronic components on-board the small satellites.
S. Hasan, A. Ghafouri, A. Dubey, G. Karsai, and X. Koutsoukos, Heuristics-based approach for identifying critical N-k contingencies in power systems, in 2017 Resilience Week (RWS), 2017, pp. 191–197.
```
@inproceedings{Hasan2017a,
  author = {{Hasan}, S. and {Ghafouri}, A. and Dubey, Abhishek and {Karsai}, G. and {Koutsoukos}, X.},
  booktitle = {2017 Resilience Week (RWS)},
  title = {Heuristics-based approach for identifying critical N-k contingencies in power systems},
  year = {2017},
  month = sep,
  pages = {191-197},
  category = {conference},
  contribution = {colab},
  doi = {10.1109/RWEEK.2017.8088671},
  file = {:Hasan2017a-Heuristics-based_approach_for_identifying_critical_N_k_contingencies_in_power_systems.pdf:PDF},
  issn = {null},
  keywords = {smartgrid},
  project = {cps-reliability,smart-energy},
  tag = {platform,power}
}
```
Reliable operation of electrical power systems in the presence of multiple critical N - k contingencies is an important challenge for the system operators. Identifying all the possible N - k critical contingencies to design effective mitigation strategies is computationally infeasible due to the combinatorial explosion of the search space. This paper describes two heuristic algorithms based on the iterative pruning of the candidate contingency set to effectively and efficiently identify all the critical N - k contingencies resulting in system failure. These algorithms are applied to the standard IEEE-14 bus system, IEEE-39 bus system, and IEEE-57 bus system to identify multiple critical N - k contingencies.
S. Hasan, A. Dubey, A. Chhokra, N. Mahadevan, G. Karsai, and X. Koutsoukos, A modeling framework to integrate exogenous tools for identifying critical components in power systems, in 2017 Workshop on Modeling and Simulation of Cyber-Physical Energy Systems (MSCPES), 2017, pp. 1–6.
```
@inproceedings{Hasan2017b,
  author = {{Hasan}, S. and Dubey, Abhishek and {Chhokra}, A. and {Mahadevan}, N. and {Karsai}, G. and {Koutsoukos}, X.},
  booktitle = {2017 Workshop on Modeling and Simulation of Cyber-Physical Energy Systems (MSCPES)},
  title = {A modeling framework to integrate exogenous tools for identifying critical components in power systems},
  year = {2017},
  month = apr,
  pages = {1-6},
  category = {workshop},
  contribution = {colab},
  doi = {10.1109/MSCPES.2017.8064540},
  file = {:Hasan2017b-A_modeling_framework_to_integrate_exogenous_tools_for_identifying_critical_components_in_power_systems.pdf:PDF},
  keywords = {smartgrid},
  tag = {platform,power}
}
```
Cascading failures in electrical power systems are one of the major causes of concern for the modem society as it results in huge socio-economic loss. Tools for analyzing these failures while considering different aspects of the system are typically very expensive. Thus, researchers tend to use multiple tools to perform various types of analysis on the same system model in order to understand the reasons for these failures in detail. Modeling a simple system in multiple platforms is a tedious, error prone and time consuming process. This paper describes a domain specific modeling language (DSML) for power systems. It identifies and captures the right abstractions for modeling components in different analysis tools. A framework is proposed that deals with system modeling using the developed DSML, identifying the type of analysis to be performed, choosing the appropriate tool(s) needed for the analysis from the tool-chain, transforming the model based on the required specifications of a particular tool and performing the analysis. A case study is done on WSCC-9 Bus System, IEEE-14 Bus System and IEEE-39 Bus System to demonstrate the entire workflow of the framework in identifying critical components for power systems.
J. Bergquist, A. Laszka, M. Sturm, and A. Dubey, On the design of communication and transaction anonymity in blockchain-based transactive microgrids, in Proceedings of the 1st Workshop on Scalable and Resilient Infrastructures for Distributed Ledgers, SERIAL@Middleware 2017, Las Vegas, NV, USA, December 11-15, 2017, 2017, pp. 3:1–3:6.
```
@inproceedings{Bergquist2017,
  author = {Bergquist, Jonatan and Laszka, Aron and Sturm, Monika and Dubey, Abhishek},
  booktitle = {Proceedings of the 1st Workshop on Scalable and Resilient Infrastructures for Distributed Ledgers, SERIAL@Middleware 2017, Las Vegas, NV, USA, December 11-15, 2017},
  title = {On the design of communication and transaction anonymity in blockchain-based transactive microgrids},
  year = {2017},
  pages = {3:1--3:6},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/middleware/BergquistLSD17},
  category = {workshop},
  contribution = {lead},
  doi = {10.1145/3152824.3152827},
  file = {:Bergquist2017-On_the_design_of_communication_and_transaction_anonymity_in_blockchain-based_transactive_microgrids.pdf:PDF},
  keywords = {transactive},
  project = {transactive-energy,cps-middleware,cps-reliability},
  tag = {decentralization,platform},
  timestamp = {Tue, 06 Nov 2018 16:57:13 +0100},
  url = {https://doi.org/10.1145/3152824.3152827}
}
```
Transactive microgrids are emerging as a transformative solution for the problems faced by distribution system operators due to an increase in the use of distributed energy resources and a rapid acceleration in renewable energy generation, such as wind and solar power. Distributed ledgers have recently found widespread interest in this domain due to their ability to provide transactional integrity across decentralized computing nodes. However, the existing state of the art has not focused on the privacy preservation requirement of these energy systems – the transaction level data can provide much greater insights into a prosumer’s behavior compared to smart meter data. There are specific safety requirements in transactive microgrids to ensure the stability of the grid and to control the load. To fulfil these requirements, the distribution system operator needs transaction information from the grid, which poses a further challenge to the privacy-goals. This problem is made worse by requirement for off-blockchain communication in these networks. In this paper, we extend a recently developed trading workflow called PETra and describe our solution for communication and transactional anonymity.
A. Chhokra, A. Kulkarni, S. Hasan, A. Dubey, N. Mahadevan, and G. Karsai, A Systematic Approach of Identifying Optimal Load Control Actions for Arresting Cascading Failures in Power Systems, in Proceedings of the 2nd Workshop on Cyber-Physical Security and Resilience in Smart Grids, SPSR-SG@CPSWeek 2017, Pittsburgh, PA, USA, April 21, 2017, 2017, pp. 41–46.
```
@inproceedings{Chhokra2017,
  author = {Chhokra, Ajay and Kulkarni, Amogh and Hasan, Saqib and Dubey, Abhishek and Mahadevan, Nagabhushan and Karsai, Gabor},
  booktitle = {Proceedings of the 2nd Workshop on Cyber-Physical Security and Resilience in Smart Grids, SPSR-SG@CPSWeek 2017, Pittsburgh, PA, USA, April 21, 2017},
  title = {A Systematic Approach of Identifying Optimal Load Control Actions for Arresting Cascading Failures in Power Systems},
  year = {2017},
  pages = {41--46},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/cpsweek/ChhokraKHDMK17},
  category = {workshop},
  contribution = {colab},
  doi = {10.1145/3055386.3055395},
  file = {:Chhokra2017-A_Systematic_Approach_of_Identifying_Optimal_Load_Control_Actions_for_Arresting_Cascading_Failures_in_Power_Systems.pdf:PDF},
  keywords = {reliability, smartgrid},
  project = {cps-reliability},
  tag = {platform},
  timestamp = {Tue, 06 Nov 2018 16:59:05 +0100},
  url = {https://doi.org/10.1145/3055386.3055395}
}
```
Cascading outages in power networks cause blackouts which lead to huge economic and social consequences. The traditional form of load shedding is avoidable in many cases by identifying optimal load control actions. However, if there is a change in the system topology (adding or removing loads, lines etc), the calculations have to be performed again. This paper addresses this problem by providing a workflow that 1) generates system models from IEEE CDF specifications, 2) identifies a collection of blackout causing contingencies, 3) dynamically sets up an optimization problem, and 4) generates a table of mitigation strategies in terms of minimal load curtailment. We demonstrate the applicability of our proposed methodology by finding load curtailment actions for N-k contingencies (k = 1, 2, 3) in IEEE 14 Bus system.
A. Chhokra, S. Hasan, A. Dubey, N. Mahadevan, and G. Karsai, Diagnostics and prognostics using temporal causal models for cyber physical energy systems, in Proceedings of the 8th International Conference on Cyber-Physical Systems, ICCPS 2017, Pittsburgh, Pennsylvania, USA, April 18-20, 2017, 2017, p. 87.
```
@inproceedings{Chhokra2017a,
  author = {Chhokra, Ajay and Hasan, Saqib and Dubey, Abhishek and Mahadevan, Nagabhushan and Karsai, Gabor},
  booktitle = {Proceedings of the 8th International Conference on Cyber-Physical Systems, {ICCPS} 2017, Pittsburgh, Pennsylvania, USA, April 18-20, 2017},
  title = {Diagnostics and prognostics using temporal causal models for cyber physical energy systems},
  year = {2017},
  pages = {87},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/iccps/ChhokraHDMK17},
  category = {poster},
  contribution = {lead},
  doi = {10.1145/3055004.3064843},
  file = {:Chhokra2017a-Diagnostics_and_prognostics_using_temporal_causal_models_for_cyber_physical_energy_systems.pdf:PDF},
  keywords = {reliability, smartgrid},
  project = {cps-reliability},
  tag = {platform,power},
  timestamp = {Wed, 16 Oct 2019 14:14:57 +0200},
  url = {https://doi.org/10.1145/3055004.3064843}
}
```
Reliable operation of cyber-physical systems such as power transmission and distribution systems is crtiical for the seamless functioning of a vibrant economy. These systems consist of tightly coupled physical (energy sources, transmission and distribution lines, and loads) and computational components (protection devices, energy management systems, etc.). The protection devices such as distance relays help in preventing failure propagation by isolating faulty physical components. However, these devices rely on hard thresholds and local information, often ignoring system-level effects introduced by the distributed control algorithms. This leads to scenarios wherein a local mitigation in a subsytem could trigger a larger fault cascade, possibly resulting in a blackout.Efficient models and tools that curtail such systematic failures by performing fault diagnosis and prognosis are therefore necessary.
A. Dubey, G. Karsai, and S. Pradhan, Resilience at the edge in cyber-physical systems, in Second International Conference on Fog and Mobile Edge Computing, FMEC 2017, Valencia, Spain, May 8-11, 2017, 2017, pp. 139–146.
```
@inproceedings{Dubey2017,
  author = {Dubey, Abhishek and Karsai, Gabor and Pradhan, Subhav},
  booktitle = {Second International Conference on Fog and Mobile Edge Computing, {FMEC} 2017, Valencia, Spain, May 8-11, 2017},
  title = {Resilience at the edge in cyber-physical systems},
  year = {2017},
  pages = {139--146},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/fmec/DubeyKP17},
  category = {selectiveconference},
  contribution = {lead},
  doi = {10.1109/FMEC.2017.7946421},
  file = {:Dubey2017-Resilience_at_the_edge_in_cyber-physical_systems.pdf:PDF},
  keywords = {reliability},
  project = {cps-reliability,cps-middleware},
  tag = {platform},
  timestamp = {Wed, 16 Oct 2019 14:14:56 +0200},
  url = {https://doi.org/10.1109/FMEC.2017.7946421}
}
```
As the number of low cost computing devices at the edge of communication network increase, there are greater opportunities to enable innovative capabilities, especially in cyber-physical systems. For example, micro-grid power systems can make use of computing capabilities at the edge of a Smart Grid to provide more robust and decentralized control. However, the downside to distributing intelligence to the edge away from the controlled environment of the data centers is the increased risk of failures. The paper introduces a framework for handling these challenges. The contribution of this framework is to support strategies to (a) tolerate the transient faults as they appear due to network fluctuations or node failures, and to (b) systematically reconfigure the application if the faults persist.
A. Dubey, G. Karsai, A. Gokhale, W. Emfinger, and P. Kumar, Drems-os: An operating system for managed distributed real-time embedded systems, in 2017 6th International Conference on Space Mission Challenges for Information Technology (SMC-IT), 2017, pp. 114–119.
```
@inproceedings{Dubey2017b,
  author = {Dubey, Abhishek and Karsai, Gabor and Gokhale, Aniruddha and Emfinger, William and Kumar, Pranav},
  booktitle = {2017 6th International Conference on Space Mission Challenges for Information Technology (SMC-IT)},
  title = {Drems-os: An operating system for managed distributed real-time embedded systems},
  year = {2017},
  organization = {IEEE},
  pages = {114--119},
  category = {conference},
  contribution = {lead},
  file = {:Dubey2017b-Drems-os_An_operating_system_for_managed_distributed_real-time_embedded_systems.pdf:PDF},
  keywords = {middleware},
  project = {cps-middleware},
  tag = {platform}
}
```
Distributed real-time and embedded (DRE) systems executing mixed criticality task sets are increasingly being deployed in mobile and embedded cloud computing platforms, including space applications. These DRE systems must not only operate over a range of temporal and spatial scales, but also require stringent assurances for secure interactions between the system’s tasks without violating their individual timing constraints. To address these challenges, this paper describes a novel distributed operating system focusing on the scheduler design to support the mixed criticality task sets. Empirical results from experiments involving a case study of a cluster of satellites emulated in a laboratory testbed validate our claims.
S. Eisele, G. Pettet, A. Dubey, and G. Karsai, Towards an architecture for evaluating and analyzing decentralized Fog applications, in IEEE Fog World Congress, FWC 2017, Santa Clara, CA, USA, October 30 - Nov. 1, 2017, 2017, pp. 1–6.
```
@inproceedings{Eisele2017,
  author = {Eisele, Scott and Pettet, Geoffrey and Dubey, Abhishek and Karsai, Gabor},
  booktitle = {{IEEE} Fog World Congress, {FWC} 2017, Santa Clara, CA, USA, October 30 - Nov. 1, 2017},
  title = {Towards an architecture for evaluating and analyzing decentralized Fog applications},
  year = {2017},
  pages = {1--6},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/fwc/EiselePDK17},
  category = {workshop},
  contribution = {lead},
  doi = {10.1109/FWC.2017.8368531},
  file = {:Eisele2017-Towards_an_architecture_for_evaluating_and_analyzing_decentralized_Fog_applications.pdf:PDF},
  keywords = {middleware},
  project = {cps-reliability,cps-middleware},
  tag = {platform,decentralization},
  timestamp = {Wed, 16 Oct 2019 14:14:51 +0200},
  url = {https://doi.org/10.1109/FWC.2017.8368531}
}
```
As the number of low cost computing devices at the edge of network increases, there are greater opportunities to enable novel, innovative capabilities, especially in decentralized cyber-physical systems. For example, in an urban setting, a set of networked, collaborating processors at the edge can be used to dynamically detect traffic densities via image processing and then use those densities to control the traffic flow by coordinating traffic light sequences, in a decentralized architecture. In this paper we describe a testbed and an application framework for such applications.
S. Eisele, I. Madari, A. Dubey, and G. Karsai, RIAPS: Resilient Information Architecture Platform for Decentralized Smart Systems, in 20th IEEE International Symposium on Real-Time Distributed Computing, ISORC 2017, Toronto, ON, Canada, May 16-18, 2017, 2017, pp. 125–132.
```
@inproceedings{Eisele2017b,
  author = {Eisele, Scott and Madari, Istv{\'{a}}n and Dubey, Abhishek and Karsai, Gabor},
  booktitle = {20th {IEEE} International Symposium on Real-Time Distributed Computing, {ISORC} 2017, Toronto, ON, Canada, May 16-18, 2017},
  title = {{RIAPS:} Resilient Information Architecture Platform for Decentralized Smart Systems},
  year = {2017},
  pages = {125--132},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/isorc/EiseleMDK17},
  category = {selectiveconference},
  contribution = {lead},
  doi = {10.1109/ISORC.2017.22},
  file = {:Eisele2017b-RIAPS_Resilient_Information_Architecture_Platform_for_Decentralized_Smart_Systems.pdf:PDF},
  keywords = {middleware},
  project = {smart-transit,smart-cities},
  tag = {platform,decentralization,power},
  timestamp = {Wed, 16 Oct 2019 14:14:53 +0200},
  url = {https://doi.org/10.1109/ISORC.2017.22}
}
```
The emerging Fog Computing paradigm provides an additional computational layer that enables new capabilities in real-time data-driven applications. This is especially interesting in the domain of Smart Grid as the boundaries between traditional generation, distribution, and consumer roles are blurring. This is a reflection of the ongoing trend of intelligence distribution in Smart Systems. In this paper, we briefly describe a component-based decentralized software platform called Resilient Information Architecture Platform for Smart Systems (RIAPS) which provides an infrastructure for such systems. We briefly describe some initial applications built using this platform. Then, we focus on the design and integration choices for a resilient Discovery Manager service that is a critical component of this infrastructure. The service allows applications to discover each other, work collaboratively, and ensure the stability of the Smart System.
A. Ghafouri, A. Laszka, A. Dubey, and X. D. Koutsoukos, Optimal detection of faulty traffic sensors used in route planning, in Proceedings of the 2nd International Workshop on Science of Smart City Operations and Platforms Engineering, SCOPE@CPSWeek 2017, Pittsburgh, PA, USA, April 21, 2017, 2017, pp. 1–6.
```
@inproceedings{Ghafouri2017,
  author = {Ghafouri, Amin and Laszka, Aron and Dubey, Abhishek and Koutsoukos, Xenofon D.},
  booktitle = {Proceedings of the 2nd International Workshop on Science of Smart City Operations and Platforms Engineering, SCOPE@CPSWeek 2017, Pittsburgh, PA, USA, April 21, 2017},
  title = {Optimal detection of faulty traffic sensors used in route planning},
  year = {2017},
  pages = {1--6},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/cpsweek/GhafouriLDK17},
  category = {workshop},
  contribution = {colab},
  doi = {10.1145/3063386.3063767},
  file = {:Ghafouri2017-Optimal_detection_of_faulty_traffic_sensors_used_in_route_planning.pdf:PDF},
  keywords = {transit},
  project = {cps-reliability,smart-transit,smart-cities},
  tag = {ai4cps,platform,incident,transit},
  timestamp = {Tue, 06 Nov 2018 16:59:05 +0100},
  url = {https://doi.org/10.1145/3063386.3063767}
}
```
In a smart city, real-time traffic sensors may be deployed for various applications, such as route planning. Unfortunately, sensors are prone to failures, which result in erroneous traffic data. Erroneous data can adversely affect applications such as route planning, and can cause increased travel time. To minimize the impact of sensor failures, we must detect them promptly and accurately. However, typical detection algorithms may lead to a large number of false positives (i.e., false alarms) and false negatives (i.e., missed detections), which can result in suboptimal route planning. In this paper, we devise an effective detector for identifying faulty traffic sensors using a prediction model based on Gaussian Processes. Further, we present an approach for computing the optimal parameters of the detector which minimize losses due to false-positive and false-negative errors. We also characterize critical sensors, whose failure can have high impact on the route planning application. Finally, we implement our method and evaluate it numerically using a real- world dataset and the route planning platform OpenTripPlanner.
S. Hasan, A. Chhokra, A. Dubey, N. Mahadevan, G. Karsai, R. Jain, and S. Lukic, A simulation testbed for cascade analysis, in IEEE Power & Energy Society Innovative Smart Grid Technologies Conference, ISGT 2017, Washington, DC, USA, April 23-26, 2017, 2017, pp. 1–5.
```
@inproceedings{Hasan2017,
  author = {Hasan, Saqib and Chhokra, Ajay and Dubey, Abhishek and Mahadevan, Nagabhushan and Karsai, Gabor and Jain, Rishabh and Lukic, Srdjan},
  booktitle = {{IEEE} Power {\&} Energy Society Innovative Smart Grid Technologies Conference, {ISGT} 2017, Washington, DC, USA, April 23-26, 2017},
  title = {A simulation testbed for cascade analysis},
  year = {2017},
  pages = {1--5},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/isgt/HasanCDMKJL17},
  category = {selectiveconference},
  contribution = {lead},
  doi = {10.1109/ISGT.2017.8086080},
  file = {:Hasan2017-A_simulation_testbed_for_cascade_analysis.pdf:PDF},
  keywords = {smartgrid},
  project = {cps-reliability},
  tag = {platform,power},
  timestamp = {Wed, 16 Oct 2019 14:14:57 +0200},
  url = {https://doi.org/10.1109/ISGT.2017.8086080}
}
```
Electrical power systems are heavily instrumented with protection assemblies (relays and breakers) that detect anomalies and arrest failure propagation. However, failures in these discrete protection devices could have inadvertent consequences, including cascading failures resulting in blackouts. This paper aims to model the behavior of these discrete protection devices in nominal and faulty conditions and apply it towards simulation and contingency analysis of cascading failures in power transmission systems. The behavior under fault conditions are used to identify and explain conditions for blackout evolution which are not otherwise obvious. The results are demonstrated using a standard IEEE-14 Bus System.
S. Nannapaneni, A. Dubey, and S. Mahadevan, Performance evaluation of smart systems under uncertainty, in 2017 IEEE SmartWorld, 2017, pp. 1–8.
```
@inproceedings{Nannapaneni2017,
  author = {Nannapaneni, Saideep and Dubey, Abhishek and Mahadevan, Sankaran},
  booktitle = {2017 {IEEE} SmartWorld},
  title = {Performance evaluation of smart systems under uncertainty},
  year = {2017},
  acceptance = {28},
  pages = {1--8},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/uic/NannapaneniDM17},
  category = {selectiveconference},
  contribution = {colab},
  doi = {10.1109/UIC-ATC.2017.8397430},
  file = {:Nannapaneni2017-Performance_evaluation_of_smart_systems_under_uncertainty.pdf:PDF},
  keywords = {performance},
  project = {cps-reliability},
  tag = {platform},
  timestamp = {Wed, 16 Oct 2019 14:14:50 +0200},
  url = {https://doi.org/10.1109/UIC-ATC.2017.8397430}
}
```
This paper develops a model-based framework for the quantification and propagation of multiple uncertainty sources affecting the performance of a smart system. A smart system, in general, performs sensing, control and actuation for proper functioning of a physical subsystem (also referred to as a plant). With strong feedback coupling between several subsystems, the uncertainty in the quantities of interest (QoI) amplifies over time. The coupling in a generic smart system occurs at two levels: (1) coupling between individual subsystems (plant, cyber, actuation, sensors), and (2) coupling between nodes in a distributed computational subsystem. In this paper, a coupled smart system is decoupled and considered as a feed-forward system over time and modeled using a two-level Dynamic Bayesian Network (DBN), one at each level of coupling (between subsystems and between nodes). A DBN can aggregate uncertainty from multiple sources within a time step and across time steps. The DBN associated with a smart system can be learned using available system models, physics models and data. The proposed methodology is demonstrated for the design of a smart indoor heating system (identification of sensors and a wireless network) within cost constraints that enables room-by-room temperature control. We observe that sensor uncertainty has a higher impact on the performance of the heating system compared to the uncertainty in the wireless network.
P. Völgyesi, A. Dubey, T. Krentz, I. Madari, M. Metelko, and G. Karsai, Time synchronization services for low-cost fog computing applications, in International Symposium on Rapid System Prototyping, RSP 2017, Shortening the Path from Specification to Prototype, October 19-20, 2017, Seoul, South Korea, 2017, pp. 57–63.
```
@inproceedings{Voelgyesi2017,
  author = {V{\"{o}}lgyesi, P{\'{e}}ter and Dubey, Abhishek and Krentz, Timothy and Madari, Istv{\'{a}}n and Metelko, Mary and Karsai, Gabor},
  booktitle = {International Symposium on Rapid System Prototyping, {RSP} 2017, Shortening the Path from Specification to Prototype, October 19-20, 2017, Seoul, South Korea},
  title = {Time synchronization services for low-cost fog computing applications},
  year = {2017},
  pages = {57--63},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/rsp/VolgyesiDKMMK17},
  category = {selectiveconference},
  contribution = {colab},
  doi = {10.1145/3130265.3130325},
  file = {:Voelgyesi2017-Time_synchronization_services_for_low-cost_fog_computing_applications.pdf:PDF},
  keywords = {middleware},
  project = {cps-middleware,cps-reliability},
  tag = {platform,decentralization},
  timestamp = {Tue, 06 Nov 2018 11:07:11 +0100},
  url = {https://doi.org/10.1145/3130265.3130325}
}
```
This paper presents the time synchronization infrastructure for a low-cost run-time platform and application framework specifically targeting Smart Grid applications. Such distributed applications require the execution of reliable and accurate time-coordinated actions and observations both within islands of deployments and across geographically distant nodes. The time synchronization infrastructure is built on well-established technologies: GPS, NTP, PTP, PPS and Linux with real-time extensions, running on low-cost BeagleBone Black hardware nodes. We describe the architecture, implementation, instrumentation approach, performance results and present an example from the application domain. Also, we discuss an important finding on the effect of the Linux RT_PREEMPT real-time patch on the accuracy of the PPS subsystem and its use for GPS-based time references.
S. Pradhan, A. Dubey, S. Neema, and A. Gokhale, Towards a generic computation model for smart city platforms, in 2016 1st International Workshop on Science of Smart City Operations and Platforms Engineering (SCOPE) in partnership with Global City Teams Challenge (GCTC) (SCOPE - GCTC), 2016, pp. 1–6.
```
@inproceedings{Pradhan2016d,
  author = {{Pradhan}, S. and Dubey, Abhishek and {Neema}, S. and {Gokhale}, A.},
  booktitle = {2016 1st International Workshop on Science of Smart City Operations and Platforms Engineering (SCOPE) in partnership with Global City Teams Challenge (GCTC) (SCOPE - GCTC)},
  title = {Towards a generic computation model for smart city platforms},
  year = {2016},
  month = apr,
  pages = {1-6},
  category = {workshop},
  contribution = {colab},
  doi = {10.1109/SCOPE.2016.7515059},
  file = {:Pradhan2016d-Towards_a_Generic_Computation_Model_for_Smart_City_Platforms.pdf:PDF},
  issn = {null},
  keywords = {middleware},
  tag = {platform}
}
```
Smart emergency response systems, smart transportation systems, smart parking spaces are some examples of multi-domain smart city systems that require large-scale, open platforms for integration and execution. These platforms illustrate high degree of heterogeneity. In this paper, we focus on software heterogeneity arising from different types of applications. The source of variability among applications stems from (a) timing requirements, (b) rate and volume of data they interact with, and (c) behavior depending on whether they are stateful or stateless. These variations result in applications with different computation models. However, a smart city system can comprise multi-domain applications with different types and therefore computation models. As such, a key challenge that arises is that of integration; we require some mechanism to facilitate integration and interaction between applications that use different computation models. In this paper, we first identify computation models based on different application types. Second, we present a generic computation model and explain how it can map to previously identified computation models. Finally, we briefly describe how the generic computation model fits in our overall smart city platform architecture.
A. Chhokra, A. Dubey, N. Mahadevan, and G. Karsai, Poster Abstract: Distributed Reasoning for Diagnosing Cascading Outages in Cyber Physical Energy Systems, in 7th ACM/IEEE International Conference on Cyber-Physical Systems, ICCPS 2016, Vienna, Austria, April 11-14, 2016, 2016, p. 33:1.
```
@inproceedings{Chhokra2016,
  author = {Chhokra, Ajay and Dubey, Abhishek and Mahadevan, Nagabhushan and Karsai, Gabor},
  booktitle = {7th {ACM/IEEE} International Conference on Cyber-Physical Systems, {ICCPS} 2016, Vienna, Austria, April 11-14, 2016},
  title = {Poster Abstract: Distributed Reasoning for Diagnosing Cascading Outages in Cyber Physical Energy Systems},
  year = {2016},
  pages = {33:1},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/iccps/ChhokraDMK16},
  category = {poster},
  contribution = {lead},
  doi = {10.1109/ICCPS.2016.7479113},
  file = {:Chhokra2016-Poster_Abstract_Distributed_Reasoning_for_Diagnosing_Cascading_Outages_in_Cyber_Physical_Energy_Systems.pdf:PDF},
  keywords = {smartgrid},
  project = {cps-reliability},
  tag = {platform},
  timestamp = {Wed, 16 Oct 2019 14:14:57 +0200},
  url = {https://doi.org/10.1109/ICCPS.2016.7479113}
}
```
The power grid incorporates a number of protection elements such as distance relays that detect faults and prevent the propagation of failure effects from influencing the rest of system. However, the decision of these protection elements is only influenced by local information in the form of bus voltage/current (V-I) samples. Due to lack of system wide perspective, erroneous settings, and latent failure modes, protection devices often mis-operate and cause cascading effects that ultimately lead to blackouts. Blackouts around the world have been triggered or worsened by circuit breakers tripping, including the blackout of 2003 in North America, where the secondary/ remote protection relays incorrectly opened the breaker. Tools that aid the operators in finding the root cause of the problem on-line are required. However, high system complexity and the interdependencies between the cyber and physical elements of the system and the mis-operation of protection devices make the failure diagnosis a challenging problem.
A. Dubey, S. Pradhan, D. C. Schmidt, S. Rusitschka, and M. Sturm, The Role of Context and Resilient Middleware in Next Generation Smart Grids, in Proceedings of the 3rd Workshop on Middleware for Context-Aware Applications in the IoT, M4IoT@Middleware 2016, Trento, Italy, December 12-13, 2016, 2016, pp. 1–6.
```
@inproceedings{Dubey2016,
  author = {Dubey, Abhishek and Pradhan, Subhav and Schmidt, Douglas C. and Rusitschka, Sebnem and Sturm, Monika},
  booktitle = {Proceedings of the 3rd Workshop on Middleware for Context-Aware Applications in the IoT, M4IoT@Middleware 2016, Trento, Italy, December 12-13, 2016},
  title = {The Role of Context and Resilient Middleware in Next Generation Smart Grids},
  year = {2016},
  pages = {1--6},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/middleware/DubeyPSRS16},
  category = {workshop},
  contribution = {lead},
  doi = {10.1145/3008631.3008632},
  file = {:Dubey2016-The_Role_of_Context_and_Resilient_Middleware_in_Next_Generation_Smart_Grids.pdf:PDF},
  keywords = {smartgrid, middleware},
  project = {cps-reliability,cps-middleware},
  tag = {platform,power},
  timestamp = {Tue, 06 Nov 2018 16:57:13 +0100},
  url = {https://doi.org/10.1145/3008631.3008632}
}
```
The emerging trends of volatile distributed energy resources and micro-grids are putting pressure on electrical power system infrastructure. This pressure is motivating the integration of digital technology and advanced power-industry practices to improve the management of distributed electricity generation, transmission, and distribution, thereby creating a web of systems. Unlike legacy power system infrastructure, however, this emerging next-generation smart grid should be context-aware and adaptive to enable the creation of applications needed to enhance grid robustness and efficiency. This paper describes key factors that are driving the architecture of smart grids and describes orchestration middleware needed to make the infrastructure resilient. We use an example of adaptive protection logic in smart grid substations as a use case to motivate the need for contextawareness and adaptivity.
W. Emfinger, A. Dubey, P. Völgyesi, J. Sallai, and G. Karsai, Demo Abstract: RIAPS - A Resilient Information Architecture Platform for Edge Computing, in IEEE/ACM Symposium on Edge Computing, SEC 2016, Washington, DC, USA, October 27-28, 2016, 2016, pp. 119–120.
```
@inproceedings{Emfinger2016,
  author = {Emfinger, William and Dubey, Abhishek and V{\"{o}}lgyesi, P{\'{e}}ter and Sallai, J{\'{a}}nos and Karsai, Gabor},
  booktitle = {{IEEE/ACM} Symposium on Edge Computing, {SEC} 2016, Washington, DC, USA, October 27-28, 2016},
  title = {Demo Abstract: {RIAPS} - {A} Resilient Information Architecture Platform for Edge Computing},
  year = {2016},
  pages = {119--120},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/edge/EmfingerDVSK16},
  category = {poster},
  contribution = {lead},
  doi = {10.1109/SEC.2016.23},
  file = {:Emfinger2016-Demo_Abstract_RIAPS-A_Resilient_Information_Architecture_Platform_for_Edge_Computing.pdf:PDF},
  keywords = {middleware},
  project = {cps-middleware},
  tag = {platform,decentralization,power},
  timestamp = {Wed, 16 Oct 2019 14:14:56 +0200},
  url = {https://doi.org/10.1109/SEC.2016.23}
}
```
The emerging CPS/IoT ecosystem platforms such as Beaglebone Black, Raspberry Pi, Intel Edison and other edge devices such as SCALE, Paradrop are providing new capabilities for data collection, analysis and processing at the edge (also referred to as Fog Computing). This allows the dynamic composition of computing and communication networks that can be used to monitor and control the physical phenomena closer to the physical system. However, there are still a number of challenges that exist and must be resolved before we see wider applicability of these platforms for applications in safety-critical application domains such as Smart Grid and Traffic Control.
G. Martins, A. Moondra, A. Dubey, A. Bhattacharjee, and X. D. Koutsoukos, Computation and Communication Evaluation of an Authentication Mechanism for Time-Triggered Networked Control Systems, Sensors, vol. 16, no. 8, p. 1166, 2016.
```
@article{Martins2016,
  author = {Martins, Gon{\c{c}}alo and Moondra, Arul and Dubey, Abhishek and Bhattacharjee, Anirban and Koutsoukos, Xenofon D.},
  journal = {Sensors},
  title = {Computation and Communication Evaluation of an Authentication Mechanism for Time-Triggered Networked Control Systems},
  year = {2016},
  number = {8},
  pages = {1166},
  volume = {16},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/journals/sensors/MartinsMDBK16},
  contribution = {minor},
  doi = {10.3390/s16081166},
  file = {:Martins2016-Computation_and_Communication_Evaluation_of an Authentication_Mechanism_for_Time-Triggered_Networked_Control_Systems.pdf:PDF},
  keywords = {reliability},
  project = {cps-middleware,cps-reliability},
  tag = {platform},
  timestamp = {Wed, 14 Nov 2018 00:00:00 +0100},
  url = {https://doi.org/10.3390/s16081166}
}
```
In modern networked control applications, confidentiality and integrity are important features to address in order to prevent against attacks. Moreover, network control systems are a fundamental part of the communication components of current cyber-physical systems (e.g., automotive communications). Many networked control systems employ Time-Triggered (TT) architectures that provide mechanisms enabling the exchange of precise and synchronous messages. TT systems have computation and communication constraints, and with the aim to enable secure communications in the network, it is important to evaluate the computational and communication overhead of implementing secure communication mechanisms. This paper presents a comprehensive analysis and evaluation of the effects of adding a Hash-based Message Authentication (HMAC) to TT networked control systems. The contributions of the paper include (1) the analysis and experimental validation of the communication overhead, as well as a scalability analysis that utilizes the experimental result for both wired and wireless platforms and (2) an experimental evaluation of the computational overhead of HMAC based on a kernel-level Linux implementation. An automotive application is used as an example, and the results show that it is feasible to implement a secure communication mechanism without interfering with the existing automotive controller execution times. The methods and results of the paper can be used for evaluating the performance impact of security mechanisms and, thus, for the design of secure wired and wireless TT networked control systems.
S. Nannapaneni, S. Mahadevan, S. Pradhan, and A. Dubey, Towards Reliability-Based Decision Making in Cyber-Physical Systems, in 2016 IEEE International Conference on Smart Computing, SMARTCOMP 2016, St Louis, MO, USA, May 18-20, 2016, 2016, pp. 1–6.
```
@inproceedings{Nannapaneni2016,
  author = {Nannapaneni, Saideep and Mahadevan, Sankaran and Pradhan, Subhav and Dubey, Abhishek},
  booktitle = {2016 {IEEE} International Conference on Smart Computing, {SMARTCOMP} 2016, St Louis, MO, USA, May 18-20, 2016},
  title = {Towards Reliability-Based Decision Making in Cyber-Physical Systems},
  year = {2016},
  note = {At Workshop},
  pages = {1--6},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/smartcomp/NannapaneniMPD16},
  category = {workshop},
  contribution = {lead},
  doi = {10.1109/SMARTCOMP.2016.7501724},
  file = {:Nannapaneni2016-Towards_Reliability-Based_Decision_Making_in_Cyber-Physical_Systems.pdf:PDF},
  keywords = {reliability, performance},
  project = {cps-reliability},
  tag = {platform},
  timestamp = {Wed, 16 Oct 2019 14:14:54 +0200},
  url = {https://doi.org/10.1109/SMARTCOMP.2016.7501724}
}
```
Cyber-physical systems (CPS) are systems with a tight integration between the computational (also referred to as software or cyber) and physical (hardware) components. While the reliability evaluation of physical systems is well-understood and well-studied, reliability evaluation of CPS is difficult because software systems do not degrade and follow a well-defined failure model like physical systems. In this paper, we propose a framework for formulating the CPS reliability evaluation as a dependence problem derived from the software component dependences, functional requirements and physical system dependences. We also consider sensor failures, and propose a method for estimating software failures in terms of associated hardware and software inputs. This framework is codified in a domain-specific modeling language, where every system-level function is mapped to a set of required components using functional decomposition and function-component association; this provides details about operational constraints and dependences. We also illustrate how the encoded information can be used to make reconfiguration decisions at runtime. The proposed methodology is demonstrated using a smart parking system, which provides localization and guidance for parking within indoor environments.
H. Neema, W. Emfinger, and A. Dubey, A Reusable and Extensible Web-Based Co-Simulation Platform for Transactive Energy Systems, in Proceedings of the 3rd International Transactive Energy Systems, Portland, Oregon, USA, 2016, vol. 12.
```
@inproceedings{Neema2016,
  author = {Neema, Himanshu and Emfinger, William and Dubey, Abhishek},
  booktitle = {Proceedings of the 3rd International Transactive Energy Systems, Portland, Oregon, USA},
  title = {A Reusable and Extensible Web-Based Co-Simulation Platform for Transactive Energy Systems},
  year = {2016},
  volume = {12},
  category = {workshop},
  contribution = {lead},
  file = {:Neema2016-A_Reusable_and_Extensible_Web-Based_Co-Simulation_Platform_for_Transactive_Energy_Systems.pdf:PDF},
  keywords = {transactive},
  tag = {platform,power}
}
```
Rapid evolution of energy generation technology and increased used of distributed energy resources (DER) is continually pushing utilities to adapt and evolve business models to align with these changes. Today, more consumers are also producing energy using green generation technologies and energy pricing is becoming rather competitive and transactional, needing utilities to increase flexibility of grid operations and incorporate transactive energy systems (TES). However, a huge bottleneck is to ensure stable grid operations while gaining efficiency. A comprehensive platform is therefore needed for grid-scale multi-aspects integrated evaluations. For instance, cyber-attacks in a road traffic controller’s communication network can subtly divert electric vehicles in a particular area, causing surge in the grid loads due to increased EV charging and people activity, which can potentially disrupt, an otherwise robust, grid. To evaluate such a scenario, multiple special-purpose simulators (e.g., SUMO, OMNeT++, GridlabD, etc.) must be run in an integrated manner. To support this, we are developing a cloud-deployed web- and model-based simulation integration platform that enables integrated evaluations of transactive energy systems and is highly extensible and customizable for utility-specific custom simulation tools.
S. Pradhan, A. Dubey, T. Levendovszky, P. S. Kumar, W. Emfinger, D. Balasubramanian, W. Otte, and G. Karsai, Achieving resilience in distributed software systems via self-reconfiguration, Journal of Systems and Software, vol. 122, pp. 344–363, 2016.
```
@article{Pradhan2016,
  author = {Pradhan, Subhav and Dubey, Abhishek and Levendovszky, Tihamer and Kumar, Pranav Srinivas and Emfinger, William and Balasubramanian, Daniel and Otte, William and Karsai, Gabor},
  journal = {Journal of Systems and Software},
  title = {Achieving resilience in distributed software systems via self-reconfiguration},
  year = {2016},
  pages = {344--363},
  volume = {122},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/journals/jss/PradhanDLKEBOK16},
  contribution = {lead},
  doi = {10.1016/j.jss.2016.05.038},
  file = {:Pradhan2016-Achieving_resilience_in_distributed_software_systems_via_self-reconfiguration.pdf:PDF},
  keywords = {reliability},
  project = {cps-middleware,cps-reliability},
  tag = {platform,a14cps},
  timestamp = {Mon, 06 Nov 2017 00:00:00 +0100},
  url = {https://doi.org/10.1016/j.jss.2016.05.038}
}
```
Improvements in mobile networking combined with the ubiquitous availability and adoption of low-cost development boards have enabled the vision of mobile platforms of Cyber-Physical Systems (CPS), such as fractionated spacecraft and UAV swarms. Computation and communication resources, sensors, and actuators that are shared among different applications characterize these systems. The cyber-physical nature of these systems means that physical environments can affect both the resource availability and software applications that depend on resource availability. While many application development and management challenges associated with such systems have been described in existing literature, resilient operation and execution have received less attention. This paper describes our work on improving runtime support for resilience in mobile CPS, with a special focus on our runtime infrastructure that provides autonomous resilience via self-reconfiguration. We also describe the interplay between this runtime infrastructure and our design-time tools, as the later is used to statically determine the resilience properties of the former. Finally, we present a use case study to demonstrate and evaluate our design-time resilience analysis and runtime self-reconfiguration infrastructure.
S. Pradhan, A. Dubey, S. Khare, F. Sun, J. Sallai, A. S. Gokhale, D. C. Schmidt, M. Lehofer, and M. Sturm, Poster Abstract: A Distributed and Resilient Platform for City-Scale Smart Systems, in IEEE/ACM Symposium on Edge Computing, SEC 2016, Washington, DC, USA, October 27-28, 2016, 2016, pp. 99–100.
```
@inproceedings{Pradhan2016a,
  author = {Pradhan, Subhav and Dubey, Abhishek and Khare, Shweta and Sun, Fangzhou and Sallai, J{\'{a}}nos and Gokhale, Aniruddha S. and Schmidt, Douglas C. and Lehofer, Martin and Sturm, Monika},
  booktitle = {{IEEE/ACM} Symposium on Edge Computing, {SEC} 2016, Washington, DC, USA, October 27-28, 2016},
  title = {Poster Abstract: {A} Distributed and Resilient Platform for City-Scale Smart Systems},
  year = {2016},
  pages = {99--100},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/edge/PradhanDKSSGSLS16},
  category = {poster},
  contribution = {lead},
  doi = {10.1109/SEC.2016.28},
  file = {:Pradhan2016a-Poster_Abstract_A_Distributed_and_Resilient_Platform_for_City-Scale_Smart_Systems.pdf:PDF},
  keywords = {middleware},
  project = {cps-middleware,smart-cities},
  tag = {platform},
  timestamp = {Wed, 16 Oct 2019 14:14:56 +0200},
  url = {https://doi.org/10.1109/SEC.2016.28}
}
```
The advent of the Internet of Things (IoT) is driving several technological trends. The first trend is an increased level of integration between edge devices and commodity computers. This trend, in conjunction with low power-devices, energy harvesting, and improved battery technology, is enabling the next generation of information technology (IT) innovation: city-scale smart systems. These types of IoT systems can operate at multiple time-scales, ranging from closed-loop control requiring strict real-time decision and actuation to near real-time operation with humans-in-the-loop, as well as to long-term analysis, planning, and decision-making.
S. Pradhan, A. Dubey, and A. S. Gokhale, WiP Abstract: Platform for Designing and Managing Resilient and Extensible CPS, in 7th ACM/IEEE International Conference on Cyber-Physical Systems, ICCPS 2016, Vienna, Austria, April 11-14, 2016, 2016, p. 39:1.
```
@inproceedings{Pradhan2016b,
  author = {Pradhan, Subhav and Dubey, Abhishek and Gokhale, Aniruddha S.},
  booktitle = {7th {ACM/IEEE} International Conference on Cyber-Physical Systems, {ICCPS} 2016, Vienna, Austria, April 11-14, 2016},
  title = {WiP Abstract: Platform for Designing and Managing Resilient and Extensible {CPS}},
  year = {2016},
  pages = {39:1},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/iccps/PradhanDG16},
  category = {poster},
  contribution = {lead},
  doi = {10.1109/ICCPS.2016.7479128},
  file = {:Pradhan2016b-WiP_Abstract_Platform_for_Designing_and_Managing_Resilient_and_Extensible_CPS.pdf:PDF},
  keywords = {performance, middleware},
  project = {cps-reliability,cps-middleware},
  tag = {platform},
  timestamp = {Wed, 16 Oct 2019 14:14:57 +0200},
  url = {https://doi.org/10.1109/ICCPS.2016.7479128}
}
```
Extensible Cyber-Physical Systems (CPS) are loosely connected, multi-domain platforms that "virtualize" their resources to provide an open platform capable of hosting different cyber-physical applications. These cyber- physical platforms are extensible since resources and applications can be added or removed at any time. However, realizing such platform requires resolving challenges emanating from different properties; for this paper, we focus on resilience. Resilience is important for extensible CPS to make sure that extensibility of a system doesn’t result in failures and anomalies.
S. Pradhan, A. Dubey, and A. S. Gokhale, Designing a Resilient Deployment and Reconfiguration Infrastructure for Remotely Managed Cyber-Physical Systems, in Software Engineering for Resilient Systems - 8th International Workshop, SERENE 2016, Gothenburg, Sweden, September 5-6, 2016, Proceedings, 2016, pp. 88–104.
```
@inbook{Pradhan2016c,
  author = {Pradhan, Subhav and Dubey, Abhishek and Gokhale, Aniruddha S.},
  pages = {88--104},
  title = {Designing a Resilient Deployment and Reconfiguration Infrastructure for Remotely Managed Cyber-Physical Systems},
  year = {2016},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/serene/PradhanDG16},
  booktitle = {Software Engineering for Resilient Systems - 8th International Workshop, {SERENE} 2016, Gothenburg, Sweden, September 5-6, 2016, Proceedings},
  contribution = {lead},
  doi = {10.1007/978-3-319-45892-2\_7},
  file = {:Pradhan2016c-Designing_a_Resilient_Deployment_and_Reconfiguration_Infrastructure_for_Remotely_Managed_CPS.pdf:PDF},
  keywords = {middleware, reliability},
  project = {cps-reliability,cps-middleware},
  tag = {platform},
  timestamp = {Tue, 14 May 2019 10:00:48 +0200},
  url = {https://doi.org/10.1007/978-3-319-45892-2\_7}
}
```
Multi-module Cyber-Physical Systems (CPS), such as satellite clusters, swarms of Unmanned Aerial Vehicles (UAV), and fleets of Unmanned Underwater Vehicles (UUV) provide a CPS cluster-as-a-service for CPS applications. The distributed and remote nature of these systems often necessitates the use of Deployment and Configuration (D&C) services to manage the lifecycle of these applications. Fluctuating resources, volatile cluster membership and changing environmental conditions necessitate resilience. Thus, the D&C infrastructure does not only have to undertake basic management actions, such as activation of new applications and deactivation of existing applications, but also has to autonomously reconfigure existing applications to mitigate failures including D&C infrastructure failures. This paper describes the design and architectural considerations to realize such a D&C infrastructure for component-based distributed systems. Experimental results demonstrating the autonomous resilience capabilities are presented.
A. Chhokra, S. Abdelwahed, A. Dubey, S. Neema, and G. Karsai, From system modeling to formal verification, in 2015 Electronic System Level Synthesis Conference (ESLsyn), 2015, pp. 41–46.
```
@inproceedings{Chhokra2015,
  author = {{Chhokra}, A. and {Abdelwahed}, S. and Dubey, Abhishek and {Neema}, S. and {Karsai}, G.},
  booktitle = {2015 Electronic System Level Synthesis Conference (ESLsyn)},
  title = {From system modeling to formal verification},
  year = {2015},
  month = jun,
  pages = {41-46},
  category = {conference},
  contribution = {minor},
  file = {:Chhokra2015-From_system_modeling_to_formal_verification.pdf:PDF},
  issn = {2117-4628},
  keywords = {reliability},
  tag = {platform}
}
```
Due to increasing design complexity, modern systems are modeled at a high level of abstraction. SystemC is widely accepted as a system level language for modeling complex embedded systems. Verification of these SystemC designs nullifies the chances of error propagation down to the hardware. Due to lack of formal semantics of SystemC, the verification of such designs is done mostly in an unsystematic manner. This paper provides a new modeling environment that enables the designer to simulate and formally verify the designs by generating SystemC code. The generated SystemC code is automatically translated to timed automata for formal analysis.
A. Chhokra, A. Dubey, N. Mahadevan, and G. Karsai, A component-based approach for modeling failure propagations in power systems, in 2015 Workshop on Modeling and Simulation of Cyber-Physical Energy Systems (MSCPES), 2015, pp. 1–6.
```
@inproceedings{Chhokra2015a,
  author = {{Chhokra}, A. and Dubey, Abhishek and {Mahadevan}, N. and {Karsai}, G.},
  booktitle = {2015 Workshop on Modeling and Simulation of Cyber-Physical Energy Systems (MSCPES)},
  title = {A component-based approach for modeling failure propagations in power systems},
  year = {2015},
  month = apr,
  pages = {1-6},
  category = {workshop},
  contribution = {colab},
  doi = {10.1109/MSCPES.2015.7115412},
  file = {:Chhokra2015a-A_component-based_approach_for_modeling_failure_propagations_in_power_systems.pdf:PDF},
  keywords = {smartgrid},
  tag = {platform,power}
}
```
Resiliency and reliability is of paramount impor- tance for energy cyber physical systems. Electrical protection systems including detection elements such as Distance Relays and actuation elements such as Breakers are designed to protect the system from abnormal operations and arrest failure propagation by rapidly isolating the faulty components. However, failure in the protection devices themselves can and do lead to major system events and fault cascades, often leading to blackouts. This paper augments our past work on Temporal Causal Diagrams (TCD), a modeling formalism designed to help reason about the failure progressions by (a) describing a way to generate the TCD model from the system specification, and (b) understand the system failure dynamics for TCD reasoners by configuring simulation models.
D. Balasubramanian, A. Dubey, W. Otte, T. Levendovszky, A. S. Gokhale, P. S. Kumar, W. Emfinger, and G. Karsai, DREMS ML: A wide spectrum architecture design language for distributed computing platforms, Sci. Comput. Program., vol. 106, pp. 3–29, 2015.
```
@article{Balasubramanian2015,
  author = {Balasubramanian, Daniel and Dubey, Abhishek and Otte, William and Levendovszky, Tihamer and Gokhale, Aniruddha S. and Kumar, Pranav Srinivas and Emfinger, William and Karsai, Gabor},
  journal = {Sci. Comput. Program.},
  title = {{DREMS} {ML:} {A} wide spectrum architecture design language for distributed computing platforms},
  year = {2015},
  pages = {3--29},
  volume = {106},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/journals/scp/Balasubramanian15},
  contribution = {colab},
  doi = {10.1016/j.scico.2015.04.002},
  file = {:Balasubramanian2015-DREMS_ML_A_wide_spectrum_architecture_design_language_for_distributed_computing_platforms.pdf:PDF},
  keywords = {middleware},
  project = {cps-middleware},
  tag = {platform},
  timestamp = {Sat, 27 May 2017 01:00:00 +0200},
  url = {https://doi.org/10.1016/j.scico.2015.04.002}
}
```
Complex sensing, processing and control applications running on distributed platforms are difficult to design, develop, analyze, integrate, deploy and operate, especially if resource constraints, fault tolerance and security issues are to be addressed. While technology exists today for engineering distributed, real-time component-based applications, many problems remain unsolved by existing tools. Model-driven development techniques are powerful, but there are very few existing and complete tool chains that offer an end-to-end solution to developers, from design to deployment. There is a need for an integrated model-driven development environment that addresses all phases of application lifecycle including design, development, verification, analysis, integration, deployment, operation and maintenance, with supporting automation in every phase. Arguably, a centerpiece of such a model-driven environment is the modeling language. To that end, this paper presents a wide-spectrum architecture design language called DREMS ML that itself is an integrated collection of individual domain-specific sub-languages. We claim that the language promotes “correct-by-construction” software development and integration by supporting each individual phase of the application lifecycle. Using a case study, we demonstrate how the design of DREMS ML impacts the development of embedded systems.
N. Mahadevan, A. Dubey, A. Chhokra, H. Guo, and G. Karsai, Using temporal causal models to isolate failures in power system protection devices, IEEE Instrum. Meas. Mag., vol. 18, no. 4, pp. 28–39, 2015.
```
@article{Mahadevan2015,
  author = {Mahadevan, Nagabhushan and Dubey, Abhishek and Chhokra, Ajay and Guo, Huangcheng and Karsai, Gabor},
  journal = {{IEEE} Instrum. Meas. Mag.},
  title = {Using temporal causal models to isolate failures in power system protection devices},
  year = {2015},
  number = {4},
  pages = {28--39},
  volume = {18},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/journals/imm/MahadevanDCGK15},
  contribution = {lead},
  doi = {10.1109/MIM.2015.7155770},
  file = {:Mahadevan2015-Using_temporal_causal_models_to_isolate_failures_in_power_system_protection_devices.pdf:PDF},
  keywords = {smartgrid, reliability},
  project = {cps-reliability,smart-energy},
  tag = {platform,power},
  timestamp = {Sun, 28 May 2017 01:00:00 +0200},
  url = {https://doi.org/10.1109/MIM.2015.7155770}
}
```
We introduced the modeling paradigm of Temporal Causal Diagrams (TCD) in this paper. TCDs capture fault propagation and behavior (nominal and faulty) of system components. An example model for the power transmission systems was also described. This TCD model was then used to develop an executable simulation model in Simulink/ Stateflow. Though this translation of TCD to an executable model is currently done manually, we are developing model templates and tools to automate this process. Simulations results (i.e., event traces) for a couple of single and multi-fault scenarios were also presented. As part of our future work, we wish to test and study the scalability of this approach towards a larger power transmission system taking into account a far richer set of protection elements. Further, we wish to consider more realistic event traces from the fault scenarios including missing, inconsistent and out-of-sequence alarms and events.
S. M. Pradhan, A. Dubey, A. S. Gokhale, and M. Lehofer, CHARIOT: a domain specific language for extensible cyber-physical systems, in Proceedings of the Workshop on Domain-Specific Modeling, DSM@SPLASH 2015, Pittsburgh, PA, USA, October 27, 2015, 2015, pp. 9–16.
```
@inproceedings{Pradhan2015,
  author = {Pradhan, Subhav M. and Dubey, Abhishek and Gokhale, Aniruddha S. and Lehofer, Martin},
  booktitle = {Proceedings of the Workshop on Domain-Specific Modeling, DSM@SPLASH 2015, Pittsburgh, PA, USA, October 27, 2015},
  title = {{CHARIOT:} a domain specific language for extensible cyber-physical systems},
  year = {2015},
  pages = {9--16},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/oopsla/PradhanDGL15},
  category = {workshop},
  contribution = {lead},
  doi = {10.1145/2846696.2846708},
  file = {:Pradhan2015-CHARIOT_a_domain_specific_language_for_extensible_cyber-physical_systems.pdf:PDF},
  keywords = {middleware, reliability},
  project = {cps-middleware,cps-reliability},
  tag = {platform},
  timestamp = {Tue, 06 Nov 2018 16:57:16 +0100},
  url = {https://doi.org/10.1145/2846696.2846708}
}
```
Wider adoption, availability and ubiquity of wireless networking technologies, integrated sensors, actuators, and edge computing devices is facilitating a paradigm shift by allowing us to transition from traditional statically configured vertical silos of CyberPhysical Systems (CPS) to next generation CPS that are more open, dynamic and extensible. Fractionated spacecraft, smart cities computing architectures, Unmanned Aerial Vehicle (UAV) clusters, platoon of vehicles on highways are all examples of extensible CPS wherein extensibility is implied by the dynamic aggregation of physical resources, affect of physical dynamics on availability of computing resources, and various multi-domain applications hosted on these systems. However, realization of extensible CPS requires resolving design-time and run-time challenges emanating from properties specific to these systems. In this paper, we first describe different properties of extensible CPS - dynamism, extensibility, remote deployment, security, heterogeneity and resilience. Then we identify different design-time challenges stemming from heterogeneity and resilience requirements. We particularly focus on software heterogeneity arising from availability of various communication middleware. We then present appropriate solutions in the context of a novel domain specific language, which can be used to design resilient systems while remaining agnostic to middleware heterogeneities. We also describe how this language and its features have evolved from our past work. We use a platform of fractionated spacecraft to describe our solution.
S. Nannapaneni, A. Dubey, S. Abdelwahed, S. Mahadevan, and S. Neema, A Model-Based Approach for Reliability Assessment in Component-Based Systems, in PHM 2014 - Proceedings of the Annual Conference of the Prognostics and Health Management Society 2014, 2014.
```
@inproceedings{Nannapaneni2014,
  author = {Nannapaneni, Saideep and Dubey, Abhishek and Abdelwahed, Sherif and Mahadevan, Sankaran and Neema, Sandeep},
  booktitle = {PHM 2014 - Proceedings of the Annual Conference of the Prognostics and Health Management Society 2014},
  title = {A Model-Based Approach for Reliability Assessment in Component-Based Systems},
  year = {2014},
  month = oct,
  category = {conference},
  contribution = {colab},
  file = {:Nannapaneni2014-A_Model-based_approach_for_reliability_assessment_in_component_based_systems.pdf:PDF},
  keywords = {reliability},
  tag = {platform}
}
```
This paper describes a formal framework for reliability assessment of component-based systems with respect to specific missions. A mission comprises of different timed mission stages, with each stage requiring a number of highlevel functions. The work presented here describes a modeling language to capture the functional decomposition and missions of a system. The components and their alternatives are mapped to basic functions which are used to implement the system-level functions. Our contribution is the extraction of mission-specific reliability block diagram from these high-level models of component assemblies. This is then used to compute the mission reliability using reliability information of components. This framework can be used for real-time monitoring of system performance where reliability of the mission is computed over time as the mission is in progress. Other quantities of interest such as mission feasibility, function availability can also be computed using this framework. Mission feasibility answers the question whether the mission can be accomplished given the current state of components in the system and function availability provides information if the function is available in the future given the current state of the system. The software used in this framework includes Generic Modeling Environment (GME) and Python. GME is used for modeling the system and Python for reliability computations. The proposed methodology is demonstrated using a radio-controlled (RC) car in carrying out a simple surveillance mission.
S. Pradhan, W. Emfinger, A. Dubey, W. R. Otte, D. Balasubramanian, A. Gokhale, G. Karsai, and A. Coglio, Establishing Secure Interactions across Distributed Applications in Satellite Clusters, in 2014 IEEE International Conference on Space Mission Challenges for Information Technology, 2014, pp. 67–74.
```
@inproceedings{Pradhan2014,
  author = {{Pradhan}, S. and {Emfinger}, W. and Dubey, Abhishek and {Otte}, W. R. and {Balasubramanian}, D. and {Gokhale}, A. and {Karsai}, G. and {Coglio}, A.},
  booktitle = {2014 IEEE International Conference on Space Mission Challenges for Information Technology},
  title = {Establishing Secure Interactions across Distributed Applications in Satellite Clusters},
  year = {2014},
  month = sep,
  pages = {67-74},
  category = {conference},
  contribution = {lead},
  doi = {10.1109/SMC-IT.2014.17},
  file = {:Pradhan2014-Establishing_Secure_Interactions_across_Distributed_Applications_in_Satellite_Clusters.pdf:PDF},
  issn = {null},
  keywords = {middleware},
  tag = {platform}
}
```
Recent developments in small satellites have led to an increasing interest in building satellite clusters as open systems that provide a "cluster-as-a-service" in space. Since applications with different security classification levels must be supported in these open systems, the system must provide strict information partitioning such that only applications with matching security classifications interact with each other. The anonymous publish/subscribe communication pattern is a powerful interaction abstraction that has enjoyed great success in previous space software architectures, such as NASA’s Core Flight Executive. However, the difficulty is that existing solutions that support anonymous publish/subscribe communication, such as the OMG Data Distribution Service (DDS), do not support information partitioning based on security classifications, which is a key requirement for some systems. This paper makes two contributions to address these limitations. First, we present a transport mechanism called Secure Transport that uses a lattice of labels to represent security classifications and enforces Multi-Level Security (MLS) policies to ensure strict information partitioning. Second, we present a novel discovery service that allows us to use an existing DDS implementation with our custom transport mechanism to realize a publish/subscribe middleware with information partitioning based on security classifications of applications. We also include an evaluation of our solution in the context of a use case scenario.
G. Martins, A. Bhattacharjee, A. Dubey, and X. D. Koutsoukos, Performance evaluation of an authentication mechanism in time-triggered networked control systems, in 2014 7th International Symposium on Resilient Control Systems (ISRCS), 2014, pp. 1–6.
```
@inproceedings{Martins2014,
  author = {{Martins}, G. and {Bhattacharjee}, A. and Dubey, Abhishek and {Koutsoukos}, X. D.},
  booktitle = {2014 7th International Symposium on Resilient Control Systems (ISRCS)},
  title = {Performance evaluation of an authentication mechanism in time-triggered networked control systems},
  year = {2014},
  month = aug,
  pages = {1-6},
  category = {conference},
  contribution = {minor},
  doi = {10.1109/ISRCS.2014.6900098},
  file = {:Martins2014-Performance_Evaluation_of_an_Authentication_Mechanism_in_Time-Triggered_Network_Control_Systems.pdf:PDF},
  issn = {null},
  keywords = {middleware, performance},
  tag = {platform}
}
```
An important challenge in networked control systems is to ensure the confidentiality and integrity of the message in order to secure the communication and prevent attackers or intruders from compromising the system. However, security mechanisms may jeopardize the temporal behavior of the network data communication because of the computation and communication overhead. In this paper, we study the effect of adding Hash Based Message Authentication (HMAC) to a time-triggered networked control system. Time Triggered Architectures (TTAs) provide a deterministic and predictable timing behavior that is used to ensure safety, reliability and fault tolerance properties. The paper analyzes the computation and communication overhead of adding HMAC and the impact on the performance of the time-triggered network. Experimental validation and performance evaluation results using a TTEthernet network are also presented.
N. Mahadevan, A. Dubey, G. Karsai, A. Srivastava, and C.-C. Liu, Temporal Causal Diagrams for diagnosing failures in cyber-physical systems, in Annual Conference of the Prognostics and Health Management Society, 2014.
```
@inproceedings{Mahadevan2014,
  author = {Mahadevan, Nagabhushan and Dubey, Abhishek and Karsai, Gabor and Srivastava, Anurag and Liu, Chen-Ching},
  booktitle = {Annual Conference of the Prognostics and Health Management Society},
  title = {Temporal Causal Diagrams for diagnosing failures in cyber-physical systems},
  year = {2014},
  month = jan,
  category = {conference},
  contribution = {colab},
  file = {:Mahadevan2014-Temporal_Causal_Diagrams_for_Diagnosing_Failures_in_Cyber_Physical_Systems.pdf:PDF},
  keywords = {reliability, smartgrid},
  tag = {platform,power}
}
```
Resilient and reliable operation of cyber physical systems of societal importance such as Smart Electric Grids is one of the top national priorities. Due to their critical nature, these systems are equipped with fast-acting, local protection mechanisms. However, commonly misguided protection actions together with system dynamics can lead to un-intentional cascading effects. This paper describes the ongoing work using Temporal Causal Diagrams (TCD), a refinement of the Timed Failure Propagation Graphs (TFPG), to diagnose problems associated with the power transmission lines protected by a combination of relays and breakers. The TCD models represent the faults and their propagation as TFPG, the nominal and faulty behavior of components (including local, discrete controllers and protection devices) as Timed Discrete Event Systems (TDES), and capture the cumulative and cascading effects of these interactions. The TCD diagnosis engine includes an extended TFPG-like reasoner which in addition to observing the alarms and mode changes (as the TFPG), monitors the event traces (that correspond to the behavioral aspects of the model) to generate hypotheses that consistently explain all the observations. In this paper, we show the results of applying the TCD to a segment of a power transmission system that is protected by distance relays and breakers.
S. Pradhan, W. Otte, A. Dubey, C. Szabo, A. Gokhale, and G. Karsai, Towards a Self-adaptive Deployment and Configuration Infrastructure for Cyber-Physical Systems, Institute for Software Integrated Systems, Vanderbilt University, Nashville, Technical Report ISIS-14-102, 2014.
```
@techreport{Pradhan2014b,
  author = {Pradhan, Subhav and Otte, William and Dubey, Abhishek and Szabo, Csanad and Gokhale, Aniruddha and Karsai, Gabor},
  institution = {Institute for Software Integrated Systems, Vanderbilt University},
  title = {Towards a Self-adaptive Deployment and Configuration Infrastructure for Cyber-Physical Systems},
  year = {2014},
  address = {Nashville},
  month = {6/2014},
  number = {ISIS-14-102},
  type = {Technical Report},
  attachments = {http://www.isis.vanderbilt.edu/sites/default/files/TechReport2013.pdf},
  contribution = {colab},
  file = {:Pradhan2014b-Towards_a_self-adaptive_deployment_and_configuration_infrastructure_for_CPS.pdf:PDF},
  issn = {ISIS-14-102},
  keywords = {middleware},
  owner = {abhishek},
  tag = {platform},
  timestamp = {2015.10.16},
  url = {http://www.isis.vanderbilt.edu/sites/default/files/Pradhan_SEAMS_TechReport.pdf}
}
```
Multi-module Cyber-Physical Systems (CPSs), such as satellite clusters, swarms of Unmanned Aerial Vehicles (UAV), and fleets of Unmanned Underwater Vehicles (UUV) are examples of managed distributed real-time systems where mission-critical applications, such as sensor fusion or coordinated flight control, are hosted. These systems are dynamic and reconfigurable, and provide a "CPS cluster-as-a-service’’ for mission-specific scientific applications that can benefit from the elasticity of the cluster membership and heterogeneity of the cluster members. Distributed and remote nature of these systems often necessitates the use of Deployment and Configuration (D&C) services to manage lifecycle of software applications. Fluctuating resources, volatile cluster membership and changing environmental conditions require resilience. However, due to the dynamic nature of the system, human intervention is often infeasible. This necessitates a self-adaptive D&C infrastructure that supports autonomous resilience. Such an infrastructure must have the ability to adapt existing applications on the fly in order to provide application resilience and must itself be able to adapt to account for changes in the system as well as tolerate failures. This paper describes the design and architectural considerations to realize a self-adaptive, D&C infrastructure for CPSs. Previous efforts in this area have resulted in D&C infrastructures that support application adaptation via dynamic re-deployment and re-configuration mechanisms. Our work, presented in this paper, improves upon these past efforts by implementing a self-adaptive D&C infrastructure which itself is resilient. The paper concludes with experimental results that demonstrate the autonomous resilience capabilities of our new D&C infrastructure.
D. Balasubramanian, T. Levendovszky, A. Dubey, and G. Karsai, Taming Multi-Paradigm Integration in a Software Architecture Description Language, in Proceedings of the 8th Workshop on Multi-Paradigm Modeling co-located with the 17th International Conference on Model Driven Engineering Languages and Systems, MPM@MODELS 2014, Valencia, Spain, September 30, 2014, 2014, pp. 67–76.
```
@inproceedings{Balasubramanian2014,
  author = {Balasubramanian, Daniel and Levendovszky, Tihamer and Dubey, Abhishek and Karsai, Gabor},
  booktitle = {Proceedings of the 8th Workshop on Multi-Paradigm Modeling co-located with the 17th International Conference on Model Driven Engineering Languages and Systems, MPM@MODELS 2014, Valencia, Spain, September 30, 2014},
  title = {Taming Multi-Paradigm Integration in a Software Architecture Description Language},
  year = {2014},
  pages = {67--76},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/models/BalasubramanianLDK14},
  category = {workshop},
  contribution = {colab},
  file = {:Balasubramanian2014-Taming_Multi-Paradigm_Integration_in_a_Software_Architecture_Description_Language.pdf:PDF},
  keywords = {middleware},
  project = {cps-reliability,cps-middleware},
  tag = {platform},
  timestamp = {Thu, 18 Jul 2019 11:36:32 +0200},
  url = {http://ceur-ws.org/Vol-1237/paper7.pdf}
}
```
Software architecture description languages offer a convenient way of describing the high-level structure of a software system. Such descriptions facilitate rapid prototyping, code generation and automated analysis. One of the big challenges facing the software community is the design of architecture description languages that are general enough to describe a wide-range of systems, yet detailed enough to capture domain-specific properties and provide a high level of tool automation. This paper presents the multi-paradigm challenges we faced and solutions we built when creating a domain-specific modeling language for software architectures of distributed real-time systems.
D. Balasubramanian, A. Dubey, W. R. Otte, W. Emfinger, P. S. Kumar, and G. Karsai, A Rapid Testing Framework for a Mobile Cloud, in 25nd IEEE International Symposium on Rapid System Prototyping, RSP 2014, New Delhi, India, October 16-17, 2014, 2014, pp. 128–134.
```
@inproceedings{Balasubramanian2014a,
  author = {Balasubramanian, Daniel and Dubey, Abhishek and Otte, William R. and Emfinger, William and Kumar, Pranav Srinivas and Karsai, Gabor},
  booktitle = {25nd {IEEE} International Symposium on Rapid System Prototyping, {RSP} 2014, New Delhi, India, October 16-17, 2014},
  title = {A Rapid Testing Framework for a Mobile Cloud},
  year = {2014},
  pages = {128--134},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/rsp/BalasubramanianDOEKK14},
  category = {selectiveconference},
  contribution = {colab},
  doi = {10.1109/RSP.2014.6966903},
  file = {:Balasubramanian2014a-A_Rapid_Testing_Framework_for_a_Mobile_Cloud.pdf:PDF},
  keywords = {middleware},
  project = {cps-middleware},
  tag = {platform},
  timestamp = {Wed, 16 Oct 2019 14:14:50 +0200},
  url = {https://doi.org/10.1109/RSP.2014.6966903}
}
```
Mobile clouds such as network-connected vehicles and satellite clusters are an emerging class of systems that are extensions to traditional real-time embedded systems: they provide long-term mission platforms made up of dynamic clusters of heterogeneous hardware nodes communicating over ad hoc wireless networks. Besides the inherent complexities entailed by a distributed architecture, developing software and testing these systems is difficult due to a number of other reasons, including the mobile nature of such systems, which can require a model of the physical dynamics of the system for accurate simulation and testing. This paper describes a rapid development and testing framework for a distributed satellite system. Our solutions include a modeling language for configuring and specifying an application’s interaction with the middleware layer, a physics simulator integrated with hardware in the loop to provide the system’s physical dynamics and the integration of a network traffic tool to dynamically vary the network bandwidth based on the physical dynamics.
W. Emfinger, G. Karsai, A. Dubey, and A. S. Gokhale, Analysis, verification, and management toolsuite for cyber-physical applications on time-varying networks, in Proceedings of the 4th ACM SIGBED International Workshop on Design, Modeling, and Evaluation of Cyber-Physical Systems, CyPhy 2014, Berlin, Germany, April 14-17, 2014, 2014, pp. 44–47.
```
@inproceedings{Emfinger2014,
  author = {Emfinger, William and Karsai, Gabor and Dubey, Abhishek and Gokhale, Aniruddha S.},
  booktitle = {Proceedings of the 4th {ACM} {SIGBED} International Workshop on Design, Modeling, and Evaluation of Cyber-Physical Systems, CyPhy 2014, Berlin, Germany, April 14-17, 2014},
  title = {Analysis, verification, and management toolsuite for cyber-physical applications on time-varying networks},
  year = {2014},
  pages = {44--47},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/cyphy/EmfingerKDG14},
  category = {workshop},
  contribution = {colab},
  doi = {10.1145/2593458.2593459},
  file = {:Emfinger2014-Analysis_verification_and_management_toolsuite_for_cyber-physical_applications_on_time-varying_networks.pdf:PDF},
  keywords = {performance},
  project = {cps-reliability},
  tag = {platform},
  timestamp = {Tue, 06 Nov 2018 00:00:00 +0100},
  url = {https://doi.org/10.1145/2593458.2593459}
}
```
Cyber-Physical Systems (CPS) are increasingly utilizing advances in wireless mesh networking among computing nodes to facilitate communication and control for distributed applications. Factors such as interference or node mobility cause such wireless networks to experience changes in both topology and link capacities. These dynamic networks pose a reliability concern for high-criticality or mixed-criticality systems which require strict guarantees about system performance and robustness prior to deployment. To address the design- and run-time verification and reliability concerns created by these dynamic networks, we are developing an integrated modeling, analysis, and run-time toolsuite which provides (1) network profiles that model the dynamics of system network resources and application network requirements over time, (2) design-time verification of application performance on dynamic networks, and (3) management of the CPS network resources during run-time. In this paper we present the foundations for the analysis of dynamic networks and show experimental validations of this analysis. We conclude with a focus on future work and applications to the field
G. Karsai, D. Balasubramanian, A. Dubey, and W. Otte, Distributed and Managed: Research Challenges and Opportunities of the Next Generation Cyber-Physical Systems, in 17th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, ISORC 2014, Reno, NV, USA, June 10-12, 2014, 2014, pp. 1–8.
```
@inproceedings{Karsai2014,
  author = {Karsai, Gabor and Balasubramanian, Daniel and Dubey, Abhishek and Otte, William},
  booktitle = {17th {IEEE} International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, {ISORC} 2014, Reno, NV, USA, June 10-12, 2014},
  title = {Distributed and Managed: Research Challenges and Opportunities of the Next Generation Cyber-Physical Systems},
  year = {2014},
  pages = {1--8},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/isorc/KarsaiBDO14},
  category = {selectiveconference},
  contribution = {colab},
  doi = {10.1109/ISORC.2014.36},
  file = {:Karsai2014-Distributed_and_Managed.pdf:PDF},
  keywords = {middleware},
  project = {cps-reliability,cps-middleware},
  tag = {platform},
  timestamp = {Wed, 16 Oct 2019 14:14:53 +0200},
  url = {https://doi.org/10.1109/ISORC.2014.36}
}
```
Cyber-physical systems increasingly rely on distributed computing platforms where sensing, computing, actuation, and communication resources are shared by a multitude of applications. Such ’cyber-physical cloud computing platforms’ present novel challenges because the system is built from mobile embedded devices, is inherently distributed, and typically suffers from highly fluctuating connectivity among the modules. Architecting software for these systems raises many challenges not present in traditional cloud computing. Effective management of constrained resources and application isolation without adversely affecting performance are necessary. Autonomous fault management and real-time performance requirements must be met in a verifiable manner. It is also both critical and challenging to support multiple end-users whose diverse software applications have changing demands for computational and communication resources, while operating on different levels and in separate domains of security. The solution presented in this paper is based on a layered architecture consisting of a novel operating system, a middleware layer, and component-structured applications. The component model facilitates the construction of software applications from modular and reusable components that are deployed in the distributed system and interact only through well-defined mechanisms. The complexity of creating applications and performing system integration is mitigated through the use of a domain-specific model-driven development process that relies on a domain-specific modeling language and its accompanying graphical modeling tools, software generators for synthesizing infrastructure code, and the extensive use of model-based analysis for verification and validation.
P. S. Kumar, A. Dubey, and G. Karsai, Colored Petri Net-based Modeling and Formal Analysis of Component-based Applications, in Proceedings of the 11th Workshop on Model-Driven Engineering, Verification and Validation co-located with 17th International Conference on Model Driven Engineering Languages and Systems, MoDeVVa@MODELS 2014, Valencia, Spain, September 30, 2014, 2014, pp. 79–88.
```
@inproceedings{Kumar2014,
  author = {Kumar, Pranav Srinivas and Dubey, Abhishek and Karsai, Gabor},
  booktitle = {Proceedings of the 11th Workshop on Model-Driven Engineering, Verification and Validation co-located with 17th International Conference on Model Driven Engineering Languages and Systems, MoDeVVa@MODELS 2014, Valencia, Spain, September 30, 2014},
  title = {Colored Petri Net-based Modeling and Formal Analysis of Component-based Applications},
  year = {2014},
  pages = {79--88},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/models/KumarDK14},
  category = {workshop},
  contribution = {colab},
  keywords = {performance},
  project = {cps-reliability,cps-middleware},
  tag = {platform},
  timestamp = {Tue, 28 May 2019 16:23:34 +0200},
  url = {http://ceur-ws.org/Vol-1235/paper-10.pdf}
}
```
Distributed Real-Time Embedded (DRE) Systems that address safety and mission-critical system requirements are applied in a variety of domains today. Complex, integrated systems like managed satellite clusters expose heterogeneous concerns such as strict timing requirements, complexity in system integration, deployment, and repair; and resilience to faults. Integrating appropriate modeling and analysis techniques into the design of such systems helps ensure predictable, dependable and safe operation upon deployment. This paper describes how we can model and analyze applications for these systems in order to verify system properties such as lack of deadline violations. Our approach is based on (1) formalizing the component operation scheduling using Colored Petri nets (CPN), (2) modeling the abstract temporal behavior of application components, and (3) integrating the business logic and the component operation scheduling models into a concrete CPN, which is then analyzed. This model-driven approach enables a verication-driven workow wherein the application model can be rened and restructured before actual code development.
T. Levendovszky, A. Dubey, W. Otte, D. Balasubramanian, A. Coglio, S. Nyako, W. Emfinger, P. S. Kumar, A. S. Gokhale, and G. Karsai, Distributed Real-Time Managed Systems: A Model-Driven Distributed Secure Information Architecture Platform for Managed Embedded Systems, IEEE Software, vol. 31, no. 2, pp. 62–69, 2014.
```
@article{Levendovszky2014,
  author = {Levendovszky, Tihamer and Dubey, Abhishek and Otte, William and Balasubramanian, Daniel and Coglio, Alessandro and Nyako, Sandor and Emfinger, William and Kumar, Pranav Srinivas and Gokhale, Aniruddha S. and Karsai, Gabor},
  journal = {{IEEE} Software},
  title = {Distributed Real-Time Managed Systems: {A} Model-Driven Distributed Secure Information Architecture Platform for Managed Embedded Systems},
  year = {2014},
  number = {2},
  pages = {62--69},
  volume = {31},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/journals/software/LevendovszkyDOBCNEKGK14},
  contribution = {colab},
  doi = {10.1109/MS.2013.143},
  file = {:Levendovszky2014-Distributed_Real_Time_Managed_Systems.pdf:PDF},
  keywords = {middleware},
  project = {cps-middleware,cps-reliability},
  tag = {platform},
  timestamp = {Thu, 18 May 2017 01:00:00 +0200},
  url = {https://doi.org/10.1109/MS.2013.143}
}
```
Architecting software for a cloud computing platform built from mobile embedded devices incurs many challenges that aren’t present in traditional cloud computing. Both effectively managing constrained resources and isolating applications without adverse performance effects are needed. A practical design- and runtime solution incorporates modern software development practices and technologies along with novel approaches to address these challenges. The patterns and principles manifested in this system can potentially serve as guidelines for current and future practitioners in this field.
W. R. Otte, A. Dubey, and G. Karsai, A resilient and secure software platform and architecture for distributed spacecraft, in Sensors and Systems for Space Applications VII, 2014, vol. 9085, pp. 121–130.
```
@inproceedings{Otte2014,
  author = {Otte, William R. and Dubey, Abhishek and Karsai, Gabor},
  booktitle = {Sensors and Systems for Space Applications VII},
  title = {{A resilient and secure software platform and architecture for distributed spacecraft}},
  year = {2014},
  editor = {Pham, Khanh D. and Cox, Joseph L.},
  organization = {International Society for Optics and Photonics},
  pages = {121 -- 130},
  publisher = {SPIE},
  volume = {9085},
  category = {conference},
  contribution = {lead},
  doi = {10.1117/12.2054055},
  file = {:Otte2014-A_resilient_and_secure_software_platform_and_architecture_for_distributed_spacecraft.pdf:PDF},
  keywords = {middleware},
  tag = {platform},
  url = {https://doi.org/10.1117/12.2054055}
}
```
A distributed spacecraft is a cluster of independent satellite modules flying in formation that communicate via ad-hoc wireless networks. This system in space is a cloud platform that facilitates sharing sensors and other computing and communication resources across multiple applications, potentially developed and maintained by different organizations. Effectively, such architecture can realize the functions of monolithic satellites at a reduced cost and with improved adaptivity and robustness. Openness of these architectures pose special challenges because the distributed software platform has to support applications from different security domains and organizations, and where information flows have to be carefully managed and compartmentalized. If the platform is used as a robust shared resource its management, configuration, and resilience becomes a challenge in itself. We have designed and prototyped a distributed software platform for such architectures. The core element of the platform is a new operating system whose services were designed to restrict access to the network and the file system, and to enforce resource management constraints for all non-privileged processes Mixed-criticality applications operating at different security labels are deployed and controlled by a privileged management process that is also pre-configuring all information flows. This paper describes the design and objective of this layer.
S. Pradhan, W. Otte, A. Dubey, A. Gokhale, and G. Karsai, Key Considerations for a Resilient and Autonomous Deployment and Configuration Infrastructure for Cyber-Physical Systems, in Proceedings of the 11th IEEE International Conference and Workshops on the Engineering of Autonomic and Autonomous Systems (EASe’14), 2014.
```
@inproceedings{Pradhan2014a,
  author = {Pradhan, Subhav and Otte, William and Dubey, Abhishek and Gokhale, Aniruddha and Karsai, Gabor},
  booktitle = {Proceedings of the 11th IEEE International Conference and Workshops on the Engineering of Autonomic and Autonomous Systems (EASe'14)},
  title = {Key Considerations for a Resilient and Autonomous Deployment and Configuration Infrastructure for Cyber-Physical Systems},
  year = {2014},
  organization = {Citeseer},
  category = {conference},
  contribution = {colab},
  file = {:Pradhan2014a-Key_Considerations_for_a_Resilient_and_Autonomous_Deployment_and_Configuration_Infrastructure_for_CPS.pdf:PDF},
  keywords = {middleware, reliability},
  tag = {platform}
}
```
Multi-module Cyber-Physical Systems (CPSs), such as satellite clusters, swarms of Unmanned Aerial Vehicles (UAV), and fleets of Unmanned Underwater Vehicles (UUV) are examples of managed distributed real-time systems where mission-critical applications, such as sensor fusion or coordinated flight control, are hosted. These systems are dynamic and reconfigurable, and provide a “CPS cluster-as-a-service” for mission-specific scientific applications that can benefit from the elasticity of the cluster membership and heterogeneity of the cluster members. The distributed and remote nature of these systems often necessitates the use of Deployment and Configuration (D&C) services to manage the lifecycle of software applications. Fluctuating resources, volatile cluster membership and changing environmental conditions require resilient D&C services. However, the dynamic nature of the system often precludes human intervention during the D&C activities, which motivates the need for a self-adaptive D&C infrastructure that supports autonomous resilience. Such an infrastructure must have the ability to adapt existing applications on-the-fly in order to provide application resilience and must itself be able to adapt to account for changes in the system as well as tolerate failures. This paper makes two contributions towards addressing these needed. First, we identify the key challenges in achieving such a self-adaptive D&C infrastructure. Second, we present our ideas on resolving these challenges and realizing a self-adaptive D&C infrastructure.
N. Mahadevan, A. Dubey, D. Balasubramanian, and G. Karsai, Deliberative, search-based mitigation strategies for model-based software health management, Innov. Syst. Softw. Eng., vol. 9, no. 4, pp. 293–318, Dec. 2013.
```
@article{Mahadevan2013,
  author = {Mahadevan, Nagabhushan and Dubey, Abhishek and Balasubramanian, Daniel and Karsai, Gabor},
  journal = {Innov. Syst. Softw. Eng.},
  title = {Deliberative, search-based mitigation strategies for model-based software health management},
  year = {2013},
  issn = {1614-5046},
  month = dec,
  number = {4},
  pages = {293–318},
  volume = {9},
  address = {Berlin, Heidelberg},
  contribution = {lead},
  doi = {10.1007/s11334-013-0215-x},
  issue_date = {December  2013},
  keywords = {Software health management, Deliberative reasoning, Component-based software development, ARINC-653},
  numpages = {26},
  project = {cps-reliability,cps-middleware},
  publisher = {Springer-Verlag},
  tag = {platform},
  url = {https://doi.org/10.1007/s11334-013-0215-x}
}
```
Rising software complexity in aerospace systems makes them very difficult to analyze and prepare for all possible fault scenarios at design time; therefore, classical run-time fault tolerance techniques such as self-checking pairs and triple modular redundancy are used. However, several recent incidents have made it clear that existing software fault tolerance techniques alone are not sufficient. To improve system dependability, simpler, yet formally specified and verified run-time monitoring, diagnosis, and fault mitigation capabilities are needed. Such architectures are already in use for managing the health of vehicles and systems. Software health management is the application of these techniques to software systems. In this paper, we briefly describe the software health management techniques and architecture developed by our research group. The foundation of the architecture is a real-time component framework (built upon ARINC-653 platform services) that defines a model of computation for software components. Dedicated architectural elements: the Component Level Health Manager (CLHM) and System Level Health Manager (SLHM) provide the health management services: anomaly detection, fault source isolation, and fault mitigation. The SLHM includes a diagnosis engine that (1) uses a Timed Failure Propagation Graph (TFPG) model derived from the component assembly model, (2) reasons about cascading fault effects in the system, and (3) isolates the fault source component(s). Thereafter, the appropriate system-level mitigation action is taken. The main focus of this article is the description of the fault mitigation architecture that uses goal-based deliberative reasoning to determine the best mitigation actions for recovering the system from the identified failure mode.
J. Shi, R. Amgai, S. Abdelwahed, A. Dubey, J. Humphreys, M. Alattar, and R. Jia, Generic modeling and analysis framework for shipboard system design, in 2013 IEEE Electric Ship Technologies Symposium (ESTS), 2013, pp. 420–428.
```
@inproceedings{Shi2013,
  author = {{Shi}, J. and {Amgai}, R. and {Abdelwahed}, S. and Dubey, Abhishek and {Humphreys}, J. and {Alattar}, M. and {Jia}, R.},
  booktitle = {2013 IEEE Electric Ship Technologies Symposium (ESTS)},
  title = {Generic modeling and analysis framework for shipboard system design},
  year = {2013},
  month = apr,
  pages = {420-428},
  category = {workshop},
  contribution = {minor},
  doi = {10.1109/ESTS.2013.6523770},
  file = {:Shi2013-Generic_modeling_and_analysis_framework_for_shipboard_system_design.pdf:PDF},
  issn = {null},
  keywords = {middleware},
  tag = {platform,power}
}
```
This paper proposes a novel modeling and simulation environment for ship design based on the principles of Model Integrated Computing (MIC). The proposed approach facilitates the design and analysis of shipboard power systems and similar systems that integrate components from different fields of expertise. The conventional simulation platforms such as Matlab&#0174, Simulink&#0174, PSCAD&#0174 and VTB&#0174 require the designers to have explicit knowledge of the syntactic and semantic information of the desired domain within the tools. This constraint, however, severely slows down the design and analysis process, and causes cross-domain or cross-platform operations remain error prone and expensive. Our approach focuses on the development of a modeling environment that provides generic support for a variety of application across different domains by capturing modeling concepts, composition principles and operation constraints. For the preliminary demonstration of the modeling concept, in this paper we limit the scope of design to cross-platform implementations of the proposed environment by developing an application model of a simplified shipboard power system and using Matlab engine and VTB solver separately to evaluate the performance with different respects. In the case studies a fault scenario is pre-specified and tested on the system model. The corresponding time domain bus voltage magnitude and angle profiles are generated via invoking external solver, displayed to users and then saved for future analysis.
N. Mahadevan, A. Dubey, D. Balasubramanian, and G. Karsai, Deliberative Reasoning in Software Health Management, Institute for Software Integrated Systems, Vanderbilt University, techreport ISIS-13-101, 2013.
```
@techreport{Mahadevan2013a,
  author = {Mahadevan, Nagabhushan and Dubey, Abhishek and Balasubramanian, Daniel and Karsai, Gabor},
  institution = {Institute for Software Integrated Systems, Vanderbilt University},
  title = {Deliberative Reasoning in Software Health Management},
  year = {2013},
  month = {04/2013},
  number = {ISIS-13-101},
  type = {techreport},
  attachments = {http://www.isis.vanderbilt.edu/sites/default/files/TechReport2013.pdf},
  contribution = {lead},
  file = {:Mahadevan2013a-Deliberative_reasoning_in_software_health_management.pdf:PDF},
  issn = {ISIS-13-101},
  keywords = {performance, reliability},
  tag = {platform}
}
```
Rising software complexity in aerospace systems makes them very dicult to analyze and prepare for all possible fault scenarios at design-time. Therefore, classical run-time fault-tolerance techniques, such as self-checking pairs and triple modular redundancy are used. However, several recent incidents have made it clear that existing software fault tolerance techniques alone are not sucient. To improve system dependability, simpler, yet formally specied and veried run-time monitoring, diagnosis, and fault mitigation are needed. Such architectures are already in use for managing the health of vehicles and systems. Software health management is the application of adapting and applying these techniques to software. In this paper, we briey describe the software health management technique and architecture developed by our research group. The foundation of the architecture is a real-time component framework (built upon ARINC-653 platform services) that denes a model of computation for software components. Dedicated architectural elements: the Component Level Health Manager (CLHM) and System Level Health Manager (SLHM) are providing health management services: anomaly detection, fault source isolation, and fault mitigation. The SLHM includes a diagnosis engine that uses a Timed Failure Propagation (TFPG) model derived from the component assembly model, and it reasons about cascading fault eects in the system and isolates the fault source component(s). Thereafter, the appropriate system level mitigation action is taken. The main focus of this article is the description of the fault mitigation architecture that uses goal-based deliberative reasoning to determine the best mitigation actions for recovering the system from the identied failure mode.
A. Dubey, G. Karsai, and N. Mahadevan, Fault-Adaptivity in Hard Real-Time Component-Based Software Systems, in Software Engineering for Self-Adaptive Systems II: International Seminar, Dagstuhl Castle, Germany, October 24-29, 2010 Revised Selected and Invited Papers, R. de Lemos, H. Giese, H. A. Müller, and M. Shaw, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013, pp. 294–323.
```
@inbook{Dubey2010,
  author = {Dubey, Abhishek and Karsai, Gabor and Mahadevan, Nagabhushan},
  editor = {de Lemos, Rog{\'e}rio and Giese, Holger and M{\"u}ller, Hausi A. and Shaw, Mary},
  pages = {294--323},
  publisher = {Springer Berlin Heidelberg},
  title = {Fault-Adaptivity in Hard Real-Time Component-Based Software Systems},
  year = {2013},
  address = {Berlin, Heidelberg},
  isbn = {978-3-642-35813-5},
  booktitle = {Software Engineering for Self-Adaptive Systems II: International Seminar, Dagstuhl Castle, Germany, October 24-29, 2010 Revised Selected and Invited Papers},
  contribution = {lead},
  doi = {10.1007/978-3-642-35813-5_12},
  file = {:Dubey2010-Fault-Adaptivity_in_Hard_Real-Time_Component-Based_Software_Systems.pdf:PDF},
  keywords = {reliability},
  project = {cps-middleware,cps-reliability},
  tag = {platform},
  url = {https://doi.org/10.1007/978-3-642-35813-5_12}
}
```
Complexity in embedded software systems has reached the point where we need run-time mechanisms that provide fault management services. Testing and verification may not cover all possible scenarios that a system encounters, hence a simpler, yet formally specified run-time monitoring, diagnosis, and fault mitigation architecture is needed to increase the software system’s dependability. The approach described in this paper borrows concepts and principles from the field of ‘Systems Health Management’ for complex aerospace systems and implements a novel two level health management architecture that can be applied in the context of a model-based software development process.

A. Dubey and G. Karsai, Software health management, Innovations in System and Software Engineering, vol. 9, no. 4, p. 217, 2013.

@article{Dubey2013,
  author = {Dubey, Abhishek and Karsai, Gabor},
  journal = {{Innovations in System and Software Engineering}},
  title = {Software health management},
  year = {2013},
  number = {4},
  pages = {217},
  volume = {9},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/journals/isse/DubeyK13},
  contribution = {lead},
  doi = {10.1007/s11334-013-0226-7},
  file = {:Dubey2013-Software_Health_Management.pdf:PDF},
  keywords = {reliability},
  project = {cps-reliability,cps-middleware},
  tag = {platform},
  timestamp = {Tue, 26 Jun 2018 01:00:00 +0200}
}

A. Dubey, G. Karsai, N. Mahadevan, A. Srivastava, C. C. Liu, and S. Lukic, Understanding Failure Dynamics in the Smart Electric Grid, in NSF Energy Cyber Physical System Workshop, Washington DC, 2013.

@inproceedings{Dubey2013a,
  author = {Dubey, A and Karsai, G and Mahadevan, N and Srivastava, A and Liu, CC and Lukic, S},
  booktitle = {NSF Energy Cyber Physical System Workshop, Washington DC},
  title = {Understanding Failure Dynamics in the Smart Electric Grid},
  year = {2013},
  category = {workshop},
  contribution = {lead},
  file = {:Dubey2013a-Understanding_failture_dynamics_in_the_smart_electric_grid.pdf:PDF},
  keywords = {smartgrid},
  tag = {platform}
}

W. Emfinger, P. Kumar, A. Dubey, W. Otte, A. Gokhale, and G. Karsai, Drems: A toolchain and platform for the rapid application development, integration, and deployment of managed distributed real-time embedded systems, in IEEE Real-time Systems Symposium, 2013.

@inproceedings{Emfinger2013,
  author = {Emfinger, William and Kumar, Pranav and Dubey, Abhishek and Otte, William and Gokhale, Aniruddha and Karsai, Gabor},
  booktitle = {IEEE Real-time Systems Symposium},
  title = {Drems: A toolchain and platform for the rapid application development, integration, and deployment of managed distributed real-time embedded systems},
  year = {2013},
  category = {poster},
  contribution = {lead},
  file = {:Emfinger2013-DREMS_A_toolchain_and_platform_for_rapid.pdf:PDF},
  keywords = {middleware},
  tag = {platform}
}

W. Otte, A. Dubey, S. Pradhan, P. Patil, A. S. Gokhale, G. Karsai, and J. Willemsen, F6COM: A component model for resource-constrained and dynamic space-based computing environments, in 16th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, ISORC 2013, Paderborn, Germany, June 19-21, 2013, 2013, pp. 1–8.
```
@inproceedings{Otte2013,
  author = {Otte, William and Dubey, Abhishek and Pradhan, Subhav and Patil, Prithviraj and Gokhale, Aniruddha S. and Karsai, Gabor and Willemsen, Johnny},
  booktitle = {16th {IEEE} International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, {ISORC} 2013, Paderborn, Germany, June 19-21, 2013},
  title = {{F6COM:} {A} component model for resource-constrained and dynamic space-based computing environments},
  year = {2013},
  pages = {1--8},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/isorc/OtteDPPGKW13},
  category = {selectiveconference},
  contribution = {lead},
  doi = {10.1109/ISORC.2013.6913199},
  file = {:Otte2013-F6COM_A_Component_Model.pdf:PDF},
  keywords = {middleware},
  project = {cps-reliability,cps-middleware},
  tag = {platform},
  timestamp = {Wed, 16 Oct 2019 14:14:53 +0200},
  url = {https://doi.org/10.1109/ISORC.2013.6913199}
}
```
Component-based programming models are well-suited to the design of large-scale, distributed applications because of the ease with which distributed functionality can be developed, deployed, and validated using the models’ compositional properties. Existing component models supported by standardized technologies, such as the OMG’s CORBA Component Model (CCM), however, incur a number of limitations in the context of cyber physical systems (CPS) that operate in highly dynamic, resource-constrained, and uncertain environments, such as space environments, yet require multiple quality of service (QoS) assurances, such as timeliness, reliability, and security. To overcome these limitations, this paper presents the design of a novel component model called F6COM that is developed for applications operating in the context of a cluster of fractionated spacecraft. Although F6COM leverages the compositional capabilities and port abstractions of existing component models, it provides several new features. Specifically, F6COM abstracts the component operations as tasks, which are scheduled sequentially based on a specified scheduling policy. The infrastructure ensures that at any time at most one task of a component can be active - eliminating race conditions and deadlocks without requiring complicated and error-prone synchronization logic to be written by the component developer. These tasks can be initiated due to (a) interactions with other components, (b) expiration of timers, both sporadic and periodic, and (c) interactions with input/output devices. Interactions with other components are facilitated by ports. To ensure secure information flows, every port of an F6COM component is associated with a security label such that all interactions are executed within a security context. Thus, all component interactions can be subjected to Mandatory Access Control checks by a Trusted Computing Base that facilitates the interactions. Finally, F6COM provides capabilities to monitor task execution deadlines and to configure component-specific fault mitigation actions.
S. Pradhan, W. Otte, A. Dubey, A. S. Gokhale, and G. Karsai, Towards a resilient deployment and configuration infrastructure for fractionated spacecraft, SIGBED Review, vol. 10, no. 4, pp. 29–32, 2013.
```
@article{Pradhan2013,
  author = {Pradhan, Subhav and Otte, William and Dubey, Abhishek and Gokhale, Aniruddha S. and Karsai, Gabor},
  journal = {{SIGBED} Review},
  title = {Towards a resilient deployment and configuration infrastructure for fractionated spacecraft},
  year = {2013},
  number = {4},
  pages = {29--32},
  volume = {10},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/journals/sigbed/PradhanODGK13},
  contribution = {lead},
  doi = {10.1145/2583687.2583694},
  file = {:Pradhan2013-Towards_a_resilient_deployment_and_configuration_infrastructure_for_fractionated_spacecraft.pdf:PDF},
  keywords = {reliability},
  project = {cps-reliability,cps-middleware},
  tag = {platform},
  timestamp = {Tue, 06 Nov 2018 00:00:00 +0100},
  url = {https://doi.org/10.1145/2583687.2583694}
}
```
Fractionated spacecraft are clusters of small, independent modules that interact wirelessly to realize the functionality of a traditional monolithic spacecraft. System F6 (F6 stands for Future, Fast, Flexible, Fractionated, Free-Flying spacecraft) is a DARPA program for fractionated spacecraft. Software applications in F6 are implemented in the context of the F6 Information Architecture Platform (IAP), which provides component-based abstractions for composing distributed applications. The lifecycle of these distributed applications must be managed autonomously by a deployment and configuration (D&C) infrastructure, which can redeploy and reconfigure the running applications in response to faults and other anomalies that may occur during system operation. Addressing these D&C requirements is hard due to the significant fluctuation in resource availabilities, constraints on resources, and safety and security concerns. This paper presents the key architectural ideas that are required in realizing such a D&C infrastructure.
A. Dubey, W. Emfinger, A. Gokhale, G. Karsai, W. R. Otte, J. Parsons, C. Szabo, A. Coglio, E. Smith, and P. Bose, A software platform for fractionated spacecraft, in 2012 IEEE Aerospace Conference, 2012, pp. 1–20.
```
@inproceedings{Dubey2012,
  author = {Dubey, Abhishek and {Emfinger}, W. and {Gokhale}, A. and {Karsai}, G. and {Otte}, W. R. and {Parsons}, J. and {Szabo}, C. and {Coglio}, A. and {Smith}, E. and {Bose}, P.},
  booktitle = {2012 IEEE Aerospace Conference},
  title = {A software platform for fractionated spacecraft},
  year = {2012},
  month = mar,
  pages = {1-20},
  category = {conference},
  contribution = {lead},
  doi = {10.1109/AERO.2012.6187334},
  file = {:Dubey2012-A_software_platform_for_fractionated_spacecraft.pdf:PDF},
  issn = {1095-323X},
  keywords = {middleware},
  tag = {platform}
}
```
A fractionated spacecraft is a cluster of independent modules that interact wirelessly to maintain cluster flight and realize the functions usually performed by a monolithic satellite. This spacecraft architecture poses novel software challenges because the hardware platform is inherently distributed, with highly fluctuating connectivity among the modules. It is critical for mission success to support autonomous fault management and to satisfy real-time performance requirements. It is also both critical and challenging to support multiple organizations and users whose diverse software applications have changing demands for computational and communication resources, while operating on different levels and in separate domains of security. The solution proposed in this paper is based on a layered architecture consisting of a novel operating system, a middleware layer, and component-structured applications. The operating system provides primitives for concurrency, synchronization, and secure information flows; it also enforces application separation and resource management policies. The middleware provides higher-level services supporting request/response and publish/subscribe interactions for distributed software. The component model facilitates the creation of software applications from modular and reusable components that are deployed in the distributed system and interact only through well-defined mechanisms. Two cross-cutting aspects - multi-level security and multi-layered fault management - are addressed at all levels of the architecture. The complexity of creating applications and performing system integration is mitigated through the use of a domain-specific model-driven development process that relies on a dedicated modeling language and its accompanying graphical modeling tools, software generators for synthesizing infrastructure code, and the extensive use of model-based analysis for verification and validation.
A. Dubey, G. Karsai, and N. Mahadevan, Formalization of a Component Model for Real-time Systems, Institute for Software Integrated Systems, Vanderbilt University, ISIS-12-102, 2012.
```
@techreport{Dubey2012b,
  author = {Dubey, Abhishek and Karsai, Gabor and Mahadevan, Nagabhushan},
  institution = {Institute for Software Integrated Systems, Vanderbilt University},
  title = {Formalization of a Component Model for Real-time Systems},
  year = {2012},
  month = {04/2012},
  number = {ISIS-12-102},
  attachments = {http://www.isis.vanderbilt.edu/sites/default/files/ISIS-12-102-TechReport.pdf},
  contribution = {lead},
  file = {:Dubey2012b-Formalization_of_a_Component_Model_for_Real-time_Systems.pdf:PDF},
  issn = {ISIS-12-102},
  keywords = {middleware},
  tag = {platform}
}
```
Component-based software development for real-time systems necessitates a well-defined ‘component model’ that allows compositional analysis and reasoning about systems. Such a model defines what a component is, how it works, and how it interacts with other components. It is especially important for real-time systems to have such a component model, as many problems in these systems arise from poorly understood and analyzed component interactions. In this paper we describe a component model for hard real-time systems that relies on the services of an ARINC-653 compliant real-time operating system platform. The model provides high-level abstractions of component interactions, both for the synchronous and asynchronous case. We present a formalization of the component model in the form of timed transition traces. Such formalization is necessary to be able to derive interesting system level properties such as fault propagation graphs from models of component assemblies. We provide a brief discussion about such system level fault propagation templates for this component model.
A. Dubey, N. Mahadevan, and G. Karsai, The Inertial Measurement Unit Example: A Software Health Management Case Study, Insitute for Software Integrated Systems, Vanderbilt University, ISIS-12-101, 2012.
```
@techreport{Dubey2012c,
  author = {Dubey, Abhishek and Mahadevan, Nagabhushan and Karsai, Gabor},
  institution = {Insitute for Software Integrated Systems, Vanderbilt University},
  title = {The Inertial Measurement Unit Example: A Software Health Management Case Study},
  year = {2012},
  month = {02/2012},
  number = {ISIS-12-101},
  attachments = {http://www.isis.vanderbilt.edu/sites/default/files/TechReport_IMU.pdf},
  contribution = {lead},
  file = {:Dubey2012c-The_Inertial_Measurement_Unit_Example.pdf:PDF},
  issn = {ISIS-12-101},
  keywords = {reliability},
  tag = {platform}
}
```
This report captures in detail a Two-level Software Health Management strategy on a real-life example of an Inertial Measurement Unit subsystem. We describe in detail the design of the component and system level health management strategy. Results are expressed as relevant portions of the detailed logs that shows the successful adaptation of the monitor/ detect/ diagnose/ mitigate approach to Software Health Management.
A. Dabholkar, A. Dubey, A. S. Gokhale, G. Karsai, and N. Mahadevan, Reliable Distributed Real-Time and Embedded Systems through Safe Middleware Adaptation, in IEEE 31st Symposium on Reliable Distributed Systems, SRDS 2012, Irvine, CA, USA, October 8-11, 2012, 2012, pp. 362–371.
```
@inproceedings{Dabholkar2012,
  author = {Dabholkar, Akshay and Dubey, Abhishek and Gokhale, Aniruddha S. and Karsai, Gabor and Mahadevan, Nagabhushan},
  booktitle = {{IEEE} 31st Symposium on Reliable Distributed Systems, {SRDS} 2012, Irvine, CA, USA, October 8-11, 2012},
  title = {Reliable Distributed Real-Time and Embedded Systems through Safe Middleware Adaptation},
  year = {2012},
  pages = {362--371},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/srds/DabholkarDGKM12},
  category = {selectiveconference},
  contribution = {lead},
  acceptance = {25},
  doi = {10.1109/SRDS.2012.59},
  file = {:Dabholkar2012-Reliable_Distributed_Real-Time_and_Embedded_Systems_through_Safe_Middleware_Adaptation.pdf:PDF},
  keywords = {middleware, reliability},
  project = {cps-reliability,cps-middleware},
  tag = {platform},
  timestamp = {Wed, 16 Oct 2019 14:14:49 +0200},
  url = {https://doi.org/10.1109/SRDS.2012.59}
}
```
Distributed real-time and embedded (DRE) systems are a class of real-time systems formed through a composition of predominantly legacy, closed and statically scheduled real-time subsystems, which comprise over-provisioned resources to deal with worst-case failure scenarios. The formation of the system-of-systems leads to a new range of faults that manifest at different granularities for which no statically defined fault tolerance scheme applies. Thus, dynamic and adaptive fault tolerance mechanisms are needed which must execute within the available resources without compromising the safety and timeliness of existing real-time tasks in the individual subsystems. To address these requirements, this paper describes a middleware solution called Safe Middleware Adaptation for Real-Time Fault Tolerance (SafeMAT), which opportunistically leverages the available slack in the over-provisioned resources of individual subsystems. SafeMAT comprises three primary artifacts: (1) a flexible and configurable distributed, runtime resource monitoring framework that can pinpoint in real-time the available slack in the system that is used in making dynamic and adaptive fault tolerance decisions, (2) a safe and resource aware dynamic failure adaptation algorithm that enables efficient recovery from different granularities of failures within the available slack in the execution schedule while ensuring real-time constraints are not violated and resources are not overloaded, and (3) a framework that empirically validates the correctness of the dynamic mechanisms and the safety of the DRE system. Experimental results evaluating SafeMAT on an avionics application indicates that SafeMAT incurs only 9-15% runtime fail over and 2-6% processor utilization overheads thereby providing safe and predictable failure adaptability in real-time.
A. Dubey, N. Mahadevan, and G. Karsai, A deliberative reasoner for model-based software health management, in The Eighth International Conference on Autonomic and Autonomous Systems, 2012, pp. 86–92.
```
@inproceedings{Dubey2012a,
  author = {Dubey, Abhishek and Mahadevan, Nagabhushan and Karsai, Gabor},
  booktitle = {The Eighth International Conference on Autonomic and Autonomous Systems},
  title = {A deliberative reasoner for model-based software health management},
  year = {2012},
  pages = {86--92},
  category = {selectiveconference},
  contribution = {lead},
  note = {Best Paper Award},
  acceptance = {23},
  file = {:Dubey2012a-A_Deliberative_Reasoner_for_Model-Based_Software_Health_Management.pdf:PDF},
  keywords = {performance, reliability},
  tag = {platform}
}
```
While traditional design-time and off-line approaches to testing and verification contribute significantly to improving and ensuring high dependability of software, they may not cover all possible fault scenarios that a system could encounter at runtime. Thus, runtime health management of complex embedded software systems is needed to improve their dependability. Our approach to Software Health Management uses concepts from the field of Systems Health Management: detection, diagnosis and mitigation. In earlier work we had shown how to use a reactive mitigation strategy specified using a timed state machine model for system health manager. This paper describes the algorithm and key concepts for an alternative approach to system mitigation using a deliberative strategy, which relies on a function-allocation model to identify alternative component-assembly configurations that can restore the functions needed for the goals of the system.
N. Mahadevan, A. Dubey, and G. Karsai, Architecting Health Management into Software Component Assemblies: Lessons Learned from the ARINC-653 Component Mode, in 15th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, ISORC 2012, Shenzhen, China, April 11-13, 2012, 2012, pp. 79–86.
```
@inproceedings{Mahadevan2012,
  author = {Mahadevan, Nagabhushan and Dubey, Abhishek and Karsai, Gabor},
  booktitle = {15th {IEEE} International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, {ISORC} 2012, Shenzhen, China, April 11-13, 2012},
  title = {Architecting Health Management into Software Component Assemblies: Lessons Learned from the {ARINC-653} Component Mode},
  year = {2012},
  pages = {79--86},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/isorc/MahadevanDK12},
  category = {selectiveconference},
  contribution = {lead},
  doi = {10.1109/ISORC.2012.19},
  file = {:Mahadevan2012-Architecting_Health_Management_into_Software_Component_Assemblies.pdf:PDF},
  keywords = {performance, reliability},
  project = {cps-reliability,cps-middleware},
  tag = {platform},
  timestamp = {Wed, 16 Oct 2019 14:14:53 +0200},
  url = {https://doi.org/10.1109/ISORC.2012.19}
}
```
Complex real-time software systems require an active fault management capability. While testing, verification and validation schemes and their constant evolution help improve the dependability of these systems, an active fault management strategy is essential to potentially mitigate the unacceptable behaviors at run-time. In our work we have applied the experience gained from the field of Systems Health Management towards component-based software systems. The software components interact via well-defined concurrency patterns and are executed on a real-time component framework built upon ARINC-653 platform services. In this paper, we present the lessons learned in architecting and applying a two-level health management strategy to assemblies of software components.

R. Mehrotra, A. Dubey, S. Abdelwahed, and A. N. Tantawi, Power-Aware Modeling and Autonomic Management Framework for Distributed Computing Systems, in Handbook of Energy-Aware and Green Computing - Two Volume Set, CRC Press, 2012, pp. 621–648.

@inbook{Mehrotra2012,
  author = {Mehrotra, Rajat and Dubey, Abhishek and Abdelwahed, Sherif and Tantawi, Asser N.},
  pages = {621--648},
  publisher = {CRC Press},
  title = {Power-Aware Modeling and Autonomic Management Framework for Distributed Computing Systems},
  year = {2012},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/reference/crc/MehrotraDAT12},
  booktitle = {Handbook of Energy-Aware and Green Computing - Two Volume Set},
  contribution = {colab},
  file = {:Mehrotra2012-Power-Aware_Modeling_and_Autonomic_Management_Framework_for_Distributed_Computing_Systems.pdf:PDF},
  keywords = {performance},
  project = {cps-middleware},
  tag = {platform},
  timestamp = {Wed, 12 Jul 2017 01:00:00 +0200},
  url = {http://www.crcnetbase.com/doi/abs/10.1201/b16631-34}
}

R. Mehrotra, A. Dubey, S. Abdelwahed, and K. W. Rowland, RFDMon: A Real-time and Fault-tolerant Distributed System Monitoring Approach, in The 8th International Conference on Autonomic and Autonomous Systems ICAS 2012, 2012.
```
@inproceedings{Mehrotra2012a,
  author = {Mehrotra, Rajat and Dubey, Abhishek and Abdelwahed, Sherif and Rowland, Krisa W.},
  booktitle = {The 8th International Conference on Autonomic and Autonomous Systems {ICAS} 2012},
  title = {RFDMon: A Real-time and Fault-tolerant Distributed System Monitoring Approach},
  year = {2012},
  category = {selectiveconference},
  contribution = {lead},
  acceptance = {23},
  file = {:Mehrotra2012a-RFDMon_A_real-time_and_fault-tolerant_distributed_system_monitoring_approach.pdf:PDF},
  keywords = {performance},
  tag = {platform}
}
```
One of the main requirements for building an autonomic system is to have a robust monitoring framework. In this paper, a systematic distributed event based (DEB) system monitoring framework “RFDMon” is presented for measuring system variables (CPU utilization, memory utilization, disk utilization, network utilization, etc.), system health (temperature and voltage of Motherboard and CPU) application performance variables (application response time, queue size, and throughput), and scientific application data structures (PBS information and MPI variables) accurately with minimum latency at a specified rate and with controllable resource utilization. This framework is designed to be tolerant to faults in monitoring framework, self-configuring (can start and stop monitoring the nodes and configure monitors for threshold values/changes for publishing the measurements), aware of execution of the framework on multiple nodes through HEARTBEAT messages, extensive (monitors multiple parameters through periodic and aperiodic sensors), resource constrainable (computational resources can be limited for monitors), and expandable for adding extra monitors on the fly. Since RFDMon uses a Data Distribution Services (DDS) middleware, it can be used for deploying in systems with heterogeneous nodes. Additionally, it provides a functionality to limit the maximum cap on resources consumed by monitoring processes such that it reduces the effect on the availability of resources for the applications.
R. Mehrotra, A. Dubey, S. Abdelwahed, and W. Monceaux, Large Scale Monitoring and Online Analysis in a Distributed Virtualized Environment, in 2011 Eighth IEEE International Conference and Workshops on Engineering of Autonomic and Autonomous Systems, 2011, pp. 1–9.
```
@inproceedings{Mehrotra2011,
  author = {{Mehrotra}, R. and Dubey, Abhishek and {Abdelwahed}, S. and {Monceaux}, W.},
  booktitle = {2011 Eighth IEEE International Conference and Workshops on Engineering of Autonomic and Autonomous Systems},
  title = {Large Scale Monitoring and Online Analysis in a Distributed Virtualized Environment},
  year = {2011},
  month = apr,
  pages = {1-9},
  category = {conference},
  contribution = {colab},
  doi = {10.1109/EASe.2011.17},
  file = {:Mehrotra2011-Large_Scale_Monitoring_and_Online_Analysis_in_a_Distributed_Virtualized_Environment.pdf:PDF},
  issn = {2168-1872},
  keywords = {performance},
  tag = {platform}
}
```
Due to increase in number and complexity of the large scale systems, performance monitoring and multidimensional quality of service (QoS) management has become a difficult and error prone task for system administrators. Recently, the trend has been to use virtualization technology, which facilitates hosting of multiple distributed systems with minimum infrastructure cost via sharing of computational and memory resources among multiple instances, and allows dynamic creation of even bigger clusters. An effective monitoring technique should not only be fine grained with respect to the measured variables, but also should be able to provide a high level overview of the distributed systems to the administrator of all variables that can affect the QoS requirements. At the same time, the technique should not add performance burden to the system. Finally, it should be integrated with a control methodology that manages performance of the enterprise system. In this paper, a systematic distributed event based (DEB) performance monitoring approach is presented for distributed systems by measuring system variables (physical/virtual CPU utilization and memory utilization), application variables (application queue size, queue waiting time, and service time), and performance variables (response time, throughput, and power consumption) accurately with minimum latency at a specified rate. Furthermore, we have shown that proposed monitoring approach can be utilized to provide input to an application monitoring utility to understand the underlying performance model of the system for a successful on-line control of the distributed systems for achieving predefined QoS parameters.
A. Dubey, G. Karsai, and N. Mahadevan, Model-based software health management for real-time systems, in 2011 Aerospace Conference, 2011, pp. 1–18.
```
@inproceedings{Dubey2011a,
  author = {Dubey, Abhishek and {Karsai}, G. and {Mahadevan}, N.},
  booktitle = {2011 Aerospace Conference},
  title = {Model-based software health management for real-time systems},
  year = {2011},
  month = mar,
  pages = {1-18},
  category = {conference},
  contribution = {lead},
  doi = {10.1109/AERO.2011.5747559},
  file = {:Dubey2011a-Model-based_software_health_management_for_real-time_systems.pdf:PDF},
  issn = {1095-323X},
  keywords = {performance, reliability},
  tag = {platform}
}
```
Complexity of software systems has reached the point where we need run-time mechanisms that can be used to provide fault management services. Testing and verification may not cover all possible scenarios that a system will encounter, hence a simpler, yet formally specified run-time monitoring, diagnosis, and fault mitigation architecture is needed to increase the software system’s dependability. The approach described in this paper borrows concepts and principles from the field of “Systems Health Management” for complex systems and implements a two level health management strategy that can be applied through a model-based software development process. The Component-level Health Manager (CLHM) for software components provides a localized and limited functionality for managing the health of a component locally. It also reports to the higher-level System Health Manager (SHM) which manages the health of the overall system. SHM consists of a diagnosis engine that uses the timed fault propagation (TFPG) model based on the component assembly. It reasons about the anomalies reported by CLHM and hypothesizes about the possible fault sources. Thereafter, necessary system level mitigation action can be taken. System-level mitigation approaches are subject of on-going investigations and have not been included in this paper. We conclude the paper with case study and discussion.

R. Mehrotra, A. Dubey, J. Kwalkowski, M. Paterno, A. Singh, R. Herber, and S. Abdelwahed, RFDMon: A Real-Time and Fault-Tolerant Distributed System Monitoring Approach, Vanderbilt University, Nashville, 2011.

@techreport{4477,
  author = {Mehrotra, Rajat and Dubey, Abhishek and Kwalkowski, Jim and Paterno, Marc and Singh, Amitoj and Herber, Randolph and Abdelwahed, Sherif},
  institution = {Vanderbilt University},
  title = {RFDMon: A Real-Time and Fault-Tolerant Distributed System Monitoring Approach},
  year = {2011},
  address = {Nashville},
  month = {10/2011},
  attachments = {http://www.isis.vanderbilt.edu/sites/default/files/SensorReport_Paper.pdf},
  contribution = {colab},
  issn = {ISIS-11-107},
  keywords = {performance},
  tag = {platform}
}

N. Mahadevan, A. Dubey, and G. Karsai, A Case Study On The Application of Software Health Management Techniques, Institute For Software Integrated Systems, Vanderbilt University, Nashville, ISIS-11-101, 2011.
```
@techreport{Mahadevan2011a,
  author = {Mahadevan, Nagabhushan and Dubey, Abhishek and Karsai, Gabor},
  institution = {Institute For Software Integrated Systems, Vanderbilt University},
  title = {A Case Study On The Application of Software Health Management Techniques},
  year = {2011},
  address = {Nashville},
  month = {01/2011},
  number = {ISIS-11-101},
  attachments = {http://www.isis.vanderbilt.edu/sites/default/files/ADIRUTechReport.pdf},
  contribution = {colab},
  file = {:Mahadevan2011a-A_case_study_on_the_application_of_software_health_management_techniques.pdf:PDF},
  tag = {platform}
}
```
Ever increasing complexity of software used in large-scale, safety critical cyber-physical systems makes it increasingly difficult to expose and thence correct all potential bugs. There is a need to augment the existing fault tolerance methodologies with new approaches that address latent software bugs exposed at runtime. This paper describes an approach that borrows and adapts traditional ‘Systems Health Management’ techniques to improve software dependability through simple formal specification of runtime monitoring, diagnosis and mitigation strategies. The two-level approach of Health Management at Component and System level is demonstrated on a simulated case study of an Air Data Inertial Reference Unit (ADIRU). That subsystem was categorized as the primary failure source for the in-flight upset caused in the Malaysian Air flight 124 over Perth, Australia in August 2005.
S. Abdelwahed, A. Dubey, G. Karsai, and N. Mahadevan, Model-based Tools and Techniques for Real-Time System and Software Health Management, in Machine Learning and Knowledge Discovery for Engineering Systems Health Management, CRC Press, 2011, p. 285.
```
@inbook{Abdelwahed2011,
  author = {Abdelwahed, Sherif and Dubey, Abhishek and Karsai, Gabor and Mahadevan, Nagabhushan},
  chapter = {Chapter 9},
  pages = {285},
  publisher = {CRC Press},
  title = {Model-based Tools and Techniques for Real-Time System and Software Health Management},
  year = {2011},
  booktitle = {Machine Learning and Knowledge Discovery for Engineering Systems Health Management},
  contribution = {colab},
  doi = {10.1201/b11580-15},
  keywords = {performance, reliability},
  organization = {CRC Press},
  tag = {platform},
  url = {https://doi.org/10.1201/b11580}
}
```
The ultimate challenge in system health management is the theory for and application of the technology to systems, for instance to an entire vehicle. The main problem the designer faces is complexity; simply the sheer size of the system, the number of data points, anomalies, and failure modes can be overwhelming. Furthermore, systems are heterogeneous and one has to have a systems engineer’s view to understand interactions among systems. Yet, system-level health management is crucial, as faults increasingly arise from system-level effects and interactions. While individual subsystems tend to have built-in redundancy or local anomaly detection, fault management, and prognostics features, the system integrators are 287required to provide the same capabilities for the entire vehicle, across different engineering subsystems and areas.
A. Dubey, G. Karsai, and N. Mahadevan, A component model for hard real-time systems: CCM with ARINC-653, Softw., Pract. Exper., vol. 41, no. 12, pp. 1517–1550, 2011.
```
@article{Dubey2011,
  author = {Dubey, Abhishek and Karsai, Gabor and Mahadevan, Nagabhushan},
  journal = {Softw., Pract. Exper.},
  title = {A component model for hard real-time systems: {CCM} with {ARINC-653}},
  year = {2011},
  number = {12},
  pages = {1517--1550},
  volume = {41},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/journals/spe/DubeyKM11},
  contribution = {lead},
  doi = {10.1002/spe.1083},
  file = {:Dubey2011-A_component_model_for_hard_real-time_systems_CCM_with_ARINC-653.pdf:PDF},
  keywords = {middleware},
  project = {cps-reliability,cps-middleware},
  tag = {platform},
  timestamp = {Sun, 28 May 2017 01:00:00 +0200},
  url = {https://doi.org/10.1002/spe.1083}
}
```
Size and complexity of software in safety critical system is increasing at a rapid pace. One technology that can be used to mitigate this complexity is component-based software development. However, in spite of the apparent benefits of a component-based approach to development, little work has been done in applying these concepts to hard real time systems. This paper improves the state of the art by making three contributions: (1) we present a component model for hard real time systems and define the semantics of different types of component interactions; (2) we present an implementation of a middleware that supports this component model. This middleware combines an open source CORBA Component Model (CCM) implementation (MICO) with ARINC-653: a state of the art RTOS standard, (3) finally; we describe a modeling environment that enables design, analysis, and deployment of component assemblies. We conclude with a discussion of lessons learned during this exercise. Our experiences point towards extending both the CCM as well as revising the ARINC-653.
N. Mahadevan, A. Dubey, and G. Karsai, Application of software health management techniques, in 2011 ICSE Symposium on Software Engineering for Adaptive and Self-Managing Systems, SEAMS 2011, Waikiki, Honolulu , HI, USA, May 23-24, 2011, 2011, pp. 1–10.
```
@inproceedings{Mahadevan2011,
  author = {Mahadevan, Nagabhushan and Dubey, Abhishek and Karsai, Gabor},
  booktitle = {2011 {ICSE} Symposium on Software Engineering for Adaptive and Self-Managing Systems, {SEAMS} 2011, Waikiki, Honolulu , HI, USA, May 23-24, 2011},
  title = {Application of software health management techniques},
  year = {2011},
  acceptance = {27},
  pages = {1--10},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/icse/MahadevanDK11},
  category = {selectiveconference},
  contribution = {colab},
  doi = {10.1145/1988008.1988010},
  file = {:Mahadevan2011-Application_of_software_health_management_techniques.pdf:PDF},
  keywords = {performance, reliability},
  project = {cps-middleware,cps-reliability},
  tag = {platform},
  timestamp = {Tue, 06 Nov 2018 00:00:00 +0100},
  url = {https://doi.org/10.1145/1988008.1988010}
}
```
The growing complexity of software used in large-scale, safety critical cyber-physical systems makes it increasingly difficult to expose and hence correct all potential defects. There is a need to augment the existing fault tolerance methodologies with new approaches that address latent software defects exposed at runtime. This paper describes an approach that borrows and adapts traditional ‘System Health Management’ techniques to improve software dependability through simple formal specification of runtime monitoring, diagnosis, and mitigation strategies. The two-level approach to health management at the component and system level is demonstrated on a simulated case study of an Air Data Inertial Reference Unit (ADIRU). An ADIRU was categorized as the primary failure source for the in-flight upset caused in the Malaysian Air flight 124 over Perth, Australia in 2005.

S. Nordstrom, A. Dubey, T. Keskinpala, S. Neema, and T. Bapty, Autonomic Healing of Model-Based Systems, Journal of Aerospace Computing, Information, and Communication, vol. 8, no. 4, pp. 87–99, 2011.

@article{Nordstrom2011,
  author = {Nordstrom, Steven and Dubey, Abhishek and Keskinpala, Turker and Neema, Sandeep and Bapty, Theodore},
  journal = {{Journal of Aerospace Computing, Information, and Communication}},
  title = {Autonomic Healing of Model-Based Systems},
  year = {2011},
  number = {4},
  pages = {87--99},
  volume = {8},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/journals/jacic/NordstromDKNB11},
  contribution = {minor},
  doi = {10.2514/1.31940},
  keywords = {reliability},
  project = {cps-reliability},
  tag = {platform},
  timestamp = {Thu, 18 May 2017 01:00:00 +0200},
  url = {https://doi.org/10.2514/1.31940}
}

N. Roy, A. Dubey, and A. S. Gokhale, Efficient Autoscaling in the Cloud Using Predictive Models for Workload Forecasting, in IEEE International Conference on Cloud Computing, CLOUD 2011, Washington, DC, USA, 4-9 July, 2011, 2011, pp. 500–507.
```
@inproceedings{Roy2011a,
  author = {Roy, Nilabja and Dubey, Abhishek and Gokhale, Aniruddha S.},
  booktitle = {{IEEE} International Conference on Cloud Computing, {CLOUD} 2011, Washington, DC, USA, 4-9 July, 2011},
  title = {Efficient Autoscaling in the Cloud Using Predictive Models for Workload Forecasting},
  year = {2011},
  acceptance = {22.4},
  pages = {500--507},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/IEEEcloud/RoyDG11},
  category = {selectiveconference},
  contribution = {colab},
  doi = {10.1109/CLOUD.2011.42},
  file = {:Roy2011a-Efficient_Autoscaling_in_the_Cloud_Using_Predictive_Models_for_Workload_Forecasting.pdf:PDF},
  keywords = {performance},
  project = {cps-middleware},
  tag = {platform},
  timestamp = {Wed, 16 Oct 2019 14:14:54 +0200},
  url = {https://doi.org/10.1109/CLOUD.2011.42}
}
```
Large-scale component-based enterprise applications that leverage Cloud resources expect Quality of Service(QoS) guarantees in accordance with service level agreements between the customer and service providers. In the context of Cloud computing, auto scaling mechanisms hold the promise of assuring QoS properties to the applications while simultaneously making efficient use of resources and keeping operational costs low for the service providers. Despite the perceived advantages of auto scaling, realizing the full potential of auto scaling is hard due to multiple challenges stemming from the need to precisely estimate resource usage in the face of significant variability in client workload patterns. This paper makes three contributions to overcome the general lack of effective techniques for workload forecasting and optimal resource allocation. First, it discusses the challenges involved in auto scaling in the cloud. Second, it develops a model-predictive algorithm for workload forecasting that is used for resource auto scaling. Finally, empirical results are provided that demonstrate that resources can be allocated and deal located by our algorithm in a way that satisfies both the application QoS while keeping operational costs low.
N. Roy, A. Dubey, A. S. Gokhale, and L. W. Dowdy, A Capacity Planning Process for Performance Assurance of Component-based Distributed Systems, in ICPE’11 - Second Joint WOSP/SIPEW International Conference on Performance Engineering, Karlsruhe, Germany, March 14-16, 2011, 2011, pp. 259–270.
```
@inproceedings{Roy2011b,
  author = {Roy, Nilabja and Dubey, Abhishek and Gokhale, Aniruddha S. and Dowdy, Larry W.},
  booktitle = {ICPE'11 - Second Joint {WOSP/SIPEW} International Conference on Performance Engineering, Karlsruhe, Germany, March 14-16, 2011},
  title = {A Capacity Planning Process for Performance Assurance of Component-based Distributed Systems},
  year = {2011},
  pages = {259--270},
  acceptance = {36},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/wosp/RoyDGD11},
  category = {selectiveconference},
  contribution = {colab},
  doi = {10.1145/1958746.1958784},
  file = {:Roy2011b-A_Capacity_Planning_Process_for_Performance_Assurance_of_Component-based_Distributed_Systems.pdf:PDF},
  keywords = {performance},
  project = {cps-middleware},
  tag = {platform},
  timestamp = {Tue, 06 Nov 2018 00:00:00 +0100},
  url = {https://doi.org/10.1145/1958746.1958784}
}
```
For service providers of multi-tiered component-based applications, such as web portals, assuring high performance and availability to their customers without impacting revenue requires effective and careful capacity planning that aims at minimizing the number of resources, and utilizing them efficiently while simultaneously supporting a large customer base and meeting their service level agreements. This paper presents a novel, hybrid capacity planning process that results from a systematic blending of 1) analytical modeling, where traditional modeling techniques are enhanced to overcome their limitations in providing accurate performance estimates; 2) profile-based techniques, which determine performance profiles of individual software components for use in resource allocation and balancing resource usage; and 3) allocation heuristics that determine minimum number of resources to allocate software components. Our results illustrate that using our technique, performance (i.e., bounded response time) can be assured while reducing operating costs by using 25% less resources and increasing revenues by handling 20% more clients compared to traditional approaches.
J. Balasubramanian, A. S. Gokhale, A. Dubey, F. Wolf, C. Lu, C. D. Gill, and D. C. Schmidt, Middleware for Resource-Aware Deployment and Configuration of Fault-Tolerant Real-time Systems, in 16th IEEE Real-Time and Embedded Technology and Applications Symposium, RTAS 2010, Stockholm, Sweden, April 12-15, 2010, 2010, pp. 69–78.
```
@inproceedings{Balasubramanian2010,
  author = {Balasubramanian, Jaiganesh and Gokhale, Aniruddha S. and Dubey, Abhishek and Wolf, Friedhelm and Lu, Chenyang and Gill, Christopher D. and Schmidt, Douglas C.},
  booktitle = {16th {IEEE} Real-Time and Embedded Technology and Applications Symposium, {RTAS} 2010, Stockholm, Sweden, April 12-15, 2010},
  title = {Middleware for Resource-Aware Deployment and Configuration of Fault-Tolerant Real-time Systems},
  year = {2010},
  pages = {69--78},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/rtas/BalasubramanianGDWLGS10},
  category = {selectiveconference},
  contribution = {colab},
  doi = {10.1109/RTAS.2010.30},
  file = {:Balasubramanian2010-Middleware_for_Resource-Aware_Deployment_and_Configuration.pdf:PDF},
  keywords = {middleware, performance},
  project = {cps-middleware,cps-reliability},
  tag = {platform},
  timestamp = {Tue, 05 Nov 2019 00:00:00 +0100},
  url = {https://doi.org/10.1109/RTAS.2010.30}
}
```
Developing large-scale distributed real-time and embedded (DRE) systems is hard in part due to complex deployment and configuration issues involved in satisfying multiple quality for service (QoS) properties, such as real-timeliness and fault tolerance. This paper makes three contributions to the study of deployment and configuration middleware for DRE systems that satisfy multiple QoS properties. First, it describes a novel task allocation algorithm for passively replicated DRE systems to meet their real-time and fault-tolerance QoS properties while consuming significantly less resources. Second, it presents the design of a strategizable allocation engine that enables application developers to evaluate different allocation algorithms. Third, it presents the design of a middleware agnostic configuration framework that uses allocation decisions to deploy application components/replicas and configure the underlying middleware automatically on the chosen nodes. These contributions are realized in the DeCoRAM (Deployment and Configuration Reasoning and Analysis via Modeling) middleware. Empirical results on a distributed testbed demonstrate DeCoRAM’s ability to handle multiple failures and provide efficient and predictable real-time performance.
A. Dubey, G. Karsai, R. Kereskényi, and N. Mahadevan, A Real-Time Component Framework: Experience with CCM and ARINC-653, in 13th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, ISORC 2010, Carmona, Sevilla, Spain, 5-6 May 2010, 2010, pp. 143–150.
```
@inproceedings{Dubey2010a,
  author = {Dubey, Abhishek and Karsai, Gabor and Keresk{\'{e}}nyi, R{\'{o}}bert and Mahadevan, Nagabhushan},
  booktitle = {13th {IEEE} International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, {ISORC} 2010, Carmona, Sevilla, Spain, 5-6 May 2010},
  title = {A Real-Time Component Framework: Experience with {CCM} and {ARINC-653}},
  year = {2010},
  pages = {143--150},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/isorc/DubeyKKM10},
  category = {selectiveconference},
  contribution = {lead},
  doi = {10.1109/ISORC.2010.39},
  file = {:Dubey2010a-A_Real-Time_Component_Framework_Experience_with_CCM_and_ARINC-653.pdf:PDF},
  keywords = {middleware},
  project = {cps-middleware,cps-reliability},
  tag = {platform},
  timestamp = {Wed, 16 Oct 2019 14:14:53 +0200},
  url = {https://doi.org/10.1109/ISORC.2010.39}
}
```
The complexity of software in systems like aerospace vehicles has reached the point where new techniques are needed to ensure system dependability while improving the productivity of developers. One possible approach is to use precisely defined software execution platforms that (1) enable the system to be composed from separate components, (2) restrict component interactions and prevent fault propagation, and (3) whose compositional properties are well-known. In this paper we describe the initial steps towards building a platform that combines component-based software construction with hard real-time operating system services. Specifically, the paper discusses how the CORBA Component Model (CCM) could be combined with the ARINC-653 platform services and the lessons learned from this experiment. The results point towards both extending the CCM as well as revising the ARINC-653.

A. Dubey, R. Mehrotra, S. Abdelwahed, and A. N. Tantawi, Performance modeling of distributed multi-tier enterprise systems, SIGMETRICS Performance Evaluation Review, vol. 37, no. 2, pp. 9–11, 2009.

@article{Dubey2009,
  author = {Dubey, Abhishek and Mehrotra, Rajat and Abdelwahed, Sherif and Tantawi, Asser N.},
  journal = {{SIGMETRICS} Performance Evaluation Review},
  title = {Performance modeling of distributed multi-tier enterprise systems},
  year = {2009},
  number = {2},
  pages = {9--11},
  volume = {37},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/journals/sigmetrics/DubeyMAT09},
  contribution = {lead},
  doi = {10.1145/1639562.1639566},
  file = {:Dubey2009-Performance_modeling_of_distributed_multi-tier_enterprise_systems.pdf:PDF},
  keywords = {performance},
  project = {cps-middleware},
  tag = {platform},
  timestamp = {Tue, 06 Nov 2018 00:00:00 +0100},
  url = {https://doi.org/10.1145/1639562.1639566}
}

A. Dubey, G. Karsai, and S. Abdelwahed, Compensating for Timing Jitter in Computing Systems with General-Purpose Operating Systems, in 2009 IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, ISORC 2009, Tokyo, Japan, 17-20 March 2009, 2009, pp. 55–62.
```
@inproceedings{Dubey2009c,
  author = {Dubey, Abhishek and Karsai, Gabor and Abdelwahed, Sherif},
  booktitle = {2009 {IEEE} International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing, {ISORC} 2009, Tokyo, Japan, 17-20 March 2009},
  title = {Compensating for Timing Jitter in Computing Systems with General-Purpose Operating Systems},
  year = {2009},
  pages = {55--62},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/isorc/DubeyKA09},
  category = {selectiveconference},
  contribution = {lead},
  doi = {10.1109/ISORC.2009.28},
  file = {:Dubey2009c-Compensating_for_Timing_Jitter_in_Computing_Systems_with_General-Purpose_Operating_Systems.pdf:PDF},
  keywords = {performance},
  project = {cps-middleware,cps-reliability},
  tag = {platform},
  timestamp = {Wed, 16 Oct 2019 14:14:53 +0200},
  url = {https://doi.org/10.1109/ISORC.2009.28}
}
```
Fault-tolerant frameworks for large scale computing clusters require sensor programs, which are executed periodically to facilitate performance and fault management. By construction, these clusters use general purpose operating systems such as Linux that are built for best average case performance and do not provide deterministic scheduling guarantees. Consequently, periodic applications show jitter in execution times relative to the expected execution time. Obtaining a deterministic schedule for periodic tasks in general purpose operating systems is difficult without using kernel-level modifications such as RTAI and RTLinux. However, due to performance and administrative issues kernel modification cannot be used in all scenarios. In this paper, we address the problem of jitter compensation for periodic tasks that cannot rely on modifying the operating system kernel. ; Towards that, (a) we present motivating examples; (b) we present a feedback controller based approach that runs in the user space and actively compensates periodic schedule based on past jitter; This approach is platform-agnostic i.e. it can be used in different operating systems without modification; and (c) we show through analysis and experiments that this approach is platform-agnostic i.e. it can be used in different operating systems without modification and also that it maintains a stable system with bounded total jitter.
A. Dubey, S. Nordstrom, T. Keskinpala, S. Neema, T. Bapty, and G. Karsai, Towards a verifiable real-time, autonomic, fault mitigation framework for large scale real-time systems, Innovations in Systems and Software Engineering, vol. 3, no. 1, pp. 33–52, 2007.
```
@article{Dubey2007,
  author = {Dubey, Abhishek and Nordstrom, Steven and Keskinpala, Turker and Neema, Sandeep and Bapty, Ted and Karsai, Gabor},
  journal = {{Innovations in Systems and Software Engineering}},
  title = {Towards a verifiable real-time, autonomic, fault mitigation framework for large scale real-time systems},
  year = {2007},
  number = {1},
  pages = {33--52},
  volume = {3},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/journals/isse/DubeyNKNBK07},
  contribution = {lead},
  doi = {10.1007/s11334-006-0015-7},
  file = {:Dubey2007-Towards_a_verifiable_real-time_autonomic_fault_mitigation_framework.pdf:PDF},
  project = {cps-middleware,cps-reliability},
  tag = {platform},
  timestamp = {Sun, 28 May 2017 01:00:00 +0200},
  url = {https://doi.org/10.1007/s11334-006-0015-7}
}
```
Designing autonomic fault responses is difficult, particularly in large-scale systems, as there is no single ‘perfect’ fault mitigation response to a given failure. The design of appropriate mitigation actions depend upon the goals and state of the application and environment. Strict time deadlines in real-time systems further exacerbate this problem. Any autonomic behavior in such systems must not only be functionally correct but should also conform to properties of liveness, safety and bounded time responsiveness. This paper details a real-time fault-tolerant framework, which uses a reflex and healing architecture to provide fault mitigation capabilities for large-scale real-time systems. At the heart of this architecture is a real-time reflex engine, which has a state-based failure management logic that can respond to both event- and time-based triggers. We also present a semantic domain for verifying properties of systems, which use this framework of real-time reflex engines. Lastly, a case study, which examines the details of such an approach, is presented.
A. Dubey, X. Wu, H. Su, and T. J. Koo, Computation Platform for Automatic Analysis of Embedded Software Systems Using Model Based Approach, in Automated Technology for Verification and Analysis, Berlin, Heidelberg, 2005, pp. 114–128.
```
@inproceedings{Dubey2005,
  author = {Dubey, Abhishek and Wu, X. and Su, H. and Koo, T. J.},
  booktitle = {Automated Technology for Verification and Analysis},
  title = {Computation Platform for Automatic Analysis of Embedded Software Systems Using Model Based Approach},
  year = {2005},
  address = {Berlin, Heidelberg},
  editor = {Peled, Doron A. and Tsay, Yih-Kuen},
  pages = {114--128},
  publisher = {Springer Berlin Heidelberg},
  category = {selectiveconference},
  contribution = {lead},
  file = {:Dubey2005-Computation_Platform_for_Automatic_Analysis_of_Embedded_Software_Systems_Using_Model_Based_Approach.pdf:PDF},
  isbn = {978-3-540-31969-6},
  keywords = {reliability},
  project = {cps-reliability},
  tag = {platform}
}
```
In this paper, we describe a computation platform called ReachLab, which enables automatic analysis of embedded software systems that interact with continuous environment. Algorithms are used to specify how the state space of the system model should be explored in order to perform analysis. In ReachLab, both system models and analysis algorithm models are specified in the same framework using Hybrid System Analysis and Design Language (HADL), which is a meta-model based language. The platform allows the models of algorithms to be constructed hierarchically and promotes their reuse in constructing more complex algorithms. Moreover, the platform is designed in such a way that the concerns of design and implementation of analysis algorithms are separated. On one hand, the models of analysis algorithms are abstract and therefore the design of algorithms can be made independent of implementation details. On the other hand, translators are provided to automatically generate implementations from the models for computing analysis results based on computation kernels. Multiple computation kernels, which are based on specific computation tools such as d/dt and the Level Set toolbox, are supported and can be chosen to enable hybrid state space exploration. An example is provided to illustrate the design and implementation process in ReachLab.

ScopeLab - Design, Operation and Optimization of Smart Cyber-Physical Systems

Resilient Design and Operation of Complex Cyber-Physical Systems

Publications in this area

ScopeLab
- Design, Operation and Optimization of Smart Cyber-Physical Systems