CHARIOT
Note Related Slides are available here: CPSWeekTutorial.pptx
Cyber-pHysical Aapplication aRchItecture with Objective-based reconfiguraTion (CHARIOT) is a tool-suite that facilitates design, development, and management of extensible CPS. CHARIOT comprises of design-time and runtime aspects; different entities that constitutes CHARIOT are described below:
- Domain-specific modeling language: CHARIOT-ML (Modeling Language) is a Domain-Specific Modeling Language (DSML) that can be used to model applications and systems. Applications are modeled as software components that provide functionalities. Systems are modeled as composition of one or more functionalities. CHARIOT-ML is a design-time tool.
- Generic component model: CHARIOT implements a novel component model that, unlike any existing component models, is middleware agnostic. At its very core, this component model relies on the design principle that a software component should have a clean separation-of-concerns between its computation and communication logic. This component model is part of both design-time (modeling application components) and runtime (execution of application components) aspect of CHARIOT.
- Autonomous resilience loop: Runtime aspect of CHARIOT also includes different entities that constitutes an autonomous resilience loop. The management infrastructure is responsible for managing a platform, whereas, the monitoring infrastructure is responsible for monitoring resources of the platform for failures and anomalies. Finally, the resilience infrastructure is responsible for determining how to resolve failures and anomalies detected by the aforementioned monitoring infrastructure. As shown in the figure below, these entities form a closed sense-plan-act loop to make sure system required functionalities provided by different applications are maintained for as long as possible.
What are extensible CPS
Cyber-Physical Systems (CPS) have traditionally been designed as closed systems for specific domains. This design philosophy was necessitated by the stringent requirements on system correctness, reliability, security and privacy. However, with increasing push towards open architectures and the emphasis on integration of CPS with Internet of Things (IoT), cloud computing, and online data analytics, as evidenced by the increasing interest in Smart City cyber-physical applications, the trend in CPS design and deployment is transitioning towards a more open and dynamic approach. This results in extensible CPS that are not built as a single function system, but rather as loosely connected, networked platforms comprising subsystems pertaining to different domains. These heterogeneous cyber-physical platforms simultaneously host multi-domain cyber-physical applications and their behavior cannot be encoded a priori, but it evolves over time depending on the hosted applications.
Following are the key properties of extensible CPS:
- Multitenancy: An extensible cyber-physical platform can host multiple applications running simultaneously.
- Dynamicity: Dynamic nature of resources provided by platforms and functionalities provided by application hosted on such platforms results in a system that can expand or contract at any time.
- Remote deployment: Resources of certain domains (such as UAVs, satellites) are remotely deployed.
- Heterogeneity: Resources of a platform can have distributed ownership resulting in varying hardware, operating system, and middleware.
- Resilience: This is a critical desired property of extensible cyber-physical platforms as they are often safety critical and susceptible to failures and anomalies due to complex interdependencies.
CHARIOT-ML
CHARIOT-ML is a textual DSL developed using Xtext. Figure below presents different first class modeling concepts in CHARIOT-ML, their interdependencies (left side of the figure), and different entities modeled using those concepts (right side of the figure).Brief description of different modeling concepts is also provided below.
Figure: Modeling concepts and their inter-dependencies in CHARIOT-ML (left side), and entities modeled for a system (right side).
- Data types: Most basic modeling construct. It facilitates modeling of data types used for interaction as well as computation. CHARIOT-ML supports data types that are common across popular programming languages and middleware solutions. This is what allows interoperability as interaction and computation modeled in CHARIOT-ML can be used with different programming languages and middleware solutions.
- Functionalities: These are logical concepts used to model functions with inputs and outputs using data types. Functionalities are provided by components and they can be composed to form objectives.
- Compositions: Logical groups of functionalities, where each functionality can be part of multiple compositions and functionalities of same composition can have inter-dependencies. Objectives are instantiations of compositions.
- Components: Applications in CHARIOT are composed of software components that interact with each other. Components have well defined ports for interaction and use workflows and tasklets to describe computational behavior. These communication and computation aspects have clean separation-on-concern to make sure components are middleware agnostic. Components provide one or more functionalities; same functionality could possibly be provided by multiple components.
- Node categories: CHARIOT-ML allows modeling of categories to which different nodes can be associated with. This allows creation of logical groups of similar resources.
- Nodes: Different nodes that are part of a platform. A node can be associated with a node category. Having the concepts of nodes and node categories allow easy addition of new nodes belonging to existing categories or completely new node categories at runtime.
- Systems: A system consists of a goal that is satisfied by one or more objectives. An objective depends on functionalities provided by components. As such, system goal, objectives, and functionalities form a tree like structure of logical concepts that describes requirements of a system. We call this goal-based system description.
CHARIOT Component Model
CHARIOT applications (apps) are in essence software components. Each component has a set of ports, workflows, tasklets, and state variables (not shown in figure below). Ports allow components to interact with each other. Workflows have associated triggers and other specific properties, which determines when and how different computation logic should be executed. Each workflow comprises one or more tasklets. A tasklet is the smallest unit of computation. Tasklets of a workflow can have data dependencies. This architecture allows tremendous flexibility to model a component's computation allowing cleanly separated computation blocks (workflow or tasklet) that can possibly be executed independently.
As mentioned before, CHARIOT components have a clean separation-of-concern between their computation and communication logic. This is an important and conscious design choice for two reasons. First, it allows components to be in control of execution of their computation logic. CHARIOT components are reactive in nature, each external event (message on a port, timer events, component life-cycle event) results in the analysis of associated trigger, eventually leading to tasklets being executed. In this way, we allow components to control execution of their computation logic resulting in architecture with predictable and analyzable computation logic, which is important for real-time systems. Using this approach we are moving away from traditional middleware and component models that are designed in such a way that any external event results in inversion of control, where execution of a related callback (computation logic) happens in the middleware's thread of control or the middleware span's a new thread to execute the callback and thus incur frequent context switching. These approaches result in unpredictable computation logic. Also, using a thread pool rather than a single threaded component results in better support for parallelization as tasklets without dependency, irrespective of their workflow, can run in parallel.
Second, communication logic is only responsible for exchanging messages. It does not need to worry about handling received data. Each component port has an associated buffer and the communication logic is responsible of managing this buffer. If a component needs to send a message using certain port, that message is placed on the port's buffer. Once the message is placed on the appropriate port buffer, the communication logic, via different transports, is responsible for picking up the message from the buffer and sending it. If a port receives information then the communication logic is responsible for receiving messages and storing them in the port's buffer for computation logic to use. As such, a component could possibly use different middleware solutions (supported by CHARIOT) by simply using different transports without having to change any of the component business/ computation logic. This is our approach to supporting heterogeneity. Current implementation of CHARIOT supports two middleware -- LCM and RTI DDS.
Resilience
CHARIOT runtime comprises entities that are part of a closed loop that follows sense-plan-act model to provided autonomous resilience. As mentioned before, resilience is a key desired property of extensible CPS, therefore, this autonomous resilience feature is one of the major contribution of CHARIOT. The figure below presents the architectural outline of the CHARIOT resilience loop. As shown in the figure, there are two kinds of nodes - (a) edge nodes, and (b) solver nodes. Edge nodes represent nodes that are deployed in the target physical environment. These nodes are equipped with required sensors and actuators to interact with their surrounding physical environment. Usually, these nodes are resource constrained. Solver nodes represent backend nodes that are resourceful and therefore can run resource-intensive tasks. This setup of different kind of nodes can be viewed as a multi-layer architecture, where solver nodes are deployed on a cloud (distinguished as compute nodes described below). Each edge node consists of - (a) one or more applications, (b) an instance of a distributed database, (c) an Application Manager (AM), and (d) a Node Monitor (NM). Solver nodes on the other hand consists of - (a) a Resilience Engine (RE), (b) an instance of a distributed database, and (c) a NM.
CHARIOT uses MongoDB as our choice of distributed database to store (a) configuration space, (b) initial configuration point, and (c) current configuration point. A configuration space represents the state of an entire platform. It includes information about different resources available, well known faults, system goals, objectives and corresponding functionalities that help achieve different system goals, components that provide aforementioned functionalities, and possible different ways in which these components can be deployed and configured. A configuration space can expand or shrink depending on addition or removal of related entities. As shown in figure below, a configuration space can contain multiple configuration points. A configuration point represents a valid configuration which includes information about a specific deployment scenario given a set of component instances and physical nodes on which these component instances can be deployed. A change in the state of a platform is represented by transition from one configuration point to another in the same configuration space. An initial configuration point represents the initial state, whereas the current configuration point represents the current state of a platform. Configuration points and their transition are critical for our self-reconfiguration mechanism.
Figure: CHARIOT Runtime Resilience Loop with deployment and reconfiguration action sequences. Figure also shows configuration space and points demonstrating an example of two component (C~A~ , and C~B~) application. C~A~ - F and C~B~ - F represent individual component failures.
As shown in the figure above, once a system is modeled using CHARIOT-ML and required artifacts (configuration space) are generated and stored in the database, a RE can be used to compute initial configuration point for deployment of applications, as well as, subsequent configuration points for runtime reconfiguration. The latter is the basis of supporting autonomous resilience as it allows the system to reconfigure by migrating/ transitioning from a faulty configuration point to a new configuration point computed by a RE. Upon computation of a target configuration point, the RE computes set of actions required to reach that target configuration point and then stores these actions in the database. At its core, our implementation of the RE is based on Satisfiability Modulo Theories (SMT).
Distributed AMs constitute our management infrastructure. Each node hosts a single AM and these AMs are responsible for managing local application processes. An AM is capable of starting a new process or stopping existing ones. These actions are taken by AMs when appropriate events are logged into the database by the RE.
Finally, the monitoring infrastructures consists of distributed NMs, where each node hosts a single NM. A NM is responsible for detecting node failures (this is the only form of failure handled currently by CHARIOT) by monitoring status of other nodes that are part of a platform. NMs use heartbeat based protocol to detect failures of existing nodes as well as addition of new nodes. Communication between different NMs happens through the distributed database. Each NM "publishes" its heartbeat periodically by writing to a specific collection in the database, similarly, each NM monitors other's heartbeat periodically via the database. Although failure of a node is detected by NMs on all other nodes of a platform, only the leader NM is responsible for initiating reconfiguration mechanism. Since we are using distributed database (MongoDB with replica set), we rely on its notion of leader (primary replica) to determine leader node, and therefore, leader NM.
Putting everything together
The figure below presents a target system architecture for CHARIOT. Edge nodes, as described above, are resource constrained nodes that are equipped with various sensors and/ or actuators, and deployed in the physical environment. Management and monitoring infrastructures can be run on these nodes as long running platform services. Applications make use of available resources for sensing, actuating, and non-resource intensive computations. Different middleware solutions can be used by these applications to communicate with each other.
Figure: CHARIOT target system architecture.
All computation cannot be run on edge nodes. Edge nodes should run small computations that require real-time response. A key point to understand here is that extensible CPS can host heterogeneous applications and these applications cannot always be deployed on edge nodes that are embedded and resource constrained. As such extensible CPS requires us to view CPS challenges from a collaborative perspective, where it is critical to utilize advancement in other computing paradigm such as cloud computing to realize a complex computing paradigm. Resource intensive computations that are not associated with real-time requirement can be deployed on a cloud. This yields a multi-layer architecture whereby application properties and requirements determine associated computation proximity and where they can be deployed.
Prerequisites
To be able to follow and run this demo scenario, we expect you to have working Java and basic UNIX command line operation experience. During this tutorial, we will provide you with a virtual appliance that has been preconfigured with CHARIOT toolchain and the development environment. During the tutorial, you will develop and experiment with the tool chain vitually without requiring the external embedded devices.
Downloads
Here is a white paper and a poster describing the CHARIOT tool chain.