Architecture


Overview

IRL Infrastructure Fault-Tolerance

Client-Server Interactions in IRL and OGH behaviour


Overview

In order to implement object replication and to preserve infrastructure portability and interoperability, IRL core is designed as a functional level logically set above the ORB:

In order not to introduce single points of failure, each IRL component is replicated and fault-tolerant. IRL components are classified into two main classes, i.e. domain and host specific components. In the picture above, only domain specific components (i.e. the main IRL components) have been shown.

IRL ReplicationManager (RM) represents the management interface of the IRL infrastructure. It exports the ReplicationManager FT-CORBA interface and then allows object group creation/disposal, and membership modification.

IRL FaultNotifier (FN) implements a publish and subscribe engine to provide subscribers with fault notifications. It essentially receives fault reports, detects host failures and propagates object and host fault reports to every object that subscribed for a report about the failure event.

IRL ObjectGroupHandler (OGH) component, is in charge of maintaining consistency among the state of the members of a stateful object group. In particular, an OGH component is associated to each stateful object group. It receives client requests and transforms them in a set of requests addressed to a object group members. OGH knows the composition of its group and can therefore create the real request to all server object belonging to its group. This entity is composed by a set of objects (replicated in order to not introduce a single point of failure) distributed on the hosts of a fault tolerance domain. Shortly speaking, to maintain consistency among the state of the replicas belonging to an object group, IRL interposes a mid-tier between clients and servers. This is necessary only for stateful objects.

As domain specific components implement functionality that have to survive to host crashes, they are replicated by deploying their replicas on different hosts. In current IRL prototype implementation, we adopt a passive replication technique to replicate domain specific components.

Host specific components implement functionality which have not to survive to their host crash. They are:

IRL LocalFailureDetector (LFD). LFD is in charge of monitoring for failures the object residing on its host. An object to be monitored exports a simple method, which LFD periodically invokes to discover object faults. Moreover, LFD periodically sends an heartbeat to FN, in order to let FN perform host-level failure detection.

IRL Factory. This is the only component that should be installed on every host of a fault tolerance domain in order to let IRL Infrastructure perform startup and be able to create and manage new object groups. In particular, IRL Factory creates new copies of the aforementioned domain and host specific components.

 

IRL Infrastructure Fault-Tolerance

In order to avoid single points of failure, each IRL component is replicated. However, different components need different replication techniques for enhancing their availability. The choice of the replication technique for each IRL component is based on the state it maintains (if any) and on its deployment and fault-tolerance requirements. The following table summarizes the replication techniques applied in the current IRL prototype:

Type

Component

Stateful

Technique

Host Specific Local Failure Detector
Yes
Cold Passive
IRL Factory
No
Stateless
Domain Specific ReplicationManager
Yes
Hot Passive
FaultNotifier
Yes
Active
ObjectGroupHandler
Yes
Hot Passive(1)

(1) The hot-passive technique adopted to replicate OGH is quite different from the technique implemented to replicate FN and RM (see Client-Server Interactions in IRL and OGH behaviour).

IRL Factory


As IRL Factory is stateless, we adopted a stateless replication configuration: on each host run two separate processes, both implementing IRL Factory. Failure detection is based on bi-directional I'm alive local messages exchange. When one replica detects the crash of the other, it spawns a new IRL Factory replica, forwards an IRL Factory failure notification to RM, along with the new IRL Factory reference.

Local Failure Detector

LFD maintains a state composed by the list of objects to monitor and their monitoring properties. Hence, we adopt a cold passive replication style for LFD: each time LFD receives reference and properties of a new monitorable object, it stores them on a local log before acknowledging the client. If LFD crashes, IRL Factory creates a new instance that sets its initial state using the log and returns to IRL Factory its reference, which is then forwarded to RM. LFD crash events are monitored by IRL Factory. For this purpose, each IRL Factory replica registers itself within LFD to be monitored. Each time LFD invokes an IRL Factory replica, the latter sets a timeout in order to discover LFD crashes. Furthermore, LFD periodically sends heartbeats to FN replicas that implement push-style failure detection. When LFD crashes, FN stops receiving its heartbeats and detects an host failure on the basis of a local timeout.

The following picture shows host specific components replication, the message they exchange and their relationships.

 

Domain specific components adopts a hot passive replication technique. However, RM is replicated by a wrapper CORBA object that transparently replicates a singleton server object in a passive way, while OGH benefits of the faul notification service implemented by the FaultNotifier to implement failure detection. FN is currently non-replicated. In the following, we shortly describe the pattern implemented to replicate FN and RM. Details on OGH replication are given in "Client-Server Interactions in IRL and OGH behaviour".

IRL RM and IRL FN

IRL replication manager is nondeterministic. As an example, at object group creation and modification time, IRL RM performs outgoing invocations to local factories which nondeterministically return their results. Hence, we implemented a fault tolerant version of RM exploiting an hot passive replication technique. Such replication technique is based on a wrapper implementing the facade pattern. The wrapper encapsulates a generic nondeterministic component. A component implements both Checkpointable and Updateable interfaces, respectively exporting the get_state(), set\_state() and get_update() and set_update() methods, which allow for state and updates writings and readings. The component is wrapped by the Wrapper CORBA object, which dynamically reads the component interface from the CORBA Interface Repository at wrapper creation time and accepts incoming client requests in spite of the actual component implementation. Then, the wrapped component is run on different hosts at FT-infrastructure initialization time.

The following picture illustrates the main steps carried out to serve a client request addressed to the wrapped, passively replicated, object.

 

The steps are:

  1. Upon receiving a client request req, a CORBA Portable Server Interceptor implementing the transparent request redirection mechanism, checks if the wrapper replica is the primary. If it is not the case, it throws a LOCATION_FORWARD exception to the client ORB that will reinvoke the primary replica. Otherwise it let the request flow into the wrapper, which parses the request exploiting the Dynamic Skeleton Interface and then
  2. checks if the request has already been already executed by the component by accessing an internal request logging mechanism. In the affirmative, it immediately returns the result to the client, otherwise
  3. The wrapper forwards the request to the component;
  4. the component performs the operation contained into the request. Then
  5. the primary wrapper invokes the get_update() method of the replicated object. If the update is not empty (i.e. the request invocation modified the object internal state), the wrapper atomically updates the backup(s) by sending an update message composed by a triple <req,res,upd>.
  6. upon receiving the update, the backup(s) (i) updates its replica by invoking the set_update(upd) method and (ii) inserts the <req,res> pair into its log. Then
  7. when the backup(s) has acknowledged the update,
  8. the primary wrapper inserts the <req,res> into its local log. Finally,
  9. the request result is returned to the client

In the IRL current prototype, we exploit the Jgroup group membership service and view-synchronous multicast primitive to implement the ReplicationEngine wrapper sub-component. When a view change event occurs (e.g. because of a primary crash), the first action undertaken by the new primary (i.e., the first member of the new view) is to ask each member for its reference (IOR) to build a new object group refernce (IOGR) and to multicast the new IOGR to each backup wrapper. In this way wrapper replicas update the state of their Portable Server Interceptor. The Checkpointable interface is exploited by the wrapper to perform full state transfer, e.g. when a new wrapped component joins the group. Note that the introduced technique is suitable for being applied to any nondeterministic CORBA object. Moreover, it only requires a CORBA object to implement Checkpointable and Updateable interfaces in order to be passively replicated. Exploiting this technique, we are able to overcome a FT-CORBA limitation, i.e. the limitation of replicating only deterministic CORBA objects. However, in its current implementation, such replication technique has been applied only to IRL RM.

In the current IRL prototype, FN is non replicated, i.e. non fault-tolerant. We are currently working on an actively replicated IRL Fault Notifier as well as to implement RM as a stateless gateway wrapping a COTS fault-tolerant DBMS.

 

Client-Server Interactions in IRL and OGH behaviour

The mid-tier passive replication protocol implemented by OGH is based on perfect failure detection, i.e. FN does not make mistakes when detecting host crashes. This assumption allows to simplify the mid-tier replication protocol implemented by OGH.

Deployment and Initialization

Client-tier. In order to let client applications benefit of transparent client reinvocation even on non FT-CORBA compliant client ORBs, client applications are augmented with the IRL Object Request Gateway (ORGW) component. In short, ORGW is a CORBA Client Request Portable Interceptor that (i) intercept requests addressed to object groups (i.e. using an IOGR), (ii) uniquely identifies them as the FT-CORBA standard prescribes and (iii) iteratively tries to send the request to a correct member, until either it receives a reply (that it returns to the client application) or it has tried all of the IOGR profiles without receiving a reply.

End-tier. Each stateful object group member is transparently wrapped by the IRL Incoming Request Gateway Component (IRGW) that implements a basic FT-CORBA logging mechanism. In short, IRGW adopts the same interface (by exploiting the Dynamic Skeleton Interface and the Interface Repository) and receives all the requests of the member it wraps. Upon receiving a request, IRGW first checks if the request is a reinvocation (exploiting the FT-CORBA compliant unique request identifier). If it is the case, IRGW returns the result that it has previously logged. Otherwise it (i) forwards the request to the member, (ii) waits until a result is produced, (iii) logs the request/reply pair and (iv) it finally returns the result to the client. To perform garbage collection of outdated request/reply pairs, IRGW exploits the FT-CORBA request expiration time contained in the unique request identifier. In order to let OGH perform state synchronization, we assume object group members to implement at least the Checkpointable and optionally the Updateable FT-CORBA interfaces.

Mid-Tier. When IRL RM creates a new stateful object group, it starts a set of OGH replica (each running on a distinct host), other than object group members. Each OGH replica reads the interface of its object group member type from the CORBA Interface Repository in order to parse incoming requests on behalf of its object group members by exploiting a Dynamic Skeleton Interface. Moreover, each replica receives from IRL RM two initial views, i.e. a view containing the identifiers of the OGH replicas (VOGH) and the view containing the identifiers of the object group members Vmembers. Views are dynamically updated by OGH upon receiving object and host fault reports from IRL FN, to which each OGH subscribes as consumer for the objects contained in VOGH+Vmembers.

The Protocol

Scenario 1

The IRL three-tier prototype protocol is illustrated in the following figures. In Scenario 1 client C1 issues request req1 that reaches the primary OGH1. Upon receiving the request, OGH1 piggybacks onto req1 a local sequence number (increased for each served request) and forwards the request to every member R of Vmembers. Each IRGW wrapping a member, upon receiving a request stores the sequence number and then forwards the request to its replica. Then it waits for the result, logs the request and the result and returns the reply to OGH1. Once OGH1 has received the results from all R in Vmembers, it returns the reply to the client. Note that if OGH1 receives another request req2 before completing a request processing (e.g. req1), then req_2 is queued until rep1 is sent to C1. This preserves request ordering at object group members in absence of primary failures. When a member crashes (e.g. R3), FN forwards a fault report to every OGH in VOGH, allowing the primary OGH not to undefinitevely block by waiting a reply from a crashed member.
Scenario 2 illustrates how a primary OGH crash is handled. Backup OGH crashes are simply notified by FN to every OGH that updates VOGH. When OGH1 crashes, FN notifies each backup of the fault. Hence, each OGH updates its local copy of VOGH and decides if it is the new primary. In Scenario 2, OGH2 is the new primary as its id appears as the first element of VOGH, whose elements are ordered basing on replica identifiers. As a primary can fail during the processing of a request without updating all the members of Vmember (e.g. OGH1), OGH2 performs a recovery protocol before starting to serve client requests. Recovery is needed to ensure update atomicity on the members of Vmembers after primary failures. To achieve this, OGH2 first determines if all the members are in the same state by invoking the IRGW get_last() method. Actually, this invocations would be not necessary if the FT-CORBA get_update (get_state) method would return an update (state) sequence number along with their current result. The get_last method takes as input parameter the new primary identifier (the new primary identifier allows IRGW to filter out outdated request coming from crashed primaries) and returns the sequence number of the last request received by IRGW. If all the members return the same sequence number, then they are in a consistent state and OGH2 starts serving client requests. Conversely, i.e. if some member executed an update not executed by some other members, OGH2 gets a state update from one of the most updated members and set the update to the least updated ones. Incremental updates are executed exploiting the FT-CORBA Updateable interface methods (if implemented). Otherwise the Checkpointable interface methods are exploited, performing full state transfers. Then OGH2 starts serving client requests as the new primary, having ensured update atomicity on object group members. As clients implement request retransmission (e.g. by ORGW), C1 reinvokes req1 onto OGH2. In this situation, members are already updated and then the IRGW wrapping each member returns the logged result without repeating the member invocation and preserving the CORBA at-most-once request invocation semantic.

Scenario 2