Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Epics

An "epic" is a compound operation comprised of a set of nanotransactions. While each nanotransaction in an epic operates under the same atomicity guarantees as in the simple case, the epic abstraction permits users to specify the kinds of isolation etc. they desire from the composition of nanotransactions (in a way similar to a Saga).

To help illustrate how epics differ from nanotransactions, we will use the following example of a simple graph traversal algorithm:

#![allow(unused)]
fn main() {
#[nandoize]
fn aggregate(node: &Node, output: &mut Aggregator) {
  if output.visited.contains(&node) {
    return;
  }
  output.visited.insert(&node);
  output.sum += node.value;
  for neighbor in node.neighbors {
    aggregate(neighbor, output);
  }
})
}

and its associated data model:

#![allow(unused)]
fn main() {
struct Node {
  value: uint,
  neighbors: PVec<Node>,
}

struct Aggregator {
  sum: uint,
  visited: Set<Node>,
}

}

"Compilation"

The above nanotransaction code would first be re-written to something like the following:

#![allow(unused)]
fn main() {
fn aggregate(node: &Node, output: &mut Aggregator) {
  if output.visited.contains(&node) {
    return;
  }
  output.visited.insert(&node);
  output.sum += node.value;
  for neighbor in node.neighbors {
    nando_spawn!("aggregate", neighbor, output);
  }
  yield();
})
}

The job of the nando_spawn!() macro is to update the transaction log entry corresponding to a given execution of aggregate (under a transactional context, called through the generated aggregate_nando() variant, as covered in the nanotransaction section) with the information that the present transaction intends to spawn a (data-dependent) number of activations of aggregate. Note the use of the word "intends": a nando_spawn!() does not immediately schedule the specified function, but rather ensures that the intent for it to be scheduled is durably logged; it is the responsibility of downstream components to schedule the target activation.

The yield() function call has two effects:

  • First, a continuation is generated, whose execution is dependent on the successful execution of all the spawned nanotransactions. The point of this continuation is to specify a natural "merge" point for the "forked" computations. In the case of what are in essence tail calls (like above), we can omit this step. This is functionally similar to the way CIEL manages its dynamic task graphs.
  • Second, it drives the current nanotransaction activation to commit (the meaning of "commit" in this context is dependent on the user's choice of consistency mode. Outside of the information that is recorded to facilitate recovery (as in nanotransaction logging, the log entry also contains the list of spawns generated by the executed nanotransaction, which are then materialized into activations, either locally or remotely.

Execution

Let's assume that a user is interacting with a REPL, and already has an invariant pointer to a node, start_node, in a graph of Nodes. A sample graph is provided on the left-hand side of the below image. On the right-hand side we can see what the full "task graph" of invoking aggregate() on start_node with a fresh Aggregator object as output looks like. This task graph is the epic.

Graph Traversal

Rectangles represent the points at which the runtime interposes on the execution of aggregate() (henceforth referred to as control nodes), while squared boxes represent the execution of the rewritten aggregate() nanotransaction (henceforth referred to as task nodes).

At each control node, the runtime schedules the execution of further downstream computations, and does all the necessary bookkeeping for the continuation of control, into either another task node / nanotransaction activation or a "commit" decision point for the epic. The advantage of the above approach is that control nodes lend themselves naturally to the process of discovery, and can facilitate both distributed prefetching and also continuation across host boundaries (think back to the graph traversal example from the HPTS talk).

For the sample graph, the execution of aggregate(&start_node, /* some output object */) (marked as T1.1 in the figure above) will spawn two further transactions, one for each of its neighbors. More specifically, T2.1 for the node with the value 14 and T2.2 for the node with the value 18. Because both of them need mutable access to the given output object, they will be scheduled on the same executor queue, one after the other (with the ordering shown in the numbered circles in the figure). Once T2.2. "commits", a decision is made to commit or discard the entire epic. On commit, the future that represented the result of the nanotransaction is resolved and the user can consume the results of aggregate() through the output object they passed as an argument to the top-level transaction.

Non-tail calls

The actual usefulness of the above dynamic continuation scheme is more evident in non-tail calls. Imagine the below two functions:

#![allow(unused)]
fn main() {
#[nandoize]
fn bar(some_other_object: &StructuredObject) -> ... { /* */ }

#[nandoize]
fn foo(some_object: &StructuredObject) {
    let x = /* some value based on some_object's contents */;
    /* invoke bar nando on a reference */
    let intermediate = bar(some_object.left);
    /* compute some value combining x and intermediate */
}
}

The above would be rewritten as

#![allow(unused)]
fn main() {
#[nandoize]
fn bar(some_other_object: &StructuredObject) -> ... { /* */ }

#[nandoize]
fn foo(some_object: &StructuredObject) {
1 |    let x = /* some value based on some_object's contents */;
2 |    /* invoke bar nando on a reference */
3 |    let intermediate = nando_spawn!("bar", some_object.left);
4 |    yield();
5 |    /* compute some value combining x and intermediate */
}
}

After the corresponding activation of bar() commits, execution of foo() will resume immediately after line 4, regardless of whether the activation of bar() was local or remote. All state up to the yield() statement has been durably persisted, so that we don't have to replay the entire epic if we need to recover from some fault.

The yield() construct allows us to introduce some hints as to the "maximum" permitted parallelism of computation subtrees. Any set of spawns between two successive yield() statements in a function (or the function's start and the first yield() statement) could be scheduled to run concurrently at runtime, but might be prevented from doing so because of data dependencies, guiding the runtime towards a sequential schedule. yields() represent barriers (or "merge points") that need to be enforced at runtime.

Motivation

The above mechanism for rewriting large transaction into compositions of contiuations is motivated by our goals of ensuring non-interference properties for workflows that derive from compositions of nanotransactions. It allows us to leverage the existing execution model and consistency guarantees of local nanotransactions and build a system of workflows on top of them, where the workflow is established by controlling how the effects of nanotransactions in the same "transaction tree" are composed and made visible to the "outside" world, while also allowing us to solve the problem of (potentially) distributed transactions outside the core implementation of nanotransactions themselves.

After a nanotransaction / task is done executing and invokes yield(), it can be considered "done", in the sense that it cannot block other nanotransactions that operate on the same data items it does from making progress. TODO expand

Handling async code

The notion of a composition of tasks into an epic permits us to re-introduce async constructs into our transactional context. An async block is equivalent to a nanotransaction that will be executed as its own task, with the await in the calling code being replaced by an appropriate yield() construct. These async tasks can be handed off to a special async executor so as not to interfere with the execution of synchronous nanotransactions.

Execution Semantics

Tunable Consistency

By default, epics work in a mode that is functionally equivalent to Snapshot Isolation (SI). Before the invocation of the top-level transaction that kicks the epic off, the runtime establishes a causally consistent snapshot that all of an epic's reads are resolved through and all of its writes are directed to -- commits of inner nanotransactions are appended to a log that is part of the snapshot and will succeed by default. At the end of the epic, a decision being made at the end of the epic on whether or not to commits its effects (the writes of all of the nanotransactions that were spawned as part of the epic), with the potential of an abort-and-retry in the case of a dependency conflict with transactions that committed since the epic's snapshot was established. If an epic commits, the entirety of its internal log is appended to the "real" instance log.

In reality, the runtime does not establish a complete snapshot at the beginning of the execution, but rather stores enough metadata to be able to establish a causally consistent snapshot lazily as the epic's data dependencies are discovered. The mechanism for this is as of yet not established.

For users that don't want, and applications that don't need to pay the cost of snapshot isolation, we can weaken the epic's consistency guarantees to those of a more traditional saga. Namely, the runtime still provides isolation and sequential consistency semantics at the level of individual nanotransactions and the objects they touch, but each inner nanotransaction immediately commits its results. If the epic is forced to roll back for whatever reason (which under this consistency mode would most likely be an application level rollback() statement), a series of compensating actions will (potentially) need to be applied as part of the epic's rollback.

Compensating (trans)actions

A compensating action might be a simple no-op, or it might be a reset of an object's values as they were before the invocation of the function being compensated for, or it might be some other application-specific logic (although it should probably be restricted to "locally" computable effects, and not include, say, other nanotransaction invocations) -- the point is that we cannot predict this and it should be up to our users to know their domain enough to be able to supply these.

Ideally, we will be able to build some infrastructure that can warn users of potential anomalies that could be introduced by this lower isolation level, to help them in making the choice between the two.

Distributed Transactions

TODO

Epic Object Versioning

TODO