The problem
Let's assume we have an instance of a bank with three branches, 1-3. Let's also imagine a setup where
hosts B and C contain the following sets of accounts:
Host B:
object id | branch_id | balance
o1 | 1 | 100
o2 | 1 | 30
o3 | 2 | 100
o4 | 3 | 100
Host C:
o5 | 2 | 200
o6 | 1 | 42
o7 | 2 | 100
o8 | 1 | 17
Now, due to some audit call, [ audit(o1, output) ], [ audit(o2, output) ] both need to be scheduled, and also
[ debit(o1, o2, 50) ] is under scheduling consideration. Both the audit calls need to be scheduled either before
or after the debit() call, otherwise we're computing an invalid result as we're capturing partial states at different
points in time.
Possible approaches
The above should illustrate the importance of introducing some kind of versioning/logical timestamps so that we can enforce a uniform serial ordering, especially in cases where we need some kind of consistency guarantee for nanotransactions that span multiple hosts or objects.
Open questions to be answered include:
- How should (multi)versioning actually be implemented in this context? Should there be a "global" transaction log (one transaction log per host for all objects resident on that host), or should we do finer-grained version tracking?
- Assuming we answer the above, how does a nanotransaction "declare" the logical version(s) it needs to operate on?
One possible way of solving the second problem is to introduce a special kind of object, tentatively called a "directory". This object is essentially a collection of references to the objects that the nanotransaction we are interested in executing is meant to operate over, but we can also attach additional information to it, like the target version(s) it (and the nanotransactions that operate on it, as well as the nanotransactions that this "parent" nanotransaction spawns) targets.