Notes from the implementation of the transaction support to the objWiki.
At the moment, objWiki uses a simple tree structure, where NodeTree stores a flat list of Nodes. Node can each hold a list of Inputs, and anot list of (sub)Nodes.
or more conveniently:
Purpose of this page is to collect all kinds of transaction mechanisms I can think of, that can be implemented on this kind of database.
By Transaction is understood database transaction mechanism, with two-phase commit protocol and rollbacks.
Example: When two clients work on the same node, and then they both call
.commit(), second transaction fails, and he second client has to repeat it. First transaction finishes successfully, as it is oldest. After that, first client's changes are now part of the database and second client has to repeat the transaction.
Copy on write
Some kind of COW support. Writing creates copy, which is then diffed.
- Doesn't require so much space in memory.
- Requires COW structure. I am not sure how to do that right now.
- Hmm, there is
.get_copy()method that returns copy, maybe do this for all elements?
- Hmm, there is
- Requires tree diff. Maybe just some marking of changed objects (should be known from the COW).
- I've just gave up after toying with the
gcmodules for a while.
Journal, mark & transaction error
Work on the original tree and mark every node that was changed. If you find an already changed element, rollback everything from journal and cancel the transaction. Or depending on the strategy, cancel the other transaction.
Journal can be a list of closures with inverse operations. Should be simple.
- Memory cheap.
- Doesn't require diff.
- Requires rollback journal for each operation.
- Requires some kind of shadow structure.
Deep copy, then diff
Just deep copy the whole tree, work on it, then diff it and merge it back.
Maybe the deep copy could be created from partially loaded nodes, which are copied on the fly only when the transaction actually touches them.
- Relatively simple.
- Requires a lot of memory in the not-lazy-loaded scenario.
- Copy is probably slow.
- Slow diff (you have to compare whole tree - could be improved with indexing).
True Copy on write is hard to implement in python, because you can't really work with garbage collector and weak references in good enough granularity. It would probably be possible, but quick research proved that I don't see clear path toward this goal.
I've tried to implement Journaling approach, that is to define rollback operation for all actions that are possible with the NodeTree and Nodes and Input. It worked in principle, but it was way too hard to define inverse operation for each action, and so the probability of bug is relatively high. And also because you have to somehow allow storing transactions in the tree and there may be multiple transactions in place at one moment, whole thing complicates a lot and you end up with the partial copy of the tree anyway.
In the end, I've decided to focus on the Deep copy approach. I create a copy of the tree, which automatically notices the changes that were made. Theese changes are stored in the transaction. When client calls
.commit() on the transaction, transaction manager checks whether the transaction conflicts with others and if not, copy is merged back into the original tree, with as little changes as possible.
Transaction is just simple container for the UUID's of changed stuff:
class TransactionConflict(Exception): pass class Transaction: def __init__(self, transaction_manager, abort_immediately=False): self.creation_ts = time.time() self.transaction_manager = transaction_manager self.transactions_to_consider =  self.touched_uuids_read = set() self.touched_uuids_write = set() self.changed_objects =  self.invalid = False self.finished = False self.abort_immediately = abort_immediately self.tree = transaction_manager.tree._clone(self) def commit(self): self.transaction_manager.validate_and_commit(self) def invalidate(self): self.invalid = True self.finished = True self.abort_immediately = True def add_finished_transaction(self, transaction): self.transactions_to_consider.append(transaction) def collides_with(self, other): # single way non-transitional? if not self.touched_uuids_read.isdisjoint(other.touched_uuids_write): return True if not self.touched_uuids_write.isdisjoint(other.touched_uuids_write): return True return False def add_r(self, uuid): if self.abort_immediately and self.invalid: raise TransactionConflict() self.touched_uuids_read.add(uuid) def add_w(self, uuid): if self.abort_immediately and self.invalid: raise TransactionConflict() self.touched_uuids_write.add(uuid) def add_changed(self, obj): self.changed_objects.append(obj)
It is used in such way, that all datastructures that wish to offer transaction support add their UUID to the transaction by calling
Most important method is
.collides_with() , which gets another transaction as parameter and if other transaction written into set of UUIDs that this trasnsaction have read from or written to, it returns
class TransactionManager: tree: NodeTree active_transactions: List[Transaction] def __init__(self, tree): self.tree = tree self.active_transactions =  def get_transaction(self) -> Transaction: t = Transaction(self) self.active_transactions.append(t) return t def validate_and_commit(self, transaction): if transaction.invalid: self._invalidate_transaction(transaction) # transactions that finished in the lifetime of this one for finished_transaction in transaction.transactions_to_consider: if transaction.collides_with(finished_transaction): self._invalidate_transaction(transaction) # transactions that begun before this one index = self.active_transactions.index(transaction) for prior_transaction in self.active_transactions[:index]: if transaction.collides_with(prior_transaction): self._invalidate_transaction(transaction) # or invalidate prior? self.active_transactions.remove(transaction) transaction.finished = True self._merge_transaction(transaction) # set this as already finished in still running for active_transaction in self.active_transactions: active_transaction.add_finished_transaction(transaction) def _invalidate_transaction(self, transaction): transaction.invalidate() self.active_transactions.remove(transaction) raise TransactionConflict() def _merge_transaction(self, transaction): self.tree = self.tree._merge_with(transaction.tree) self.tree.reindex_node_directory()
TransactionManager provides and tracks transactions. Most interesting method is
.validate_and_commit(), which gets a transaction as parameter and checks whether it doesn't conflict with previously finished transactions. Then with actually running transactions, that are older than this one. If there is no conflict, the tree in this transaction is merged to the main tree and transaction is stored in all other transactions, so they would look into it when they are committed.
For tracking access to the data structures used in my simple in-memory database, I've created two monstrosities:
class TrackingList(UserList): def __init__(self, initlist=None, uuid=None, transaction=None): super().__init__(initlist) self._uuid = uuid self.track = False self._transaction = transaction def _add_r(self): if self._transaction is not None and self.track: self._transaction.add_r(self._uuid) def _add_w(self): if self._transaction is not None and self.track: self._transaction.add_w(self._uuid) def _log_all_uuids_in(self, item): self._add_w() if self._transaction and self.track and hasattr(item, "all_uuids"): for uuid in item.all_uuids(): self._transaction.add_w(uuid) def __getitem__(self, i): self._add_r() return super().__getitem__(i) def __setitem__(self, i, item): if i < len(self.data): self._log_all_uuids_in(self.data[i]) return super().__setitem__(i, item) def __delitem__(self, i): if i < len(self.data): self._log_all_uuids_in(self.data[i]) return super().__delitem__(i) def append(self, item): self._add_w() return super().append(item) def insert(self, i, item): self._add_w() return super().insert(i, item) def pop(self, i=-1): self._add_w() return super().pop(i) def remove(self, item): self._add_w() return super().remove(item) def clear(self): self._add_w() return super().clear() def copy(self): self._add_w() return super().copy() def count(self, item): self._add_r() return super().count(item) def index(self, item, *args): self._add_r() return super().index(item, *args) def reverse(self): self._add_w() return super().reverse() def sort(self, *args, **kwds): self._add_w() return super().sort(*args, **kwds) def extend(self, other): self._add_w() return super().extend(other)
This is a list, that works as a standard
list, but it also tracks all access and stores uuids of the added / changed / removed elements to the transactions.
Second monstrosity is
Clonable class, which I won't post here. Basically, it is an ABC class, that implements
.__setattr__() which do track access to all properties and store it to the transaction.
Clonable is not explained here, because it also creates partial copies of the trees, which are dynamically loaded from the original tree on the fly. This is to save memory and CPU time by not creating full copy of the tree for each transaction.
Merge could be implemented simply by replacing the original tree with updated one, but I've wanted to keep as much as possible from the original tree. This is because of the references other transactions / parts of the system may use, which would be suddenly invalid.
Thus, it ended up being a little too complicated recursive replacement of only those parts of the tree, that really changed, but there is nothing really interesting about it. Just recursively call
._merge() for the tree, and if it is new item, use it, and if its old, changed, item, merge it.